xc6lx45

Members
  • Content Count

    740
  • Joined

  • Last visited

Reputation Activity

  1. Like
    xc6lx45 got a reaction from SigProcbro in Zynq 7000 Baremetal with webpack.   
    you can single-step through the FSBL, so that's probably a "yes".
    Note that documentation may become an issue (regardless of webpack or paying license): The ARM core is Xilinx-customized, so the ARM documentation helps only to a point. For example, non-standard use of the cache controller is such a topic.
    If you do want to use Xilinx libraries (but no OS) then forget everything I've written. It's a straightforward design flow => click through the menus to generate a new e.g. "Hello World" SDK project in standalone mode. Use an unmodified FSBL that loads your application. It's the obvious way ahead if you want to use external DRAM in your project, which is probably the case.
  2. Like
    xc6lx45 got a reaction from mjacome in Vivado Design Suite for BASYS 3: Mass Installation Inquiry   
    I'm quite sure you can use one account (I have done so on several PCs myself with Webpack).
    Looking at that license file, it says
    HOSTID=ANY
    To me, this looks like  (but someone correct me if I'm wrong) that the free webpack license isn't even tied to one specific machine.
  3. Like
    xc6lx45 reacted to asmi in Public service announcement: PLL locking   
    You can force config logic to wait for PLL/MCMM locks before GSR deassertion and design startup. RTFM: UG472 table 3-7, parameter STARTUP_WAIT. But you've got to be careful with this option as design will never start if one of clocks is not present at startup - typical case being HDMI input, or just about any non-MGT high-speed input for that matter. So it's fine to use it for system clock(s), but it's a definitely "NO" for IO clocks.
  4. Like
    xc6lx45 reacted to hamster in RISC-V RV32I CPU/controller   
    I just had a look at the J1b source, and saw something of interest (well, at least to weird old me):
    4'b1001: _st0 = st1 >> st0[3:0]; .... 4'b1101: _st0 = st1 << st0[3:0]; A 32-bit shifter takes two and a half levels of 4-input, -2 select MUXs per input bit PER DIRECTION (left or right) and the final selection between the two takes another half a LUT, so about 160 LUTs in total (which agrees with the numbers above)
    However, if you optionally reverse the order of bits going in, and then also reverse them going out of the shifter, then the same shifter logic can do both left and right shifts.
    This needs only three and a half levels of LUT6s, and no output MUX is needed. That is somewhere between 96 and 128 LUTs, saving maybe up to 64 LUTs.
    It's a few more lines of quite ugly code, but might save ~10% of logic and may not hit performance (unless the shifter becomes the critical path...).
  5. Like
    xc6lx45 reacted to hamster in RISC-V RV32I CPU/controller   
    I've just posted my holiday project to Github - Rudi-RV32I - https://github.com/hamsternz/Rudi-RV32I
    It is a 32-bit CPU, memory and peripherals for a simple RISC-V microcontroller-sized system for use in an FPGA.
    A very compact implementation and can use under 750 LUTs and as little as two block RAMs -  < 10% of an Artix-7 15T.
    All instructions can run in a single cycle, at around 50MHz to 75MHz. Actual performance currently depends on the complexity of system bus.
    It has full support for the RISC-V RV32I instructions, and has supporting files that allow you to use the RISC-V GNU toolchain (i.e. standard GCC C compiler) to compile programs and run them on your FPGA board. 
    Here is an example of the sort of code I'm running on it - a simple echo test:, that counts characters on the GPIO port that I have connected to the LEDs.
    // These match the address of the peripherals on the system bus. volatile char *serial_tx = (char *)0xE0000000; volatile char *serial_tx_full = (char *)0xE0000004; volatile char *serial_rx = (char *)0xE0000008; volatile char *serial_rx_empty = (char *)0xE000000C; volatile int *gpio_value = (int *)0xE0000010; volatile int *gpio_direction = (int *)0xE0000014; int getchar(void) { // Wait until status is zero while(*serial_rx_empty) { } // Output character return *serial_rx; } int putchar(int c) { // Wait until status is zero while(*serial_tx_full) { } // Output character *serial_tx = c; return c; } int puts(char *s) { int n = 0; while(*s) { putchar(*s); s++; n++; } return n; } int test_program(void) { puts("System restart\r\n"); /* Run a serial port echo */ *gpio_direction = 0xFFFF; while(1) { putchar(getchar()); *gpio_value = *gpio_value + 1; } return 0; } As it doesn't have interrupts it isn't really a general purpose CPU, but somebody might find it useful for command and control of a larger FPGA project (converting button presses or serial data into control signals). It is released under the MIT license, so you can do pretty much whatever you want with it.
    Oh, all resources are inferred, so it is easily ported to different vendor FPGAs (unlike vendor IP controllers)
  6. Like
    xc6lx45 got a reaction from JColvin in hard working FPGA...   
    Happy new year
     
  7. Like
    xc6lx45 got a reaction from Arjun in Why we need SOC (Procesor + FPGA), if we can do our all work with FPGA???   
    Hi,
    learning a new language well is a major investment => constant cost. Picking an inadequate language / technology / platform is a cost multiplier.
    Which one hurts more? For a small project the learning effort dominates so you tend to stick with the tools you've got. Try this in a large project and the words "uphill battle" or "deathmarch" will come to life...
    There's a human component to this question: Say, my local expert has decades of experience with FORTH coding on relay logic - you bet what his recommendation will be, backed by some very quick prototyping within a day or two.
    And if you have ...
    >> someone good in verilog or vhdl,
    ... who is opposed to learning C, you have interesting days ahead...
    Ultimately, implementing non-critical, sequential functionality in FPGA fabric is a dead end for several reasons. Start with cost - a  LUT is much, much more expensive than its functional equivalent in RAM on a processor. Build time is another. The "dead end" may well stretch all the way to success but don't lose sight of it. You will see it clearly when it's right in front of your nose.
    Now this is highly subjective, but my first guess (knowing nothing about the job, assuming it's not small and not geared towards either side by e.g. performance requirements), I'd predict that implementation on Zynq would take me 3..10x less effort than using HDL only. This may be even worse when requirements change mid-project (again, this is highly subjective but you have considerably more freedom in C to keep things "simple and stupid", use floats where it's not critical,  direct access to debug UART, ...).
    On the other hand, Zynq is a very complex platform and someone needs to act as architect - it may well be that the "someone good in verilog" will get it right first time in a HDL-only design but need architectural iterations on Zynq because the first design round was mainly for learning. Take your pick.
    Most likely, Zynq is the best choice if you plan medium-/long term, and the (low-volume!) pricing seems quite attractive compared to Artix.
     
  8. Like
    xc6lx45 got a reaction from Arjun in Why we need SOC (Procesor + FPGA), if we can do our all work with FPGA???   
    ... some numbers. Yes, apples are not oranges , this is about orders-of-magnitude, not at all a scientific analysis and maybe slightly biased.
    Take the Zynq 7010. It has 17600 LUTs. Let's count each as 64 bits => 1.1 MBit for the logic functions of my application (if you like, add 2.1 MBit BRAM => 3.2 MBit).
    Now the ARM processor: While it's probably only a small add-on in terms of silicon area / cost (compare with the equivalent Artix - it's even cheaper - weird world...) it includes
    256 kB on-chip memory
    512 kB on-chip L2 cache
    which is 6.1 MBit
    So we've got already several times the amount of "on-chip floorspace" for the application logic and it'll probably run faster than FPGA logic as it's ASIC technology not reprogrammable logic, typically clocks at 666 MHz (-1) where a non-tuned-/pipelined design on the PL side will probably end up between 100 and 200 MHz.
    Needless to say, offloading application logic to DRAM or FLASH is trivial where a RTL-only implementation hits the end of the road, somewhat stretchable by buying a bigger chip, maybe partial reconfiguration or biting the bullet and adding a softcore CPU which will be so pathetically slow that the ARM will hop circles around it on one leg. Right, I forgot, the above-mentioned 7010 actually has two of them
  9. Like
    xc6lx45 got a reaction from Arjun in Why we need SOC (Procesor + FPGA), if we can do our all work with FPGA???   
    This is really something to consider in the long term. X and A have a strong interest to make us use their respective processor offerings. Nothing ever is for free and we may pay the price later when e.g. some third-party vendor (think China) shows up with more competitive FPGA silicon but I'd need a year for migrating my CPU-centric design.
    For industrial project reality, accepting vendor lock-in may be the smaller evil but if you have the freedom to look ahead strategically (personal competence development is maybe the most obvious reason for doing so, maybe also government funding) there may be wiser options.
    This is at least what keeps me interested in soft-core CPUs even though its absolute KPIs are abysmally bad.
  10. Like
    xc6lx45 got a reaction from JColvin in hard working FPGA...   
    1920x1080, 60 FPS, every pixel is recalculated for each new frame. Standard Julia set with 29 iterations limit.
    100 % DSP utilization on a CMOD A7 (35); 9e9 multiplications per second in 25 bits still running on USB power if getting a little warm.
    Probably more to come later ... stay tuned 🙂
     
     
  11. Like
    xc6lx45 got a reaction from infpgaadv in Advanced topics   
    one recommendation, but check out whether it works for you:
    Keshab K. Parhi "VLSI Digital Signal Processing Systems: Design and Implementation"
    It's a very old (pre-FPGA) book, but it'll be as relevant in 20 years since the theory does not change.
    You can find a condensed version in the lecture slides. Pick what appears interesting:
    https://www.win.tue.nl/~wsinmak/Education/2IN35/Parhi/
    -------
    Reading through the Xilinx documentation might be a good idea. There are those people who read manuals and those who don't. Usually it's easy to tell the difference... I'd skim quickly over parts that don't seem relevant at the moment (which may be 99 %, definitely too much material to read cover-to-cover) and spend time with those parts that seem interesting or immediately relevant.
    ------------
    For a practical example regarding timing, speed optimization, critical path, you can try this simple project: implement a pseudorandom sequence (e.g. 9 or 24 bits) and compare against a same-size number that is an input to the block (not hardcoded, e.g. set by switches). This is a simple AD-converter and you can test with a LED that it works. Then try to run at as high clock speed as you can manage e.g. 300 or 400 MHz. Understand all the warnings, fix those that are relevant (some are not but you should understand why), especially the ones related to inputs ("switches") and outputs ("LED").
  12. Like
    xc6lx45 got a reaction from SmashedTransistors in pipeline granularity   
    >>thus i have a tendency to over-pipeline my design
    read the warnings. If a DSP48 has pipeline registers it cannot utilize, it will complain. Similar for BRAM - it needs to absorb some levels of registers to reach nominal performance. I'd check the timing report.
    At 100 MHz you are are maybe at 25..30 % of the nominal DSP performance of an Artix, but I wouldn't aim much higher without good reason (200 MHz may still be realistic but the task gets much harder).
    A typical number I'd expect  could be four cycles for a multiplication in a loop (e.g. IIR).
    Try to predict resource usage - if FFs are abundant, I'd make that "4" an "8" to leave some margin for register rebalancing: An "optimal" design will become problematic in P&R when utilization goes up (but obviously, FF count is only a small fraction of BRAM bits so I wouldn't overdo it)
  13. Like
    xc6lx45 got a reaction from [email protected] in I bricked my CMOD-A7   
    Thinking aloud: Is it even possible to "brick" an Artix from Flash? On Zynq it is if the FSBL breaks JTAG, and the solution to the problem without boot mode jumpers is to short one of the flash pins to GND via a paper-clip at power-up. But on Artix? Can't remember having seen such a thing. Through EFUSE, yes, but that's a different story.
    If you like, you can try this if it's a 35T (use ADC capture at 700 k, it stresses the JTAG port to capacity). For example, it might give an FTDI error. Or if it works, you know that JTAG is OK.
     
  14. Like
    xc6lx45 got a reaction from Ahmed Alfadhel in Verilog Simulator   
    As a 2nd opinion, I would not recommend Verilator to learn the language. It does work on Windows (MSYS) but I'd ask for a good reason why you need Verilator in the first place instead of a conventional simulator.
    Have a look at iverilog / gtkwave: http://iverilog.icarus.com/
    It works fine from standard Windows (no need to create a virtual machine). You'd call it through the command line though (hint: create a .bat file with the simulation commands to keep them together with the project. Hint, the abovementioned MSYS environment is pretty good for this, e.g. use a makefile or shell script).
  15. Like
    xc6lx45 got a reaction from RFtmi in Using USB 2.0 on Cora Z7 board   
    Reading between the lines (apologies if I'm wrong, this is based solely on four sentences you wrote so prove me wrong): I see someone ("need the full speed ...") who'll have a hard time in embedded / FPGA land. For example, what is supposed to sit on the other end of the cable? A driver, yes, but for what protocol and where does that come from?
    Have you considered Ethernet? It's relatively straightforward for passing generic data and you could use multiple ports for different signals to keep the software simple. UDP is less complex than TCP/IP and will drop excess data (which may be what I want, e.g. when single-stepping code on the other end with a debugger).
  16. Like
    xc6lx45 got a reaction from chaitusvk in Amplitude modulation with DDS generator   
    You could use the multiplication operator "*" in Verilog (similar VHDL).
    For example, scale the "mark" sections (level 10) by 1024, the "space" sections (level 3) by 307. This will increase the bit width from 12 to 22 bits, therefore discard the lowest 10 bits and you are back at 12 bits. Pay attention to "signed" signals at input and output, otherwise the result will be garbled.
  17. Like
    xc6lx45 got a reaction from [email protected] in xadc_zynq   
    I'd recommend you spend a working week "researching" the electrical-engineering aspects.
    The ADC may look just as an afterthought to DSP but it will require significant engineering resources (plan for several / many man-months). Long is the list of bright-eyed students / researchers / engineers / managers who have learned the hard way that there is a bit more to the problem than finding two boards with the same connector...
    Hint, check how much latency you can tolerate and research "digitizer" cards for PC (or PXI platform). If you don't need a closed-loop real-time system, don't design for a closed-loop realtime system.
  18. Like
    xc6lx45 got a reaction from Ahmed Alfadhel in Enevlope Detection using FPGA board   
    yes, for an application with basic requirements, like receiver gain control this will probably work just fine (it's equivalent to an analog envelope detector). Now it needs a fairly high bandwidth margin between the modulation and the carrier, and that may make it problematic in more sophisticated DSP applications (say "polar" signal processing when I try to reconstruct the signal from the envelope) where the tolerable noise level is orders of magnitude lower.
     
     
  19. Like
    xc6lx45 got a reaction from jpeyron in Enevlope Detection using FPGA board   
    Well yes and no. The question I'd ask is, can you use a local oscillator somewhere in your signal path with a 90 degree offset replica. In many cases this is trivially easy ("trivially" because I can e.g. divide digitally from double frequency or somewhat less trivially, use, say, a polyphase filter. In any way, it's probably easier on the LO than on the information signal because it's a single discrete frequency at a time, where the Hilbert transform approach needs to deal with the information signal bandwidth).
    If so, downconvert with sine and cosine ("direct conversion") and the result will be just the same. After lowpass filtering, square, add, take square-root, there's your envelope . When throughput / cost matters (think "Envelope tracking" on cellphones) it is not uncommon to design RTL in square-of-envelope units to avoid the square root operation. Or if accuracy is not that critical, consider a nonlinear bit level approximation see "root of less evil, R. Lyons".
    Of course, Hilbert transform is a viable alternative, just a FIR filter (if complex-valued).
    In case you can't tell the answer right away, I recommend you do the experiment in the design tools what happens if you try to reach 0 Hz (hint, "Time-bandwidth product, Mr. Heisenberg". Eventually it boils down to fractional bandwidth and phase-shifting DC remains an unsolved problem...).
     
     
  20. Like
    xc6lx45 got a reaction from Ahmed Alfadhel in Enevlope Detection using FPGA board   
    Well yes and no. The question I'd ask is, can you use a local oscillator somewhere in your signal path with a 90 degree offset replica. In many cases this is trivially easy ("trivially" because I can e.g. divide digitally from double frequency or somewhat less trivially, use, say, a polyphase filter. In any way, it's probably easier on the LO than on the information signal because it's a single discrete frequency at a time, where the Hilbert transform approach needs to deal with the information signal bandwidth).
    If so, downconvert with sine and cosine ("direct conversion") and the result will be just the same. After lowpass filtering, square, add, take square-root, there's your envelope . When throughput / cost matters (think "Envelope tracking" on cellphones) it is not uncommon to design RTL in square-of-envelope units to avoid the square root operation. Or if accuracy is not that critical, consider a nonlinear bit level approximation see "root of less evil, R. Lyons".
    Of course, Hilbert transform is a viable alternative, just a FIR filter (if complex-valued).
    In case you can't tell the answer right away, I recommend you do the experiment in the design tools what happens if you try to reach 0 Hz (hint, "Time-bandwidth product, Mr. Heisenberg". Eventually it boils down to fractional bandwidth and phase-shifting DC remains an unsolved problem...).
     
     
  21. Like
    xc6lx45 reacted to jpeyron in How to program Cmod 7 board?   
    Hi @macellan,
    The USB JTAG Bridge on the Cmod A7 35T or 15T is usable for both configuring the FPGA through JTAG and UART communication as discussed in the reference manual here in section 2 FPGA Configuration and section 5 USB-UART Bridge. Here are the Cmod A7 projects on Digilent's GitHub. The Cmod-A7-35T-GPIO project uses the USB UART Bridge.  
    Also the the Cmod A7 can be demanding when it comes to cables. It's very common with FPGA boards that some cable works with device x but not device y (there are many low-quality cables on the market and they fail gradually). I'd suggest making sure you have quality USB A to Micro-B Cable.
    best regards,
    Jon
  22. Like
    xc6lx45 got a reaction from PoojaN in Arty A7 takes a lot of time to load after power up   
    Hi,
    check your options for bitstream generation. There is a clock frequency setting that determines the time it takes to move the data from flash to FPGA.
  23. Like
    xc6lx45 got a reaction from Foisal Ahmed in Higher frequency in the Aritx-7 FPGA layout   
    Nothing to worry about if only one is up at a time. It would mean that the frequencies of adjacent oscillators affect each other if they are running at the same time  ("injection pulling", to the point that they agree on a common frequency ("locking").
    Consider the oscillator as an amplifier with a feedback loop. The feedback path plus phase shift lead to a fairly narrow frequency response around the oscillation frequency or harmonically related frequencies). Weird things can happen with the gain - while it is unity in average steady-state operation, the circuit can get highly sensitive to external interference that is (near)-correlated with the oscillator's own signal.
    Wikipedia:
    Perhaps the first to document these effects was Christiaan Huygens, the inventor of the pendulum clock, who was surprised to note that two pendulum clocks which normally would keep slightly different time nonetheless became perfectly synchronized when hung from a common beam
  24. Like
    xc6lx45 got a reaction from pgmaser in Xilinx Tools FPGA and ARM Coding ?   
    Hi,
    just a thought, reading and guessing what you're after (and I may be wrong). I see a risk that you're underestimating the difficulty / workload / learning curve substantially, by orders of magnitude. There is nothing that cannot be done, but so many things that need to be done and stack up. Many people have learned the hard way there's a long distance between features on vendors' slideware and functional features in a design.
    Zynq would seem the logical choice. Soft core doesn't even get close to 2x 666 MHz, 256 kB processor memory, 512 kB cache plus the FPGA.
    You might do the experiment, get some cheap recent board, set yourself the goal of a basic "pipe-cleaning" exercise to bring up the basic features using only the tools (not just build a ready-made example project, stealing line-by-line is OK). If you eventually start to feel this takes too much work for stuff that "should work", this is just the point I'm trying to make 🙂
    If you're getting into the technology, I would strongly suggest to first get a "simple" Artix board e.g. CMOD A7 for the FPGA end of the "pipe-cleaning". A Zynq contains almost exactly the same FPGA but is less accessible because it is "owned" by the PS = ARM side. The small ones need only no-cost licenses (=Vivado "Webpack").
  25. Like
    xc6lx45 got a reaction from Ahmed Alfadhel in IIR compiler   
    IIR filters are more challenging for several reasons (bitwidth / coefficient quantization, internal gain boosting / biquad Q, limit cycles, nonlinear group delay, non-feedforward data flow, ...)
    You will probably find that once you've gone all the way through a fixed point implementation, IIR filters are not as attractive as suggested by the MAC count.
    Of course, they do exist and it may work just fine (depends also on the values of the coefficients you're trying to implement).

    Your filtering problem from the other post had a fairly narrow passband and a huge stopband. This is very expensive use of a FIR filter...

    If you use a more sophisticated (multirate) architecture, you'll be able to get the same or better filtering with maybe 1..5 % of the MAC count.

    One approach is:
    - design an inexpensive band stop that suppresses the alias band of the following decimation step
    - discard every 2nd sample (said alias band folds over your wanted signal but we've made sure there is no significant energy)
    - repeat the procedure as many times as possible
    - design a final filter that provides steep edges and equalizes the sum of all earlier stages
    The point is that the cost of later stages gets much lower because the sampling rate drops (you may actually find most of the MACs get used in the first stage, and the last one is basically for free thanks to the much lower sampling rate).

    Now this isn't trivial, people like me get paid for this... fortunately there aren't (yet) wizards for everything.