• Content Count

  • Joined

  • Last visited

  • Days Won


Posts posted by hamster

  1. On this topic I've been making an audio DSP board using the CMOD A7, where additional noise is a real pain.

    My initial prototype board had some audio noise problems - I couldn't hear it but I could measure it. I initally thought was due to the CMOD-A7 and could not be fixed, but eventually put down to quite a few different causes:

    - I had nearly shorted the output of one of the DAC to GND, which as causing spikes on the power rail. Once fixed things were a lot better, but not perfect/

    - I had not made any real attempt to stitch the top fill to the ground plain on the bottom - after all it was a hack.

    - I didn't have any series resistors in the I2S lines. I added 50 ohm ones (just picked a random value out of the air - might look at this again)

    - I had a few capacitor bodges standing up in the air, which could only make things worse

    - I was measuring very close to the FPGA, with a high impedance scope probe

    So I addressed all of these in the next prototype, and made up a test jig allowing me to measure 30cm from the board and things are much better - to the point I can't reliably measure any additional noise in the audio band.

    I guess what I am trying to say is that even with just one GND pin the CMOD-A7 can be part of a low noise audio system, but you have to put some extra thinking and work in to make it happen. 

    This may or may not be of use to your use-case.

  2. The last of the parts came in and the new board is up and running.

    Here's the old and new boards side by side, and spectrum of a 10kHz test tone going from the ADC, through the FPGA and then DAC (top = new board, middle = old board, bottom = no board in the loop.

    The additional work I did on grounding on the PCB has paid off, with a very good noise floor - better than I can measure with the tools I have to hand.




  3. 3 hours ago, xc6lx45 said:

    For comparison, I got the following LUT counts for James Bowman's J1B (16 bit instruction, 32 bit ALU) CPU which I know quite well:

    * 673 LUTs = 3.3% utilization of A7-35 with 32 stack levels in distributed RAM (replacing the original shift register based stack which does not look efficient on Xilinx 7 series)
    * 526 LUTs if reducing the +/- 32 bit barrel shifter to +/- 1 bit, but the performance penalty is severe (e.g. IMM values need to be constructed from shifts).
    * 453 LUTs if further allowing one BRAM18 for each of the two stacks. This includes a UART and runs at slightly more than 100 MHz but memory/IO need two instructions / two cycles.

    So the RISC "overhead" does not seem that dramatic. It's slightly bigger, somewhat slower but has baseline opcodes (e.g. arithmetic shift and subtract, if I read it correctly) that J1B needs to emulate in SW).

    It would be interesting to know where the memory footprint goes when I use (soft) floats. I've done the experiment in the recent past with microblaze MCS, and did not like what I saw. On J1B I need about 320 bytes for (non IEEE 754) float + -  * / painfully slow without any hardware support but it keeps the boat afloat, so to speak.

    Using C instead of bare metal assembly would be tempting.... I just wonder how much effort it takes to install the toolchain.


    I just had a look at the J1b source, and saw something of interest (well, at least to weird old me):

            4'b1001: _st0 = st1 >> st0[3:0];
            4'b1101: _st0 = st1 << st0[3:0];

    A 32-bit shifter takes two and a half levels of 4-input, -2 select MUXs per input bit PER DIRECTION (left or right) and the final selection between the two takes another half a LUT, so about 160 LUTs in total (which agrees with the numbers above)

    However, if you optionally reverse the order of bits going in, and then also reverse them going out of the shifter, then the same shifter logic can do both left and right shifts.

    This needs only three and a half levels of LUT6s, and no output MUX is needed. That is somewhere between 96 and 128 LUTs, saving maybe up to 64 LUTs.

    It's a few more lines of quite ugly code, but might save ~10% of logic and may not hit performance (unless the shifter becomes the critical path...).

  4. The toolchain is pretty simple to build but takes a while - for me it was just clone https://github.com/riscv/riscv-gnu-toolchain, make /opt/riscv (and change ownership), then run './configure' with the correct options, then 'make'.  There are a whole lot of different Instruction set options and ABIs, so I definitely recommend building from source rather than downloading prebuild images.

    At the moment I haven't included any of the stdlib or soft floating point. I'll add that to the "todo someday" list.

  5. I've just posted my holiday project to Github - Rudi-RV32I - https://github.com/hamsternz/Rudi-RV32I

    It is a 32-bit CPU, memory and peripherals for a simple RISC-V microcontroller-sized system for use in an FPGA.

    A very compact implementation and can use under 750 LUTs and as little as two block RAMs -  < 10% of an Artix-7 15T.

    All instructions can run in a single cycle, at around 50MHz to 75MHz. Actual performance currently depends on the complexity of system bus.

    It has full support for the RISC-V RV32I instructions, and has supporting files that allow you to use the RISC-V GNU toolchain (i.e. standard GCC C compiler) to compile programs and run them on your FPGA board. 

    Here is an example of the sort of code I'm running on it - a simple echo test:, that counts characters on the GPIO port that I have connected to the LEDs.

    // These match the address of the peripherals on the system bus.
    volatile char *serial_tx        = (char *)0xE0000000;
    volatile char *serial_tx_full   = (char *)0xE0000004;
    volatile char *serial_rx        = (char *)0xE0000008;
    volatile char *serial_rx_empty  = (char *)0xE000000C;
    volatile int  *gpio_value       = (int  *)0xE0000010;
    volatile int  *gpio_direction   = (int  *)0xE0000014;
    int getchar(void) {
      // Wait until status is zero 
      while(*serial_rx_empty) {
      // Output character
      return *serial_rx;
    int putchar(int c) {
      // Wait until status is zero 
      while(*serial_tx_full) {
      // Output character
      *serial_tx = c;
      return c;
    int puts(char *s) {
        int n = 0;
        while(*s) {
        return n;
    int test_program(void) {
      puts("System restart\r\n");  
      /* Run a serial port echo */
      *gpio_direction = 0xFFFF;
      while(1) {
        *gpio_value = *gpio_value + 1;
      return 0;

    As it doesn't have interrupts it isn't really a general purpose CPU, but somebody might find it useful for command and control of a larger FPGA project (converting button presses or serial data into control signals). It is released under the MIT license, so you can do pretty much whatever you want with it.

    Oh, all resources are inferred, so it is easily ported to different vendor FPGAs (unlike vendor IP controllers)

  6. WAV files are the simplest to work with.

    1. The WAV file have s small header on it, then they are all raw sample data, usually stereo pairs of 16-bit signed numbers. Just write a small program in your favorite scripting language to print out data after about 64 bytes.

    2. For phone-quality audio, you need bandwidth of 300Hz to 3kHz. -  this needs around 8000 samples per second, and about 8-bit sample depth . You could use some u-law or a-law compression to increase dynamic range (https://en.wikipedia.org/wiki/Μ-law_algorithm)

    3. - 8 kilobyes per second, if you play raw 8-bit samples.

    Oh, and to convert data from a WAV file to lower sample rates (e.g. from 48kS/s to 8kS/s) you can't just drop 5 out of six samples - you need to first filter off the frequencies greater than half the target sample rate. It's not that challenging to actually do in code (usually just a couple of 'for' loops around something like "out[x] += in[x+i] * filter[j]') but generating the magic values for the filter can be interesting.


  7. The "DC and Switching characteristics" tells you the delays in the primatives, but can't tell you the routing delays. The only way to truly know it to build the design in Vivado, and then look at the timing report. 

    Inference of DSP blocks and features is pretty good as long as your design is structured to map onto the DSP slices. There are little gotchas like not attempting to reset registers in the DSP slice that don't support it.

    Skim reading the DSP48 User Guide will pay off many times over in time saved from not having to redesign stuff over and over to help it map to the hardware. 

  8. My views - if you want to learn low-level stuff (eg. VHDL/Verilog coding), buy a board with lots of buttons, LEDs, switches and different I/O over a more application specific development board. I think think that the Basys3 is pretty good for this and better than the Arty. Once you have sharpened your skills, then look for a board that will support your projects.

    If you want to initially work at a systems level, using IP blocks and so on, then look for a board that has interfaces that supports your area of interest. Debugging H/W when you are also debugging FPGA designs is no fun. A Zynq based board (e.g. Zybo) would be good, as it already a CPU, that is much better (faster, less power, better features) than a CPU you could implement in the FPGA fabric. Just be warned that with a Zynq system the SDRAM memory is usually on the far side of the processor system, so you don't get direct access to it - you need to access it over an AXI interface and compete with the CPU for bandwidth.

  9. 2 hours ago, skylape said:

    Would something like this work https://www.xilinx.com/support/documentation/ip_documentation/div_gen/v5_1/pg151-div-gen.pdf ? It is a IP wizard from xillinx. 

    It may well do, but not knowing *all* the details of what you are doing means I can't offer you useful advice. 

  10. 54 minutes ago, Andrew Touma said:

    Thank you very much for the assistance! I was able to make the necessary corrections and my project is running smoothly now. I also found some other errors in the logic of my code, which I have corrected as well. 

    Yay! Glad to have helped. 

  11. If you are dividing by a  constant to can multiply by the inverse. If you only have a small number of different divisors you could consider a lookup table of inverses. 

    Otherwise you need to implement a binary division algorithm yourself, to meet your throughput and latency needs.

    Division by arbitrary numbers is quite expensive - best avoided if at all possible.

  12. Oh, found the source of some of the noise, and it wasn't the FPGA (phew). I had inadvertently bodged the output RF filtering capacitor to the wrong side of the output resistor causing it to pull lots of current when switching. A bit more bodging to the correct side and it is now a lot better.  

    Now if I short out the inductors on the DAC's power supply I now see the FPGA noise, and when I remove the short the noise goes away.


  13. So a long time between updates.

    Board has arrived, has been built and tested - I left off some RF filtering caps on the output.. In 16-bit audio pass-through went well.

    The CMOD-A7 is quite noisy - there is quite a lot of high frequency spikes and ringing from the FPGA. Event when I split the power supply so the CMOD is powered by a linear bench supply. Turns out a lot of the noise was through the air, not via the PCB or power trace, and due to the high impedance scope.probe. I could probe straight on the ground clip and still see the noise. 

    However, on listening tests it is fine - even at high volumes there is minimal hiss. Move to using 24-bit I2S mode at 48kHz, and it too works well. 

    Last night I wrote written a simple SDRAM memory controller for the CMOD-A7, and it is now storing samples into RAM and playing it back. I might play with some filters over the next few days.

    I am hHappy enough that I am doing a second version of the board, that has a nicer layout, better grounding and uses electrolytic caps (because that is what audio nerds are supposed to use!)

    It is quite a fun project to tinker with. All I need is more time alone at home to play the HiFi really loud!



  14. 2 hours ago, greengun said:

    @hamster why would neither work? The XCA75T has 8 6.6 Gb/s transceivers. Shouldn’t that be enough for 4k60 in and out? (4 transceivers each, one for each tmds pair).


    There is also this: https://www.xilinx.com/products/intellectual-property/hdmi.html shouldn’t that work on the Arty A7?

    The transceivers are on dedicated pins, and are seldom connected to the HDMI sockets unless the board was expressly designed for video.

    As mentioned earlier, this isn't a restriction of the FPGA - the FPGA can handle this data rate. It is a restriction imposed by choices made by the development board's designers.

  15. 5 hours ago, zygot said:

    Have you actually driven or received 40 Gbps data on the Genesys2? That would be interesting.

    I've done a total of 10.3125 Gbps, or maybe 10.8 GB/s over the four lanes... can't remember if it is 2.7Gb/s or 2.65 Gb/s per lane. It was a while ago...

    That is just enough for 4k60.

  16. None of these options will deliver 2160p.

    To do 2160p (and even 1080p) you need hardware specifically engineered for that purpose, not a generic multipurpose development board. 

    The only Digilent board that I am pretty sure could support a 2160p 60hz display is the Gensys2, using the DisplayPort interface. However event that might be restricted to 422 YCC formats (not 444 RGB) 0

  17. 57 minutes ago, zygot said:

    Hopefully. when Digilent spins the next CMOD they will provide a better arrangement for using the configuration circuitry with an external power supply. Even as a standalone module configured from flash 1 Vcc pin and one GND pin is less than ideal, and likely problematic if the FPGA is driving or receiving a number of single-ended signals.  

    Anything that uses 0.1" headers is more of a "use on a breadboard for learning/experiments" than a serious module for integration... 

  18. What do you mean by 'handle'?

    The FPGA fabric could process the video stream, but unless the FPGA transceivers are connected to the HDMI sockets it won't be able to. On the Z7 the HDMI connectors are on the standard I/O pins. 

    Also, HDMI 2.0 is required for that video rate, and the spec isn't openly published. 



  19. Wahoo! Success!

    Thanks for your help @zygot, @D@n & @xc6lx45, and faith that the CMOD wasn't a completely bricked.

    Last night while I was on the way to bed walked past my bench and had one last try.

    Plugged the CMOD A7 into a powered USB, and then booted up my Linux PC. Once booted and signed in I cleared the kernel message buffer with a "sudo dmesg -c".

    I then plugged the hub into the PC, and both FTDI channels channels came up and stayed up.

    As a test, I then removed and replugged the CMOD to the hub, and had the usual fault happen - both came up, but the JTAG interface disappears.

    I repeated the process from cold, and got the same results.


    With at least my invalid image in the flash, the FTDI JTAG interface or driver does not initialize correctly if plugged into a PC directly.

    By using a powered hub,. and giving the CMOD enough time, BEFORE you connect the hub to your PC, you can restore JTAG functionality.

    ... phew, I am glad I wasn't in a hurry to break out the soldering iron and hot air..

    @JColvin Are any of the back-room team aware of this sort of thing happening and how to work around it? Maybe they could investigate what is going on, publish a process on how to recover and it might save a few warranty claims / product failures. I am sure that there will be people other than me who have experienced this issue.