ADC to FFT using Lattice FPGA

Michael Astahov · March 8, 2020

Hello, this is my first post in this forum.

Im working on a project which I should sample data from ADC (ADS5463), and then fft the sampled data and see the results.
The sampling clock is 400MHz and my FPGA working with DRY clock coming from the ADC which is 200MHz (fs/2).
Im sampling the data with DDR interface using Lattice IP (GDDRX1_RX.SCLK.Aligned Interface), which sampling 12 bit DDR data into a bus of 24 bit (there the 11:0 bits is positive edge data and 23:12 is the negative edge data).
Next Im storing this data into 2 FIFOs, one for the positive edge data and another for the negative edge data.
My next step which Im currently working on is to insert this data into the FFT IP module which Lattice provides. (https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=2&ved=2ahUKEwiBl_HfzovoAhVKY5oKHfNPBt0QFjABegQIAhAB&url=http%3A%2F%2Fwww.latticesemi.com%2Fview_document%3Fdocument_id%3D28236&usg=AOvVaw3HSzLdNneCLsy5wEoUnUOx)
I attached timing digrams (timings.pdf).
The FFT IP Im creating is 12bit width input/output so I need to time the input flags in a way that it take first data from the positive edge FIFO and the next data from the negative edge FIFO and processing so on in a stream. Of course Im paying attention to all the flags as the IP telling.

I want to ask some guidelines questions about how to do it correctly.
1. Do I need a state machine which indicates when the FIFO is full and only then to read the data into the FFT input? Or I can start writing to the FFT without state machine and just counter register which indicate when is read enable asserted and start reading to the FFT?

2. Do I need to fill the FIFO and then read the data until its empty, or I can write to the FIFO and read from the FIFO to the FFT continuously?

3. Any guideline how to make this task correctly? I never did this before.. From my prepective I would just wait for ready flag from the FFT IP and read_enable from the FIFO and start to provide data to the FFT IP but I told the there is more timing managment to be made.

thanks.

timings.pdf

D@n · March 9, 2020

@Michael Astahov,

Welcome to Digilent's forums! These forums are primarily centered around Xilinx boards, so ... asking about lattice boards has you a ways from home here.

May I ask why you are separating your stream into two FIFOs? That doesn't quite sound right.

You should be able to write to the FFT any time you have data available. If your sample clock is 400MHz, I would imagine that would be on every clock cycle. It will more often/likely be that you have to drop data. Have you thought about how you might do that?

Also, you didn't mention if your FIFOs were synchronous or asynchronous. That's kind of important. Is the whole system running on a 400MHz clock? That seems kind of high for a Lattice chip.

Also, if you've never done this before, then your first priority should be to set up a simulation where you can read inputs from the outside world, give them to the FFT, and then check the results. Without such a simulation, you will find it very difficult to pinpoint problems later. As an example, here's a design of a scrolling raster for a Nexys Video board which includes a simulation of the incoming data, the FFT, and the outgoing video. It can be kind of painful to run, but I doubt I would've gotten everything working on the actual hardware without it.

Good luck!

Dan

Michael Astahov · March 9, 2020

48 minutes ago, D@n said:

@Michael Astahov,

Welcome to Digilent's forums! These forums are primarily centered around Xilinx boards, so ... asking about lattice boards has you a ways from home here.

May I ask why you are separating your stream into two FIFOs? That doesn't quite sound right.

You should be able to write to the FFT any time you have data available. If your sample clock is 400MHz, I would imagine that would be on every clock cycle. It will more often/likely be that you have to drop data. Have you thought about how you might do that?

Also, you didn't mention if your FIFOs were synchronous or asynchronous. That's kind of important. Is the whole system running on a 400MHz clock? That seems kind of high for a Lattice chip.

Also, if you've never done this before, then your first priority should be to set up a simulation where you can read inputs from the outside world, give them to the FFT, and then check the results. Without such a simulation, you will find it very difficult to pinpoint problems later. As an example, here's a design of a scrolling raster for a Nexys Video board which includes a simulation of the incoming data, the FFT, and the outgoing video. It can be kind of painful to run, but I doubt I would've gotten everything working on the actual hardware without it.

Good luck!

Dan

The sampling clock is 400MHz (the clock going to the ADC) but the clock coming to the FPGA is the DRY signal from the ADC (200MHz).
The ADC working in DDR mode, so I saving the negative edge and positive edge data in diffrent FIFOs, the FIFOs is sync using one clock (200MHz).
Im writing to the FIFOs until they fill up (full signal assert) and the FFT reads the data from them (1st clock cycle -> positive edge fifo, 2nd clock cycle-> negative edge fifo.. and so on).

I think I have to use 2 FIFOs because the DDR interface latching the 12bit ADC data to 24bit bus, so I need to seperate the data before Im using it.

xc6lx45 · March 9, 2020

Hi,

>> Any guideline how to make this task correctly? I never did this before..

are you able to do the FFT offline on a PC? If so, write data to memory, build a simple state machine that prints hex-formatted numbers to a UART port, and transfer them to a PC for analysis. It'll save you 90 % of the work. And, most real-world algorithms are more complex than just a FFT, but you don't want to experiment with algorithms in fixed-point RTL.

And, do you really need a full-blown FFT or can you correlate out one frequency at a time with a sine-/cosine NCO, a complex-valued multiplier and a complex-valued accumulator (a real-world implementation would use https://en.wikipedia.org/wiki/Goertzel_algorithm)

D@n · March 9, 2020

2 hours ago, Michael Astahov said:

The sampling clock is 400MHz (the clock going to the ADC) but the clock coming to the FPGA is the DRY signal from the ADC (200MHz).
The ADC working in DDR mode, so I saving the negative edge and positive edge data in diffrent FIFOs, the FIFOs is sync using one clock (200MHz).
Im writing to the FIFOs until they fill up (full signal assert) and the FFT reads the data from them (1st clock cycle -> positive edge fifo, 2nd clock cycle-> negative edge fifo.. and so on).

I think I have to use 2 FIFOs because the DDR interface latching the 12bit ADC data to 24bit bus, so I need to seperate the data before Im using it.

@Michael Astahov,

This sounds like a recipe for failure. Try going back to the Lattice manual and figuring out how to push both data values to the same FIFO on the same clock cycle. It's in there. If you don't, you'll be struggling forever to figure out how to keep the two FIFO's synchronized with each other and never certain that you have the right answer. Further, using FIFOs to cross clock domains such as from negative to positive edges while possible, is a bit of a trick to get right.

Dan

Michael Astahov · March 10, 2020

16 hours ago, D@n said:

@Michael Astahov,

This sounds like a recipe for failure. Try going back to the Lattice manual and figuring out how to push both data values to the same FIFO on the same clock cycle. It's in there. If you don't, you'll be struggling forever to figure out how to keep the two FIFO's synchronized with each other and never certain that you have the right answer. Further, using FIFOs to cross clock domains such as from negative to positive edges while possible, is a bit of a trick to get right.

Dan

so you saying there is a way to fill 2 cells of the FIFO in 1 clock cycle? for example, ADC data coming in buses of 24bit, in rising edge clock Im latching the data to the FIFO so 1st cell of the fifo is adc_data[11:0] and 2nd cell of the fifo is adc_data[23:12] ?

I didnt find such technique in the lattice documantations but if you say there is a way for sure ill further explore.

xc6lx45 · March 10, 2020

I don't know which device family you are using, but for example for MachXO3 locate this document

http://www.latticesemi.com/-/media/LatticeSemi/Documents/ApplicationNotes/IK/ImplementingHighSpeedInterfaceswithMachXO3-Devices.ashx?document_id=50122

The high-speed generic DDR (GDDR) interfaces are supported through the built-in gearing logic in the Programmable I/O (PIO) cells. This gearing is necessary to support high-speed I/O while reducing the performance requirement on the FPGA fabric.

The idea is to instantiate a PIO cell that captures data on both clock edges for you. Do not think in terms of rising and falling clock edge on an FPGA for general logic: Verilog or VHDL offers simply more flexibility than FPGA technology. There is also a lot of ASIC- or simulation-driven (or just clueless) material on the web to cause confusion.

Unless you're into specialist applications e.g. low power, stick with one clock edge for the whole design (reality check: it might seem a clever idea to use both edges, especially given that Lattice devices often have invertible clock inputs. But, the hardware is never perfectly symmetrical so it will always sacrifice some timing margin over the equivalent single-edge design with double-frequency clock via duty cycle uncertainty. It gains nothing also because standard logic cells (exception: PIO above) with edge sensitive input will only trigger on one edge at a time - this goes down to the transistor level circuitry)

Then, use a 24 bit FIFO. You will not be able to deinterleave the data in real time - that would mean running at 400 MHz. Do not underestimate the difficulty of reaching even 200 MHz - I suggest you do some early synthesis trials to avoid hitting the wall with a design that works in simulation but can never close timing on hardware. I don't know all the Lattice families by heart but "Lattice", 200 MHz and what I read between the lines in your first post rings an alarm bell for me e.g. it comes as a surprise how many logic levels you really get when you've worked with 6-/7 input LUTs before and suddenly are down to four.

D@n · March 10, 2020

@Michael Astahov,

No, I'm saying that you can get an I/O driver which will produce both results on the same clock edge, and a FIFO that writes data twice as wide for which you can read out the data every clock period.

The problem is that logic transitioned on the positive edge of a clock is in a separate clock domain from logic transitioned on the negative edges of the same clock clock, and crossing from one clock domain to another tends to be a pain that needs to be carefully managed. It's not something I would recommend for a beginner. Instead, for a beginner, I'd recommend you build your design entirely in the same clock domain (if possible). Worse, clock domain crossings (CDCs) are known for their potential for synthesis-simulation mismatch and bugs that are very difficult to track down. My recommendation is simply that you do all your work on the same clock edge.

There's a common beginner misperception that you can work on both edges of the clock to get twice the work done. This isn't the case. Most tools can't handle the timing properly between both edges of the same clock to even know if your design will work. It's usually better to run at twice the clock speed. In your case, thankfully, you can just run your algorithm twice as wide and you should be just as well off.

Dan

Michael Astahov · March 10, 2020

15 minutes ago, xc6lx45 said:

I don't know which device family you are using, but for example for MachXO3 locate this document

http://www.latticesemi.com/-/media/LatticeSemi/Documents/ApplicationNotes/IK/ImplementingHighSpeedInterfaceswithMachXO3-Devices.ashx?document_id=50122

The high-speed generic DDR (GDDR) interfaces are supported through the built-in gearing logic in the Programmable I/O (PIO) cells. This gearing is necessary to support high-speed I/O while reducing the performance requirement on the FPGA fabric.

The idea is to instantiate a PIO cell that captures data on both clock edges for you. Do not think in terms of rising and falling clock edge on an FPGA for general logic: Verilog or VHDL offers simply more flexibility than FPGA technology. There is also a lot of ASIC- or simulation-driven (or just clueless) material on the web to cause confusion.

Unless you're into specialist applications e.g. low power, stick with one clock edge for the whole design (reality check: it might seem a clever idea to use both edges, especially given that Lattice devices often have invertible clock inputs. But, the hardware is never perfectly symmetrical so it will always sacrifice some timing margin over the equivalent single-edge design with double-frequency clock via duty cycle uncertainty. It gains nothing also because standard logic cells (exception: PIO above) with edge sensitive input will only trigger on one edge at a time - this goes down to the transistor level circuitry)

Then, use a 24 bit FIFO. You will not be able to deinterleave the data in real time - that would mean running at 400 MHz. Do not underestimate the difficulty of reaching even 200 MHz - I suggest you do some early synthesis trials to avoid hitting the wall with a design that works in simulation but can never close timing on hardware. I don't know all the Lattice families by heart but "Lattice", 200 MHz and what I read between the lines in your first post rings an alarm bell for me e.g. it comes as a surprise how many logic levels you really get when you've worked with 6-/7 input LUTs before and suddenly are down to four.

14 minutes ago, D@n said:

@Michael Astahov,

No, I'm saying that you can get an I/O driver which will produce both results on the same clock edge, and a FIFO that writes data twice as wide for which you can read out the data every clock period.

The problem is that logic transitioned on the positive edge of a clock is in a separate clock domain from logic transitioned on the negative edges of the same clock clock, and crossing from one clock domain to another tends to be a pain that needs to be carefully managed. It's not something I would recommend for a beginner. Instead, for a beginner, I'd recommend you build your design entirely in the same clock domain (if possible). Worse, clock domain crossings (CDCs) are known for their potential for synthesis-simulation mismatch and bugs that are very difficult to track down. My recommendation is simply that you do all your work on the same clock edge.

There's a common beginner misperception that you can work on both edges of the clock to get twice the work done. This isn't the case. Most tools can't handle the timing properly between both edges of the same clock to even know if your design will work. It's usually better to run at twice the clock speed. In your case, thankfully, you can just run your algorithm twice as wide and you should be just as well off.

Dan

I appreciate the help.

Im using Lattice ECP3, Ill try to explain myself again, my english is not perfect.

I do not working on both clock cycles, but my ADC is providing me data on both clock cycles, I attached figure below from my ADC datasheet (ADS5463),
Im already using I/O High Speed Interface (GDDRX1_RX.SCLK.Aligned) that latching the 12bit data from the ADC and outputs every clock cycle 24bit bus, I attached figure for this interface architecture too.
the clock Im using is external and not important for now.. lets assume my sampling frequency is 100MHz so the FPGA gets 50MHz clock.

I already did synthesis tests on my data until now (using Reveal Analyzer that Lattice have in their Diamond program).. everything working fine until now, I do getting all the data from the ADC to the FIFO ann from the FIFO output.

The point of my question, is how I should use this 24bit bus to continue use the ADC data in the FPGA (for example provide the data to the FFT).. because the 24bit bus as is not helping me, I need 12 bit data because only the 12bit data is meaningful point on the sine wave.
so for example if I creating 24bit width FIFO, what can I do next with this 24bit data? I need only the 12bit data, How can I use all the data without losing half of the data.

D@n · March 10, 2020

@Michael Astahov,

At this point, you really need to find a Lattice/ECP5 forum to get advice. The ECP5 has hardware specific FIFOs attached to their I/Os, and you aren't likely to find ECP5 engineers here (you might) who would be familiar with how those FIFOs operate.

You should be aware that there is also an open source tool-chain for the ECP5. You might also try some of the open source forums as well.

Dan

xc6lx45 · March 11, 2020

>> what can I do next with this 24bit data? I need only the 12bit data, How can I use all the data without losing half of the data.

you could write a state machine that alternates between low and high 12 bit word. But that would raise the 12-bit output rate to two times the 24-bit input rate, which is probably not feasible.

A typical design would continue processing two words in parallel e.g. a DSP algorithm designed to consume two input words per clock cycle. With typical ADC applications, you'd use even more parallel words e.g. 5 (say, a FIR filter that loads 5 new values into the delay chain and shifts 5 steps ahead per cycle. If it doesn't decimate, this is followed by five independent coefficient sets operating on the shared delay line, giving 5 output samples ("phases") of the signal. This only as a simple example - it really depends on what you intend to do with the data.

Michael Astahov · March 11, 2020

2 hours ago, xc6lx45 said:

>> what can I do next with this 24bit data? I need only the 12bit data, How can I use all the data without losing half of the data.

you could write a state machine that alternates between low and high 12 bit word. But that would raise the 12-bit output rate to two times the 24-bit input rate, which is probably not feasible.

A typical design would continue processing two words in parallel e.g. a DSP algorithm designed to consume two input words per clock cycle. With typical ADC applications, you'd use even more parallel words e.g. 5 (say, a FIR filter that loads 5 new values into the delay chain and shifts 5 steps ahead per cycle. If it doesn't decimate, this is followed by five independent coefficient sets operating on the shared delay line, giving 5 output samples ("phases") of the signal. This only as a simple example - it really depends on what you intend to do with the data.

Im using Lattice FFT Compiler I didnt see in the datasheet a way to insert 2 samples per clock cycles.

nevermind I have results using 2 FIFOs and mux to switch between the FIFOs data to the FFT. I thought maybe there is a better way.
thanks for the help

FPGA-IPUG-02045-2-1-FFT-Compiler-IP-Core.pdf

xc6lx45 · March 11, 2020

are you sure that your whole architecture makes sense? FFT size / memory is limited, at such a high rate you can transform only a very short burst. For example, to isolate power line hum (which is very likely something you'll find in the data) you need at least 20 ms...

ADC to FFT using Lattice FPGA

Question

Link to comment

Share on other sites

12 answers to this question

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Archived