• 0

# Clarification for FFT implementation in FPGA

## Question

Hi,

I am trying to compute FFT of a synthesized square wave of frequency 100Hz. The 100Hz signal has to be sampled at 1kHz. So, I kept the clock frequency of FIFO, FFT_IPcore and other blocks at 1ms. I have attached the screenshot of the design file and simulation results.

From the simulation output, it can be observed that the buffer stores for 512 samples and the un-buffer after 512*1ms. But, there is no output from the FFT block.  I would like to know whether my approach is correct or I am committing any mistake in the way the blocks have to be integrated.

Help much appreciated.

Regards,

Subash

## Recommended Posts

• 0

Glad to hear that you found the problem!

Now that you've gotten this far, and asked about it, here are some things I might've done.  I'll let you ponder whether this would offer a "better" solution or not.

1. I'd run the FPGA at a system clock speed of 80-100MHz.
2. I'd either replace what you are doing with pipelined FFT mode, or I'd run using burst I/O mode at the higher clock rate and use a FIFO to avoid data loss.  Something within me just loathes the idea of losing data when processing an FFT--even though it may be irrelevant for most applications.
3. If the SQRT is a problem, I might consider |A|+|B| as an alternative.  It's ad-hoc, doesn't maintain the nice mathematical properties of the SQRT, etc, but it will work for *some* applications.  (Being a math purist, this would not be my first choice)
4. A SQRT can be done with a lookup table, provided a floating point input--I'm not sure how Xilinx is doing their's.  In other words, if you shift your data by 2N bits so that the top two bits are non zero (01, 10, or 11), then the table will work--but to go back to fixed point you'll then need to shift your data back by N bits.

These are just some ideas to consider.  Whether or not they work in your application is up to you to decide.

Dan

##### Share on other sites
• 0

Hi @subasheee,

I have not worked with xilinxs FFT. Here is a forum thread that goes through getting their fft project working correctly.

thank you,

Jon

##### Share on other sites
• 0
On 2/16/2018 at 5:32 AM, jpeyron said:

Hi @subasheee,

I have not worked with xilinxs FFT. Here is a forum thread that goes through getting their fft project working correctly.

thank you,

Jon

Hi @jpeyron

Thanks for your the link. I learned certain details about FFT and able to compute FFT using FFT IPcore V9.0.

Regards,

Subash

##### Share on other sites
• 0

Hi @D@n

I went through the link provide by @jpeyron for FFT which is informative. I was able to perform FFT on the synthesized sine signals using DDS compiler.
I tried FFT for various frequencies of sine signals and their details are as follows.

FFT implementation: Radix-2 burst I/O, Natural order, 16384, 1 channel, Fixed-point, scaled, truncation, Use 3-multiplier structure, Use CLB-logic

Sine signal 1: sampling frequency: 100kHz, Signal frequency: 48.828Hz, FIFO size:16384, Transform length: 16384, Frequency resolution: 100kHz/16384
Calculated BIN index for peak value= 48.828Hz/(100kHz/16384) = 8 , XK_index for peak = 41

Sine signal 2: sampling frequency: 50kHz, Signal frequency: 48.828Hz, FIFO size:16384, Transform length: 16384, Frequency resolution: 50kHz/16384
Calculated BIN index for peak value= 48.828Hz/(50kHz/16384) = 16 , XK_index for peak = 49

DC signal: sampling frequency: 50kHz, Signal frequency: 0.7Hz, FIFO size:16384, Transform length: 16384, Frequency resolution: 50kHz/16384
Calculated BIN index for peak value= 48.828Hz/(50kHz/16384) = 0.23 ~ 0, XK_index for peak = 33

If you observe, even for DC signal (0.7Hz) the FFT bin provided by XK_index is 33 bins, if you subtract this offset for each signal then the Calculated BIN index for peak value matches with the peak value in FFT

Signal 1: XK_index for peak = 41, XK_index for peak (DC) = 33, 41-33 = 8

Signal 2: XK_index for peak = 49, XK_index for peak (DC) = 33, 49-33 = 16

Signal 3: XK_index for peak = 33, XK_index for peak (DC) = 33, 33-33 = 0

So, my questions are
1. why there is a offset in XK_index values? am i missing some configuration setting ?
2. Can i buffer the signals to FIFO at low frequency (10kHz), unbuffer at high frequency 1MHz and carry out FFT ?. In this way i can reduce the FFT computation time without compromising frequency resolution. But, as the FFT block samples the signal at 1MHz, that means FFT block take the sampling frequency as 1MHZ ?

Help is much appreciated.

Regards,
Subash

##### Share on other sites
• 0

Hi @subasheee,

Congratulations!  You are getting farther than many who have written to this blog.

I often counsel folks not to use burst mode, such as you are using, but the pipeline mode instead.  This is due to the added logic and complexity of properly setting up the valid and ready wires on the input, and the difficulty I personally have validating someone's design at a distance (i.e. via this forum).  From what you've written above, it looks like you've solved and gotten past this problem.  I'll hope so.

You never told me the sample rate (or clock rate even) of the FFT.  Are you running it at 1MHz?  or 100MHz?  (Best performance would be at 100MHz or so ...)

But your question is about the FFT offset.  You need to be aware that the first valid output from the FFT will not be the first sample out of the FFT, nor will it be the first sample one FFT length later.  Look at the FFT manual--there's a flag that's used to note the first valid value from the FFT (and the last IIRC).  You'll need to synch to that value, or you'll have these offset problems you noted above.

As for whether or not you can buffer the incoming data ... of course you can!  Will it help?  That might depend upon what you are trying to do.  I don't think you need to do this.  Adjust the incoming valid signal instead, and the FFT should naturally buffer itself.  Perhaps once you fill up the FFT you can give it a whole much of enable signals (I forget what the wire is called, CE perhaps?).  You should be able to clock the Xilinx FFT at 100MHz or more.  Your real driving factor in your computational delay is not the speed of the processing, but the speed of the input samples.

As for resolution, an FFT is really a multi-rate signal processing tool.  If you have samples coming in at 1MHz, but you want frequency resolution between 0-10 kHz, then you'll want to filter and downsample the FFT by a factor of (1MHz/20 kHz=) 50.  Oh, back to resolution, don't forget that windowing can help.

Hmm ... reading my comments above, they sound rather jumbled.  Feel free to write back with more questions if this doesn't make any sense.

Dan

##### Share on other sites
• 0

Hi @D@n

Hi D@n,

Thanks for your inputs, my responses are as follows. Do let me know, if i am not clear or has to provide further details.

1.     I often counsel folks not to use burst mode, such as you are using, but the pipeline mode instead.  This is due to the added logic and complexity of properly setting up the valid and ready wires on the input.

Please see the attached image file of the design file. s_axis_data_tvalid is always set to 1. I am not concerned with the data loss during FFT computation

2.     You never told me the sample rate (or clock rate even) of the FFT.  Are you running it at 1MHz?  or 100MHz?  (Best performance would be at 100MHz or so ...)

The FFT clock is same as the sampling frequency 100kHz or 50kHz. Basically, I did Behavioral simulation in Vivado 2017.4

3.     Look at the FFT manual--there's a flag that's used to note the first valid value from the FFT (and the last IIRC).  You'll need to synch to that value, or you'll have these offset problems you noted above.

You mean to say m_axis_data_tvalid and m_axis_data_tlast?. If so, I am synchronizing with these two flags. Please see the attached figure named “signals”. You can also see the TUSER ie XK_index in the figure names “signals1”. For this Behavioral simulation, the specifications are DDS sine frequency 97.65, DDS clock=100kHz,  FFT clock=100kHz and FFT size=16384

4.     If you have samples coming in at 1MHz, but you want frequency resolution between 0-10 kHz, then you'll want to filter and downsample the FFT by a factor of (1MHz/20 kHz=) 50.

Yes I do agree with your comment, I did down sampling and carried out FFT. DDS compiler generated 48.8Hz sinusoidal signal with 1MS/s, it is down sampled to 10kS/s and given as input to FFT IP core. The clock signal of FFT IP core is 1MHz. I though the clock frequency of FFT fixes the sampling frequency, But I was wrong. Apparently FFT clock is nothing to do with the sampling frequency of the signal. Sampling frequency is decided by sample time of the input signal to the FFT block.

So altogether, the unresolved issue is the offset 33 samples in the XK_index. I am synchronizing with m_axis_data_tvalid and m_axis_data_tlast. SDo I need to sync with other signal ?

Regards,

Subash

##### Share on other sites
• 0

I see you are ignoring the event_* signals.  If you take a look at them, I think you might discover that you are getting out of synch with the FFT by (for example) providing data to it when it isn't ready to receive data.  This would render your last signal out of order.

Perhaps you are misunderstanding how the "Burst I/O" of the FFT works.  It isn't ideally suited for processing incoming sample data.  The pipeline mode is better for that.  The problem with burst I/O is that the FFT core won't accept sample data while it's busy, whereas pipeline mode will always accept sample data.

Dan

##### Share on other sites
• 0
On 2/25/2018 at 9:38 PM, D@n said:

I see you are ignoring the event_* signals.  If you take a look at them, I think you might discover that you are getting out of synch with the FFT by (for example) providing data to it when it isn't ready to receive data.  This would render your last signal out of order.

Perhaps you are misunderstanding how the "Burst I/O" of the FFT works.  It isn't ideally suited for processing incoming sample data.  The pipeline mode is better for that.  The problem with burst I/O is that the FFT core won't accept sample data while it's busy, whereas pipeline mode will always accept sample data.

Dan

Hi @D@n

Thanks for your reply, I understood the difference between pipeline and burst modes.

I figured out the reason for 33 samples delay between m_axis_data_tuser and m_axis_data_tdata. I used multiplier, adder and square root calculator to compute the magnitude of FFT which is sqrt(Re^2+Imag^2). The adder and multiplier took 2 clock cycle and the square root block (CORDIC) took 31 clock cycles. So, the output of m_axis_data_tdata is 33 clock cycles behind the m_axis_data_tuser.
Do you have any better suggestions to calculate the magnitude of FFT which is sqrt(Re^2+Imag^2) without introducing the delay ?
Of course i can overclock the multiplier, adder and square root block to reduce the latency, but other than that is there any better solution ?

Regards,
Subash

##### Share on other sites
• 0
On 2/27/2018 at 9:25 PM, D@n said:

Glad to hear that you found the problem!

Now that you've gotten this far, and asked about it, here are some things I might've done.  I'll let you ponder whether this would offer a "better" solution or not.

1. I'd run the FPGA at a system clock speed of 80-100MHz.
2. I'd either replace what you are doing with pipelined FFT mode, or I'd run using burst I/O mode at the higher clock rate and use a FIFO to avoid data loss.  Something within me just loathes the idea of losing data when processing an FFT--even though it may be irrelevant for most applications.
3. If the SQRT is a problem, I might consider |A|+|B| as an alternative.  It's ad-hoc, doesn't maintain the nice mathematical properties of the SQRT, etc, but it will work for *some* applications.  (Being a math purist, this would not be my first choice)
4. A SQRT can be done with a lookup table, provided a floating point input--I'm not sure how Xilinx is doing their's.  In other words, if you shift your data by 2N bits so that the top two bits are non zero (01, 10, or 11), then the table will work--but to go back to fixed point you'll then need to shift your data back by N bits.

These are just some ideas to consider.  Whether or not they work in your application is up to you to decide.

Dan

Dear @D@n

Sorry for the late reply; thanks for your responses. Your suggestions are helpful. i considered your suggestions 1 and 2 for square root, i simply delayed the m_axis_data_tuser by 33 samples using the same square root block.

Regards,

Subash