Jump to content
  • 0

latency and throughput of fft processors


farhanazneen

Question

As we all know that pipelined processors has more latency and less execution time compared to non pipelined processors then why the latency of pipelined  fft processors is less compared to radix-2 fft and radix-4 fft as per the xilinx fft ip

latency of pipelined  fft processor according to xilinx fft ip: 8341 cycles

latency of radix-2 fft as per xilinx fft ip: 33009 cycles

latency of radix-4 fft as per the xilinx fft ip: 14483 cycles


below has the attachments of latency of different processors according to xilinx ip

https://www.xilinx.com/support/documentation/ip_documentation/xfft/v9_0/pg109-xfft.pdf 

page 41 to 43 has different block diagrams of fft processors.Do let me know please about latencies why pipelined fft processor has less latency and how to find throughput of it.

pipelined latency.PNG

radix 2 latency.PNG

radix4 latency.PNG

Link to comment
Share on other sites

3 answers to this question

Recommended Posts

@D@n sir  pipelined technique is implemented in order to get the high throughput but at same time the latency also increases (  increases the execution time of each instruction due to overhead in the pipeline control) this is what the issue comes in pipelining technique that is pipeline latency. So here in pipelined FFT(xilinx fft ip) why there is less latency as pipelined techique always has high latencies compared to non pipelined techinque.

If this is the case that pipelined fft processor has less latency how could i justify it? because i always have had learnt that pipeline has high latency due to overheads.Do let me know sir as i was asked this question many times .

Link to comment
Share on other sites

@farhanazneen,

I've only created pipelined FFT implementations so far.  I have yet to create a block FFT, and so to understand how a block FFT might be faster/better/cheaper.

That said, the pipeline FFT uses information internally as soon as it is calculated and therefore become avaialble.  It must therefore have the lowest latency.

I'm not yet certain how this block FFT is built, but my guess is that they have one processing stage (as opposed to log_2(N)) that gets applied to the data, adjusted for the coefficients of the next stage and processed again, etc.  This reuse will create a *LOT* of latency, and much more latency than pipelined--as you are noticing.

Dan

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...