I am trying to investigate how sparsity (the number of zeroes in the filter coefficients) affects the implementation of a filter on the FPGA. Specifically, I am interested in the number of BRAM blocks and DSP slices used for a sparse filter versus a non-sparse filter of the same length. I assume the filter coefficients to be symmetric and the number of taps to be odd.
I have been experimenting with the FIR compiler GUI and have observed the following.
For one output per cycle (No overclocking),
The filter coefficients [1 2 3 4 0 1 2 3 4 ] use 9 DSP slices. Shouldn't this be 4 DSP slices (if we use the symmetry of the coefficients)?
The filter coefficients [1 0 0 0 0 0 0 0 1] use 5 DSP slices. We have just the two multipliers here? Why do we need 5 DSP slices?
For both of the filter coefficients, no BRAM units are used.
When I increase the clock frequency, the number of DSP slices used goes down (although, I do not see a clean division here i.e. increasing the clock frequency by a factor of 2 does not reduce the DSP slices by half), but the number of BRAM units used increases. For example, the filter coefficients [1 2 3 4 0 1 2 3 4] use 6 DSP slices, (down from 9), and 5 BRAM blocks (previously 0 in the not overclocked case) when overclocked by a factor of 2. Similarly, overclocking by a factor of 3 reduces the number of DSP slices to 4, and the BRAM count goes to 3. Is there some relation between the number of DSP slices and the BRAM count?
Note: I have also posted this question to Xilinx Forums, but have not received any feedback.
Question
aabbas02
I am trying to investigate how sparsity (the number of zeroes in the filter coefficients) affects the implementation of a filter on the FPGA. Specifically, I am interested in the number of BRAM blocks and DSP slices used for a sparse filter versus a non-sparse filter of the same length. I assume the filter coefficients to be symmetric and the number of taps to be odd.
I have been experimenting with the FIR compiler GUI and have observed the following.
For one output per cycle (No overclocking),
The filter coefficients [1 2 3 4 0 1 2 3 4 ] use 9 DSP slices. Shouldn't this be 4 DSP slices (if we use the symmetry of the coefficients)?
The filter coefficients [1 0 0 0 0 0 0 0 1] use 5 DSP slices. We have just the two multipliers here? Why do we need 5 DSP slices?
For both of the filter coefficients, no BRAM units are used.
When I increase the clock frequency, the number of DSP slices used goes down (although, I do not see a clean division here i.e. increasing the clock frequency by a factor of 2 does not reduce the DSP slices by half), but the number of BRAM units used increases. For example, the filter coefficients [1 2 3 4 0 1 2 3 4] use 6 DSP slices, (down from 9), and 5 BRAM blocks (previously 0 in the not overclocked case) when overclocked by a factor of 2. Similarly, overclocking by a factor of 3 reduces the number of DSP slices to 4, and the BRAM count goes to 3. Is there some relation between the number of DSP slices and the BRAM count?
Note: I have also posted this question to Xilinx Forums, but have not received any feedback.
Link to comment
Share on other sites
13 answers to this question
Recommended Posts
Archived
This topic is now archived and is closed to further replies.