I am trying to investigate how sparsity (the number of zeroes in the filter coefficients) affects the implementation of a filter on the FPGA. Specifically, I am interested in the number of BRAM blocks and DSP slices used for a sparse filter versus a non-sparse filter of the same length. I assume the filter coefficients to be symmetric and the number of taps to be odd.

I have been experimenting with the FIR compiler GUI and have observed the following.

For one output per cycle (No overclocking),
The filter coefficients [1 2 3 4 0 1 2 3 4 ] use 9 DSP slices. Shouldn't this be 4 DSP slices (if we use the symmetry of the coefficients)?
The filter coefficients [1 0 0 0 0 0 0 0 1] use 5 DSP slices. We have just the two multipliers here? Why do we need 5 DSP slices?
For both of the filter coefficients, no BRAM units are used.

When I increase the clock frequency, the number of DSP slices used goes down (although, I do not see a clean division here i.e. increasing the clock frequency by a factor of 2 does not reduce the DSP slices by half), but the number of BRAM units used increases. For example, the filter coefficients [1 2 3 4 0 1 2 3 4] use 6 DSP slices, (down from 9), and 5 BRAM blocks (previously 0 in the not overclocked case) when overclocked by a factor of 2. Similarly, overclocking by a factor of 3 reduces the number of DSP slices to 4, and the BRAM count goes to 3. Is there some relation between the number of DSP slices and the BRAM count?

Note: I have also posted this question to Xilinx Forums, but have not received any feedback.

I am trying to investigate how sparsity (the number of zeroes in the filter coefficients) affects the implementation of a filter on the FPGA. Specifically, I am interested in the number of BRAM blocks and DSP slices used for a sparse filter versus a non-sparse filter of the same length. I assume the filter coefficients to be symmetric and the number of taps to be odd.

I have been experimenting with the FIR compiler GUI and have observed the following.

For one output per cycle (No overclocking),

The filter coefficients [1 2 3 4 0 1 2 3 4 ] use 9 DSP slices. Shouldn't this be 4 DSP slices (if we use the symmetry of the coefficients)?

The filter coefficients [1 0 0 0 0 0 0 0 1] use 5 DSP slices. We have just the two multipliers here? Why do we need 5 DSP slices?

For both of the filter coefficients, no BRAM units are used.

When I increase the clock frequency, the number of DSP slices used goes down (although, I do not see a clean division here i.e. increasing the clock frequency by a factor of 2 does not reduce the DSP slices by half), but the number of BRAM units used increases. For example, the filter coefficients [1 2 3 4 0 1 2 3 4] use 6 DSP slices, (down from 9), and 5 BRAM blocks (previously 0 in the not overclocked case) when overclocked by a factor of 2. Similarly, overclocking by a factor of 3 reduces the number of DSP slices to 4, and the BRAM count goes to 3. Is there some relation between the number of DSP slices and the BRAM count?

Note: I have also posted this question to Xilinx Forums, but have not received any feedback.

## Share this post

## Link to post

## Share on other sites