aabbas02

Members
  • Content Count

    6
  • Joined

  • Last visited

  1. Actually, I can do either. I just want to show that mixed symmetry also reduces to the symmetric or anti-symmetric case. I don't have any specific latency/throughput requirement. Thanks!
  2. Ah! I do not think it could have been said better because it really describes my situation. I have filter coefficients that have mixed symmetry ( [1 2 3 4 0 -4 3 -2 1] is an example of what I mean by mixed symmetry ). I want to be able to argue that this is just as good as ( [1 2 3 4 0 -4 -3 -2 - 1] ) or ( [1 2 3 4 0 4 3 2 1] ), but the Xilinx (or Altera) tools do not seem to think so. @D@n , I tried out a half-band filter, but it didn't reduce the number of DSP Slices consumed. I think the IP prefers to use the full number, i.e 25 DSP slices for 49 tap half-band filter, because of the input chain shift requirement, as @xc6lx45 suggested earlier. The Xilinx Document for FIR Compiler, PG149 , does not explicitly state any reduction in the number of DSP slices. It only mentions minimizing arithmetic requirements by exploiting coefficient symmetry on pages 25,26.
  3. Thank you! So, I feel I should have been more clear about why I am looking into this question. I am going through some old papers on FIR design for specific applications and nearly all the papers suggest a "proposed filter structure", using which the number of multipliers comes down. Now, what I gather from this discussion here is that, while useful if one were laying out the actual hardware of the circuit, the 'proposed filter structure' is not readily understood by the IP (except for a few special cases like symmetry and antisymmetry of the coefficients about some central tap value). With that limitation of the IP, it then becomes reasonable to think of designing a filter that has a structure which can be exploited by the IP. This would be desirable if one wanted to use off the shelf vendor IP but still have some degree of optimality in terms of resource consumption. Thank you for your responses! I'll be going over them again to make sure I have the correct idea.
  4. Dan, Thank you so much! Your post answers so many questions. So, I never intended for the filter coefficients to be [1 2 3 4 0 1 2 3 4] - I understand that this is not the definition of symmetry. With the coefficients [1 2 3 4 0 4 3 2 1], the DSP slice usage goes down (as expected). I was reading up on the DSP48E1 Slice element here . Turns out that the element comes with something called a pre-adder (Fig. 2-14 in the linked document) that is able to exploit symmetry. For example, see page 25,26 of PG 149 . The Xilinx document recognizes and optimizes 2 coefficient structures: even symmetric and odd symmetric. One would expect that this could easily be extended to coefficient structures that have a mixture of both even and odd symmetry (for eg. [1 2 3 4 0 -4 3 -2 1], but this does not appear to be so). Is it possible to make the IP optimize mixed symmetry too (The presence of the pre-adder in the slice seems to suggest that this should be (easily) possible). Thanks!
  5. Thank you! I think I understand what you guys mean. It's just that I have come across talk of the number of multipliers in a filter being a measure of the filter's complexity. I just expected the implementation of filters to take that into account; however, what I gather from reading your posts is that the IP still (even when there are a large number of zeros in the filter coefficients) uses the full number of DSP slices because it's easier to shift the input chain? <--- Is this an accurate conclusion? But, I am still a little short on understanding why we end up using BRAM units when we overclock to reduce the number of DSP slices. I understand zygot's reasoning that one cannot force complete usage of any particular resource (BRAM block in this case). But what necessitates the usage of BRAM for overclocking in the first place? Is the BRAM being used to store the input chain? Of course, I also see why it's important to move on and let the vendor IP do what it has to do, but a little more context would be really nice too. Thanks!
  6. I am trying to investigate how sparsity (the number of zeroes in the filter coefficients) affects the implementation of a filter on the FPGA. Specifically, I am interested in the number of BRAM blocks and DSP slices used for a sparse filter versus a non-sparse filter of the same length. I assume the filter coefficients to be symmetric and the number of taps to be odd. I have been experimenting with the FIR compiler GUI and have observed the following. For one output per cycle (No overclocking), The filter coefficients [1 2 3 4 0 1 2 3 4 ] use 9 DSP slices. Shouldn't this be 4 DSP slices (if we use the symmetry of the coefficients)? The filter coefficients [1 0 0 0 0 0 0 0 1] use 5 DSP slices. We have just the two multipliers here? Why do we need 5 DSP slices? For both of the filter coefficients, no BRAM units are used. When I increase the clock frequency, the number of DSP slices used goes down (although, I do not see a clean division here i.e. increasing the clock frequency by a factor of 2 does not reduce the DSP slices by half), but the number of BRAM units used increases. For example, the filter coefficients [1 2 3 4 0 1 2 3 4] use 6 DSP slices, (down from 9), and 5 BRAM blocks (previously 0 in the not overclocked case) when overclocked by a factor of 2. Similarly, overclocking by a factor of 3 reduces the number of DSP slices to 4, and the BRAM count goes to 3. Is there some relation between the number of DSP slices and the BRAM count? Note: I have also posted this question to Xilinx Forums, but have not received any feedback.