Leaderboard


Popular Content

Showing content with the highest reputation on 04/28/19 in all areas

  1. 1 point
    Just be aware that most of the "legacy" material on FIR filters limits itself to what can be presented conveniently. Numerical optimization is the tool of choice and there is no "cheating". Or, taking one step back to the filter specs, there is usually no need to specify a flat stopband and it can significantly reduce the required filter size (credits to Prof. Fred Harris) This only as.example where I can avoid unnecessary constraints from using a ready-made design process by writing my own solver. Which is actually not that hard, basing it on fminsolve or fminunc in Matlab / Octave. BTW, one reference on this topic I found useful: author = {Mathias Lang}, title = {Algorithms for the Constrained Design of Digital Filters with Arbitrary Magnitude and Phase Responses}, year = {1999} http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.6.9336 The title alone is interesting - "Arbitrary Magnitude and Phase Responses" - no one says a digital filter needs to be flat or linear phase (of course, at the expense of symmetry). Sometimes I wonder, our thinking seems to often get stuck in patterns and "templates". Take analog filters, for example: Chebyshev, Butterworth or Bessel, which one do I pick? But those are just corners of the design space, and the best filter for a given application is most likely somewhere in-between, if I can only design it (which is, again, a job for a numerical optimizer, even though this one is more difficult).
  2. 1 point
    So, if I am getting the point of the previous two posts from xclx45 and Dan, make your own filter in Verilog or VHDL and figure out the details ( signed, unsigned, fixed point, etc, .... etc ). You can instantiate DSP48E primitives (difficult) or just let the synthesis tool do it from your HDL code (easier). Debating how things should be verses how they are when using third party scripts to generate unreadable code seems like a waste of time to me... If you don't like what you get then design what you want. If you can make the time to write your own IP ( would be nice to not depend on a particular vendor's DSP architecture ) you'll learn a lot and save a lot of time later. If a vendor's IP doesn't make timing closure for your design a nightmare and you don't have the time to figure out all the details just let the IP handle it. I suspect that trying to optimize the FIR Compiler will be frustrating at best. I once had to come up with pipelined code to calculate signal strength in dB at a minimum sustained data rate. My approach to converting 32-bit integers to logarithms used a combination of Taylor Series expansion and look-up tables. I had a few versions**. One was straight VHDL so that I could compare Altera and Xilinx DSP tiles. One instantiated DSP48 tile primitives for a particular Xilinx device. These were fixed point designs. There's theory and there's practical experience.. they are usually not the same. ** I played with a number of approaches based on extremely limited specifications so there were quite a few versions. Every time I presented one the requirements changed and so did the complexity and resource requirements. I should mention that my intent for mentioning this experience is not to denigrate the information presented by others or to claim superiority in any way. When getting advice it's important to put that into context. A lot of times facts aren't necessarily relevant to solving a particular problem. If I haven't made this clear I've never had the experience that vendor IP optimizes resource usage... in fact quite the opposite. This is why in a commercial setting companies are willing to pay to develop their own IP. Sometimes FPGA silicon costs overshadow development costs.
  3. 1 point
    The word "overclocking" may be even misleading - this architecture is used when the data rate is significantly lower than the multiplier's speed. Inputs to one (expensive) multiplier are multiplexed, so it can do the work for several or all taps. The "multiplexing" itself will get fairly expensive in logic fabric due to the number of data bits and coefficients. To the rescue comes the BRAM, which is essentially a giant demultiplexer-multiplexer combo, with a football field of flipflops in-between. You can find an example for this approach here. Out-of-the-box, it's unfortunately quite complicated because it does arbitrary rational resampling. Setting the rates equal for a standard FIR filter, you end up with only a few lines of remaining code. BTW, multiplier count as design metric is probably overrated nowadays for several reasons (the IP tool resource usage is already more practical, e.g. BRAM count may become the bottleneck). If you can get this book through your library, you might have a look e.g. at chapter 6 (background only, this is a very old book): Keshab K. Parhi VLSI Digital Signal Processing Systems: Design and Implementation
  4. 1 point
    D@n

    Understand Resource Usage for FIR Compiler

    @aabbas02, So let's start at the top. An N-point FIR filter, such as this one, requires N multiplies and N-1 adds. Let's just count multiplies, though, for now. Let's now look at some optimizations you might apply. A complex multiply usually requires 4 multiplies. If you have complex input and taps, that's 4N real multiplies. There's a trick you can use to drop this to 3N multiplies. If the filter is real, and the incoming signal is complex, you can drop this to 2N multiplies by filtering each of the real and imaginary channels separately. If your filter taps are fixed, a good optimizer should be able to replace the zeros with a constant output, the ones with data pass through, etc. Implementing taps with two or three bits this way is possible. This only works, though, if the filter taps are fixed. Many of the common FIR filter developments generate linear phase filters. These filters are symmetric about a common point. With a little bit of work, you can use that to reduce the number of multiplies (for dynamic taps) down to (N-1)/2. There is a very significant class of filters called half band filters. In a half band filter, every other tap (other than the center tap) is zero. With a bit of work, you can then drop your usage down to (N-1)/4 multiplies. This optimization applies to Hilbert transforms as well. In the given list above, I've assumed that you need to operate at one sample in and one sample out per clock. In that case, there's no time or room for memory, since all of the stages need their data on every clock. I should also point out that I'm counting multiplies, not DSP slices. If your multiplies are smaller than 18x18 bits, you may be able to use a single DSP slice per multiply. If they are larger, you might find yourself using many DSP slices per multiply. It depends. Let's now consider the case where you have an N sample filter but you are only going to provide it with one sample of data every N+ samples (some number more than N). You could then multiplex this multiply and implement an N-point filter with 1 multiply and 2^(ceil(log_2(N)) RAM elements. If your filter was symmetric, you could process a filter roughly twice as long, or a sample rate roughly twice as fast while still using a single multiply. The half-band and hilbert tricks can apply to (roughly) double your filter size or your data rate again. In these cases, however, you can't spare the multiply at all, since it is used to implement every coefficient. That should give you both ends of the spectrum. Now, while I don't understand how Xilinx's FIR compiler works, I can say that if I were to make one I would allow a user to design crosses between these two ends of the spectrum. In the case of a such a cascaded filter, however, you may find it difficult to continue to implement the optimizations we've just discussed, simply because the cascaded structure becomes more difficult to deal with. (By cascade, I mean cascade in implementation and not filtering a stream and then filtering it again.) Looking at your two designs, none of the optimizations I've mentioned would apply. In the high speed filters we started out with, sure, you might manage to remove a multiply or two by multiplying by zero. On the other hand, you can't share multiplies across elements if you do so. For example, you mentioned [1 2 3 4 0 1 2 3 4]. Sure, it's got repeating coefficients, but this is actually a really obscure filtering structure, and not likely one that I (were I Xilinx) would pay to support. Try [ 1 2 3 4 0 4 3 2 1] instead and see if things change. Similarly for [ 1 0 0 0 0 0 0 0 1]. Yeah, it's symmetric about the mid-point, but in all of my symmetric filtering applications I'm assuming the mid point has a value of 2^N-1 or similar so that it doesn't need a multiply. Your choice of filters offers no such optimization, so who knows how it might work. Hope this helps to shed some light on things, Dan
  5. 1 point
    Xilinx encrypts the FIR compiler code so it's difficult to figure out what's going on with your experiments. You should read PG149 which is the IP product Guide for the FIR Compiler. It does indicate when coefficient symmetry optimizations might or won't be made. It also has a link to a Resource Utilization "Spreadsheet" for some FPGA family devices depending on your customized specifications. Oddly, that bit of guidance doesn't show any usages where significant numbers of BRAMS are ever used. The guide does have a section titled "Resource Considerations" that has some interesting information but not enough to answer you questions. I suspect that you'll have to understand actual resource utilization for your application empirically. (SOP from my experience when I really need optimal performance) It's always a good idea to be familiar with all device and IP documentation though, in my experience, the IP documentation rarely addresses all of my questions when I'm deep into a design and committed to IP that's not mine. Again, my guess is that when you specify a clock rate, sample rate, data and coefficient widths, etc resource utilization increases with increasing throughput. The same thing happens if you pipeline your code to maximize data rates. But I'm only guessing. I don't use the FIR Compiler so my views are an extrapolation of experience with other IP from all FPGA vendors that I have used.
  6. 1 point
    D@n

    MMCM dynamic clocking

    @rangaraj, It's a shame you only know VHDL coding, since the Verilog code I posted above would give you the ability to generate an arbitrary clock--unencumbered by the constraints of the PLL, with frequency resolution in the milli-Hertz range (100MHz/2^32). Perhaps you want to take another look at it? Sure, it would have some phase noise, but ... that could be beaten now if necessary by using an OSERDESE2 component. I've got an example design I'm working on that does just that and should knock the phase noise down to 1.25ns or better. Dan
  7. 1 point
    hamster

    MMCM dynamic clocking

    Hey, something else I just saw when reading the clocking guide was: MMCM Counter Cascading The CLKOUT6 divider (counter) can be cascaded with the CLKOUT4 divider. This provides a capability to have an output divider that is larger than 128. CLKOUT6 feeds the input of the CLKOUT4 divider. There is a static phase offset between the output of the cascaded divider and all other output dividers. And: CLKOUT4_CASCADE : Cascades the output divider (counter) CLKOUT6 into the input of the CLKOUT4 divider for an output clock divider that is greater than 128, effectively providing a total divide value of 16,384. So that can divide a 600 MHz VCO down to 36.6 kHz.
  8. 1 point
    hamster

    MMCM dynamic clocking

    I feel a bit bad about posting a minor novel here, but here is an example of going from "5 cycles on, 5 off" (i.e. divide by 10) to "10 on, 10 off" (device by 20). The VCO is initially to 800 MHz with CLK0 being VCO divide by 8.... so after config you get 100MHz. Push the button and you get 800/20 = 40MHz, release the button and you get 80MHz. It is all really hairy in practice! EDIT: Through experimentation I just found that you don't need to reset the MMCM if you are not changing the VCO frequency. So the 'rst' signal in the code below isn't needed (and LOCKED will stay asserted). -------------------------------------------------------------------------------------------------------- -- Playing with the MMCM DRP ports. -- see https://www.xilinx.com/support/documentation/application_notes/xapp888_7Series_DynamicRecon.pdf -- for the Dynamic Reconviguration Port addresses -------------------------------------------------------------------------------------------------------- library IEEE; use IEEE.STD_LOGIC_1164.ALL; use IEEE.NUMERIC_STD.ALL; library UNISIM; use UNISIM.VComponents.all; entity mmcm_reset is Port ( clk_100 : in STD_LOGIC; btn_raw : in STD_LOGIC; led : out STD_LOGIC_VECTOR (15 downto 0)); end mmcm_reset; architecture Behavioral of mmcm_reset is signal btn_meta : std_logic := '0'; signal btn : std_logic := '0'; signal speed_select : std_logic := '0'; signal counter : unsigned(26 downto 0) := (others => '0'); signal debounce : unsigned(15 downto 0) := (others => '0'); signal clk_switched : std_logic := '0'; signal clk_fb : std_logic := '0'; type t_state is (state_idle_fast, state_go_slow_1, state_go_slow_2, state_go_slow_3, state_idle_slow, state_go_fast_1, state_go_fast_2, state_go_fast_3); signal state : t_state := state_idle_fast; ----------------------------------------------------------------------------- --- This is the CLKOUT0 ClkReg1 address - the only register to be played with ----------------------------------------------------------------------------- signal daddr : std_logic_vector(6 downto 0) := "0001000"; signal do : std_logic_vector(15 downto 0) := (others => '0'); signal drdy : std_logic := '0'; signal den : std_logic := '0'; signal di : std_logic_vector(15 downto 0) := (others => '0'); signal dwe : std_logic := '0'; signal rst : std_logic := '0'; begin MMCME2_ADV_inst : MMCME2_ADV generic map ( BANDWIDTH => "OPTIMIZED", -- Jitter programming (OPTIMIZED, HIGH, LOW) CLKFBOUT_MULT_F => 8.0, -- Multiply value for all CLKOUT (2.000-64.000). CLKFBOUT_PHASE => 0.0, -- Phase offset in degrees of CLKFB (-360.000-360.000). -- CLKIN_PERIOD: Input clock period in ns to ps resolution (i.e. 33.333 is 30 MHz). CLKIN1_PERIOD => 10.0, CLKIN2_PERIOD => 0.0, -- CLKOUT0_DIVIDE - CLKOUT6_DIVIDE: Divide amount for CLKOUT (1-128) CLKOUT1_DIVIDE => 1, CLKOUT2_DIVIDE => 1, CLKOUT3_DIVIDE => 1, CLKOUT4_DIVIDE => 1, CLKOUT5_DIVIDE => 1, CLKOUT6_DIVIDE => 1, CLKOUT0_DIVIDE_F => 8.0, -- Divide amount for CLKOUT0 (1.000-128.000). -- CLKOUT0_DUTY_CYCLE - CLKOUT6_DUTY_CYCLE: Duty cycle for CLKOUT outputs (0.01-0.99). CLKOUT0_DUTY_CYCLE => 0.5, CLKOUT1_DUTY_CYCLE => 0.5, CLKOUT2_DUTY_CYCLE => 0.5, CLKOUT3_DUTY_CYCLE => 0.5, CLKOUT4_DUTY_CYCLE => 0.5, CLKOUT5_DUTY_CYCLE => 0.5, CLKOUT6_DUTY_CYCLE => 0.5, -- CLKOUT0_PHASE - CLKOUT6_PHASE: Phase offset for CLKOUT outputs (-360.000-360.000). CLKOUT0_PHASE => 0.0, CLKOUT1_PHASE => 0.0, CLKOUT2_PHASE => 0.0, CLKOUT3_PHASE => 0.0, CLKOUT4_PHASE => 0.0, CLKOUT5_PHASE => 0.0, CLKOUT6_PHASE => 0.0, CLKOUT4_CASCADE => FALSE, -- Cascade CLKOUT4 counter with CLKOUT6 (FALSE, TRUE) COMPENSATION => "ZHOLD", -- ZHOLD, BUF_IN, EXTERNAL, INTERNAL DIVCLK_DIVIDE => 1, -- Master division value (1-106) -- REF_JITTER: Reference input jitter in UI (0.000-0.999). REF_JITTER1 => 0.0, REF_JITTER2 => 0.0, STARTUP_WAIT => FALSE, -- Delays DONE until MMCM is locked (FALSE, TRUE) -- Spread Spectrum: Spread Spectrum Attributes SS_EN => "FALSE", -- Enables spread spectrum (FALSE, TRUE) SS_MODE => "CENTER_HIGH", -- CENTER_HIGH, CENTER_LOW, DOWN_HIGH, DOWN_LOW SS_MOD_PERIOD => 10000, -- Spread spectrum modulation period (ns) (VALUES) -- USE_FINE_PS: Fine phase shift enable (TRUE/FALSE) CLKFBOUT_USE_FINE_PS => FALSE, CLKOUT0_USE_FINE_PS => FALSE, CLKOUT1_USE_FINE_PS => FALSE, CLKOUT2_USE_FINE_PS => FALSE, CLKOUT3_USE_FINE_PS => FALSE, CLKOUT4_USE_FINE_PS => FALSE, CLKOUT5_USE_FINE_PS => FALSE, CLKOUT6_USE_FINE_PS => FALSE ) port map ( -- Clock Outputs: 1-bit (each) output: User configurable clock outputs CLKOUT0 => clk_switched, CLKOUT0B => open, CLKOUT1 => open, CLKOUT1B => open, CLKOUT2 => open, CLKOUT2B => open, CLKOUT3 => open, CLKOUT3B => open, CLKOUT4 => open, CLKOUT5 => open, CLKOUT6 => open, -- Dynamic Phase Shift Ports: 1-bit (each) output: Ports used for dynamic phase shifting of the outputs PSDONE => open, -- Feedback Clocks: 1-bit (each) output: Clock feedback ports CLKFBOUT => clk_fb, CLKFBOUTB => open, -- Status Ports: 1-bit (each) output: MMCM status ports CLKFBSTOPPED => open, CLKINSTOPPED => open, LOCKED => open, -- Clock Inputs: 1-bit (each) input: Clock inputs CLKIN1 => clk_100, CLKIN2 => '0', -- Control Ports: 1-bit (each) input: MMCM control ports CLKINSEL => '1', PWRDWN => '0', -- 1-bit input: Power-down RST => rst, -- 1-bit input: Reset -- DRP Ports: 16-bit (each) output: Dynamic reconfiguration ports DCLK => clk_100, -- 1-bit input: DRP clock DO => DO, -- 16-bit output: DRP data DRDY => DRDY, -- 1-bit output: DRP ready -- DRP Ports: 7-bit (each) input: Dynamic reconfiguration ports DADDR => DADDR, -- 7-bit input: DRP address DEN => DEN, -- 1-bit input: DRP enable DI => DI, -- 16-bit input: DRP data DWE => DWE, -- 1-bit input: DRP write enable -- Dynamic Phase Shift Ports: 1-bit (each) input: Ports used for dynamic phase shifting of the outputs PSCLK => '0', PSEN => '0', PSINCDEC => '0', -- Feedback Clocks: 1-bit (each) input: Clock feedback ports CLKFBIN => clk_fb ); speed_change_fsm: process(clk_100) begin if rising_edge(clk_100) then di <= (others => '0'); dwe <= '0'; den <= '0'; case state is when state_idle_fast => if speed_select = '1'then state <= state_go_slow_1; -- High 10 Low 10 di <= "0001" & "001010" & "001010"; dwe <= '1'; den <= '1'; end if; when state_go_slow_1 => if drdy = '1' then state <= state_go_slow_2; end if; when state_go_slow_2 => rst <= '1'; state <= state_go_slow_3; when state_go_slow_3 => rst <= '0'; state <= state_idle_slow; when state_idle_slow => di <= (others => '0'); if speed_select = '0' and drdy = '0' then state <= state_go_fast_1; -- High 5 Low 5 di <= "0001" & "000101" & "000101"; dwe <= '1'; den <= '1'; end if; when state_go_fast_1 => if drdy = '1' then state <= state_go_fast_2; end if; when state_go_fast_2 => rst <= '1'; state <= state_go_fast_3; when state_go_fast_3 => rst <= '0'; state <= state_idle_fast; end case; end if; end process; dbounce_proc: process(clk_100) begin if rising_edge(clk_100) then if speed_select = btn then debounce <= (others => '0'); elsif debounce(debounce'high) = '1' then speed_select <= not speed_select; else debounce <= debounce + 1; end if; -- Syncronise the button btn <= btn_meta; btn_meta <= btn_raw; end if; end process; show_speed_proc: process(clk_switched) begin if rising_edge(clk_switched) then counter <= counter + 1; led(7 downto 0) <= std_logic_vector(counter(counter'high downto counter'high-7)); end if; end process; led(15) <= speed_select; end Behavioral;