• Content count

  • Joined

  • Last visited

  • Days Won

  1. Audio processing

    It isn't ideal. This is a common case of half-generalizing code. The declared type for coefficients is generic but then restricted to a specific width in practice. My guess is that the original design had known sizes for coefs/inputs but also made them generics for "good practice". Many practical designs are never tested outside of the original use case. That the filter clearly uses signed values as "std_logic_vector" is also concerning. Also, the values seem to be scaled in a sub-optimal manner -- none are close to max magnitude. Like someone assumed intermediate stages had to have the same bit width as the output and also didn't know that FPGAs have had 18 bit multiplies for over a decade. In both Verilog and VHDL the coefs should either be loaded from a file or be dynamically loaded after the fpga is programmed. (also, this is clearly a symmetric FIR filter where someone also had a copy-paste error to make it asymmetric and then also not use an optimal implementation. This is also probably not ideal for audio as attack/decay is possibly a concern over timeless spectrum.) --edit: This comment might have been harsh. It might be that the original poster is also the person I criticized here. In that case I would have been critical of the person asking for help from the community. That was not my intent as my post was intended to help that person in solving their issues.
  2. Non-clocked synchronous circuits

    Using posedge on fabric logic can have a few issues. If the fabric-generated clock doesn't come directly from a register -- eg if you have c = ( a == b ) -- then you can generate glitches. It might be that as new values of a,b are being propagated to the logic the condition is met one or more times within a normal cycle. This can generate short pulses which might trigger some registers but not others. This is also true for async set/reset logic. When a fabric generated clock comes from a register or doesn't have glitches, the clock might be ok to use. There are still some issues. First, this design style is more prone to generating a larger number of clocks, which might exaust the clock routing for a given clock region. Second, the clock might will have routing delays that change from build to build as well as over temperature. This means the clock must be treated as asynchronous to other clocks in the design. These are not insurmountable issues -- you can create directed routing constraints (DiRt) to ensure the same routing is used each build. You can ensure safe clock-domain-crossing logic. However, this requires extra effort in design/sim/constraints. This is another issue -- that the fabric clocks appear easier to use. Add to this that they often work fine and they teach novice bad practices. The fabric generated clocks also can have additional jitter, duty-cycle distortion, etc... This generally isn't an issue as these clocks tend to be run at Fmax/10 or lower. For the original post, the synthesis tool generally is allowed to optimize the circuit. It is possible the tools will decide to share adder logic or other logic when it can detect mutual exclusion. The tools might opt to place the majority of the ALU into a DSP48 slice for example.
  3. Difference between BRAM, DRAm and DMA

    BRAM is "block ram" and is a fast and small, internal memory that can be accessed each cycle. DRAM is an external ram that is large, but has some overhead issues and also sends data back over multiple cycles. DMA is a scheme where a CPU can request the memory controller to move data from DRAM to/from another device in a short command. eg, if you need to send 1kB of data to a network card, the CPU issues only a few commands vs manually reading/writing every byte.
  4. Stupid Q: What does a loop in a HDL really mean?

    A shift register actually has fairly simple representations in both VHDL/Verilog. sr <= sr(sr'length-1 downto 0) & something; st <= {sr[`SRLEN-1:0], something}; Depending on how optimized the simulation/synthesis tool is, there could be a difference in performance.
  5. Stupid Q: What does a loop in a HDL really mean?

    loops can be used to describe HW. normally this isn't done to describe complex HW though. An example of a loops that can be useful in VHDL are: parity (xor-reduce), any (or-reduce), none (not or-reduce), all (and-reduce), bit-reverse, bit-count (maybe), gf2 inner product, etc... More complex logic like long-division or shift-add multiplication can infer much more logic. very complex logic like a sorting algorithm would result in long synthesis times and likely a large, slow design.
  6. One-hot encoding mystery

    VHDL uses the sensitivity list as a hint for the simulator. VHDL-2008 adds a wildcard feature for the sensitivity list. Verilog is a bit different as it uses the sensitivity list to determine if something infers clocked logic and if there is async set/reset. Verilog-2001 added a @(*) for combinatorial always blocks. SystemVerilog adds always_comb and always_ff to ensure the inferred logic matches the intended type. @Trickstart: In your post, the simulation will not use the I input in the "diagram" process. If the value of I changes and the value of state does not change, the value of next_state will not change -- in simulation. The synthesis tools will ignore the sensitivity list and provide a "simulation mismatch" warning. I generally search synthesis logs for "simulation mismatch" as a result. There is a slightly newer/better template that places the reset condition at the end of the process, either inside the if (rising_edge(clk)) statement, or after for async resets. This allows the data path logic -- which might not need a reset -- to be in the same process as the control logic that does need a reset. The issue with the traditional approach is that signals are left out of the reset case. This causes reset to become an input to the logic driving that register. FPGA vendors often advise only resetting control logic as well, which can lead to developers placing related logic in several different processes. (this is similar to an older design style where every register was in its own process -- as if it were a schematic element.) The three-process style is not common anymore, nor are two-process state machines. More devs do what @D@n describes as the single process style is less verbose and has less micro-management and also simulates faster. IMO, this is a failing of both VHDL and Verilog as the single process FSM is more prone to major logic errors -- things that can only be caught with really good simulation. 2-3 process versions are much more likely to produce logic errors that also generate synthesis warnings. The single process style also has issues when you need combinatorial logic to be based on FSM transitions on the same cycle. This occurs when a process interacts with a process in another module, in some cases. Mostly fifo empty/full. There are actually a lot of nifty ways to represent FSMs and to write them. The main issue is checking to see if the tools actually do what you want. In terms of encodings, one-hot is just one case. Really, none of the encoding types is the best, and tools don't normally support anything but the basics. four-hot, for example, might make more sense for an FPGA that uses LUTs as you can have a small encoding with efficient, but not instant decoding. Likewise, a factor approach with a "state-group" selector and a "state within state group" might be better. Even a 1-hot + state group (now more state bits than states) could lead to higher performance when lots of logic is based on logic being in any of a group of states. --edit: for the 1-hot. The logic is "if input is 0, transition to state 0 else if state isn't state0 transition to 2 else transition to 1." In this case, you know that being in state2 is equivalent to not being in state0 and not being in state1.
  7. Audio processing

    There are three phases to this design. Phase 1 -- get the mic input to work at all. Phase 2 -- develop DSP algorithm for FPGA. Phase 3 -- implement and test. One thing you can try is getting the mic input into a file that you can use for testing DSP algorithms out. It is better to test out DSP on a computer first as the iteration time is much lower. You can also try DSP algorithms on voice recordings from the internet to get a basic idea of what processing might sound like. Finally, implementation. For this design, I suggest using a DSP48, a small RAM, and a small FSM. FIR filters are nicer as the fixed-point models are predictable and fairly easy. IIR filters are probably more applicable, but fixed point models could have more issues. In either case though, you are looking at generating coefficients in matlab/octave or something like that.
  8. FPGA based PWM generation

    This is not correct. For the ramp case it mostly works. The ramp is a test signal that minimizes the importance of the signed/unsigned input differentiation. The choice to invert the msb or not is based on if the input is signed/unsigned. A test signal that alternates between +1 and -1 would show this best. After all, if the msb is not inverted this would be 0x0001 to 0xFFFF vs 0x8001 to 0x7FFF. This is ~0% to ~100% vs ~50% to ~50%. Is this in comparison to a traditional PWM system or a traditional PDM system?
  9. FPGA based PWM generation

    I'm not entirely certain what your argument is. The PDM waveforms can be different when the 15 lsb of the ramp's first half do not match the 15 lsb of the ramp's second half. In that case, the values provided to the comparator will be different. The analog waveform should look similar, except for any DC offset (which is not possible with the AC-coupled input) or any phase shift (which is not visible without an external reference). --Added: I will answer the question I think is being discussed /wrt encoding. At a minimum, an input value (A) that is considered higher in value than a second input (B) should result in a higher density of the PDM output. sumOfOnesPerPeriod(A) > sumOfOnesPerPeriod(B). This is true for all encoding -- unsigned, signed 2's complement, signed 1's complement, zig-zag, etc... For unsigned, this property is automatic. For signed 2's complement, this becomes a simple operation of inverting the msb. For 1s complement this is a bit more difficult because 1's complement addition does have feedback from the msb to the lsb. For zig-zag this would be (i >>> 1) ^ -(i & 1) ^ 0x8000.
  10. FPGA based PWM generation

    First, VHDL doesn't define "+" for std_logic_vector. Second, VHDL (for numeric_std and std_logic_arith) defines size(A+B) = max(size(A), size(B)). "c<=a+b" would not be a synthesis error. This would be a logical error. Verilog is similar, but includes the LHS in the size calculation as well. Last, mod is defined differently for VHDL vs Verilog when negative numbers are used. That point is just a note. I haven't really looked at the FPGA implementation because it doesn't answer any interesting questions. It does not include the reconstruction filter and does not provide any estimate of the output spectrum. It also does not include the effects of finite rise/fall times or other driver imperfections.
  11. FPGA based PWM generation

    You can also pick up ~3 bits using oserdes and maybe 2 bits from a faster io clock.
  12. Do you have an example of a VHDL digital filter to filter noise?

    What does the entire system look like? DSP applications are much easier to implement on a computer. If the ADC is low-rate data, a CPU/GPU implementation might be much easier to do and also has the advantage of using floating point math. If the ADC is multichannel, you might consider adding a second ADC input that can be used as a noise reference. This can be used with and adaptive filter to remove the noise. If the sampling rate is fairly low, this could be done on the FPGA with normal adaptive filtering methods. The concept is that the spectral content of the noise reference is correlated to the noise in the main channel. If the data is post processed, you can also remove this from a single recording under some assumptions. If the fpga is to be used and the sampling rate is low, it still makes sense to serialize the operations for the filter to reduce resources. This can be done in different ways. The DSP48 slices and BRAMs can be used together to create a mini-dsp that executes a simple set of instructions. This is actually easier than it sounds as there doesn't need to be any branching or anything that is difficult in a normal CPU. Software pipelining is also easy to do. This is easier to do in VHDL, or with a script that generates the RTL for this application.
  13. Using the Rj45 and Ethernet controller

    I don't think you can access the PS* (ARM) pins from the PL (FPGA) section of the SoC. You likely need to have ethernet connected to the ARM.
  14. FPGA based PWM generation

    The system is AC coupled, so no dc output. You can directly use a 16b unsigned value if you consider yourself to have a 16b unsigned input. The DC component will be removed. If you consider yourself to have a 16b signed 2's complement input, you would need to invert the msb. Otherwise -1 would map to ~100% duty ratio, and -32k would map to 50%. (0 would map to 0% and +32k would map to ~50%). Inverting the msb has the effect of converting the 16b signed value into a 16b unsigned value that is equal to x + 32*1024. -1 maps to ~50%, -32k maps to 0%, 0 maps to 50% and 32k maps to ~100%. This design maximizes the number of transitions. Each transition has a cost in terms of parasitics in the circuit. The goal of maximizing transitions is to move more energy to higher frequencies that the RCRC filter can reject more easily -- the goal being to reduce ripple within a pwm period. There are other options. in terms of arranging the bits. You can also get 2x rate PWM signals or 4x rate PWM for example.
  15. FPGA based PWM generation

    Can you be more specific? For example, differences in comparisons can affect the design. IIRC the original blog post used "<=" and "<" in two different snippets. The difference between comparisons is either a small offset or a phase inversion or both.