xc6lx45

Members
  • Content Count

    740
  • Joined

  • Last visited

Reputation Activity

  1. Like
    xc6lx45 got a reaction from Ahmed Alfadhel in FIR compiler 7.2 stopband   
    I think you should go back to the basics. Set up a simple FIR filter e.g. [1 2 3 4 5 6 7] impulse response, and get your simulation working that an input of [..0, 1, 0.....0] gives something that resembles [1 2 3 4 5 6 7]. In your simulation this is not the case (the filter has 123 different coefficients, your simulation shows only a single output value).
  2. Like
    xc6lx45 got a reaction from Ahmed Alfadhel in FIR compiler 7.2 stopband   
    yes, it's the highest positive 16 bit signed number 32767 (0x8FFF is the smallest negative number, -32768).
    You could also consider 0x4000, which is a single bit, makes your coefficients easier to recognize in the output (because multiplying with this number is a single bit shift operation).
  3. Like
    xc6lx45 got a reaction from Ahmed Alfadhel in FIR compiler 7.2 stopband   
    true but it's a five-line job e.g. in Verilog
    reg[15:0] counter = 0; reg [15:0] impulse = 0; always @(posedge clk) begin counter <= counter + 1; impulse <= (counter == 0) ? 16'h7FFF : 0; end plus the protocol interface (e.g. trigger a new valid sample if counter[7:0] == 0)
  4. Like
    xc6lx45 got a reaction from SmashedTransistors in FIR compiler 7.2 stopband   
    ... and how about a simple impulse response test (feed a stream of zeroes with an occasional 1 and check that the filter coefficients appear at the output).
    Just wondering, isn't there a "ready / valid" interface also at the output if you expand the port with "+"?
  5. Like
    xc6lx45 got a reaction from Ahmed Alfadhel in FIR compiler 7.2 stopband   
    ... and how about a simple impulse response test (feed a stream of zeroes with an occasional 1 and check that the filter coefficients appear at the output).
    Just wondering, isn't there a "ready / valid" interface also at the output if you expand the port with "+"?
  6. Like
    xc6lx45 got a reaction from SmashedTransistors in Increasing the clock frequency to 260 MHz   
    Hi,
    reading between the lines of your post, you're just "stepping up" one level in FPGA design. I don't do long answers but here's my pick on the "important stuff"
    - Before, take one step back from the timing report and fix asynchronous inputs and outputs (e.g. LEDs and switches). Throw in a bunch of extra registers, or even "false-path" them. The problem (assuming this "beginner mistake") is that the design tries to sample them at the high clock rate. Which creates a near-impossible problem. Don't move further before this is understood, fixed and verified.
    - speaking of "verified": Read the detailed timing analysis and understand it. It'll take a few working hours to make sense of it but this is where a large part of "serious" design work happens.
    - Once the obvious problems are fixed, I need to understand what is the so-called "critical path" in the design and improve it. For a feedforward-style design (no feedback loops) this can be systematically done by inserting delay registers. The output is generated e.g. one clock cycle later but the design is able to run at a higher clock so overall performance improves.
    - Don't worry about floorplanning yet (if ever) - this comes in when the "automatic" intelligence of the tools fails. But, they are very good.
    - Do not optimize on a P&R result that fails timing catastrophically (as in your example - there are almost 2000 paths that fail). It can lead into a "rabbit's hole" where you optimize non-critical paths (which is usually a bad idea for long-term maintenance)
    - You may adjust your coding style based on the observations, e.g. throw in extra registers where they will "probably" make sense (even if those paths don't show up in the timing analysis, the extra registers allow the tools to essentially disregard them in optimization to focus on what is important)
    - There are a few tricks like forcing redundant registers to remain separate. Example, I have a dozen identical blocks that run on a common, fast 32-bit system clock and are critical to timing. Step 1, I sample the clock into a 32-bit register at each block's input to relax timing, and step 2) I declare these register as DONT_TOUCH because the tools would otherwise notice they are logically equivalent and try to use one shared instance. This as an example.
    - For BRAMs and DSP blocks, check the documentation where extra registers are needed (that get absorbed into the BRAM or DSP using a dedicated hardware register). This is the only way to reach the device's specified memory or DSP performance.
    - Read the warnings. Many relate to timing, e.g. when the design forces a BRAM or DSP to bypass a hardware register.
    - Finally, 260 MHz on Artix is already much harder than 130 MHz (very generally speaking). Usually feasible but you need to pay attention to what you're doing and design for it (e.g. a Microblaze with the wrong settings will most likely not make it through timing).
    - You might also have a look at the options ("strategy") but don't expect any miracles on a bad design.
    Ooops, this almost qualifies as "long" answer ...
     
     
  7. Like
    xc6lx45 got a reaction from SmashedTransistors in Beginner DSP Projects   
    Well, if you want my opinion, DSP on FPGA is a fairly specialized niche application. It's a long walk to come up with a project that really fits into that niche, justifying an FPGA (rather pair a $0.50 FPGA for programmable IO with one or more high-end DSPs for the number crunching if someone claims "we need an FPGA").
    For studying, it can be "interesting" in a sense that you get to know quite a few dragons on a first-name basis. But then, is it productive to spend weeks on fixed point math when everybody else uses floats on a DSP / CPU when "time-to-market" is #1 priority. Maybe not. DSP is more fun in Matlab (Octave). And there is no point in FPGA for performance unless you have exhausted the options at algorithm level (again, exceptions e.g. well-defined brute-force filtering problems)
    A lot of the online material is "sponsored" by companies that sell FPGA silicon by the square meter (Yessir. We have Floats!). But this is largely for the desperate and ill-informed (of course, there are viable use cases - say high volume basestations or automotive with need for EOL in a decade or two. As said, a niche application).
    When you take the direct route, you'll run into a question like, "how on earth could I implement an audio mixing console when the FPGA has only 96 multipliers". Challenge me or anybody who has read some books and you'll find it can be done on a single multiplier (say, 100 MHz at 96 kHz is 86 multiplications per sample for 12 channels. It's just an example. In reality I'd use a few with "maintainability" of the code my major concern). The point is, the skill ceiling is fairly high but so is the design effort. It only makes sense if I plan to sell at least a hundred gazillon devices.
    On the other hand, if you separate DSP and FPGA, you'll find that a lot of the Matlab (Octave) magic maps 1:1 to real life on any modern CPU platform by importing e.g. the "Eigen" library into my C code.
     
     
  8. Like
    xc6lx45 got a reaction from Ahmed Alfadhel in FIR compiler Amplitude   
    I must admit I didn't read the manual of that tool... but "Coefficient fractional Bits" should do the required scaling (my guess: "20" for the correct value, given your 120 dB peak gain)
    I'd try to re-run the wizard with new, default settings to get rid of the error.
    Have you checked your input file, are there very large numbers (such as 32-bit-ish, like 2e9)? It may be that the tool uses 32 bits internally and needs some headroom (guessing,..), try to export to a lower range. For example, sign bit + 1 integer bit + 16 fractional bits (18 total) would seem reasonable.
  9. Like
    xc6lx45 got a reaction from Ahmed Alfadhel in FIR compiler Amplitude   
    Well, I'm not familiar with this particular tool. But at a quick glance, I think you need to set "Quantization" to e.g. "quantize only". Then the field "Coefficient fractional bits" becomes enabled.
    If it doesn't fall into place easily, start with a simple example, e.g. import a 0 0 256 0 0 FIR with 8 fractional bits and you should see 0 dB across the frequency response.
     
    Hint: for an order-of-magnitude cross-check you can get the DC gain of a FIR filter by summing its coefficients. In your example, this should be around 0.0001 for ~ -80 dB @ 0 Hz
    And PS: one possible explanation (this is a long shot) is that you designed for 12 bit coefficients and exported in 32 bits => +20
  10. Like
    xc6lx45 got a reaction from Ahmed Alfadhel in FIR compiler Amplitude   
    My first guess is that the tool needs to know the position of the decimal point of your number format. It's off by 20 bits (=> 1048576 => 120 dB).
    Fixed point knows only integers, so it's a matter of interpretation.
  11. Like
    xc6lx45 got a reaction from OvidiuD in FTDI chip not recognized anymore   
    I think it's Linux...
    Try the "dmesg" command immediately after plugging or unplugging. It should show some related events.
    The obvious, try with a different computer and a different cable. Especially cables fail often.
  12. Like
    xc6lx45 got a reaction from hearos in FTDI chip not recognized anymore   
    I think it's Linux...
    Try the "dmesg" command immediately after plugging or unplugging. It should show some related events.
    The obvious, try with a different computer and a different cable. Especially cables fail often.
  13. Like
    xc6lx45 got a reaction from [email protected] in FTDI chip not recognized anymore   
    I think it's Linux...
    Try the "dmesg" command immediately after plugging or unplugging. It should show some related events.
    The obvious, try with a different computer and a different cable. Especially cables fail often.
  14. Like
    xc6lx45 got a reaction from JColvin in Is C or C++ faster?   
    You should maybe give more background. Otherwise the answers will be only as good as your question.
    C is faster because...
    C++ is faster because...
  15. Like
    xc6lx45 got a reaction from jpeyron in busbridge3: High-speed FTDI/FPGA interface   
    Hi,
    a quick update: I released a new version 1.1 that supports variable address width in the protocol.
    Functionality is unchanged,  but performance will improve for scattered writes and reads in the lower address range: 0x000000xx range saves 3 bytes per transaction, 0x0000xxxx 2 bytes and 0x00xxxxxx 1 byte.
    There was also a missing -datapath_only in the constraints, which made the timing report hard to read (the intention behind the set_max_delay constraint was simply "tool, don't make this path between clock domains any slower than x ns end-to-end").
  16. Like
    xc6lx45 got a reaction from jpeyron in fpga kit   
    >> is this can be used for controlling of power switches like MOSFET ,IGBT?
    "yes but" Electronics 101 applies (look at the data sheet of your component: E.g. how much Vds remains with Vgs=3.3 V given the switched current?)
    You might want to use some intermediate driver circuit for the simple reason of protecting the FPGA against magic-smoke incidents in the power side. For e.g. a traffic light demo, a 10k series resistor to the gate will prevent the worst but this wrecks the switching speed for e.g. PWM.
     
  17. Like
    xc6lx45 got a reaction from ahmedengr.bilal in Car Detection using p5cam on Zybo z7 10   
    >> new to fpga and zybo
    >> i want to use open cv but i dont know where to start from
    Just thinking aloud: Independently of making hardware work, it might be a good idea to forget everything about hardware and video. Spend some time with openCv and offline bitmap examples on a standard Linux machine, say a virtual Linux box or a Raspberry Pi. Can speak only for myself, but I rather fight my dragons one at a time, not all at once
  18. Like
    xc6lx45 got a reaction from [email protected] in FPGA for network protocol conversion   
    Hi,
    I can tell you this much that estimating required FPGA size is a challenging topic in general. Chances are high that an abstract analysis that's not based on experience will completely miss the point.
    For the PC, I think you are describing operating system overhead, not hardware limitations. Before I'd consider FPGA, I'd have a look at a bare-metal implementation, which can probably be approximated by editing the network card driver in a linux distribution (e.g. use one CPU core per inbound card and keep it in a spinlock  waiting for data to avoid the slow context switch etc on interrupt). If your code runs otherwise from cache, a modern e.g. 5 GHz CPU is a force to be reckoned with.
  19. Like
    xc6lx45 got a reaction from Ahmed Alfadhel in Increasing the clock frequency to 260 MHz   
    Maybe one comment: In the ASIC world, "floorplanning" is an essential design phase, where you slice and dice the predicted silicon area and give each design team their own little box. The blocks are designed and simulated independently, and come together only at a fairly late phase.
    ASIC differs from FPGA in some major ways:
    - ASIC IOs have a physical placement e.g. along the pad ring. We don't want to run sensitive signals across the chip, need to minimize coupling for mixed-signal etc. In comparison, FPGAs are probably more robust (a complex design will definitely consider the layout, especially on larger devices. But on smaller eval boards, the first restrictions I'll probably run into are logical e.g. which clock is available where, not geometrical).
    - For ASICs, we need the floorplan to design the power distribution network as an own sub-project (and many a bright-eyed startup has learned electromigration the hard way).
    - In the ASIC world, we need to worry about wide and fast data paths both regarding power and area - transistors are tiny but metal wires are not.
    You might have a look at "partial reconfiguration", here the geometry of the layout plays some role.
     
  20. Like
    xc6lx45 got a reaction from [email protected] in Increasing the clock frequency to 260 MHz   
    Hi,
    reading between the lines of your post, you're just "stepping up" one level in FPGA design. I don't do long answers but here's my pick on the "important stuff"
    - Before, take one step back from the timing report and fix asynchronous inputs and outputs (e.g. LEDs and switches). Throw in a bunch of extra registers, or even "false-path" them. The problem (assuming this "beginner mistake") is that the design tries to sample them at the high clock rate. Which creates a near-impossible problem. Don't move further before this is understood, fixed and verified.
    - speaking of "verified": Read the detailed timing analysis and understand it. It'll take a few working hours to make sense of it but this is where a large part of "serious" design work happens.
    - Once the obvious problems are fixed, I need to understand what is the so-called "critical path" in the design and improve it. For a feedforward-style design (no feedback loops) this can be systematically done by inserting delay registers. The output is generated e.g. one clock cycle later but the design is able to run at a higher clock so overall performance improves.
    - Don't worry about floorplanning yet (if ever) - this comes in when the "automatic" intelligence of the tools fails. But, they are very good.
    - Do not optimize on a P&R result that fails timing catastrophically (as in your example - there are almost 2000 paths that fail). It can lead into a "rabbit's hole" where you optimize non-critical paths (which is usually a bad idea for long-term maintenance)
    - You may adjust your coding style based on the observations, e.g. throw in extra registers where they will "probably" make sense (even if those paths don't show up in the timing analysis, the extra registers allow the tools to essentially disregard them in optimization to focus on what is important)
    - There are a few tricks like forcing redundant registers to remain separate. Example, I have a dozen identical blocks that run on a common, fast 32-bit system clock and are critical to timing. Step 1, I sample the clock into a 32-bit register at each block's input to relax timing, and step 2) I declare these register as DONT_TOUCH because the tools would otherwise notice they are logically equivalent and try to use one shared instance. This as an example.
    - For BRAMs and DSP blocks, check the documentation where extra registers are needed (that get absorbed into the BRAM or DSP using a dedicated hardware register). This is the only way to reach the device's specified memory or DSP performance.
    - Read the warnings. Many relate to timing, e.g. when the design forces a BRAM or DSP to bypass a hardware register.
    - Finally, 260 MHz on Artix is already much harder than 130 MHz (very generally speaking). Usually feasible but you need to pay attention to what you're doing and design for it (e.g. a Microblaze with the wrong settings will most likely not make it through timing).
    - You might also have a look at the options ("strategy") but don't expect any miracles on a bad design.
    Ooops, this almost qualifies as "long" answer ...
     
     
  21. Like
    xc6lx45 got a reaction from Ahmed Alfadhel in Increasing the clock frequency to 260 MHz   
    Hi,
    reading between the lines of your post, you're just "stepping up" one level in FPGA design. I don't do long answers but here's my pick on the "important stuff"
    - Before, take one step back from the timing report and fix asynchronous inputs and outputs (e.g. LEDs and switches). Throw in a bunch of extra registers, or even "false-path" them. The problem (assuming this "beginner mistake") is that the design tries to sample them at the high clock rate. Which creates a near-impossible problem. Don't move further before this is understood, fixed and verified.
    - speaking of "verified": Read the detailed timing analysis and understand it. It'll take a few working hours to make sense of it but this is where a large part of "serious" design work happens.
    - Once the obvious problems are fixed, I need to understand what is the so-called "critical path" in the design and improve it. For a feedforward-style design (no feedback loops) this can be systematically done by inserting delay registers. The output is generated e.g. one clock cycle later but the design is able to run at a higher clock so overall performance improves.
    - Don't worry about floorplanning yet (if ever) - this comes in when the "automatic" intelligence of the tools fails. But, they are very good.
    - Do not optimize on a P&R result that fails timing catastrophically (as in your example - there are almost 2000 paths that fail). It can lead into a "rabbit's hole" where you optimize non-critical paths (which is usually a bad idea for long-term maintenance)
    - You may adjust your coding style based on the observations, e.g. throw in extra registers where they will "probably" make sense (even if those paths don't show up in the timing analysis, the extra registers allow the tools to essentially disregard them in optimization to focus on what is important)
    - There are a few tricks like forcing redundant registers to remain separate. Example, I have a dozen identical blocks that run on a common, fast 32-bit system clock and are critical to timing. Step 1, I sample the clock into a 32-bit register at each block's input to relax timing, and step 2) I declare these register as DONT_TOUCH because the tools would otherwise notice they are logically equivalent and try to use one shared instance. This as an example.
    - For BRAMs and DSP blocks, check the documentation where extra registers are needed (that get absorbed into the BRAM or DSP using a dedicated hardware register). This is the only way to reach the device's specified memory or DSP performance.
    - Read the warnings. Many relate to timing, e.g. when the design forces a BRAM or DSP to bypass a hardware register.
    - Finally, 260 MHz on Artix is already much harder than 130 MHz (very generally speaking). Usually feasible but you need to pay attention to what you're doing and design for it (e.g. a Microblaze with the wrong settings will most likely not make it through timing).
    - You might also have a look at the options ("strategy") but don't expect any miracles on a bad design.
    Ooops, this almost qualifies as "long" answer ...
     
     
  22. Like
    xc6lx45 got a reaction from jpeyron in power supply   
    a fuse?
  23. Like
    xc6lx45 got a reaction from charlieho in How to connect an external FIFO to FPGA   
    Thinking of which... actually I do have a plain-Verilog FIFO around from an old design.
    It's not a showroom piece but I think it did work as expected (whatever that is...) For 131072 elements you'd set ADDRBITS to 17 and DATABITS to 18 for 18 bit width.
    module FIFO(i_clk, i_reset, i_push, i_pushData, i_pop, o_popAck, o_popData, o_empty, o_full, o_error, o_nItems, o_nFree); parameter DATABITS = -1; parameter ADDRBITS = -1; localparam ADDR_ZERO = {{(ADDRBITS){1'b0}}}; localparam ADDR_ONE = {{(ADDRBITS-1){1'b0}}, 1'b1}; localparam DATA_X = {{(DATABITS){1'bx}}}; input wire i_clk; input wire i_push; input wire i_reset; input wire [DATABITS-1:0] i_pushData; input wire i_pop; output reg o_popAck = 1'b0; output wire [DATABITS-1:0] o_popData; output reg o_error = 1'b0; output wire [31:0] o_nItems; output wire [31:0] o_nFree; output wire o_empty; output wire o_full; reg popAckB = 1'b0; reg [DATABITS-1:0] mem[((1 << ADDRBITS)-1):0]; reg [ADDRBITS-1:0] pushPtr = ADDR_ZERO; reg [ADDRBITS-1:0] popPtr = ADDR_ZERO; reg [DATABITS-1:0] readReg = DATA_X; reg [DATABITS-1:0] readRegB = DATA_X; wire [ADDRBITS-1:0] nextPushPtr = i_push ? pushPtr + ADDR_ONE : pushPtr; wire [ADDRBITS-1:0] nextPopPtr = i_pop ? popPtr + ADDR_ONE : popPtr; assign o_popData = o_popAck ? readReg : DATA_X; // === items counter === // note: needs extra bit (e.g. 4 slots may hold [0, 1, 2, 3, 4] elements) reg [ADDRBITS:0] nItems; assign o_nItems = {{{31-ADDRBITS-1}{1'b0}}, nItems}; assign o_nFree = (1 << ADDRBITS) - nItems; localparam NITEMS_ONE = {{(ADDRBITS){1'b0}}, 1'b1}; assign o_empty = nItems == 0; assign o_full = nItems == {1'b1, {{ADDRBITS}{1'b0}}}; always @(posedge i_clk) begin // === preliminary assignments === readRegB <= DATA_X; popAckB <= 1'b0; case ({i_push, i_pop}) 2'b10: nItems <= nItems + NITEMS_ONE; 2'b01: nItems <= nItems - NITEMS_ONE; default: begin end endcase o_error <= (i_push && ~i_pop && o_full) || (i_pop && o_empty); // === output register (delay 1) === o_popAck <= popAckB; readReg <= readRegB; pushPtr <= nextPushPtr; popPtr <= nextPopPtr; if (i_push) mem[pushPtr] <= i_pushData; if (i_pop) begin readRegB <= mem[popPtr]; popAckB <= 1'b1; end if (i_reset) begin pushPtr <= ADDR_ZERO; popPtr <= ADDR_ZERO; o_error <= 1'b0; o_popAck <= 1'b0; popAckB <= 1'b0; readReg <= DATA_X; readRegB <= DATA_X; nItems <= 0; end end endmodule  
  24. Like
    xc6lx45 got a reaction from charlieho in How to connect an external FIFO to FPGA   
    Yes, you can combine more than one block RAM.
    There is more than one way to implement a FIFO. If I had to do it for myself, I'd write it in plain Verilog, it's about two or three screen lengths of code if the interface requirements are "clean" (such as, one clock and freedom to leave a few clock cycles of latency, before the first input appears at the output).
    I didn't check but I think there is an "IP block wizard" for FIFOs in Vivado that may do what you need.
    With "expensive" I meant just that, it costs a lot of money to use half an FPGA just for memory.
  25. Like
    xc6lx45 got a reaction from jpeyron in How to connect an external FIFO to FPGA   
    Thinking of which... actually I do have a plain-Verilog FIFO around from an old design.
    It's not a showroom piece but I think it did work as expected (whatever that is...) For 131072 elements you'd set ADDRBITS to 17 and DATABITS to 18 for 18 bit width.
    module FIFO(i_clk, i_reset, i_push, i_pushData, i_pop, o_popAck, o_popData, o_empty, o_full, o_error, o_nItems, o_nFree); parameter DATABITS = -1; parameter ADDRBITS = -1; localparam ADDR_ZERO = {{(ADDRBITS){1'b0}}}; localparam ADDR_ONE = {{(ADDRBITS-1){1'b0}}, 1'b1}; localparam DATA_X = {{(DATABITS){1'bx}}}; input wire i_clk; input wire i_push; input wire i_reset; input wire [DATABITS-1:0] i_pushData; input wire i_pop; output reg o_popAck = 1'b0; output wire [DATABITS-1:0] o_popData; output reg o_error = 1'b0; output wire [31:0] o_nItems; output wire [31:0] o_nFree; output wire o_empty; output wire o_full; reg popAckB = 1'b0; reg [DATABITS-1:0] mem[((1 << ADDRBITS)-1):0]; reg [ADDRBITS-1:0] pushPtr = ADDR_ZERO; reg [ADDRBITS-1:0] popPtr = ADDR_ZERO; reg [DATABITS-1:0] readReg = DATA_X; reg [DATABITS-1:0] readRegB = DATA_X; wire [ADDRBITS-1:0] nextPushPtr = i_push ? pushPtr + ADDR_ONE : pushPtr; wire [ADDRBITS-1:0] nextPopPtr = i_pop ? popPtr + ADDR_ONE : popPtr; assign o_popData = o_popAck ? readReg : DATA_X; // === items counter === // note: needs extra bit (e.g. 4 slots may hold [0, 1, 2, 3, 4] elements) reg [ADDRBITS:0] nItems; assign o_nItems = {{{31-ADDRBITS-1}{1'b0}}, nItems}; assign o_nFree = (1 << ADDRBITS) - nItems; localparam NITEMS_ONE = {{(ADDRBITS){1'b0}}, 1'b1}; assign o_empty = nItems == 0; assign o_full = nItems == {1'b1, {{ADDRBITS}{1'b0}}}; always @(posedge i_clk) begin // === preliminary assignments === readRegB <= DATA_X; popAckB <= 1'b0; case ({i_push, i_pop}) 2'b10: nItems <= nItems + NITEMS_ONE; 2'b01: nItems <= nItems - NITEMS_ONE; default: begin end endcase o_error <= (i_push && ~i_pop && o_full) || (i_pop && o_empty); // === output register (delay 1) === o_popAck <= popAckB; readReg <= readRegB; pushPtr <= nextPushPtr; popPtr <= nextPopPtr; if (i_push) mem[pushPtr] <= i_pushData; if (i_pop) begin readRegB <= mem[popPtr]; popAckB <= 1'b1; end if (i_reset) begin pushPtr <= ADDR_ZERO; popPtr <= ADDR_ZERO; o_error <= 1'b0; o_popAck <= 1'b0; popAckB <= 1'b0; readReg <= DATA_X; readRegB <= DATA_X; nItems <= 0; end end endmodule