xc6lx45

Members
  • Content Count

    450
  • Joined

  • Last visited

  • Days Won

    29

xc6lx45 last won the day on May 18

xc6lx45 had the most liked content!

About xc6lx45

Contact Methods

  • Website URL
    https://www.linkedin.com/in/markus-nentwig-380a4575/

Profile Information

  • Gender
    Male
  • Location
    MUC
  • Interests
    RF / DSP / algorithms / systems / implementation / characterization / high-speed PA test and creative abuse of Pedal Steel Guitars

Recent Profile Visitors

1814 profile views
  1. Hi, reading between the lines of your post, you're just "stepping up" one level in FPGA design. I don't do long answers but here's my pick on the "important stuff" - Before, take one step back from the timing report and fix asynchronous inputs and outputs (e.g. LEDs and switches). Throw in a bunch of extra registers, or even "false-path" them. The problem (assuming this "beginner mistake") is that the design tries to sample them at the high clock rate. Which creates a near-impossible problem. Don't move further before this is understood, fixed and verified. - speaking of "verified": Read the detailed timing analysis and understand it. It'll take a few working hours to make sense of it but this is where a large part of "serious" design work happens. - Once the obvious problems are fixed, I need to understand what is the so-called "critical path" in the design and improve it. For a feedforward-style design (no feedback loops) this can be systematically done by inserting delay registers. The output is generated e.g. one clock cycle later but the design is able to run at a higher clock so overall performance improves. - Don't worry about floorplanning yet (if ever) - this comes in when the "automatic" intelligence of the tools fails. But, they are very good. - Do not optimize on a P&R result that fails timing catastrophically (as in your example - there are almost 2000 paths that fail). It can lead into a "rabbit's hole" where you optimize non-critical paths (which is usually a bad idea for long-term maintenance) - You may adjust your coding style based on the observations, e.g. throw in extra registers where they will "probably" make sense (even if those paths don't show up in the timing analysis, the extra registers allow the tools to essentially disregard them in optimization to focus on what is important) - There are a few tricks like forcing redundant registers to remain separate. Example, I have a dozen identical blocks that run on a common, fast 32-bit system clock and are critical to timing. Step 1, I sample the clock into a 32-bit register at each block's input to relax timing, and step 2) I declare these register as DONT_TOUCH because the tools would otherwise notice they are logically equivalent and try to use one shared instance. This as an example. - For BRAMs and DSP blocks, check the documentation where extra registers are needed (that get absorbed into the BRAM or DSP using a dedicated hardware register). This is the only way to reach the device's specified memory or DSP performance. - Read the warnings. Many relate to timing, e.g. when the design forces a BRAM or DSP to bypass a hardware register. - Finally, 260 MHz on Artix is already much harder than 130 MHz (very generally speaking). Usually feasible but you need to pay attention to what you're doing and design for it (e.g. a Microblaze with the wrong settings will most likely not make it through timing). - You might also have a look at the options ("strategy") but don't expect any miracles on a bad design. Ooops, this almost qualifies as "long" answer ...
  2. Hi, you might look at the open-source xc3sprog utility, it shows how it's done. Nevermind the name, it works also with 7 series with minor modifications (such as IDCODE and flash ID). I remember there is some header in the .bit file that is quite obviously for documentation purposes (open in text editor). But then, AFAIK it does no harm since the FPGA looks for some "magic" 32-bit word to recognize the start of the binary block. That is at least for JTAG-based upload (not sure about flash, I guess it's the same but I don't know). You might have a quick look into the configuration guide https://www.xilinx.com/support/documentation/user_guides/ug470_7Series_Config.pdf if it says anything about preparing a bitstream for flash.
  3. Hi, not sure if I understand this correctly but are you sure this can be done with this chip? It sounds like functionality internal to the microchip firmware.
  4. Hi, just a thought, looking at your diagram from a large distance. Most likely you have some power supplies with two pins (non-grounded). The problem is that the output is floating at half the mains voltage, set by a high resistance (megaohms) voltage divider. Ironically, this is to protect the power supply against ESD / charge buildup on the secondary side that could break the transformer's insulation. With such a supply, if you accidentally disconnect the ground connection to your circuit, you have half the AC voltage on the supply pin. I'd double-check all involved power supplies and make sure your connection scheme has a well-established ground even if some random cable comes loose.
  5. Hi, this may be a typo but "daddr" is not a "register". It's an input to the XADC. - wait for eoc - take the output from "chan" and - put it into daddr (zero-padded with two high bits) , raise den with dwe=0 - when drdy goes up, get the result from dout
  6. Thinking of which... actually I do have a plain-Verilog FIFO around from an old design. It's not a showroom piece but I think it did work as expected (whatever that is...) For 131072 elements you'd set ADDRBITS to 17 and DATABITS to 18 for 18 bit width. module FIFO(i_clk, i_reset, i_push, i_pushData, i_pop, o_popAck, o_popData, o_empty, o_full, o_error, o_nItems, o_nFree); parameter DATABITS = -1; parameter ADDRBITS = -1; localparam ADDR_ZERO = {{(ADDRBITS){1'b0}}}; localparam ADDR_ONE = {{(ADDRBITS-1){1'b0}}, 1'b1}; localparam DATA_X = {{(DATABITS){1'bx}}}; input wire i_clk; input wire i_push; input wire i_reset; input wire [DATABITS-1:0] i_pushData; input wire i_pop; output reg o_popAck = 1'b0; output wire [DATABITS-1:0] o_popData; output reg o_error = 1'b0; output wire [31:0] o_nItems; output wire [31:0] o_nFree; output wire o_empty; output wire o_full; reg popAckB = 1'b0; reg [DATABITS-1:0] mem[((1 << ADDRBITS)-1):0]; reg [ADDRBITS-1:0] pushPtr = ADDR_ZERO; reg [ADDRBITS-1:0] popPtr = ADDR_ZERO; reg [DATABITS-1:0] readReg = DATA_X; reg [DATABITS-1:0] readRegB = DATA_X; wire [ADDRBITS-1:0] nextPushPtr = i_push ? pushPtr + ADDR_ONE : pushPtr; wire [ADDRBITS-1:0] nextPopPtr = i_pop ? popPtr + ADDR_ONE : popPtr; assign o_popData = o_popAck ? readReg : DATA_X; // === items counter === // note: needs extra bit (e.g. 4 slots may hold [0, 1, 2, 3, 4] elements) reg [ADDRBITS:0] nItems; assign o_nItems = {{{31-ADDRBITS-1}{1'b0}}, nItems}; assign o_nFree = (1 << ADDRBITS) - nItems; localparam NITEMS_ONE = {{(ADDRBITS){1'b0}}, 1'b1}; assign o_empty = nItems == 0; assign o_full = nItems == {1'b1, {{ADDRBITS}{1'b0}}}; always @(posedge i_clk) begin // === preliminary assignments === readRegB <= DATA_X; popAckB <= 1'b0; case ({i_push, i_pop}) 2'b10: nItems <= nItems + NITEMS_ONE; 2'b01: nItems <= nItems - NITEMS_ONE; default: begin end endcase o_error <= (i_push && ~i_pop && o_full) || (i_pop && o_empty); // === output register (delay 1) === o_popAck <= popAckB; readReg <= readRegB; pushPtr <= nextPushPtr; popPtr <= nextPopPtr; if (i_push) mem[pushPtr] <= i_pushData; if (i_pop) begin readRegB <= mem[popPtr]; popAckB <= 1'b1; end if (i_reset) begin pushPtr <= ADDR_ZERO; popPtr <= ADDR_ZERO; o_error <= 1'b0; o_popAck <= 1'b0; popAckB <= 1'b0; readReg <= DATA_X; readRegB <= DATA_X; nItems <= 0; end end endmodule
  7. Yes, you can combine more than one block RAM. There is more than one way to implement a FIFO. If I had to do it for myself, I'd write it in plain Verilog, it's about two or three screen lengths of code if the interface requirements are "clean" (such as, one clock and freedom to leave a few clock cycles of latency, before the first input appears at the output). I didn't check but I think there is an "IP block wizard" for FIFOs in Vivado that may do what you need. With "expensive" I meant just that, it costs a lot of money to use half an FPGA just for memory.
  8. Well, to be honest, I didn't read the datasheet to the high-capacity devices with 9M bits. So this one isn't even EOL. Well, it depends. Have a look at https://www.xilinx.com/support/documentation/data_sheets/ds180_7Series_Overview.pdf table 6. There are 325 of those 36 kB blocks on the FPGA (11700 kB in total), so you need about 1/4 of the total FPGA for memory. Technically feasible and easiest to implement but a very expensive FIFO. Now, connecting this chip to realize its full performance potential (e.g. 225 MHz) is not straightforward. From between the lines ... >> Do I must use all pins of external FIFO? ... I read that you don't have hands-on experience with e.g. CMOS ICs (the answer is "you must drive any input at any time, unless the data sheet says explicitly otherwise", or strange things will happen. "Strange" in a sense that the circuit may respond to waving my hands over it, and that's not an exaggeration). If I'm correct in this, it may be a good idea to get e.g. a few CD4017 or some other standard CMOS chip with simple functionality for < $1 and use this to bring up your FPGA IOs. If the FIFO chip doesn't work because of IO issues, it will be near-impossible to debug.
  9. you might give a bit more information to not be mistaken for a lazy student. My first thought is simply "do not". The component is EOL and you can have the same using the FPGA's BRAM with a LOT less hassle.
  10. https://www.xilinx.com/support/documentation/white_papers/wp389_Lowering_Power_at_28nm.pdf page 3
  11. >> is it possible to present DVFS on it. >> For now I now about clock wizard, DCM, PLL for different clock generation (frequency) but this is not frequency scaling mi right? you may have your own answer there. This is some university project? Have you done your own research? For example, this has all the right keywords: https://highlevel-synthesis.com/2017/04/12/voltage-scaling-on-xilinx-zynq
  12. if it helps, UARTs are extremely robust towards frequency error, in the order of percent (the protocol effectively wastes ~10% throughput on synchronization). The closest integer UART divider will probably work just fine. >> Synchronizer theory works only for clock domains that have totally independent sources. Not sure what you mean with that. A CDC needs to function at any possible phase delta between two clocks. If the two clocks are from the same source and co-periodic in some length, the random distribution of the phase looks different but it's just a special case and should still work.
  13. Just be aware that most of the "legacy" material on FIR filters limits itself to what can be presented conveniently. Numerical optimization is the tool of choice and there is no "cheating". Or, taking one step back to the filter specs, there is usually no need to specify a flat stopband and it can significantly reduce the required filter size (credits to Prof. Fred Harris) This only as.example where I can avoid unnecessary constraints from using a ready-made design process by writing my own solver. Which is actually not that hard, basing it on fminsolve or fminunc in Matlab / Octave. BTW, one reference on this topic I found useful: author = {Mathias Lang}, title = {Algorithms for the Constrained Design of Digital Filters with Arbitrary Magnitude and Phase Responses}, year = {1999} http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.6.9336 The title alone is interesting - "Arbitrary Magnitude and Phase Responses" - no one says a digital filter needs to be flat or linear phase (of course, at the expense of symmetry). Sometimes I wonder, our thinking seems to often get stuck in patterns and "templates". Take analog filters, for example: Chebyshev, Butterworth or Bessel, which one do I pick? But those are just corners of the design space, and the best filter for a given application is most likely somewhere in-between, if I can only design it (which is, again, a job for a numerical optimizer, even though this one is more difficult).
  14. The word "overclocking" may be even misleading - this architecture is used when the data rate is significantly lower than the multiplier's speed. Inputs to one (expensive) multiplier are multiplexed, so it can do the work for several or all taps. The "multiplexing" itself will get fairly expensive in logic fabric due to the number of data bits and coefficients. To the rescue comes the BRAM, which is essentially a giant demultiplexer-multiplexer combo, with a football field of flipflops in-between. You can find an example for this approach here. Out-of-the-box, it's unfortunately quite complicated because it does arbitrary rational resampling. Setting the rates equal for a standard FIR filter, you end up with only a few lines of remaining code. BTW, multiplier count as design metric is probably overrated nowadays for several reasons (the IP tool resource usage is already more practical, e.g. BRAM count may become the bottleneck). If you can get this book through your library, you might have a look e.g. at chapter 6 (background only, this is a very old book): Keshab K. Parhi VLSI Digital Signal Processing Systems: Design and Implementation