Popular Content

Showing content with the highest reputation since 04/25/19 in all areas

  1. 2 points

    Increasing the clock frequency to 260 MHz

    Hi, reading between the lines of your post, you're just "stepping up" one level in FPGA design. I don't do long answers but here's my pick on the "important stuff" - Before, take one step back from the timing report and fix asynchronous inputs and outputs (e.g. LEDs and switches). Throw in a bunch of extra registers, or even "false-path" them. The problem (assuming this "beginner mistake") is that the design tries to sample them at the high clock rate. Which creates a near-impossible problem. Don't move further before this is understood, fixed and verified. - speaking of "verified": Read the detailed timing analysis and understand it. It'll take a few working hours to make sense of it but this is where a large part of "serious" design work happens. - Once the obvious problems are fixed, I need to understand what is the so-called "critical path" in the design and improve it. For a feedforward-style design (no feedback loops) this can be systematically done by inserting delay registers. The output is generated e.g. one clock cycle later but the design is able to run at a higher clock so overall performance improves. - Don't worry about floorplanning yet (if ever) - this comes in when the "automatic" intelligence of the tools fails. But, they are very good. - Do not optimize on a P&R result that fails timing catastrophically (as in your example - there are almost 2000 paths that fail). It can lead into a "rabbit's hole" where you optimize non-critical paths (which is usually a bad idea for long-term maintenance) - You may adjust your coding style based on the observations, e.g. throw in extra registers where they will "probably" make sense (even if those paths don't show up in the timing analysis, the extra registers allow the tools to essentially disregard them in optimization to focus on what is important) - There are a few tricks like forcing redundant registers to remain separate. Example, I have a dozen identical blocks that run on a common, fast 32-bit system clock and are critical to timing. Step 1, I sample the clock into a 32-bit register at each block's input to relax timing, and step 2) I declare these register as DONT_TOUCH because the tools would otherwise notice they are logically equivalent and try to use one shared instance. This as an example. - For BRAMs and DSP blocks, check the documentation where extra registers are needed (that get absorbed into the BRAM or DSP using a dedicated hardware register). This is the only way to reach the device's specified memory or DSP performance. - Read the warnings. Many relate to timing, e.g. when the design forces a BRAM or DSP to bypass a hardware register. - Finally, 260 MHz on Artix is already much harder than 130 MHz (very generally speaking). Usually feasible but you need to pay attention to what you're doing and design for it (e.g. a Microblaze with the wrong settings will most likely not make it through timing). - You might also have a look at the options ("strategy") but don't expect any miracles on a bad design. Ooops, this almost qualifies as "long" answer ...
  2. 2 points
    Thinking of which... actually I do have a plain-Verilog FIFO around from an old design. It's not a showroom piece but I think it did work as expected (whatever that is...) For 131072 elements you'd set ADDRBITS to 17 and DATABITS to 18 for 18 bit width. module FIFO(i_clk, i_reset, i_push, i_pushData, i_pop, o_popAck, o_popData, o_empty, o_full, o_error, o_nItems, o_nFree); parameter DATABITS = -1; parameter ADDRBITS = -1; localparam ADDR_ZERO = {{(ADDRBITS){1'b0}}}; localparam ADDR_ONE = {{(ADDRBITS-1){1'b0}}, 1'b1}; localparam DATA_X = {{(DATABITS){1'bx}}}; input wire i_clk; input wire i_push; input wire i_reset; input wire [DATABITS-1:0] i_pushData; input wire i_pop; output reg o_popAck = 1'b0; output wire [DATABITS-1:0] o_popData; output reg o_error = 1'b0; output wire [31:0] o_nItems; output wire [31:0] o_nFree; output wire o_empty; output wire o_full; reg popAckB = 1'b0; reg [DATABITS-1:0] mem[((1 << ADDRBITS)-1):0]; reg [ADDRBITS-1:0] pushPtr = ADDR_ZERO; reg [ADDRBITS-1:0] popPtr = ADDR_ZERO; reg [DATABITS-1:0] readReg = DATA_X; reg [DATABITS-1:0] readRegB = DATA_X; wire [ADDRBITS-1:0] nextPushPtr = i_push ? pushPtr + ADDR_ONE : pushPtr; wire [ADDRBITS-1:0] nextPopPtr = i_pop ? popPtr + ADDR_ONE : popPtr; assign o_popData = o_popAck ? readReg : DATA_X; // === items counter === // note: needs extra bit (e.g. 4 slots may hold [0, 1, 2, 3, 4] elements) reg [ADDRBITS:0] nItems; assign o_nItems = {{{31-ADDRBITS-1}{1'b0}}, nItems}; assign o_nFree = (1 << ADDRBITS) - nItems; localparam NITEMS_ONE = {{(ADDRBITS){1'b0}}, 1'b1}; assign o_empty = nItems == 0; assign o_full = nItems == {1'b1, {{ADDRBITS}{1'b0}}}; always @(posedge i_clk) begin // === preliminary assignments === readRegB <= DATA_X; popAckB <= 1'b0; case ({i_push, i_pop}) 2'b10: nItems <= nItems + NITEMS_ONE; 2'b01: nItems <= nItems - NITEMS_ONE; default: begin end endcase o_error <= (i_push && ~i_pop && o_full) || (i_pop && o_empty); // === output register (delay 1) === o_popAck <= popAckB; readReg <= readRegB; pushPtr <= nextPushPtr; popPtr <= nextPopPtr; if (i_push) mem[pushPtr] <= i_pushData; if (i_pop) begin readRegB <= mem[popPtr]; popAckB <= 1'b1; end if (i_reset) begin pushPtr <= ADDR_ZERO; popPtr <= ADDR_ZERO; o_error <= 1'b0; o_popAck <= 1'b0; popAckB <= 1'b0; readReg <= DATA_X; readRegB <= DATA_X; nItems <= 0; end end endmodule
  3. 1 point

    FPGA for network protocol conversion

    Hi, I can tell you this much that estimating required FPGA size is a challenging topic in general. Chances are high that an abstract analysis that's not based on experience will completely miss the point. For the PC, I think you are describing operating system overhead, not hardware limitations. Before I'd consider FPGA, I'd have a look at a bare-metal implementation, which can probably be approximated by editing the network card driver in a linux distribution (e.g. use one CPU core per inbound card and keep it in a spinlock waiting for data to avoid the slow context switch etc on interrupt). If your code runs otherwise from cache, a modern e.g. 5 GHz CPU is a force to be reckoned with.
  4. 1 point

    Microblaze to communicate to VHDL

    Thx, will look at it. Rob
  5. 1 point
    While not wrong, I'd caution that this advice might be overly optimistic. There's a reason why termination was given it's name; it generally needs to be as close to the source or terminus of a driven signal depending on the type of termination. It's one thing to lay out an FPGA PCB with the smallest available components to implement most inter-standard conversion schemes. It's quite another to get to work... on the first try... when you don't have past experience doing this sucessfully. While it's possible to implement AC coupled termination for such connections it's a risky business for those who don't know how to analyze and understand the design. Starting with a board that has its FPGA pins already assigned isn't going to work in your favor. Finally, if your source transfers data at a rate higher than the reference clock you need to understand all of the issues and limitations of using ISERDES2. None of this is to say that you can't do what you want to do, but you can damage your board and expend a lot of time and effort trying to fix the unfixable. 250 Mbps isn't an extreme data rate for Artix but it isn't trivial either. It's a lot easier to violate AC and DC IO specifications near logic switching events than you probably realize.
  6. 1 point
    Hi @aliff saad, I would suggest to make sure that there are no spaces in your paths. I downloaded the library and placed it here: C:\Users\jpeyron\Documents\Arduino\libraries\SparkFun_LSM9DS1_Arduino_Library-master\examples\LSM9DS1_Basic_SPI I then opened the arduino ide using the digilent core and ran the LSM9DS1_Basic_SPI.ino with no issue as shown in the attached screen shot below. best regards, Jon
  7. 1 point
    Maybe one comment: In the ASIC world, "floorplanning" is an essential design phase, where you slice and dice the predicted silicon area and give each design team their own little box. The blocks are designed and simulated independently, and come together only at a fairly late phase. ASIC differs from FPGA in some major ways: - ASIC IOs have a physical placement e.g. along the pad ring. We don't want to run sensitive signals across the chip, need to minimize coupling for mixed-signal etc. In comparison, FPGAs are probably more robust (a complex design will definitely consider the layout, especially on larger devices. But on smaller eval boards, the first restrictions I'll probably run into are logical e.g. which clock is available where, not geometrical). - For ASICs, we need the floorplan to design the power distribution network as an own sub-project (and many a bright-eyed startup has learned electromigration the hard way). - In the ASIC world, we need to worry about wide and fast data paths both regarding power and area - transistors are tiny but metal wires are not. You might have a look at "partial reconfiguration", here the geometry of the layout plays some role.
  8. 1 point
    Floorplanning is where you start before designing a new board. Once you've assigned pins and created a PCB your options for meeting timing for a particularly complex, dense, and high clock rate design are limited. Of course you will need to have a reasonably 'close to final version' of your FPGA design to start with so that the tools can select the best pin locations. For a general purpose development board like the one you are using only a few interfaces need to be 'optimized' for speed; and of course the speed grade of the parts on the board have a large impact on limiting the performance of any design. It is not always possible to select an arbitrary clock rate for any application for a particular board and always meet timing. On the other hand it's easy to create a design that doesn't have a chance to operate at a desired clock rate when a better conceived design might. Providing the tools with good guidance in the form of constraints is often the key to achieving a particular performance goal, though don't expect Vivado to turn a poor design into a greate design.
  9. 1 point

    Waveforms SDK calibration

    Hi @Thore The calibration is implicitly used in the SDK. The set or got voltage values with API functions are always corrected based on the device calibration parameters.
  10. 1 point

    Axi DMA from all memory

    Hi @Rickdegier, Welcome to the Digilent forums! I am not the most confident on this topic, but I have used the DMA some. The most important facet here is to make sure that your buffer is actually contained in the DDR memory. Different parts of the program can be placed in different memories using the linker script in your application project's src folder (lscript.ld). You should check that file to make sure that your global arrays are placed in the DDR. Second, if the data cache is enabled (likely), you should make sure to flush and invalidate the buffer memory area around your SimpleTransfer calls (functions to do this are in xil_cache.h). Lastly, I personally have had more success using malloc to create my buffers than using global or local arrays - I'm not sure why this is, from a cursory google search, it looks like the DMA will allow transfers into program memory when you aren't careful. You may want to reach out to Xilinx on their forums. Thanks, Arthur
  11. 1 point
    That's amazing to hear! I appreciate all of your help. I will update the board files now. Take care, Justen
  12. 1 point

    power supply

    a fuse?
  13. 1 point

    hdmi ip clocking error

    Hi @askhunter, I did a little more searching and found a forum thread here where the customer is having a similar issue. A community member also posted a pass through zynq project that should be useful for your project. best regards, Jon
  14. 1 point
    For the Protocol / SPI-I2C /Spy mode you should specify the approximate (or highest) protocol frequency which will be used to filter transient glitches, like ringing on clock signal transition. The Errors you get indicate the signals are not correctly captured. - make sure to have proper grounding between the devices/circuits - use twisted wires (signal/ground) to reduce EMI - use logic analyzer and/or scope to verify the captured data / voltage levels at higher sample rate at least 10x the protocol frequency Like here in the Logic Analyzer you can see a case when the samples are noisy:
  15. 1 point
    Hi, I indeed have a sd card with the correct files in. Anyway, i found a way to make it work without the LVLSHFT here : https://github.com/NicholsKyle/ECE387_SimonSays/wiki/My-Design#schematic-drawing. Now I'm struggling to display correctly on the MTDs, it's certainly my code the problem. But thanks for your help @jpeyron.
  16. 1 point

    hdmi ip clocking error

    Hi @askhunter, Please attach a screen shot of your vivado block design. Have you tried changing the MMCM to PLL in the DVI2RGB IP Core? best regards, Jon
  17. 1 point

    JTAG-HS2 firmware erased by accident

    Hi Jon Many thanks - cable now working Many Thanks Martin
  18. 1 point
    Hi @askhunter, I believe that you would only need the more recent DVI2RGB IP Core and the IF folder in the Vivado library. best regards, Jon
  19. 1 point
    Thank you Dan. Your answer helps clarify things greatly for me on this subject. I appreciate you taking the time to help me out. -Sean
  20. 1 point
    Yes, you can combine more than one block RAM. There is more than one way to implement a FIFO. If I had to do it for myself, I'd write it in plain Verilog, it's about two or three screen lengths of code if the interface requirements are "clean" (such as, one clock and freedom to leave a few clock cycles of latency, before the first input appears at the output). I didn't check but I think there is an "IP block wizard" for FIFOs in Vivado that may do what you need. With "expensive" I meant just that, it costs a lot of money to use half an FPGA just for memory.
  21. 1 point
    @longboard, Yeah, that's really confusing isn't it? At issue is the fact that many of these chips are specified in Mega BITS not BYTES. So the 1Gib is mean to refer to a one gigabit memory, which is also a 128 megabyte memory. That's what the parentheses are trying to tell you. Where this becomes a real problem is that I've always learned that a MiB is a reference to a million bytes, 10^6 bytes, rather than a mega byte, or 2^20 bytes. The proper acronyms, IMHO, should be Gb, GB, Mb, and MB rather than GiB or MiB which are entirely misleading. As for the memory, listed as 16 Meg x 8 x 8, that's a reference to 8-banks of 16-mega words or memory, where each word is 8-bits wide. In other words, the memory has 16MB*8 or 128MB of storage. You could alternatively say it had 1Gb of memory, which would be the same thing, but this is often confused with 1GB of memory--hence the desire for the parentheses again. Dan
  22. 1 point
    I'm going to echo @xc6lx45 and suggest that you reconsider. Does your Kintex have insufficient BRAM for an on-chip FIFO? Using BRAM would be so much easier. If you choose to use the external FIFO, you'll have to adapt the logic that you use to interface to the FIFO to use the signaling of this other FIFO. Sometimes it helps to post more context: what do you want the system to be able to do that it does not now?
  23. 1 point
    Hi @Esti.A, The first error that you get is the following one: ERROR: [Common 17-179] Fork failed: Cannot allocate memory This kind of error is generated when your machine does not have enough RAM memory. Please post here the configuration of your system (CPU, OS, RAM, ..). Do you have swap enabled? Also, have a look on this post.
  24. 1 point
    you might give a bit more information to not be mistaken for a lazy student. My first thought is simply "do not". The component is EOL and you can have the same using the FPGA's BRAM with a LOT less hassle.
  25. 1 point


    Hi @jpeyron, Thankyou for your kind reply. I will check the code that you have sent. The board that am using is Zynq ZC702, evalutaion kit. NO, I didn't try any auxillary channel. Today i will try and update you. Thankyou once again
  26. 1 point
    Thanks much, Jon. Best Cuikun
  27. 1 point

    Dynamic voltage and frequency scaling

    Here are two additional articles I have read on the technique being applied to a zynq. https://xilinx-wiki.atlassian.net/wiki/spaces/A/pages/18842065/Zynq-7000+AP+SoC+Low+Power+Techniques+part+5+-+Linux+Application+Control+of+Processing+System+-+Frequency+Scaling+More+Tech+Tip https://github.com/tulipp-eu/tulipp-guidelines/wiki/Dynamic-voltage-and-frequency-scaling-(DVFS)-on-ZC702
  28. 1 point

    Dynamic voltage and frequency scaling

    https://www.xilinx.com/support/documentation/white_papers/wp389_Lowering_Power_at_28nm.pdf page 3
  29. 1 point

    Dynamic voltage and frequency scaling

    >> is it possible to present DVFS on it. >> For now I now about clock wizard, DCM, PLL for different clock generation (frequency) but this is not frequency scaling mi right? you may have your own answer there. This is some university project? Have you done your own research? For example, this has all the right keywords: https://highlevel-synthesis.com/2017/04/12/voltage-scaling-on-xilinx-zynq
  30. 1 point
    Hi @Phil_D The gain switch is adjusted automatically based on the selected scope range. At 500mV/div (5Vpk2pk ~0.3mV resolution) or lower the high gain is used with and above this the low gain (50Vpk2pk w ~3mV resolution). In case you specify trigger level out of the screen (5Vpk2pk) or offset higher/lower than +/- 2.5V the low gain will be used for the trigger source channel. This will be noted on the screen with red warning text. The attenuation is a different thing. This option lets you specify the external attenuation or amplification on the signals which enter the scope inputs and the data is scaled accordingly. Like, if you use a 10x scope probe, the scope input will actually get 1/10th of the original signal, but specifying 10x attenuation the signal is scaled to show values on the probe. In this case the 500mV/div (5Vpk2pk) low/high gain limit moves up to 5V/div (50Vpk2pk) and the low gain up to 50V/div If you have an external 100x amplifier on the scope input you can specify 0.01x attenuation. With this you will have 5mV/div (50mVpk2pk ~0.003mV resolution) for high gain.
  31. 1 point

    HS1 HS3 to Spartan 6 in Linux

    Hello Daniel What do you get with: djtgcfg enum and after that djtgcfg init <device name> Which flavour of Linux are you using? Have you installed digilent adept and utilities? Billy
  32. 1 point

    Stuck in SDK

    Kris, Printing via xil_print is using STDIO by default and could reconfigured in Vivado and SDK. With little info you provided a lot left for guessing. My last recommendation is to pay attention to a configuration of your build in SDK. The interrupt interrupt service routine (ISR) might not work if GCC compiler has optimization ON. To make it work the variable in the ISR should be declared volatile. Debug build usually has flag -O0 that optimization none. Good luck!
  33. 1 point

    Understand Resource Usage for FIR Compiler

    @aabbas02, What's the data rate of the filter compared to the number of taps? As in, are you going to need to accept one sample per clock into this filter, or will you have reliable idle clock periods to work with? I'm wondering, if you have to build this, how easy/hard it might be. Dan
  34. 1 point
    So, if I am getting the point of the previous two posts from xclx45 and Dan, make your own filter in Verilog or VHDL and figure out the details ( signed, unsigned, fixed point, etc, .... etc ). You can instantiate DSP48E primitives (difficult) or just let the synthesis tool do it from your HDL code (easier). Debating how things should be verses how they are when using third party scripts to generate unreadable code seems like a waste of time to me... If you don't like what you get then design what you want. If you can make the time to write your own IP ( would be nice to not depend on a particular vendor's DSP architecture ) you'll learn a lot and save a lot of time later. If a vendor's IP doesn't make timing closure for your design a nightmare and you don't have the time to figure out all the details just let the IP handle it. I suspect that trying to optimize the FIR Compiler will be frustrating at best. I once had to come up with pipelined code to calculate signal strength in dB at a minimum sustained data rate. My approach to converting 32-bit integers to logarithms used a combination of Taylor Series expansion and look-up tables. I had a few versions**. One was straight VHDL so that I could compare Altera and Xilinx DSP tiles. One instantiated DSP48 tile primitives for a particular Xilinx device. These were fixed point designs. There's theory and there's practical experience.. they are usually not the same. ** I played with a number of approaches based on extremely limited specifications so there were quite a few versions. Every time I presented one the requirements changed and so did the complexity and resource requirements. I should mention that my intent for mentioning this experience is not to denigrate the information presented by others or to claim superiority in any way. When getting advice it's important to put that into context. A lot of times facts aren't necessarily relevant to solving a particular problem. If I haven't made this clear I've never had the experience that vendor IP optimizes resource usage... in fact quite the opposite. This is why in a commercial setting companies are willing to pay to develop their own IP. Sometimes FPGA silicon costs overshadow development costs.
  35. 1 point
    The word "overclocking" may be even misleading - this architecture is used when the data rate is significantly lower than the multiplier's speed. Inputs to one (expensive) multiplier are multiplexed, so it can do the work for several or all taps. The "multiplexing" itself will get fairly expensive in logic fabric due to the number of data bits and coefficients. To the rescue comes the BRAM, which is essentially a giant demultiplexer-multiplexer combo, with a football field of flipflops in-between. You can find an example for this approach here. Out-of-the-box, it's unfortunately quite complicated because it does arbitrary rational resampling. Setting the rates equal for a standard FIR filter, you end up with only a few lines of remaining code. BTW, multiplier count as design metric is probably overrated nowadays for several reasons (the IP tool resource usage is already more practical, e.g. BRAM count may become the bottleneck). If you can get this book through your library, you might have a look e.g. at chapter 6 (background only, this is a very old book): Keshab K. Parhi VLSI Digital Signal Processing Systems: Design and Implementation
  36. 1 point

    Understand Resource Usage for FIR Compiler

    @aabbas02, So let's start at the top. An N-point FIR filter, such as this one, requires N multiplies and N-1 adds. Let's just count multiplies, though, for now. Let's now look at some optimizations you might apply. A complex multiply usually requires 4 multiplies. If you have complex input and taps, that's 4N real multiplies. There's a trick you can use to drop this to 3N multiplies. If the filter is real, and the incoming signal is complex, you can drop this to 2N multiplies by filtering each of the real and imaginary channels separately. If your filter taps are fixed, a good optimizer should be able to replace the zeros with a constant output, the ones with data pass through, etc. Implementing taps with two or three bits this way is possible. This only works, though, if the filter taps are fixed. Many of the common FIR filter developments generate linear phase filters. These filters are symmetric about a common point. With a little bit of work, you can use that to reduce the number of multiplies (for dynamic taps) down to (N-1)/2. There is a very significant class of filters called half band filters. In a half band filter, every other tap (other than the center tap) is zero. With a bit of work, you can then drop your usage down to (N-1)/4 multiplies. This optimization applies to Hilbert transforms as well. In the given list above, I've assumed that you need to operate at one sample in and one sample out per clock. In that case, there's no time or room for memory, since all of the stages need their data on every clock. I should also point out that I'm counting multiplies, not DSP slices. If your multiplies are smaller than 18x18 bits, you may be able to use a single DSP slice per multiply. If they are larger, you might find yourself using many DSP slices per multiply. It depends. Let's now consider the case where you have an N sample filter but you are only going to provide it with one sample of data every N+ samples (some number more than N). You could then multiplex this multiply and implement an N-point filter with 1 multiply and 2^(ceil(log_2(N)) RAM elements. If your filter was symmetric, you could process a filter roughly twice as long, or a sample rate roughly twice as fast while still using a single multiply. The half-band and hilbert tricks can apply to (roughly) double your filter size or your data rate again. In these cases, however, you can't spare the multiply at all, since it is used to implement every coefficient. That should give you both ends of the spectrum. Now, while I don't understand how Xilinx's FIR compiler works, I can say that if I were to make one I would allow a user to design crosses between these two ends of the spectrum. In the case of a such a cascaded filter, however, you may find it difficult to continue to implement the optimizations we've just discussed, simply because the cascaded structure becomes more difficult to deal with. (By cascade, I mean cascade in implementation and not filtering a stream and then filtering it again.) Looking at your two designs, none of the optimizations I've mentioned would apply. In the high speed filters we started out with, sure, you might manage to remove a multiply or two by multiplying by zero. On the other hand, you can't share multiplies across elements if you do so. For example, you mentioned [1 2 3 4 0 1 2 3 4]. Sure, it's got repeating coefficients, but this is actually a really obscure filtering structure, and not likely one that I (were I Xilinx) would pay to support. Try [ 1 2 3 4 0 4 3 2 1] instead and see if things change. Similarly for [ 1 0 0 0 0 0 0 0 1]. Yeah, it's symmetric about the mid-point, but in all of my symmetric filtering applications I'm assuming the mid point has a value of 2^N-1 or similar so that it doesn't need a multiply. Your choice of filters offers no such optimization, so who knows how it might work. Hope this helps to shed some light on things, Dan
  37. 1 point
    Xilinx encrypts the FIR compiler code so it's difficult to figure out what's going on with your experiments. You should read PG149 which is the IP product Guide for the FIR Compiler. It does indicate when coefficient symmetry optimizations might or won't be made. It also has a link to a Resource Utilization "Spreadsheet" for some FPGA family devices depending on your customized specifications. Oddly, that bit of guidance doesn't show any usages where significant numbers of BRAMS are ever used. The guide does have a section titled "Resource Considerations" that has some interesting information but not enough to answer you questions. I suspect that you'll have to understand actual resource utilization for your application empirically. (SOP from my experience when I really need optimal performance) It's always a good idea to be familiar with all device and IP documentation though, in my experience, the IP documentation rarely addresses all of my questions when I'm deep into a design and committed to IP that's not mine. Again, my guess is that when you specify a clock rate, sample rate, data and coefficient widths, etc resource utilization increases with increasing throughput. The same thing happens if you pipeline your code to maximize data rates. But I'm only guessing. I don't use the FIR Compiler so my views are an extrapolation of experience with other IP from all FPGA vendors that I have used.
  38. 1 point
    Amen. And that's a good view for all Xilinx IP. The structure for a simple digital filter is not that complex; you can implement them in HDL. I've done that. Xilinx IP is convenient but usually not the best approach when you have concerns about using up limited resources. The issue for using very fast resources like BRAM and DSP slices is that they are placed in particular locations throughout the device with limited routing resources for the signals between them or other logic. You can let Xilinx balance throughput, resource usage, logic placement, and throughput or you can try to do that yourself. Trying to use 100% of every BRAM or DSP resource in order to minimize the number of BRAM or DSP resources used is not easy. In my experience FPGA vendors are content to have their IP wizards make the customer think that he needs a larger and more expensive device. So that's the trade-off; let the vendors' tools do the work to save time or write your own IP and be responsible for taking care of all the little details that the IP hides form you. I've spent some time experimenting with DSP resources from various FPGA vendors. They are complicated with a lot of modes and depending on how you use them throughput can decline substantially from the ideal. Just read the user's guide and switching specs in the datasheet to get the idea. Generally the DSP slices are arranged to perform optimally with certain topologies but not all. Implementing designs that are iterative or have feedback can get ugly; especially when you try and fit that into a larger design using most of the devices resources. As a general rule, in my experience, use vendor IP and don't ask a lot of questions or design your own IP and be prepared to learn how to handle a lot of details that aren't obvious. Time verses convenience.
  39. 1 point
    Hi, [1 2 3 4 0 1 2 3 4] is not symmetric in a linear-phase sense. That would be e.g. [1 2 3 4 0 4 3 2 1]. You could exploit the shared coefficients manually, see e.g. Figure 3 for the general concept. But this case is so unusual that I doubt tools will take it into account. The tool does nothing magical. If performance matters more than design time, you'll always get better results for one specific problem with manual design. One performance metric is multiplier utilization (e.g. assuming you design for a 200 MHz clock, one DSP delivering 200M operations / second performs at 100 %. Reaching 50+ % is a realistic goal for a simple / single rate structure). For example, do I want to use an expensive BRAM at all, when I could use ring shift registers for delay line and coefficients. Then you only need a small controlling state machine around it that does a full circular shift for each sample, muxing in the new input sample every full cycle (the BRAM makes more sense when the filter serves many channels in parallel, then FF count becomes an issue).
  40. 1 point

    HID protocol on Basys3

    From the Basys3 Reference Manual: The Auxiliary Function microcontroller hides the USB HID protocol from the FPGA and emulates an old-style PS/2 bus. The microcontroller behaves just like a PS/2 keyboard or mouse would. This means new designs can re-use existing PS/2 IP cores. I've written code for Digilent's PS/2 interface and have never configured the keyboard. Be aware that pressing some keys send more than one keycode byte. I don't have the Baysys3 board so I can't confirm that the PS/2 emulation is exactly as claimed. [edit] One of the Forums around here is the Project Vault where users can (should) post code and projects. You might find my submission 'Fast Inter-board Interface' to have some interesting stuff to get you started. Look in the HDL source for the Genesys.
  41. 1 point
    My my... I'm not sure what LFSR you are using but mine have a shift enable input so that I can use any clock that's available but update the LFSR output at almost any update rate needed. You can create a counter to control the shift enable so that it's synchronous with whatever logic is running at 8 KHz and needs data at that rate. It's typical in a design to have lots of parts of the logic changing states at lots of different frequencies. You don't want separate clock domains for all of those rates even if those clocks are derived and phase coherent. Sometimes, for high speed applications you do need a higher, phase coherent clock; like in video where there might be a reference clock and a higher but synchronous pixel clock. In general it's best to have the minimum number of clock domains in a design that you can get away with. FPGA devices don't have long analog or combinatorial delay lines on the order of microseconds or milliseconds. The Series 7 devices do allow adding very small delays to signals coming into FPGA pins via the IDELAY2 primitive. If your device has outputs on pins on an HP bank you can also add similar small delays to output signals using the ODELAY2 primitive. Synchronous delays lines using counters and enables as I mentioned before are the normal way to achieve teh equivalent of the analog delay line that used to be part of some digital logic long long ago.
  42. 1 point

    A UART Based Debugger Tool

    Here's a utility for debugging and testing your code in hardware and uses any IO pin to send an ASCII representation of any signal through a hardware UART interface. If you don't have a UART on you FPGA board there are TTL USB UART breakout boards and cables that allow any spare IO pin to become a UART interface. This code is functionally the same as one recently released by Hamster but developed independently for the Fast Data Interface project. I recommend comparing the different coding styles. I decided to release this as a separate project as there are likely more people interested in this one that the other. This project contains test bench code. UartDebuggerR3.zip
  43. 1 point
    HI @Sandrine In Sync mode the trigger is not available. The I2S interpreter needs to see the transitions on the clock signal, so if you use Sync mode select Edge option (sample on both edges) for Clock signal. Repeated captures for the Logic Analyzer can be done from Script tool like this: for(var c = 0; c < 10 && wait(); c++){ print(c) Logic.run() Logic.wait() }
  44. 1 point
    Hi @aliff saad, Welcome to the forums. Here is the resource center for the PmodNAV. Here is a completed INO file, basically a main file, for the PmodNAV and an Arduino. The error is stating that LSM9DS1 is not assigned to a data type I.E. int, char... or in the case of the INO it is a class. Did you also download the SparkFun LSM9DS1 Arduino Library and add the src files to your project? cheers, Jon
  45. 1 point

    Zybo Z7-10 audio passthrough

    Hi @jpeyron The DMA audio demo uses the d_axi_i2s_audio IP core, which has a S2MM output and MM2S input. The first thing I've tried was to route the output directly into the input, which didn't work. In addition, the way the C code handles recording is by configuring the DMA block to record, then telling the i2s core to store N bytes from the input into a register. The HDMI demo works by reading video data into a series of video buffers, and displaying image data from a series of frame buffers. I can make an HDMI passthrough by pointing the display output buffer to the video input buffer. I'm wondering if there's a similar solution for audio. I've had some trouble getting that instructables project to work. The i2s controller looks fine, but the SerialEffects block doesn't seem to match the block diagram (which is really blurry). I'll try it again and see if I missed something. Does that d_axi_i2s_audio IP core have any documentation?
  46. 1 point

    MMCM dynamic clocking

    @rangaraj, It's a shame you only know VHDL coding, since the Verilog code I posted above would give you the ability to generate an arbitrary clock--unencumbered by the constraints of the PLL, with frequency resolution in the milli-Hertz range (100MHz/2^32). Perhaps you want to take another look at it? Sure, it would have some phase noise, but ... that could be beaten now if necessary by using an OSERDESE2 component. I've got an example design I'm working on that does just that and should knock the phase noise down to 1.25ns or better. Dan
  47. 1 point

    MMCM dynamic clocking

    Hey, something else I just saw when reading the clocking guide was: MMCM Counter Cascading The CLKOUT6 divider (counter) can be cascaded with the CLKOUT4 divider. This provides a capability to have an output divider that is larger than 128. CLKOUT6 feeds the input of the CLKOUT4 divider. There is a static phase offset between the output of the cascaded divider and all other output dividers. And: CLKOUT4_CASCADE : Cascades the output divider (counter) CLKOUT6 into the input of the CLKOUT4 divider for an output clock divider that is greater than 128, effectively providing a total divide value of 16,384. So that can divide a 600 MHz VCO down to 36.6 kHz.
  48. 1 point

    MMCM dynamic clocking

    I feel a bit bad about posting a minor novel here, but here is an example of going from "5 cycles on, 5 off" (i.e. divide by 10) to "10 on, 10 off" (device by 20). The VCO is initially to 800 MHz with CLK0 being VCO divide by 8.... so after config you get 100MHz. Push the button and you get 800/20 = 40MHz, release the button and you get 80MHz. It is all really hairy in practice! EDIT: Through experimentation I just found that you don't need to reset the MMCM if you are not changing the VCO frequency. So the 'rst' signal in the code below isn't needed (and LOCKED will stay asserted). -------------------------------------------------------------------------------------------------------- -- Playing with the MMCM DRP ports. -- see https://www.xilinx.com/support/documentation/application_notes/xapp888_7Series_DynamicRecon.pdf -- for the Dynamic Reconviguration Port addresses -------------------------------------------------------------------------------------------------------- library IEEE; use IEEE.STD_LOGIC_1164.ALL; use IEEE.NUMERIC_STD.ALL; library UNISIM; use UNISIM.VComponents.all; entity mmcm_reset is Port ( clk_100 : in STD_LOGIC; btn_raw : in STD_LOGIC; led : out STD_LOGIC_VECTOR (15 downto 0)); end mmcm_reset; architecture Behavioral of mmcm_reset is signal btn_meta : std_logic := '0'; signal btn : std_logic := '0'; signal speed_select : std_logic := '0'; signal counter : unsigned(26 downto 0) := (others => '0'); signal debounce : unsigned(15 downto 0) := (others => '0'); signal clk_switched : std_logic := '0'; signal clk_fb : std_logic := '0'; type t_state is (state_idle_fast, state_go_slow_1, state_go_slow_2, state_go_slow_3, state_idle_slow, state_go_fast_1, state_go_fast_2, state_go_fast_3); signal state : t_state := state_idle_fast; ----------------------------------------------------------------------------- --- This is the CLKOUT0 ClkReg1 address - the only register to be played with ----------------------------------------------------------------------------- signal daddr : std_logic_vector(6 downto 0) := "0001000"; signal do : std_logic_vector(15 downto 0) := (others => '0'); signal drdy : std_logic := '0'; signal den : std_logic := '0'; signal di : std_logic_vector(15 downto 0) := (others => '0'); signal dwe : std_logic := '0'; signal rst : std_logic := '0'; begin MMCME2_ADV_inst : MMCME2_ADV generic map ( BANDWIDTH => "OPTIMIZED", -- Jitter programming (OPTIMIZED, HIGH, LOW) CLKFBOUT_MULT_F => 8.0, -- Multiply value for all CLKOUT (2.000-64.000). CLKFBOUT_PHASE => 0.0, -- Phase offset in degrees of CLKFB (-360.000-360.000). -- CLKIN_PERIOD: Input clock period in ns to ps resolution (i.e. 33.333 is 30 MHz). CLKIN1_PERIOD => 10.0, CLKIN2_PERIOD => 0.0, -- CLKOUT0_DIVIDE - CLKOUT6_DIVIDE: Divide amount for CLKOUT (1-128) CLKOUT1_DIVIDE => 1, CLKOUT2_DIVIDE => 1, CLKOUT3_DIVIDE => 1, CLKOUT4_DIVIDE => 1, CLKOUT5_DIVIDE => 1, CLKOUT6_DIVIDE => 1, CLKOUT0_DIVIDE_F => 8.0, -- Divide amount for CLKOUT0 (1.000-128.000). -- CLKOUT0_DUTY_CYCLE - CLKOUT6_DUTY_CYCLE: Duty cycle for CLKOUT outputs (0.01-0.99). CLKOUT0_DUTY_CYCLE => 0.5, CLKOUT1_DUTY_CYCLE => 0.5, CLKOUT2_DUTY_CYCLE => 0.5, CLKOUT3_DUTY_CYCLE => 0.5, CLKOUT4_DUTY_CYCLE => 0.5, CLKOUT5_DUTY_CYCLE => 0.5, CLKOUT6_DUTY_CYCLE => 0.5, -- CLKOUT0_PHASE - CLKOUT6_PHASE: Phase offset for CLKOUT outputs (-360.000-360.000). CLKOUT0_PHASE => 0.0, CLKOUT1_PHASE => 0.0, CLKOUT2_PHASE => 0.0, CLKOUT3_PHASE => 0.0, CLKOUT4_PHASE => 0.0, CLKOUT5_PHASE => 0.0, CLKOUT6_PHASE => 0.0, CLKOUT4_CASCADE => FALSE, -- Cascade CLKOUT4 counter with CLKOUT6 (FALSE, TRUE) COMPENSATION => "ZHOLD", -- ZHOLD, BUF_IN, EXTERNAL, INTERNAL DIVCLK_DIVIDE => 1, -- Master division value (1-106) -- REF_JITTER: Reference input jitter in UI (0.000-0.999). REF_JITTER1 => 0.0, REF_JITTER2 => 0.0, STARTUP_WAIT => FALSE, -- Delays DONE until MMCM is locked (FALSE, TRUE) -- Spread Spectrum: Spread Spectrum Attributes SS_EN => "FALSE", -- Enables spread spectrum (FALSE, TRUE) SS_MODE => "CENTER_HIGH", -- CENTER_HIGH, CENTER_LOW, DOWN_HIGH, DOWN_LOW SS_MOD_PERIOD => 10000, -- Spread spectrum modulation period (ns) (VALUES) -- USE_FINE_PS: Fine phase shift enable (TRUE/FALSE) CLKFBOUT_USE_FINE_PS => FALSE, CLKOUT0_USE_FINE_PS => FALSE, CLKOUT1_USE_FINE_PS => FALSE, CLKOUT2_USE_FINE_PS => FALSE, CLKOUT3_USE_FINE_PS => FALSE, CLKOUT4_USE_FINE_PS => FALSE, CLKOUT5_USE_FINE_PS => FALSE, CLKOUT6_USE_FINE_PS => FALSE ) port map ( -- Clock Outputs: 1-bit (each) output: User configurable clock outputs CLKOUT0 => clk_switched, CLKOUT0B => open, CLKOUT1 => open, CLKOUT1B => open, CLKOUT2 => open, CLKOUT2B => open, CLKOUT3 => open, CLKOUT3B => open, CLKOUT4 => open, CLKOUT5 => open, CLKOUT6 => open, -- Dynamic Phase Shift Ports: 1-bit (each) output: Ports used for dynamic phase shifting of the outputs PSDONE => open, -- Feedback Clocks: 1-bit (each) output: Clock feedback ports CLKFBOUT => clk_fb, CLKFBOUTB => open, -- Status Ports: 1-bit (each) output: MMCM status ports CLKFBSTOPPED => open, CLKINSTOPPED => open, LOCKED => open, -- Clock Inputs: 1-bit (each) input: Clock inputs CLKIN1 => clk_100, CLKIN2 => '0', -- Control Ports: 1-bit (each) input: MMCM control ports CLKINSEL => '1', PWRDWN => '0', -- 1-bit input: Power-down RST => rst, -- 1-bit input: Reset -- DRP Ports: 16-bit (each) output: Dynamic reconfiguration ports DCLK => clk_100, -- 1-bit input: DRP clock DO => DO, -- 16-bit output: DRP data DRDY => DRDY, -- 1-bit output: DRP ready -- DRP Ports: 7-bit (each) input: Dynamic reconfiguration ports DADDR => DADDR, -- 7-bit input: DRP address DEN => DEN, -- 1-bit input: DRP enable DI => DI, -- 16-bit input: DRP data DWE => DWE, -- 1-bit input: DRP write enable -- Dynamic Phase Shift Ports: 1-bit (each) input: Ports used for dynamic phase shifting of the outputs PSCLK => '0', PSEN => '0', PSINCDEC => '0', -- Feedback Clocks: 1-bit (each) input: Clock feedback ports CLKFBIN => clk_fb ); speed_change_fsm: process(clk_100) begin if rising_edge(clk_100) then di <= (others => '0'); dwe <= '0'; den <= '0'; case state is when state_idle_fast => if speed_select = '1'then state <= state_go_slow_1; -- High 10 Low 10 di <= "0001" & "001010" & "001010"; dwe <= '1'; den <= '1'; end if; when state_go_slow_1 => if drdy = '1' then state <= state_go_slow_2; end if; when state_go_slow_2 => rst <= '1'; state <= state_go_slow_3; when state_go_slow_3 => rst <= '0'; state <= state_idle_slow; when state_idle_slow => di <= (others => '0'); if speed_select = '0' and drdy = '0' then state <= state_go_fast_1; -- High 5 Low 5 di <= "0001" & "000101" & "000101"; dwe <= '1'; den <= '1'; end if; when state_go_fast_1 => if drdy = '1' then state <= state_go_fast_2; end if; when state_go_fast_2 => rst <= '1'; state <= state_go_fast_3; when state_go_fast_3 => rst <= '0'; state <= state_idle_fast; end case; end if; end process; dbounce_proc: process(clk_100) begin if rising_edge(clk_100) then if speed_select = btn then debounce <= (others => '0'); elsif debounce(debounce'high) = '1' then speed_select <= not speed_select; else debounce <= debounce + 1; end if; -- Syncronise the button btn <= btn_meta; btn_meta <= btn_raw; end if; end process; show_speed_proc: process(clk_switched) begin if rising_edge(clk_switched) then counter <= counter + 1; led(7 downto 0) <= std_logic_vector(counter(counter'high downto counter'high-7)); end if; end process; led(15) <= speed_select; end Behavioral;
  49. 1 point

    How to program Arty flash

    I figured out why the sck signal is not getting implemented- For some reason, "PARAM_VALUE.C_USE_STARTUP" is not set to zero, so when the file "board.xit" does not see this variable =0, it does not implement the sck pin (L16). I cheated and took out this test in "board.xit" (because I don't know where that PARAM_VALUE gets set). I added constraints for the pin: set_property PACKAGE_PIN L16 [get_ports qspi_flash_0_sck_io] set_property IOSTANDARD LVCMOS33 [get_ports qspi_flash_0_sck_io] and I now get a clock to the QSPI flash when the bootloader runs. One last step to solve- where to put my user program in flash that the bootloader will copy into DDR. The tools don't tell me how large the FPGA config file is (with compression on). I looked at the file size of the .bin file and rounded up to the next 1K, but I wish there was a programmatic way to do this from Vivado. Hurray- On power-up I can now config the FPGA, run the bootloader in block ram, which then copies my large user program from flash to DDR and executes.
  50. 1 point
    We have a few demo projects using the Basys2 in RF communications that we haven't gotten had time to document yet. Although it's different, there are a lot of similiarities that you may be able to use. Take a look at the attached Zip File. The demo project is using the Pmod RF1 to communicate keyboard presses to another Basys2 that is controlling a speaker. PmodRF1 Basys demo project by digilentinc, on Flickr Here is a link to get the files for the project: https://www.dropbox.com/s/83winp3zyio7or7/Basys2AudioRfPmodHID.zip?dl=0 Hope that helps!