Leaderboard


Popular Content

Showing content with the highest reputation since 04/23/19 in all areas

  1. 2 points
    xc6lx45

    Increasing the clock frequency to 260 MHz

    Hi, reading between the lines of your post, you're just "stepping up" one level in FPGA design. I don't do long answers but here's my pick on the "important stuff" - Before, take one step back from the timing report and fix asynchronous inputs and outputs (e.g. LEDs and switches). Throw in a bunch of extra registers, or even "false-path" them. The problem (assuming this "beginner mistake") is that the design tries to sample them at the high clock rate. Which creates a near-impossible problem. Don't move further before this is understood, fixed and verified. - speaking of "verified": Read the detailed timing analysis and understand it. It'll take a few working hours to make sense of it but this is where a large part of "serious" design work happens. - Once the obvious problems are fixed, I need to understand what is the so-called "critical path" in the design and improve it. For a feedforward-style design (no feedback loops) this can be systematically done by inserting delay registers. The output is generated e.g. one clock cycle later but the design is able to run at a higher clock so overall performance improves. - Don't worry about floorplanning yet (if ever) - this comes in when the "automatic" intelligence of the tools fails. But, they are very good. - Do not optimize on a P&R result that fails timing catastrophically (as in your example - there are almost 2000 paths that fail). It can lead into a "rabbit's hole" where you optimize non-critical paths (which is usually a bad idea for long-term maintenance) - You may adjust your coding style based on the observations, e.g. throw in extra registers where they will "probably" make sense (even if those paths don't show up in the timing analysis, the extra registers allow the tools to essentially disregard them in optimization to focus on what is important) - There are a few tricks like forcing redundant registers to remain separate. Example, I have a dozen identical blocks that run on a common, fast 32-bit system clock and are critical to timing. Step 1, I sample the clock into a 32-bit register at each block's input to relax timing, and step 2) I declare these register as DONT_TOUCH because the tools would otherwise notice they are logically equivalent and try to use one shared instance. This as an example. - For BRAMs and DSP blocks, check the documentation where extra registers are needed (that get absorbed into the BRAM or DSP using a dedicated hardware register). This is the only way to reach the device's specified memory or DSP performance. - Read the warnings. Many relate to timing, e.g. when the design forces a BRAM or DSP to bypass a hardware register. - Finally, 260 MHz on Artix is already much harder than 130 MHz (very generally speaking). Usually feasible but you need to pay attention to what you're doing and design for it (e.g. a Microblaze with the wrong settings will most likely not make it through timing). - You might also have a look at the options ("strategy") but don't expect any miracles on a bad design. Ooops, this almost qualifies as "long" answer ...
  2. 2 points
    Thinking of which... actually I do have a plain-Verilog FIFO around from an old design. It's not a showroom piece but I think it did work as expected (whatever that is...) For 131072 elements you'd set ADDRBITS to 17 and DATABITS to 18 for 18 bit width. module FIFO(i_clk, i_reset, i_push, i_pushData, i_pop, o_popAck, o_popData, o_empty, o_full, o_error, o_nItems, o_nFree); parameter DATABITS = -1; parameter ADDRBITS = -1; localparam ADDR_ZERO = {{(ADDRBITS){1'b0}}}; localparam ADDR_ONE = {{(ADDRBITS-1){1'b0}}, 1'b1}; localparam DATA_X = {{(DATABITS){1'bx}}}; input wire i_clk; input wire i_push; input wire i_reset; input wire [DATABITS-1:0] i_pushData; input wire i_pop; output reg o_popAck = 1'b0; output wire [DATABITS-1:0] o_popData; output reg o_error = 1'b0; output wire [31:0] o_nItems; output wire [31:0] o_nFree; output wire o_empty; output wire o_full; reg popAckB = 1'b0; reg [DATABITS-1:0] mem[((1 << ADDRBITS)-1):0]; reg [ADDRBITS-1:0] pushPtr = ADDR_ZERO; reg [ADDRBITS-1:0] popPtr = ADDR_ZERO; reg [DATABITS-1:0] readReg = DATA_X; reg [DATABITS-1:0] readRegB = DATA_X; wire [ADDRBITS-1:0] nextPushPtr = i_push ? pushPtr + ADDR_ONE : pushPtr; wire [ADDRBITS-1:0] nextPopPtr = i_pop ? popPtr + ADDR_ONE : popPtr; assign o_popData = o_popAck ? readReg : DATA_X; // === items counter === // note: needs extra bit (e.g. 4 slots may hold [0, 1, 2, 3, 4] elements) reg [ADDRBITS:0] nItems; assign o_nItems = {{{31-ADDRBITS-1}{1'b0}}, nItems}; assign o_nFree = (1 << ADDRBITS) - nItems; localparam NITEMS_ONE = {{(ADDRBITS){1'b0}}, 1'b1}; assign o_empty = nItems == 0; assign o_full = nItems == {1'b1, {{ADDRBITS}{1'b0}}}; always @(posedge i_clk) begin // === preliminary assignments === readRegB <= DATA_X; popAckB <= 1'b0; case ({i_push, i_pop}) 2'b10: nItems <= nItems + NITEMS_ONE; 2'b01: nItems <= nItems - NITEMS_ONE; default: begin end endcase o_error <= (i_push && ~i_pop && o_full) || (i_pop && o_empty); // === output register (delay 1) === o_popAck <= popAckB; readReg <= readRegB; pushPtr <= nextPushPtr; popPtr <= nextPopPtr; if (i_push) mem[pushPtr] <= i_pushData; if (i_pop) begin readRegB <= mem[popPtr]; popAckB <= 1'b1; end if (i_reset) begin pushPtr <= ADDR_ZERO; popPtr <= ADDR_ZERO; o_error <= 1'b0; o_popAck <= 1'b0; popAckB <= 1'b0; readReg <= DATA_X; readRegB <= DATA_X; nItems <= 0; end end endmodule
  3. 1 point
    Hi @aliff saad, I would suggest to make sure that there are no spaces in your paths. I downloaded the library and placed it here: C:\Users\jpeyron\Documents\Arduino\libraries\SparkFun_LSM9DS1_Arduino_Library-master\examples\LSM9DS1_Basic_SPI I then opened the arduino ide using the digilent core and ran the LSM9DS1_Basic_SPI.ino with no issue as shown in the attached screen shot below. best regards, Jon
  4. 1 point
    Hi @aliff saad, Please attach a screen shot of the Arduino IDE errors. Please attach the path of where you have install the LSM9DS1 library. best regards, Jon
  5. 1 point
    Maybe one comment: In the ASIC world, "floorplanning" is an essential design phase, where you slice and dice the predicted silicon area and give each design team their own little box. The blocks are designed and simulated independently, and come together only at a fairly late phase. ASIC differs from FPGA in some major ways: - ASIC IOs have a physical placement e.g. along the pad ring. We don't want to run sensitive signals across the chip, need to minimize coupling for mixed-signal etc. In comparison, FPGAs are probably more robust (a complex design will definitely consider the layout, especially on larger devices. But on smaller eval boards, the first restrictions I'll probably run into are logical e.g. which clock is available where, not geometrical). - For ASICs, we need the floorplan to design the power distribution network as an own sub-project (and many a bright-eyed startup has learned electromigration the hard way). - In the ASIC world, we need to worry about wide and fast data paths both regarding power and area - transistors are tiny but metal wires are not. You might have a look at "partial reconfiguration", here the geometry of the layout plays some role.
  6. 1 point
    artvvb

    Axi DMA from all memory

    Hi @Rickdegier, Welcome to the Digilent forums! I am not the most confident on this topic, but I have used the DMA some. The most important facet here is to make sure that your buffer is actually contained in the DDR memory. Different parts of the program can be placed in different memories using the linker script in your application project's src folder (lscript.ld). You should check that file to make sure that your global arrays are placed in the DDR. Second, if the data cache is enabled (likely), you should make sure to flush and invalidate the buffer memory area around your SimpleTransfer calls (functions to do this are in xil_cache.h). Lastly, I personally have had more success using malloc to create my buffers than using global or local arrays - I'm not sure why this is, from a cursory google search, it looks like the DMA will allow transfers into program memory when you aren't careful. You may want to reach out to Xilinx on their forums. Thanks, Arthur
  7. 1 point
    That's amazing to hear! I appreciate all of your help. I will update the board files now. Take care, Justen
  8. 1 point
    Zorroslade000

    Adding rs232refcomp to Microblaze

    Thx, That seems to work. Also, now I know where the examples are. Rob
  9. 1 point
    xc6lx45

    power supply

    a fuse?
  10. 1 point
    jpeyron

    hdmi ip clocking error

    Hi @askhunter, I did a little more searching and found a forum thread here where the customer is having a similar issue. A community member also posted a pass through zynq project that should be useful for your project. best regards, Jon
  11. 1 point
    For the Protocol / SPI-I2C /Spy mode you should specify the approximate (or highest) protocol frequency which will be used to filter transient glitches, like ringing on clock signal transition. The Errors you get indicate the signals are not correctly captured. - make sure to have proper grounding between the devices/circuits - use twisted wires (signal/ground) to reduce EMI - use logic analyzer and/or scope to verify the captured data / voltage levels at higher sample rate at least 10x the protocol frequency Like here in the Logic Analyzer you can see a case when the samples are noisy:
  12. 1 point
    Hi, I indeed have a sd card with the correct files in. Anyway, i found a way to make it work without the LVLSHFT here : https://github.com/NicholsKyle/ECE387_SimonSays/wiki/My-Design#schematic-drawing. Now I'm struggling to display correctly on the MTDs, it's certainly my code the problem. But thanks for your help @jpeyron.
  13. 1 point
    jpeyron

    hdmi ip clocking error

    Hi @askhunter, Please attach a screen shot of your vivado block design. Have you tried changing the MMCM to PLL in the DVI2RGB IP Core? best regards, Jon
  14. 1 point
    Hi @askhunter, I believe that you would only need the more recent DVI2RGB IP Core and the IF folder in the Vivado library. best regards, Jon
  15. 1 point
    Hi @askhunter, Here is the newest version of the DVI2RGB IP Core. Please add the full vivado library folder in the ip repository. There is mandatory files that need to be included for the DVI2RGB IP Core to work. What development board are you using? best regards, Jon
  16. 1 point
    zygot

    How to connect an external FIFO to FPGA

    I have a few random thoughts on the subject ( is anyone surprised? ) I looked over an old project where just for fun I used a 128Kx32 single clock FIFO built with BRAM. It was for the Nexys Video Artix device which has the same 36Kb BRAMs. it used 116 BRAMs and worked at 100 Mhz with a mid-range speed part. 36Kb/9 = 4096 bytes plus parity, 131072/4906 = 32 9-bit BRAMs, 4x32 = 128 BRAMs to implement a 128Kx32 FIFO so Vivado must have found some way to save 12 BRAMs. If you have 18-bit data that's fine as the BRAMs can be organized as 32Kx9 where the extra bit is meant for parity. From experience I can tell you that using the parity bit for data can get tricky but is entirely possible. If you need a dual clock FIFO then expect to use more BRAMs. If there isn't much else in your design timing won't be a problem. If you are trying to place 116 BRAMs into a complicated high speed design then you will find yourself needing to leanr about timing closure strategies. If I don't have to worry about resource usage, timing issues I'd use an HDL to implement RAM or FIFO structures as it's portable, more or less. I tend to just bite the bullet and use the vendors tools to implement resources like block memory, PLLs, and such as these resources aren't really that compatible between vendors and I usually do care about resource usage and timing. Also, vendor IP 'wizards' sometimes creates constraints for them and take care of a lot of little details that ultimately save time.
  17. 1 point
    Thank you Dan. Your answer helps clarify things greatly for me on this subject. I appreciate you taking the time to help me out. -Sean
  18. 1 point
    Yes, you can combine more than one block RAM. There is more than one way to implement a FIFO. If I had to do it for myself, I'd write it in plain Verilog, it's about two or three screen lengths of code if the interface requirements are "clean" (such as, one clock and freedom to leave a few clock cycles of latency, before the first input appears at the output). I didn't check but I think there is an "IP block wizard" for FIFOs in Vivado that may do what you need. With "expensive" I meant just that, it costs a lot of money to use half an FPGA just for memory.
  19. 1 point
    Well, to be honest, I didn't read the datasheet to the high-capacity devices with 9M bits. So this one isn't even EOL. Well, it depends. Have a look at https://www.xilinx.com/support/documentation/data_sheets/ds180_7Series_Overview.pdf table 6. There are 325 of those 36 kB blocks on the FPGA (11700 kB in total), so you need about 1/4 of the total FPGA for memory. Technically feasible and easiest to implement but a very expensive FIFO. Now, connecting this chip to realize its full performance potential (e.g. 225 MHz) is not straightforward. From between the lines ... >> Do I must use all pins of external FIFO? ... I read that you don't have hands-on experience with e.g. CMOS ICs (the answer is "you must drive any input at any time, unless the data sheet says explicitly otherwise", or strange things will happen. "Strange" in a sense that the circuit may respond to waving my hands over it, and that's not an exaggeration). If I'm correct in this, it may be a good idea to get e.g. a few CD4017 or some other standard CMOS chip with simple functionality for < $1 and use this to bring up your FPGA IOs. If the FIFO chip doesn't work because of IO issues, it will be near-impossible to debug.
  20. 1 point
    @longboard, Yeah, that's really confusing isn't it? At issue is the fact that many of these chips are specified in Mega BITS not BYTES. So the 1Gib is mean to refer to a one gigabit memory, which is also a 128 megabyte memory. That's what the parentheses are trying to tell you. Where this becomes a real problem is that I've always learned that a MiB is a reference to a million bytes, 10^6 bytes, rather than a mega byte, or 2^20 bytes. The proper acronyms, IMHO, should be Gb, GB, Mb, and MB rather than GiB or MiB which are entirely misleading. As for the memory, listed as 16 Meg x 8 x 8, that's a reference to 8-banks of 16-mega words or memory, where each word is 8-bits wide. In other words, the memory has 16MB*8 or 128MB of storage. You could alternatively say it had 1Gb of memory, which would be the same thing, but this is often confused with 1GB of memory--hence the desire for the parentheses again. Dan
  21. 1 point
    I'm going to echo @xc6lx45 and suggest that you reconsider. Does your Kintex have insufficient BRAM for an on-chip FIFO? Using BRAM would be so much easier. If you choose to use the external FIFO, you'll have to adapt the logic that you use to interface to the FIFO to use the signaling of this other FIFO. Sometimes it helps to post more context: what do you want the system to be able to do that it does not now?
  22. 1 point
    Hi @Esti.A, The first error that you get is the following one: ERROR: [Common 17-179] Fork failed: Cannot allocate memory This kind of error is generated when your machine does not have enough RAM memory. Please post here the configuration of your system (CPU, OS, RAM, ..). Do you have swap enabled? Also, have a look on this post.
  23. 1 point
    revathi

    xadc_zynq

    Hi @jpeyron, Thankyou for your kind reply. I will check the code that you have sent. The board that am using is Zynq ZC702, evalutaion kit. NO, I didn't try any auxillary channel. Today i will try and update you. Thankyou once again
  24. 1 point
    Thanks much, Jon. Best Cuikun
  25. 1 point
    kwilber

    Dynamic voltage and frequency scaling

    Here are two additional articles I have read on the technique being applied to a zynq. https://xilinx-wiki.atlassian.net/wiki/spaces/A/pages/18842065/Zynq-7000+AP+SoC+Low+Power+Techniques+part+5+-+Linux+Application+Control+of+Processing+System+-+Frequency+Scaling+More+Tech+Tip https://github.com/tulipp-eu/tulipp-guidelines/wiki/Dynamic-voltage-and-frequency-scaling-(DVFS)-on-ZC702
  26. 1 point
    xc6lx45

    Dynamic voltage and frequency scaling

    https://www.xilinx.com/support/documentation/white_papers/wp389_Lowering_Power_at_28nm.pdf page 3
  27. 1 point
    xc6lx45

    Dynamic voltage and frequency scaling

    >> is it possible to present DVFS on it. >> For now I now about clock wizard, DCM, PLL for different clock generation (frequency) but this is not frequency scaling mi right? you may have your own answer there. This is some university project? Have you done your own research? For example, this has all the right keywords: https://highlevel-synthesis.com/2017/04/12/voltage-scaling-on-xilinx-zynq
  28. 1 point
    Hi @Phil_D The gain switch is adjusted automatically based on the selected scope range. At 500mV/div (5Vpk2pk ~0.3mV resolution) or lower the high gain is used with and above this the low gain (50Vpk2pk w ~3mV resolution). In case you specify trigger level out of the screen (5Vpk2pk) or offset higher/lower than +/- 2.5V the low gain will be used for the trigger source channel. This will be noted on the screen with red warning text. The attenuation is a different thing. This option lets you specify the external attenuation or amplification on the signals which enter the scope inputs and the data is scaled accordingly. Like, if you use a 10x scope probe, the scope input will actually get 1/10th of the original signal, but specifying 10x attenuation the signal is scaled to show values on the probe. In this case the 500mV/div (5Vpk2pk) low/high gain limit moves up to 5V/div (50Vpk2pk) and the low gain up to 50V/div If you have an external 100x amplifier on the scope input you can specify 0.01x attenuation. With this you will have 5mV/div (50mVpk2pk ~0.003mV resolution) for high gain.
  29. 1 point
    Vivado is complaining that there are active (not commented) pins in the constraint file that do not have matching port names in your design. Open your design_1_wrapper.v file and reconcile the port names specified there with the constraints file. It is not uncommon to have to change the name of a pin in the constraints file to match the port name in the wrapper. This could happen for example if you used "Make external" on an i/o pin from an IP block. One thing that has helped in the past was to delete the top level wrapper and regenerate it. Sometimes when you make pins external after generating the wrapper, there can be inconsistencies between port and pin naming. Xilinx UG903, page 42 and following elaborates on the scoping mechanism Vivado uses.
  30. 1 point
    Notarobot

    Stuck in SDK

    Kris, Printing via xil_print is using STDIO by default and could reconfigured in Vivado and SDK. With little info you provided a lot left for guessing. My last recommendation is to pay attention to a configuration of your build in SDK. The interrupt interrupt service routine (ISR) might not work if GCC compiler has optimization ON. To make it work the variable in the ISR should be declared volatile. Debug build usually has flag -O0 that optimization none. Good luck!
  31. 1 point
    D@n

    Understand Resource Usage for FIR Compiler

    @aabbas02, What's the data rate of the filter compared to the number of taps? As in, are you going to need to accept one sample per clock into this filter, or will you have reliable idle clock periods to work with? I'm wondering, if you have to build this, how easy/hard it might be. Dan
  32. 1 point
    So, if I am getting the point of the previous two posts from xclx45 and Dan, make your own filter in Verilog or VHDL and figure out the details ( signed, unsigned, fixed point, etc, .... etc ). You can instantiate DSP48E primitives (difficult) or just let the synthesis tool do it from your HDL code (easier). Debating how things should be verses how they are when using third party scripts to generate unreadable code seems like a waste of time to me... If you don't like what you get then design what you want. If you can make the time to write your own IP ( would be nice to not depend on a particular vendor's DSP architecture ) you'll learn a lot and save a lot of time later. If a vendor's IP doesn't make timing closure for your design a nightmare and you don't have the time to figure out all the details just let the IP handle it. I suspect that trying to optimize the FIR Compiler will be frustrating at best. I once had to come up with pipelined code to calculate signal strength in dB at a minimum sustained data rate. My approach to converting 32-bit integers to logarithms used a combination of Taylor Series expansion and look-up tables. I had a few versions**. One was straight VHDL so that I could compare Altera and Xilinx DSP tiles. One instantiated DSP48 tile primitives for a particular Xilinx device. These were fixed point designs. There's theory and there's practical experience.. they are usually not the same. ** I played with a number of approaches based on extremely limited specifications so there were quite a few versions. Every time I presented one the requirements changed and so did the complexity and resource requirements. I should mention that my intent for mentioning this experience is not to denigrate the information presented by others or to claim superiority in any way. When getting advice it's important to put that into context. A lot of times facts aren't necessarily relevant to solving a particular problem. If I haven't made this clear I've never had the experience that vendor IP optimizes resource usage... in fact quite the opposite. This is why in a commercial setting companies are willing to pay to develop their own IP. Sometimes FPGA silicon costs overshadow development costs.
  33. 1 point
    The word "overclocking" may be even misleading - this architecture is used when the data rate is significantly lower than the multiplier's speed. Inputs to one (expensive) multiplier are multiplexed, so it can do the work for several or all taps. The "multiplexing" itself will get fairly expensive in logic fabric due to the number of data bits and coefficients. To the rescue comes the BRAM, which is essentially a giant demultiplexer-multiplexer combo, with a football field of flipflops in-between. You can find an example for this approach here. Out-of-the-box, it's unfortunately quite complicated because it does arbitrary rational resampling. Setting the rates equal for a standard FIR filter, you end up with only a few lines of remaining code. BTW, multiplier count as design metric is probably overrated nowadays for several reasons (the IP tool resource usage is already more practical, e.g. BRAM count may become the bottleneck). If you can get this book through your library, you might have a look e.g. at chapter 6 (background only, this is a very old book): Keshab K. Parhi VLSI Digital Signal Processing Systems: Design and Implementation
  34. 1 point
    D@n

    Understand Resource Usage for FIR Compiler

    @aabbas02, So let's start at the top. An N-point FIR filter, such as this one, requires N multiplies and N-1 adds. Let's just count multiplies, though, for now. Let's now look at some optimizations you might apply. A complex multiply usually requires 4 multiplies. If you have complex input and taps, that's 4N real multiplies. There's a trick you can use to drop this to 3N multiplies. If the filter is real, and the incoming signal is complex, you can drop this to 2N multiplies by filtering each of the real and imaginary channels separately. If your filter taps are fixed, a good optimizer should be able to replace the zeros with a constant output, the ones with data pass through, etc. Implementing taps with two or three bits this way is possible. This only works, though, if the filter taps are fixed. Many of the common FIR filter developments generate linear phase filters. These filters are symmetric about a common point. With a little bit of work, you can use that to reduce the number of multiplies (for dynamic taps) down to (N-1)/2. There is a very significant class of filters called half band filters. In a half band filter, every other tap (other than the center tap) is zero. With a bit of work, you can then drop your usage down to (N-1)/4 multiplies. This optimization applies to Hilbert transforms as well. In the given list above, I've assumed that you need to operate at one sample in and one sample out per clock. In that case, there's no time or room for memory, since all of the stages need their data on every clock. I should also point out that I'm counting multiplies, not DSP slices. If your multiplies are smaller than 18x18 bits, you may be able to use a single DSP slice per multiply. If they are larger, you might find yourself using many DSP slices per multiply. It depends. Let's now consider the case where you have an N sample filter but you are only going to provide it with one sample of data every N+ samples (some number more than N). You could then multiplex this multiply and implement an N-point filter with 1 multiply and 2^(ceil(log_2(N)) RAM elements. If your filter was symmetric, you could process a filter roughly twice as long, or a sample rate roughly twice as fast while still using a single multiply. The half-band and hilbert tricks can apply to (roughly) double your filter size or your data rate again. In these cases, however, you can't spare the multiply at all, since it is used to implement every coefficient. That should give you both ends of the spectrum. Now, while I don't understand how Xilinx's FIR compiler works, I can say that if I were to make one I would allow a user to design crosses between these two ends of the spectrum. In the case of a such a cascaded filter, however, you may find it difficult to continue to implement the optimizations we've just discussed, simply because the cascaded structure becomes more difficult to deal with. (By cascade, I mean cascade in implementation and not filtering a stream and then filtering it again.) Looking at your two designs, none of the optimizations I've mentioned would apply. In the high speed filters we started out with, sure, you might manage to remove a multiply or two by multiplying by zero. On the other hand, you can't share multiplies across elements if you do so. For example, you mentioned [1 2 3 4 0 1 2 3 4]. Sure, it's got repeating coefficients, but this is actually a really obscure filtering structure, and not likely one that I (were I Xilinx) would pay to support. Try [ 1 2 3 4 0 4 3 2 1] instead and see if things change. Similarly for [ 1 0 0 0 0 0 0 0 1]. Yeah, it's symmetric about the mid-point, but in all of my symmetric filtering applications I'm assuming the mid point has a value of 2^N-1 or similar so that it doesn't need a multiply. Your choice of filters offers no such optimization, so who knows how it might work. Hope this helps to shed some light on things, Dan
  35. 1 point
    Amen. And that's a good view for all Xilinx IP. The structure for a simple digital filter is not that complex; you can implement them in HDL. I've done that. Xilinx IP is convenient but usually not the best approach when you have concerns about using up limited resources. The issue for using very fast resources like BRAM and DSP slices is that they are placed in particular locations throughout the device with limited routing resources for the signals between them or other logic. You can let Xilinx balance throughput, resource usage, logic placement, and throughput or you can try to do that yourself. Trying to use 100% of every BRAM or DSP resource in order to minimize the number of BRAM or DSP resources used is not easy. In my experience FPGA vendors are content to have their IP wizards make the customer think that he needs a larger and more expensive device. So that's the trade-off; let the vendors' tools do the work to save time or write your own IP and be responsible for taking care of all the little details that the IP hides form you. I've spent some time experimenting with DSP resources from various FPGA vendors. They are complicated with a lot of modes and depending on how you use them throughput can decline substantially from the ideal. Just read the user's guide and switching specs in the datasheet to get the idea. Generally the DSP slices are arranged to perform optimally with certain topologies but not all. Implementing designs that are iterative or have feedback can get ugly; especially when you try and fit that into a larger design using most of the devices resources. As a general rule, in my experience, use vendor IP and don't ask a lot of questions or design your own IP and be prepared to learn how to handle a lot of details that aren't obvious. Time verses convenience.
  36. 1 point
    Hi, [1 2 3 4 0 1 2 3 4] is not symmetric in a linear-phase sense. That would be e.g. [1 2 3 4 0 4 3 2 1]. You could exploit the shared coefficients manually, see e.g. Figure 3 for the general concept. But this case is so unusual that I doubt tools will take it into account. The tool does nothing magical. If performance matters more than design time, you'll always get better results for one specific problem with manual design. One performance metric is multiplier utilization (e.g. assuming you design for a 200 MHz clock, one DSP delivering 200M operations / second performs at 100 %. Reaching 50+ % is a realistic goal for a simple / single rate structure). For example, do I want to use an expensive BRAM at all, when I could use ring shift registers for delay line and coefficients. Then you only need a small controlling state machine around it that does a full circular shift for each sample, muxing in the new input sample every full cycle (the BRAM makes more sense when the filter serves many channels in parallel, then FF count becomes an issue).
  37. 1 point
    I'm really not too interested in spending a lot of time fixing peoples code or teaching HDLs.. but... Why did you comment out the enable from your port and make it a local signal? I don't think that you quite understand the concept of enables. What do you suppose is going on with your concurrent assignment to i_en? Try to figure out what it is that your code is doing. What's being clocked and what's not? Look at where you assign values to the signal counter ( that's where the answer to your question will be found if your grasp of VHDL for synthesis is sufficient ). Why is counter type integer? What do you supposed happens when the synthesis tool tries to use an unconstrained integer tp implement a counter? Don't try an stuff all of your logic into one process; put your counter into its own process. Look around for some examples of implementing a counter in VHDL. My impression is that you haven't quite grasped the basic concepts of the VHDL that you are trying to use. Here's my suggestion: Create a standalone LFSR entity, a standalone counter entity, and a toplevel entity that instantiates both the LFSR and counter components. Don't use type integer anywhere in your code. Only use if..elsif..else statements in your code. See if you can shift your LFSR only when the counter reaches a certain value. Write a testbench to exercise your toplevel source file. You may not get exactly what you want at first but you will have a nice little project that along with some help from the simulator can help you learn VHDL.
  38. 1 point
    zygot

    RISC-V on Nexys A7?

    @Dan The viewpoint that I presented wasn't meant to be the only reasonable one. True, being able to say that you implemented a RISC-V processor on your Nexys-A7 doesn't involve life-threatening feats of daring-do. Though it might sound like I'm trying to discourage FPGA beginners from following recipes to accomplish what they aren't capable of accomplishing on their own that's not my intent. I'm merely suggesting that aeon20 listen to what he's said and re-evaluate his goals. I'd point out that the Wright brothers, who had no recipe and little in the way of tutorials ( they were not the first ones to fly or even attempt to fly ) they did have considerable practical experience in the mechanics of the parts of their experimental planes. My point is that they were leveraging their expertise in one area to try and accomplish a goal in another area. So I view your example as supporting my point. By the way building airplanes from a kit is a real thing. I've used recipes from others in building software applications for a particular brand of Linux that I want to use. I really don't want to figure out how all of the libraries, frameworks, scripts and tools used to build the application work; I just want to use the application on a particular version of a particular distribution of Linux. Sometimes this doesn't work out as in order to get my application I need to build the framework or tool from scratch and it end up being more work that I want to put into it. When it does succeed I still don't know how all of the dependencies ( and there can be a LOT of dependencies ) work and I don't care. If someone wants to play around with RISC-V there are development boards with silicon implementations of the processor that will be much higher performance than anything implemented in a low end FPGA. So the motivation must be different. Some will see using a recipe to build an application as the same thing as using a recipe to build an soft-processor. I would disagree. I'm not questioning the validity of anyone's motivation. I'm suggesting that there might be a more rewarding path.
  39. 1 point
    @askhunter It's not clear from your pictures what it is that you are referring to since the times scales are different. The purpose of post place and route timing simulation is to show the relative signal path delays in your implemented design as well as possible inferred latches or registers hidden by IP. The RTL simulation merely indicates if your behavioural logic is performing as you intended ( assuming that the testbench is well designed ). It is merely a simplified (no delays, no setup, no hold times) idealistic representation of simple logic. If the timing simulation doesn't give the same results as the RTL simulation then it's unlikely that your hardware will behave as you intend either. In the typical professional setting a lot of people are working on parts of a large design effort simultaneously. No one can afford to schedule a design effort where everything is done sequentially. In such a case timing simulations become a very important indicator of risks of projects not making deadlines. It simply isn't possible to create a lot of hardware, software, test protocols etc sequentially or even in parallel and 2 weeks before shipment throw all that stuff together for the first time and then figure out why things don't work. So we have a lot of ways to do simulation that offer increasingly more accurate, and hence reliable, views of how our design ( after it's been optimized, re-worked and reduced to LUT equations ) might actually work in a system before having to run it in hardware. When there are 10 engineers doing parts of 1 large FPGA design and all of those parts are integrated it's not uncommon for some of them to start failing due to limited routing resources and clock line limitations.
  40. 1 point
    attila

    Save continous data to file in WaveForms

    Szia @Andras At the moment you have WAV RIFF WAVE export under Scope/View/Logging/Script/Example.
  41. 1 point
    zygot

    A UART Based Debugger Tool

    Here's a utility for debugging and testing your code in hardware and uses any IO pin to send an ASCII representation of any signal through a hardware UART interface. If you don't have a UART on you FPGA board there are TTL USB UART breakout boards and cables that allow any spare IO pin to become a UART interface. This code is functionally the same as one recently released by Hamster but developed independently for the Fast Data Interface project. I recommend comparing the different coding styles. I decided to release this as a separate project as there are likely more people interested in this one that the other. This project contains test bench code. UartDebuggerR3.zip
  42. 1 point
    bogdan.deac

    OpenCV and Pcam5-c

    Hi @Esti.A, If you clone the repo you obtain the "source code" for the platform and you have to generate the platform by yourself. This is a time consuming and complicated task and is not recommended if you do not understand SDSoC very well. I advise you to download the last SDSoC platform release from here. You will obtain a zip file that contains the SDSoC platform already build. After that, you can follow these steps to create your first project.
  43. 1 point
    bogdan.deac

    OpenCV and Pcam5-c

    Hi @Esti.A, SDx, which includes SDSoC (Software Defined System on Chip), is a development environment that allows you to develop a computer vision application, in your case, using C/C++ and OpenCV library. The target of SDx-built applications are Xilinx systems on chip (SoC) (Zynq-7000 or Zynq Ultrascale+). Xilinx SoC architecture has two main components: ARM processor (single or multi core) named Processing System (PS) and FPGA, named Programmable Logic (PL). Using SDx to build an application for SoC allows you to choose which functions from your algorithm are executed in PS and which ones are executed in PL. SDx will generate all data movers and dependencies that you need to move data between PS, DDR memory and PL. The PL is suitable for operations that can be easily executed in parallel. So if you are going to choose a median filter function to be executed in PL, instead of PS, you will obtain a better throughput from your system. As you said, you can use OpenCV to develop your application. You have to take into account that OpenCV library was developed with CPU architecture in mind. So the library was designed to obtain the best performance on some specific CPU architectures (x86-64, ARM, etc.). If you are trying to accelerate an OpenCV function in PL using SDx you will obtain a poor performance. To overcome this issue, Xilinx has developed xfopencv, which is a subset if OpenCV library functions. The functionalities of xfopecv functions and OpenCV functions are the same but the xfopencv functions are implemented having FPGA architecture in mind. xfopencv was developed in C/C++ following some coding guideline. When you are building a project, the C/C++ code is given as input to Xilinx HLS (High Level Synthesis) tool that will convert it to HDL (Hardware Description Language) that will be synthetized for FPGA. The above mentioned coding guideline provides information about how to write C/C++ code that will be implemented efficiently in FPGA. To have a better understanding on xfopencv consult this documentation. So SDx helps you to obtain a better performance by offloading PS and by taking advantage of parallel execution capabilities of PL. Have a look on SDSoC documentation. For more details check this. An SoC is a complex system composed by a Zynq (ARM + FPGA), DDR memory and many types of peripherals. Above those, one can run a Linux distribution (usually Petalinux, from Xilinx) and above the Linux distribution, the user application will run. The user application may access the DDR memory and different types of peripherals (PCam in your case). Also, it may accelerate some functions in FPGA to obtain a better performance. To simplify the development pipeline Xilinx provides an abstract way to interact with, named SDSoC platform. SDSoC platform has two components: Software Component and Hardware Component that describes the system from the hardware to the operating system. Your application will interact with this platform. You are not supposed to know all details about this platform. This was the idea, to abstract things. Usually, the SDSoC platforms are provided by the SoC development boards providers, like Digilent. All you have to do is to download the last SDSoC platform release from github. You have to use SDx 2017.4. You don't have to build your own SDSoC platform. This is a complex task. You can follow these steps in order to build your first project that will use PCam and Zybo Z7 board. The interaction between PCam and the user application is done in the following way: there is an IP in FPGA that acquires live video stream from the camera, the video stream is written into DDR memory. This pipeline is abstracted by the SDSoC platform. The user application can access the video frames by Video4Linux (V4L2). The Live I/O for PCam demo shows you how to do this. I suggest you to read the proposed documentation to obtain a basic knowledge needed for SDSoC projects development. Best regards, Bogdan D.
  44. 1 point
    HI @Sandrine In Sync mode the trigger is not available. The I2S interpreter needs to see the transitions on the clock signal, so if you use Sync mode select Edge option (sample on both edges) for Clock signal. Repeated captures for the Logic Analyzer can be done from Script tool like this: for(var c = 0; c < 10 && wait(); c++){ print(c) Logic.run() Logic.wait() }
  45. 1 point
    aytli

    Zybo Z7-10 audio passthrough

    Hi @jpeyron The DMA audio demo uses the d_axi_i2s_audio IP core, which has a S2MM output and MM2S input. The first thing I've tried was to route the output directly into the input, which didn't work. In addition, the way the C code handles recording is by configuring the DMA block to record, then telling the i2s core to store N bytes from the input into a register. The HDMI demo works by reading video data into a series of video buffers, and displaying image data from a series of frame buffers. I can make an HDMI passthrough by pointing the display output buffer to the video input buffer. I'm wondering if there's a similar solution for audio. I've had some trouble getting that instructables project to work. The i2s controller looks fine, but the SerialEffects block doesn't seem to match the block diagram (which is really blurry). I'll try it again and see if I missed something. Does that d_axi_i2s_audio IP core have any documentation?
  46. 1 point
    D@n

    MMCM dynamic clocking

    @rangaraj, It's a shame you only know VHDL coding, since the Verilog code I posted above would give you the ability to generate an arbitrary clock--unencumbered by the constraints of the PLL, with frequency resolution in the milli-Hertz range (100MHz/2^32). Perhaps you want to take another look at it? Sure, it would have some phase noise, but ... that could be beaten now if necessary by using an OSERDESE2 component. I've got an example design I'm working on that does just that and should knock the phase noise down to 1.25ns or better. Dan
  47. 1 point
    hamster

    MMCM dynamic clocking

    Hey, something else I just saw when reading the clocking guide was: MMCM Counter Cascading The CLKOUT6 divider (counter) can be cascaded with the CLKOUT4 divider. This provides a capability to have an output divider that is larger than 128. CLKOUT6 feeds the input of the CLKOUT4 divider. There is a static phase offset between the output of the cascaded divider and all other output dividers. And: CLKOUT4_CASCADE : Cascades the output divider (counter) CLKOUT6 into the input of the CLKOUT4 divider for an output clock divider that is greater than 128, effectively providing a total divide value of 16,384. So that can divide a 600 MHz VCO down to 36.6 kHz.
  48. 1 point
    hamster

    MMCM dynamic clocking

    I feel a bit bad about posting a minor novel here, but here is an example of going from "5 cycles on, 5 off" (i.e. divide by 10) to "10 on, 10 off" (device by 20). The VCO is initially to 800 MHz with CLK0 being VCO divide by 8.... so after config you get 100MHz. Push the button and you get 800/20 = 40MHz, release the button and you get 80MHz. It is all really hairy in practice! EDIT: Through experimentation I just found that you don't need to reset the MMCM if you are not changing the VCO frequency. So the 'rst' signal in the code below isn't needed (and LOCKED will stay asserted). -------------------------------------------------------------------------------------------------------- -- Playing with the MMCM DRP ports. -- see https://www.xilinx.com/support/documentation/application_notes/xapp888_7Series_DynamicRecon.pdf -- for the Dynamic Reconviguration Port addresses -------------------------------------------------------------------------------------------------------- library IEEE; use IEEE.STD_LOGIC_1164.ALL; use IEEE.NUMERIC_STD.ALL; library UNISIM; use UNISIM.VComponents.all; entity mmcm_reset is Port ( clk_100 : in STD_LOGIC; btn_raw : in STD_LOGIC; led : out STD_LOGIC_VECTOR (15 downto 0)); end mmcm_reset; architecture Behavioral of mmcm_reset is signal btn_meta : std_logic := '0'; signal btn : std_logic := '0'; signal speed_select : std_logic := '0'; signal counter : unsigned(26 downto 0) := (others => '0'); signal debounce : unsigned(15 downto 0) := (others => '0'); signal clk_switched : std_logic := '0'; signal clk_fb : std_logic := '0'; type t_state is (state_idle_fast, state_go_slow_1, state_go_slow_2, state_go_slow_3, state_idle_slow, state_go_fast_1, state_go_fast_2, state_go_fast_3); signal state : t_state := state_idle_fast; ----------------------------------------------------------------------------- --- This is the CLKOUT0 ClkReg1 address - the only register to be played with ----------------------------------------------------------------------------- signal daddr : std_logic_vector(6 downto 0) := "0001000"; signal do : std_logic_vector(15 downto 0) := (others => '0'); signal drdy : std_logic := '0'; signal den : std_logic := '0'; signal di : std_logic_vector(15 downto 0) := (others => '0'); signal dwe : std_logic := '0'; signal rst : std_logic := '0'; begin MMCME2_ADV_inst : MMCME2_ADV generic map ( BANDWIDTH => "OPTIMIZED", -- Jitter programming (OPTIMIZED, HIGH, LOW) CLKFBOUT_MULT_F => 8.0, -- Multiply value for all CLKOUT (2.000-64.000). CLKFBOUT_PHASE => 0.0, -- Phase offset in degrees of CLKFB (-360.000-360.000). -- CLKIN_PERIOD: Input clock period in ns to ps resolution (i.e. 33.333 is 30 MHz). CLKIN1_PERIOD => 10.0, CLKIN2_PERIOD => 0.0, -- CLKOUT0_DIVIDE - CLKOUT6_DIVIDE: Divide amount for CLKOUT (1-128) CLKOUT1_DIVIDE => 1, CLKOUT2_DIVIDE => 1, CLKOUT3_DIVIDE => 1, CLKOUT4_DIVIDE => 1, CLKOUT5_DIVIDE => 1, CLKOUT6_DIVIDE => 1, CLKOUT0_DIVIDE_F => 8.0, -- Divide amount for CLKOUT0 (1.000-128.000). -- CLKOUT0_DUTY_CYCLE - CLKOUT6_DUTY_CYCLE: Duty cycle for CLKOUT outputs (0.01-0.99). CLKOUT0_DUTY_CYCLE => 0.5, CLKOUT1_DUTY_CYCLE => 0.5, CLKOUT2_DUTY_CYCLE => 0.5, CLKOUT3_DUTY_CYCLE => 0.5, CLKOUT4_DUTY_CYCLE => 0.5, CLKOUT5_DUTY_CYCLE => 0.5, CLKOUT6_DUTY_CYCLE => 0.5, -- CLKOUT0_PHASE - CLKOUT6_PHASE: Phase offset for CLKOUT outputs (-360.000-360.000). CLKOUT0_PHASE => 0.0, CLKOUT1_PHASE => 0.0, CLKOUT2_PHASE => 0.0, CLKOUT3_PHASE => 0.0, CLKOUT4_PHASE => 0.0, CLKOUT5_PHASE => 0.0, CLKOUT6_PHASE => 0.0, CLKOUT4_CASCADE => FALSE, -- Cascade CLKOUT4 counter with CLKOUT6 (FALSE, TRUE) COMPENSATION => "ZHOLD", -- ZHOLD, BUF_IN, EXTERNAL, INTERNAL DIVCLK_DIVIDE => 1, -- Master division value (1-106) -- REF_JITTER: Reference input jitter in UI (0.000-0.999). REF_JITTER1 => 0.0, REF_JITTER2 => 0.0, STARTUP_WAIT => FALSE, -- Delays DONE until MMCM is locked (FALSE, TRUE) -- Spread Spectrum: Spread Spectrum Attributes SS_EN => "FALSE", -- Enables spread spectrum (FALSE, TRUE) SS_MODE => "CENTER_HIGH", -- CENTER_HIGH, CENTER_LOW, DOWN_HIGH, DOWN_LOW SS_MOD_PERIOD => 10000, -- Spread spectrum modulation period (ns) (VALUES) -- USE_FINE_PS: Fine phase shift enable (TRUE/FALSE) CLKFBOUT_USE_FINE_PS => FALSE, CLKOUT0_USE_FINE_PS => FALSE, CLKOUT1_USE_FINE_PS => FALSE, CLKOUT2_USE_FINE_PS => FALSE, CLKOUT3_USE_FINE_PS => FALSE, CLKOUT4_USE_FINE_PS => FALSE, CLKOUT5_USE_FINE_PS => FALSE, CLKOUT6_USE_FINE_PS => FALSE ) port map ( -- Clock Outputs: 1-bit (each) output: User configurable clock outputs CLKOUT0 => clk_switched, CLKOUT0B => open, CLKOUT1 => open, CLKOUT1B => open, CLKOUT2 => open, CLKOUT2B => open, CLKOUT3 => open, CLKOUT3B => open, CLKOUT4 => open, CLKOUT5 => open, CLKOUT6 => open, -- Dynamic Phase Shift Ports: 1-bit (each) output: Ports used for dynamic phase shifting of the outputs PSDONE => open, -- Feedback Clocks: 1-bit (each) output: Clock feedback ports CLKFBOUT => clk_fb, CLKFBOUTB => open, -- Status Ports: 1-bit (each) output: MMCM status ports CLKFBSTOPPED => open, CLKINSTOPPED => open, LOCKED => open, -- Clock Inputs: 1-bit (each) input: Clock inputs CLKIN1 => clk_100, CLKIN2 => '0', -- Control Ports: 1-bit (each) input: MMCM control ports CLKINSEL => '1', PWRDWN => '0', -- 1-bit input: Power-down RST => rst, -- 1-bit input: Reset -- DRP Ports: 16-bit (each) output: Dynamic reconfiguration ports DCLK => clk_100, -- 1-bit input: DRP clock DO => DO, -- 16-bit output: DRP data DRDY => DRDY, -- 1-bit output: DRP ready -- DRP Ports: 7-bit (each) input: Dynamic reconfiguration ports DADDR => DADDR, -- 7-bit input: DRP address DEN => DEN, -- 1-bit input: DRP enable DI => DI, -- 16-bit input: DRP data DWE => DWE, -- 1-bit input: DRP write enable -- Dynamic Phase Shift Ports: 1-bit (each) input: Ports used for dynamic phase shifting of the outputs PSCLK => '0', PSEN => '0', PSINCDEC => '0', -- Feedback Clocks: 1-bit (each) input: Clock feedback ports CLKFBIN => clk_fb ); speed_change_fsm: process(clk_100) begin if rising_edge(clk_100) then di <= (others => '0'); dwe <= '0'; den <= '0'; case state is when state_idle_fast => if speed_select = '1'then state <= state_go_slow_1; -- High 10 Low 10 di <= "0001" & "001010" & "001010"; dwe <= '1'; den <= '1'; end if; when state_go_slow_1 => if drdy = '1' then state <= state_go_slow_2; end if; when state_go_slow_2 => rst <= '1'; state <= state_go_slow_3; when state_go_slow_3 => rst <= '0'; state <= state_idle_slow; when state_idle_slow => di <= (others => '0'); if speed_select = '0' and drdy = '0' then state <= state_go_fast_1; -- High 5 Low 5 di <= "0001" & "000101" & "000101"; dwe <= '1'; den <= '1'; end if; when state_go_fast_1 => if drdy = '1' then state <= state_go_fast_2; end if; when state_go_fast_2 => rst <= '1'; state <= state_go_fast_3; when state_go_fast_3 => rst <= '0'; state <= state_idle_fast; end case; end if; end process; dbounce_proc: process(clk_100) begin if rising_edge(clk_100) then if speed_select = btn then debounce <= (others => '0'); elsif debounce(debounce'high) = '1' then speed_select <= not speed_select; else debounce <= debounce + 1; end if; -- Syncronise the button btn <= btn_meta; btn_meta <= btn_raw; end if; end process; show_speed_proc: process(clk_switched) begin if rising_edge(clk_switched) then counter <= counter + 1; led(7 downto 0) <= std_logic_vector(counter(counter'high downto counter'high-7)); end if; end process; led(15) <= speed_select; end Behavioral;
  49. 1 point
    davec

    How to program Arty flash

    I figured out why the sck signal is not getting implemented- For some reason, "PARAM_VALUE.C_USE_STARTUP" is not set to zero, so when the file "board.xit" does not see this variable =0, it does not implement the sck pin (L16). I cheated and took out this test in "board.xit" (because I don't know where that PARAM_VALUE gets set). I added constraints for the pin: set_property PACKAGE_PIN L16 [get_ports qspi_flash_0_sck_io] set_property IOSTANDARD LVCMOS33 [get_ports qspi_flash_0_sck_io] and I now get a clock to the QSPI flash when the bootloader runs. One last step to solve- where to put my user program in flash that the bootloader will copy into DDR. The tools don't tell me how large the FPGA config file is (with compression on). I looked at the file size of the .bin file and rounded up to the next 1K, but I wish there was a programmatic way to do this from Vivado. Hurray- On power-up I can now config the FPGA, run the bootloader in block ram, which then copies my large user program from flash to DDR and executes.
  50. 1 point
    We have a few demo projects using the Basys2 in RF communications that we haven't gotten had time to document yet. Although it's different, there are a lot of similiarities that you may be able to use. Take a look at the attached Zip File. The demo project is using the Pmod RF1 to communicate keyboard presses to another Basys2 that is controlling a speaker. PmodRF1 Basys demo project by digilentinc, on Flickr Here is a link to get the files for the project: https://www.dropbox.com/s/83winp3zyio7or7/Basys2AudioRfPmodHID.zip?dl=0 Hope that helps!