Leaderboard


Popular Content

Showing content with the highest reputation since 04/27/19 in all areas

  1. 2 points
    xc6lx45

    Increasing the clock frequency to 260 MHz

    Hi, reading between the lines of your post, you're just "stepping up" one level in FPGA design. I don't do long answers but here's my pick on the "important stuff" - Before, take one step back from the timing report and fix asynchronous inputs and outputs (e.g. LEDs and switches). Throw in a bunch of extra registers, or even "false-path" them. The problem (assuming this "beginner mistake") is that the design tries to sample them at the high clock rate. Which creates a near-impossible problem. Don't move further before this is understood, fixed and verified. - speaking of "verified": Read the detailed timing analysis and understand it. It'll take a few working hours to make sense of it but this is where a large part of "serious" design work happens. - Once the obvious problems are fixed, I need to understand what is the so-called "critical path" in the design and improve it. For a feedforward-style design (no feedback loops) this can be systematically done by inserting delay registers. The output is generated e.g. one clock cycle later but the design is able to run at a higher clock so overall performance improves. - Don't worry about floorplanning yet (if ever) - this comes in when the "automatic" intelligence of the tools fails. But, they are very good. - Do not optimize on a P&R result that fails timing catastrophically (as in your example - there are almost 2000 paths that fail). It can lead into a "rabbit's hole" where you optimize non-critical paths (which is usually a bad idea for long-term maintenance) - You may adjust your coding style based on the observations, e.g. throw in extra registers where they will "probably" make sense (even if those paths don't show up in the timing analysis, the extra registers allow the tools to essentially disregard them in optimization to focus on what is important) - There are a few tricks like forcing redundant registers to remain separate. Example, I have a dozen identical blocks that run on a common, fast 32-bit system clock and are critical to timing. Step 1, I sample the clock into a 32-bit register at each block's input to relax timing, and step 2) I declare these register as DONT_TOUCH because the tools would otherwise notice they are logically equivalent and try to use one shared instance. This as an example. - For BRAMs and DSP blocks, check the documentation where extra registers are needed (that get absorbed into the BRAM or DSP using a dedicated hardware register). This is the only way to reach the device's specified memory or DSP performance. - Read the warnings. Many relate to timing, e.g. when the design forces a BRAM or DSP to bypass a hardware register. - Finally, 260 MHz on Artix is already much harder than 130 MHz (very generally speaking). Usually feasible but you need to pay attention to what you're doing and design for it (e.g. a Microblaze with the wrong settings will most likely not make it through timing). - You might also have a look at the options ("strategy") but don't expect any miracles on a bad design. Ooops, this almost qualifies as "long" answer ...
  2. 2 points
    Thinking of which... actually I do have a plain-Verilog FIFO around from an old design. It's not a showroom piece but I think it did work as expected (whatever that is...) For 131072 elements you'd set ADDRBITS to 17 and DATABITS to 18 for 18 bit width. module FIFO(i_clk, i_reset, i_push, i_pushData, i_pop, o_popAck, o_popData, o_empty, o_full, o_error, o_nItems, o_nFree); parameter DATABITS = -1; parameter ADDRBITS = -1; localparam ADDR_ZERO = {{(ADDRBITS){1'b0}}}; localparam ADDR_ONE = {{(ADDRBITS-1){1'b0}}, 1'b1}; localparam DATA_X = {{(DATABITS){1'bx}}}; input wire i_clk; input wire i_push; input wire i_reset; input wire [DATABITS-1:0] i_pushData; input wire i_pop; output reg o_popAck = 1'b0; output wire [DATABITS-1:0] o_popData; output reg o_error = 1'b0; output wire [31:0] o_nItems; output wire [31:0] o_nFree; output wire o_empty; output wire o_full; reg popAckB = 1'b0; reg [DATABITS-1:0] mem[((1 << ADDRBITS)-1):0]; reg [ADDRBITS-1:0] pushPtr = ADDR_ZERO; reg [ADDRBITS-1:0] popPtr = ADDR_ZERO; reg [DATABITS-1:0] readReg = DATA_X; reg [DATABITS-1:0] readRegB = DATA_X; wire [ADDRBITS-1:0] nextPushPtr = i_push ? pushPtr + ADDR_ONE : pushPtr; wire [ADDRBITS-1:0] nextPopPtr = i_pop ? popPtr + ADDR_ONE : popPtr; assign o_popData = o_popAck ? readReg : DATA_X; // === items counter === // note: needs extra bit (e.g. 4 slots may hold [0, 1, 2, 3, 4] elements) reg [ADDRBITS:0] nItems; assign o_nItems = {{{31-ADDRBITS-1}{1'b0}}, nItems}; assign o_nFree = (1 << ADDRBITS) - nItems; localparam NITEMS_ONE = {{(ADDRBITS){1'b0}}, 1'b1}; assign o_empty = nItems == 0; assign o_full = nItems == {1'b1, {{ADDRBITS}{1'b0}}}; always @(posedge i_clk) begin // === preliminary assignments === readRegB <= DATA_X; popAckB <= 1'b0; case ({i_push, i_pop}) 2'b10: nItems <= nItems + NITEMS_ONE; 2'b01: nItems <= nItems - NITEMS_ONE; default: begin end endcase o_error <= (i_push && ~i_pop && o_full) || (i_pop && o_empty); // === output register (delay 1) === o_popAck <= popAckB; readReg <= readRegB; pushPtr <= nextPushPtr; popPtr <= nextPopPtr; if (i_push) mem[pushPtr] <= i_pushData; if (i_pop) begin readRegB <= mem[popPtr]; popAckB <= 1'b1; end if (i_reset) begin pushPtr <= ADDR_ZERO; popPtr <= ADDR_ZERO; o_error <= 1'b0; o_popAck <= 1'b0; popAckB <= 1'b0; readReg <= DATA_X; readRegB <= DATA_X; nItems <= 0; end end endmodule
  3. 1 point
    malexander

    Pin mapping for JTAG-SMT2-NC?

    There are directional buffers between the FT232 and the actual GPIO pads on the module. The direction of these buffers is controlled by the logic level that's applied on the OE0, OE1, and OE2 nets. Placing a '0' on the OEn net will configure the applicable buffer as an output and allow the signal to be driven out of the applicable GPIO pad. Placing a '1' on the OEn net will configure the associated buffer as an input and allow the FT232H to read the state of the applicable GPIO pad.
  4. 1 point
    xc6lx45

    FPGA for network protocol conversion

    Hi, I can tell you this much that estimating required FPGA size is a challenging topic in general. Chances are high that an abstract analysis that's not based on experience will completely miss the point. For the PC, I think you are describing operating system overhead, not hardware limitations. Before I'd consider FPGA, I'd have a look at a bare-metal implementation, which can probably be approximated by editing the network card driver in a linux distribution (e.g. use one CPU core per inbound card and keep it in a spinlock waiting for data to avoid the slow context switch etc on interrupt). If your code runs otherwise from cache, a modern e.g. 5 GHz CPU is a force to be reckoned with.
  5. 1 point
    Zorroslade000

    Microblaze to communicate to VHDL

    Thx, will look at it. Rob
  6. 1 point
    Hi @aliff saad, I would suggest to make sure that there are no spaces in your paths. I downloaded the library and placed it here: C:\Users\jpeyron\Documents\Arduino\libraries\SparkFun_LSM9DS1_Arduino_Library-master\examples\LSM9DS1_Basic_SPI I then opened the arduino ide using the digilent core and ran the LSM9DS1_Basic_SPI.ino with no issue as shown in the attached screen shot below. best regards, Jon
  7. 1 point
    jpeyron

    Vivado sysnthesis fail..Pcam

    Hi @Sduru, I found a xilinx forum that that discusses this issue here. For their project the .project and .cproject files were referencing an obsolete hw_platform that no longer existed. They manually edited the files to the new hw_platform--now the design worked. Did you use the 2018.2 project from the release page here? Did you import the fsbl and pcam_vdma_hdmi from the sdk_appsrc folder in the Zybo-Z7-20-Pcam-5C-2018.2.1 folder? best regards, Jon
  8. 1 point
    Maybe one comment: In the ASIC world, "floorplanning" is an essential design phase, where you slice and dice the predicted silicon area and give each design team their own little box. The blocks are designed and simulated independently, and come together only at a fairly late phase. ASIC differs from FPGA in some major ways: - ASIC IOs have a physical placement e.g. along the pad ring. We don't want to run sensitive signals across the chip, need to minimize coupling for mixed-signal etc. In comparison, FPGAs are probably more robust (a complex design will definitely consider the layout, especially on larger devices. But on smaller eval boards, the first restrictions I'll probably run into are logical e.g. which clock is available where, not geometrical). - For ASICs, we need the floorplan to design the power distribution network as an own sub-project (and many a bright-eyed startup has learned electromigration the hard way). - In the ASIC world, we need to worry about wide and fast data paths both regarding power and area - transistors are tiny but metal wires are not. You might have a look at "partial reconfiguration", here the geometry of the layout plays some role.
  9. 1 point
    attila

    Waveforms SDK calibration

    Hi @Thore The calibration is implicitly used in the SDK. The set or got voltage values with API functions are always corrected based on the device calibration parameters.
  10. 1 point
    artvvb

    Axi DMA from all memory

    Hi @Rickdegier, Welcome to the Digilent forums! I am not the most confident on this topic, but I have used the DMA some. The most important facet here is to make sure that your buffer is actually contained in the DDR memory. Different parts of the program can be placed in different memories using the linker script in your application project's src folder (lscript.ld). You should check that file to make sure that your global arrays are placed in the DDR. Second, if the data cache is enabled (likely), you should make sure to flush and invalidate the buffer memory area around your SimpleTransfer calls (functions to do this are in xil_cache.h). Lastly, I personally have had more success using malloc to create my buffers than using global or local arrays - I'm not sure why this is, from a cursory google search, it looks like the DMA will allow transfers into program memory when you aren't careful. You may want to reach out to Xilinx on their forums. Thanks, Arthur
  11. 1 point
    That's amazing to hear! I appreciate all of your help. I will update the board files now. Take care, Justen
  12. 1 point
    Zorroslade000

    Adding rs232refcomp to Microblaze

    Thx, That seems to work. Also, now I know where the examples are. Rob
  13. 1 point
    xc6lx45

    power supply

    a fuse?
  14. 1 point
    Hi Fonak, Thanks for your response. Unfortunately, the author or the developer removed the program from the web (probably due to the commercial importance). However, he still has some plugins (eg. audio analyser suite etc.) on his youtube account with download links. I think, the hardware section has several different modifications such as you suggested to use (LT1210, I am not sure from this driver, perhaps a smaller one such as OPA569 etc. can be a lighter version). If I would access to a source code perhaps I would give a try to modificate it with some effort. However, I am not able to reach it anymore. Btw, if you need previous versions of the advertised software on the video I may have in my download folder which I can share privately with you. Best wishes, Ferda
  15. 1 point
    jpeyron

    hdmi ip clocking error

    Hi @askhunter, I did a little more searching and found a forum thread here where the customer is having a similar issue. A community member also posted a pass through zynq project that should be useful for your project. best regards, Jon
  16. 1 point
    For the Protocol / SPI-I2C /Spy mode you should specify the approximate (or highest) protocol frequency which will be used to filter transient glitches, like ringing on clock signal transition. The Errors you get indicate the signals are not correctly captured. - make sure to have proper grounding between the devices/circuits - use twisted wires (signal/ground) to reduce EMI - use logic analyzer and/or scope to verify the captured data / voltage levels at higher sample rate at least 10x the protocol frequency Like here in the Logic Analyzer you can see a case when the samples are noisy:
  17. 1 point
    Hi, I indeed have a sd card with the correct files in. Anyway, i found a way to make it work without the LVLSHFT here : https://github.com/NicholsKyle/ECE387_SimonSays/wiki/My-Design#schematic-drawing. Now I'm struggling to display correctly on the MTDs, it's certainly my code the problem. But thanks for your help @jpeyron.
  18. 1 point
    jpeyron

    hdmi ip clocking error

    Hi @askhunter, Please attach a screen shot of your vivado block design. Have you tried changing the MMCM to PLL in the DVI2RGB IP Core? best regards, Jon
  19. 1 point
    mjdbishop

    JTAG-HS2 firmware erased by accident

    Hi Jon Many thanks - cable now working Many Thanks Martin
  20. 1 point
    Hi @Mohammad12gsj, I have sent you a PM about this. best regards, Jon
  21. 1 point
    Hi @askhunter, Here is the newest version of the DVI2RGB IP Core. Please add the full vivado library folder in the ip repository. There is mandatory files that need to be included for the DVI2RGB IP Core to work. What development board are you using? best regards, Jon
  22. 1 point
    zygot

    How to connect an external FIFO to FPGA

    I have a few random thoughts on the subject ( is anyone surprised? ) I looked over an old project where just for fun I used a 128Kx32 single clock FIFO built with BRAM. It was for the Nexys Video Artix device which has the same 36Kb BRAMs. it used 116 BRAMs and worked at 100 Mhz with a mid-range speed part. 36Kb/9 = 4096 bytes plus parity, 131072/4906 = 32 9-bit BRAMs, 4x32 = 128 BRAMs to implement a 128Kx32 FIFO so Vivado must have found some way to save 12 BRAMs. If you have 18-bit data that's fine as the BRAMs can be organized as 32Kx9 where the extra bit is meant for parity. From experience I can tell you that using the parity bit for data can get tricky but is entirely possible. If you need a dual clock FIFO then expect to use more BRAMs. If there isn't much else in your design timing won't be a problem. If you are trying to place 116 BRAMs into a complicated high speed design then you will find yourself needing to leanr about timing closure strategies. If I don't have to worry about resource usage, timing issues I'd use an HDL to implement RAM or FIFO structures as it's portable, more or less. I tend to just bite the bullet and use the vendors tools to implement resources like block memory, PLLs, and such as these resources aren't really that compatible between vendors and I usually do care about resource usage and timing. Also, vendor IP 'wizards' sometimes creates constraints for them and take care of a lot of little details that ultimately save time.
  23. 1 point
    Thank you Dan. Your answer helps clarify things greatly for me on this subject. I appreciate you taking the time to help me out. -Sean
  24. 1 point
    Yes, you can combine more than one block RAM. There is more than one way to implement a FIFO. If I had to do it for myself, I'd write it in plain Verilog, it's about two or three screen lengths of code if the interface requirements are "clean" (such as, one clock and freedom to leave a few clock cycles of latency, before the first input appears at the output). I didn't check but I think there is an "IP block wizard" for FIFOs in Vivado that may do what you need. With "expensive" I meant just that, it costs a lot of money to use half an FPGA just for memory.
  25. 1 point
    Well, to be honest, I didn't read the datasheet to the high-capacity devices with 9M bits. So this one isn't even EOL. Well, it depends. Have a look at https://www.xilinx.com/support/documentation/data_sheets/ds180_7Series_Overview.pdf table 6. There are 325 of those 36 kB blocks on the FPGA (11700 kB in total), so you need about 1/4 of the total FPGA for memory. Technically feasible and easiest to implement but a very expensive FIFO. Now, connecting this chip to realize its full performance potential (e.g. 225 MHz) is not straightforward. From between the lines ... >> Do I must use all pins of external FIFO? ... I read that you don't have hands-on experience with e.g. CMOS ICs (the answer is "you must drive any input at any time, unless the data sheet says explicitly otherwise", or strange things will happen. "Strange" in a sense that the circuit may respond to waving my hands over it, and that's not an exaggeration). If I'm correct in this, it may be a good idea to get e.g. a few CD4017 or some other standard CMOS chip with simple functionality for < $1 and use this to bring up your FPGA IOs. If the FIFO chip doesn't work because of IO issues, it will be near-impossible to debug.
  26. 1 point
    fpga_babe

    HID protocol on Basys3

    Thank you Jon. I will check it.
  27. 1 point
    @longboard, Yeah, that's really confusing isn't it? At issue is the fact that many of these chips are specified in Mega BITS not BYTES. So the 1Gib is mean to refer to a one gigabit memory, which is also a 128 megabyte memory. That's what the parentheses are trying to tell you. Where this becomes a real problem is that I've always learned that a MiB is a reference to a million bytes, 10^6 bytes, rather than a mega byte, or 2^20 bytes. The proper acronyms, IMHO, should be Gb, GB, Mb, and MB rather than GiB or MiB which are entirely misleading. As for the memory, listed as 16 Meg x 8 x 8, that's a reference to 8-banks of 16-mega words or memory, where each word is 8-bits wide. In other words, the memory has 16MB*8 or 128MB of storage. You could alternatively say it had 1Gb of memory, which would be the same thing, but this is often confused with 1GB of memory--hence the desire for the parentheses again. Dan
  28. 1 point
    you might give a bit more information to not be mistaken for a lazy student. My first thought is simply "do not". The component is EOL and you can have the same using the FPGA's BRAM with a LOT less hassle.
  29. 1 point
    revathi

    xadc_zynq

    Hi @jpeyron, Thankyou for your kind reply. I will check the code that you have sent. The board that am using is Zynq ZC702, evalutaion kit. NO, I didn't try any auxillary channel. Today i will try and update you. Thankyou once again
  30. 1 point
    Thanks much, Jon. Best Cuikun
  31. 1 point
    kwilber

    Dynamic voltage and frequency scaling

    Here are two additional articles I have read on the technique being applied to a zynq. https://xilinx-wiki.atlassian.net/wiki/spaces/A/pages/18842065/Zynq-7000+AP+SoC+Low+Power+Techniques+part+5+-+Linux+Application+Control+of+Processing+System+-+Frequency+Scaling+More+Tech+Tip https://github.com/tulipp-eu/tulipp-guidelines/wiki/Dynamic-voltage-and-frequency-scaling-(DVFS)-on-ZC702
  32. 1 point
    xc6lx45

    Dynamic voltage and frequency scaling

    https://www.xilinx.com/support/documentation/white_papers/wp389_Lowering_Power_at_28nm.pdf page 3
  33. 1 point
    xc6lx45

    Dynamic voltage and frequency scaling

    >> is it possible to present DVFS on it. >> For now I now about clock wizard, DCM, PLL for different clock generation (frequency) but this is not frequency scaling mi right? you may have your own answer there. This is some university project? Have you done your own research? For example, this has all the right keywords: https://highlevel-synthesis.com/2017/04/12/voltage-scaling-on-xilinx-zynq
  34. 1 point
    Hi @Phil_D The gain switch is adjusted automatically based on the selected scope range. At 500mV/div (5Vpk2pk ~0.3mV resolution) or lower the high gain is used with and above this the low gain (50Vpk2pk w ~3mV resolution). In case you specify trigger level out of the screen (5Vpk2pk) or offset higher/lower than +/- 2.5V the low gain will be used for the trigger source channel. This will be noted on the screen with red warning text. The attenuation is a different thing. This option lets you specify the external attenuation or amplification on the signals which enter the scope inputs and the data is scaled accordingly. Like, if you use a 10x scope probe, the scope input will actually get 1/10th of the original signal, but specifying 10x attenuation the signal is scaled to show values on the probe. In this case the 500mV/div (5Vpk2pk) low/high gain limit moves up to 5V/div (50Vpk2pk) and the low gain up to 50V/div If you have an external 100x amplifier on the scope input you can specify 0.01x attenuation. With this you will have 5mV/div (50mVpk2pk ~0.003mV resolution) for high gain.
  35. 1 point
    billium

    HS1 HS3 to Spartan 6 in Linux

    Hello Daniel What do you get with: djtgcfg enum and after that djtgcfg init <device name> Which flavour of Linux are you using? Have you installed digilent adept and utilities? Billy
  36. 1 point
    Notarobot

    Stuck in SDK

    Kris, Printing via xil_print is using STDIO by default and could reconfigured in Vivado and SDK. With little info you provided a lot left for guessing. My last recommendation is to pay attention to a configuration of your build in SDK. The interrupt interrupt service routine (ISR) might not work if GCC compiler has optimization ON. To make it work the variable in the ISR should be declared volatile. Debug build usually has flag -O0 that optimization none. Good luck!
  37. 1 point
    Hi Ferda I am currently at the stage of designing an external adapter PCB similar to this presented video (I read the whole post about it on eevblog). This adapter should at least solve some of the problems you're talking about (eg stronger output based on LT1210 or BUF634). PS. Do you have a link to the program that the author of the attached video use?
  38. 1 point
    D@n

    Understand Resource Usage for FIR Compiler

    @aabbas02, What's the data rate of the filter compared to the number of taps? As in, are you going to need to accept one sample per clock into this filter, or will you have reliable idle clock periods to work with? I'm wondering, if you have to build this, how easy/hard it might be. Dan
  39. 1 point
    So, if I am getting the point of the previous two posts from xclx45 and Dan, make your own filter in Verilog or VHDL and figure out the details ( signed, unsigned, fixed point, etc, .... etc ). You can instantiate DSP48E primitives (difficult) or just let the synthesis tool do it from your HDL code (easier). Debating how things should be verses how they are when using third party scripts to generate unreadable code seems like a waste of time to me... If you don't like what you get then design what you want. If you can make the time to write your own IP ( would be nice to not depend on a particular vendor's DSP architecture ) you'll learn a lot and save a lot of time later. If a vendor's IP doesn't make timing closure for your design a nightmare and you don't have the time to figure out all the details just let the IP handle it. I suspect that trying to optimize the FIR Compiler will be frustrating at best. I once had to come up with pipelined code to calculate signal strength in dB at a minimum sustained data rate. My approach to converting 32-bit integers to logarithms used a combination of Taylor Series expansion and look-up tables. I had a few versions**. One was straight VHDL so that I could compare Altera and Xilinx DSP tiles. One instantiated DSP48 tile primitives for a particular Xilinx device. These were fixed point designs. There's theory and there's practical experience.. they are usually not the same. ** I played with a number of approaches based on extremely limited specifications so there were quite a few versions. Every time I presented one the requirements changed and so did the complexity and resource requirements. I should mention that my intent for mentioning this experience is not to denigrate the information presented by others or to claim superiority in any way. When getting advice it's important to put that into context. A lot of times facts aren't necessarily relevant to solving a particular problem. If I haven't made this clear I've never had the experience that vendor IP optimizes resource usage... in fact quite the opposite. This is why in a commercial setting companies are willing to pay to develop their own IP. Sometimes FPGA silicon costs overshadow development costs.
  40. 1 point
    The word "overclocking" may be even misleading - this architecture is used when the data rate is significantly lower than the multiplier's speed. Inputs to one (expensive) multiplier are multiplexed, so it can do the work for several or all taps. The "multiplexing" itself will get fairly expensive in logic fabric due to the number of data bits and coefficients. To the rescue comes the BRAM, which is essentially a giant demultiplexer-multiplexer combo, with a football field of flipflops in-between. You can find an example for this approach here. Out-of-the-box, it's unfortunately quite complicated because it does arbitrary rational resampling. Setting the rates equal for a standard FIR filter, you end up with only a few lines of remaining code. BTW, multiplier count as design metric is probably overrated nowadays for several reasons (the IP tool resource usage is already more practical, e.g. BRAM count may become the bottleneck). If you can get this book through your library, you might have a look e.g. at chapter 6 (background only, this is a very old book): Keshab K. Parhi VLSI Digital Signal Processing Systems: Design and Implementation
  41. 1 point
    D@n

    Understand Resource Usage for FIR Compiler

    @aabbas02, So let's start at the top. An N-point FIR filter, such as this one, requires N multiplies and N-1 adds. Let's just count multiplies, though, for now. Let's now look at some optimizations you might apply. A complex multiply usually requires 4 multiplies. If you have complex input and taps, that's 4N real multiplies. There's a trick you can use to drop this to 3N multiplies. If the filter is real, and the incoming signal is complex, you can drop this to 2N multiplies by filtering each of the real and imaginary channels separately. If your filter taps are fixed, a good optimizer should be able to replace the zeros with a constant output, the ones with data pass through, etc. Implementing taps with two or three bits this way is possible. This only works, though, if the filter taps are fixed. Many of the common FIR filter developments generate linear phase filters. These filters are symmetric about a common point. With a little bit of work, you can use that to reduce the number of multiplies (for dynamic taps) down to (N-1)/2. There is a very significant class of filters called half band filters. In a half band filter, every other tap (other than the center tap) is zero. With a bit of work, you can then drop your usage down to (N-1)/4 multiplies. This optimization applies to Hilbert transforms as well. In the given list above, I've assumed that you need to operate at one sample in and one sample out per clock. In that case, there's no time or room for memory, since all of the stages need their data on every clock. I should also point out that I'm counting multiplies, not DSP slices. If your multiplies are smaller than 18x18 bits, you may be able to use a single DSP slice per multiply. If they are larger, you might find yourself using many DSP slices per multiply. It depends. Let's now consider the case where you have an N sample filter but you are only going to provide it with one sample of data every N+ samples (some number more than N). You could then multiplex this multiply and implement an N-point filter with 1 multiply and 2^(ceil(log_2(N)) RAM elements. If your filter was symmetric, you could process a filter roughly twice as long, or a sample rate roughly twice as fast while still using a single multiply. The half-band and hilbert tricks can apply to (roughly) double your filter size or your data rate again. In these cases, however, you can't spare the multiply at all, since it is used to implement every coefficient. That should give you both ends of the spectrum. Now, while I don't understand how Xilinx's FIR compiler works, I can say that if I were to make one I would allow a user to design crosses between these two ends of the spectrum. In the case of a such a cascaded filter, however, you may find it difficult to continue to implement the optimizations we've just discussed, simply because the cascaded structure becomes more difficult to deal with. (By cascade, I mean cascade in implementation and not filtering a stream and then filtering it again.) Looking at your two designs, none of the optimizations I've mentioned would apply. In the high speed filters we started out with, sure, you might manage to remove a multiply or two by multiplying by zero. On the other hand, you can't share multiplies across elements if you do so. For example, you mentioned [1 2 3 4 0 1 2 3 4]. Sure, it's got repeating coefficients, but this is actually a really obscure filtering structure, and not likely one that I (were I Xilinx) would pay to support. Try [ 1 2 3 4 0 4 3 2 1] instead and see if things change. Similarly for [ 1 0 0 0 0 0 0 0 1]. Yeah, it's symmetric about the mid-point, but in all of my symmetric filtering applications I'm assuming the mid point has a value of 2^N-1 or similar so that it doesn't need a multiply. Your choice of filters offers no such optimization, so who knows how it might work. Hope this helps to shed some light on things, Dan
  42. 1 point
    Xilinx encrypts the FIR compiler code so it's difficult to figure out what's going on with your experiments. You should read PG149 which is the IP product Guide for the FIR Compiler. It does indicate when coefficient symmetry optimizations might or won't be made. It also has a link to a Resource Utilization "Spreadsheet" for some FPGA family devices depending on your customized specifications. Oddly, that bit of guidance doesn't show any usages where significant numbers of BRAMS are ever used. The guide does have a section titled "Resource Considerations" that has some interesting information but not enough to answer you questions. I suspect that you'll have to understand actual resource utilization for your application empirically. (SOP from my experience when I really need optimal performance) It's always a good idea to be familiar with all device and IP documentation though, in my experience, the IP documentation rarely addresses all of my questions when I'm deep into a design and committed to IP that's not mine. Again, my guess is that when you specify a clock rate, sample rate, data and coefficient widths, etc resource utilization increases with increasing throughput. The same thing happens if you pipeline your code to maximize data rates. But I'm only guessing. I don't use the FIR Compiler so my views are an extrapolation of experience with other IP from all FPGA vendors that I have used.
  43. 1 point
    Hi, [1 2 3 4 0 1 2 3 4] is not symmetric in a linear-phase sense. That would be e.g. [1 2 3 4 0 4 3 2 1]. You could exploit the shared coefficients manually, see e.g. Figure 3 for the general concept. But this case is so unusual that I doubt tools will take it into account. The tool does nothing magical. If performance matters more than design time, you'll always get better results for one specific problem with manual design. One performance metric is multiplier utilization (e.g. assuming you design for a 200 MHz clock, one DSP delivering 200M operations / second performs at 100 %. Reaching 50+ % is a realistic goal for a simple / single rate structure). For example, do I want to use an expensive BRAM at all, when I could use ring shift registers for delay line and coefficients. Then you only need a small controlling state machine around it that does a full circular shift for each sample, muxing in the new input sample every full cycle (the BRAM makes more sense when the filter serves many channels in parallel, then FF count becomes an issue).
  44. 1 point
    PG_R

    RGMII (ZCU102-Zybo Z7)

    Thanks Jon and Zygot, I appreciate the explanation. Regards
  45. 1 point
    zygot

    A UART Based Debugger Tool

    Here's a utility for debugging and testing your code in hardware and uses any IO pin to send an ASCII representation of any signal through a hardware UART interface. If you don't have a UART on you FPGA board there are TTL USB UART breakout boards and cables that allow any spare IO pin to become a UART interface. This code is functionally the same as one recently released by Hamster but developed independently for the Fast Data Interface project. I recommend comparing the different coding styles. I decided to release this as a separate project as there are likely more people interested in this one that the other. This project contains test bench code. UartDebuggerR3.zip
  46. 1 point
    Hi @aliff saad, Welcome to the forums. Here is the resource center for the PmodNAV. Here is a completed INO file, basically a main file, for the PmodNAV and an Arduino. The error is stating that LSM9DS1 is not assigned to a data type I.E. int, char... or in the case of the INO it is a class. Did you also download the SparkFun LSM9DS1 Arduino Library and add the src files to your project? cheers, Jon
  47. 1 point
    hamster

    MMCM dynamic clocking

    Hey, something else I just saw when reading the clocking guide was: MMCM Counter Cascading The CLKOUT6 divider (counter) can be cascaded with the CLKOUT4 divider. This provides a capability to have an output divider that is larger than 128. CLKOUT6 feeds the input of the CLKOUT4 divider. There is a static phase offset between the output of the cascaded divider and all other output dividers. And: CLKOUT4_CASCADE : Cascades the output divider (counter) CLKOUT6 into the input of the CLKOUT4 divider for an output clock divider that is greater than 128, effectively providing a total divide value of 16,384. So that can divide a 600 MHz VCO down to 36.6 kHz.
  48. 1 point
    hamster

    MMCM dynamic clocking

    I feel a bit bad about posting a minor novel here, but here is an example of going from "5 cycles on, 5 off" (i.e. divide by 10) to "10 on, 10 off" (device by 20). The VCO is initially to 800 MHz with CLK0 being VCO divide by 8.... so after config you get 100MHz. Push the button and you get 800/20 = 40MHz, release the button and you get 80MHz. It is all really hairy in practice! EDIT: Through experimentation I just found that you don't need to reset the MMCM if you are not changing the VCO frequency. So the 'rst' signal in the code below isn't needed (and LOCKED will stay asserted). -------------------------------------------------------------------------------------------------------- -- Playing with the MMCM DRP ports. -- see https://www.xilinx.com/support/documentation/application_notes/xapp888_7Series_DynamicRecon.pdf -- for the Dynamic Reconviguration Port addresses -------------------------------------------------------------------------------------------------------- library IEEE; use IEEE.STD_LOGIC_1164.ALL; use IEEE.NUMERIC_STD.ALL; library UNISIM; use UNISIM.VComponents.all; entity mmcm_reset is Port ( clk_100 : in STD_LOGIC; btn_raw : in STD_LOGIC; led : out STD_LOGIC_VECTOR (15 downto 0)); end mmcm_reset; architecture Behavioral of mmcm_reset is signal btn_meta : std_logic := '0'; signal btn : std_logic := '0'; signal speed_select : std_logic := '0'; signal counter : unsigned(26 downto 0) := (others => '0'); signal debounce : unsigned(15 downto 0) := (others => '0'); signal clk_switched : std_logic := '0'; signal clk_fb : std_logic := '0'; type t_state is (state_idle_fast, state_go_slow_1, state_go_slow_2, state_go_slow_3, state_idle_slow, state_go_fast_1, state_go_fast_2, state_go_fast_3); signal state : t_state := state_idle_fast; ----------------------------------------------------------------------------- --- This is the CLKOUT0 ClkReg1 address - the only register to be played with ----------------------------------------------------------------------------- signal daddr : std_logic_vector(6 downto 0) := "0001000"; signal do : std_logic_vector(15 downto 0) := (others => '0'); signal drdy : std_logic := '0'; signal den : std_logic := '0'; signal di : std_logic_vector(15 downto 0) := (others => '0'); signal dwe : std_logic := '0'; signal rst : std_logic := '0'; begin MMCME2_ADV_inst : MMCME2_ADV generic map ( BANDWIDTH => "OPTIMIZED", -- Jitter programming (OPTIMIZED, HIGH, LOW) CLKFBOUT_MULT_F => 8.0, -- Multiply value for all CLKOUT (2.000-64.000). CLKFBOUT_PHASE => 0.0, -- Phase offset in degrees of CLKFB (-360.000-360.000). -- CLKIN_PERIOD: Input clock period in ns to ps resolution (i.e. 33.333 is 30 MHz). CLKIN1_PERIOD => 10.0, CLKIN2_PERIOD => 0.0, -- CLKOUT0_DIVIDE - CLKOUT6_DIVIDE: Divide amount for CLKOUT (1-128) CLKOUT1_DIVIDE => 1, CLKOUT2_DIVIDE => 1, CLKOUT3_DIVIDE => 1, CLKOUT4_DIVIDE => 1, CLKOUT5_DIVIDE => 1, CLKOUT6_DIVIDE => 1, CLKOUT0_DIVIDE_F => 8.0, -- Divide amount for CLKOUT0 (1.000-128.000). -- CLKOUT0_DUTY_CYCLE - CLKOUT6_DUTY_CYCLE: Duty cycle for CLKOUT outputs (0.01-0.99). CLKOUT0_DUTY_CYCLE => 0.5, CLKOUT1_DUTY_CYCLE => 0.5, CLKOUT2_DUTY_CYCLE => 0.5, CLKOUT3_DUTY_CYCLE => 0.5, CLKOUT4_DUTY_CYCLE => 0.5, CLKOUT5_DUTY_CYCLE => 0.5, CLKOUT6_DUTY_CYCLE => 0.5, -- CLKOUT0_PHASE - CLKOUT6_PHASE: Phase offset for CLKOUT outputs (-360.000-360.000). CLKOUT0_PHASE => 0.0, CLKOUT1_PHASE => 0.0, CLKOUT2_PHASE => 0.0, CLKOUT3_PHASE => 0.0, CLKOUT4_PHASE => 0.0, CLKOUT5_PHASE => 0.0, CLKOUT6_PHASE => 0.0, CLKOUT4_CASCADE => FALSE, -- Cascade CLKOUT4 counter with CLKOUT6 (FALSE, TRUE) COMPENSATION => "ZHOLD", -- ZHOLD, BUF_IN, EXTERNAL, INTERNAL DIVCLK_DIVIDE => 1, -- Master division value (1-106) -- REF_JITTER: Reference input jitter in UI (0.000-0.999). REF_JITTER1 => 0.0, REF_JITTER2 => 0.0, STARTUP_WAIT => FALSE, -- Delays DONE until MMCM is locked (FALSE, TRUE) -- Spread Spectrum: Spread Spectrum Attributes SS_EN => "FALSE", -- Enables spread spectrum (FALSE, TRUE) SS_MODE => "CENTER_HIGH", -- CENTER_HIGH, CENTER_LOW, DOWN_HIGH, DOWN_LOW SS_MOD_PERIOD => 10000, -- Spread spectrum modulation period (ns) (VALUES) -- USE_FINE_PS: Fine phase shift enable (TRUE/FALSE) CLKFBOUT_USE_FINE_PS => FALSE, CLKOUT0_USE_FINE_PS => FALSE, CLKOUT1_USE_FINE_PS => FALSE, CLKOUT2_USE_FINE_PS => FALSE, CLKOUT3_USE_FINE_PS => FALSE, CLKOUT4_USE_FINE_PS => FALSE, CLKOUT5_USE_FINE_PS => FALSE, CLKOUT6_USE_FINE_PS => FALSE ) port map ( -- Clock Outputs: 1-bit (each) output: User configurable clock outputs CLKOUT0 => clk_switched, CLKOUT0B => open, CLKOUT1 => open, CLKOUT1B => open, CLKOUT2 => open, CLKOUT2B => open, CLKOUT3 => open, CLKOUT3B => open, CLKOUT4 => open, CLKOUT5 => open, CLKOUT6 => open, -- Dynamic Phase Shift Ports: 1-bit (each) output: Ports used for dynamic phase shifting of the outputs PSDONE => open, -- Feedback Clocks: 1-bit (each) output: Clock feedback ports CLKFBOUT => clk_fb, CLKFBOUTB => open, -- Status Ports: 1-bit (each) output: MMCM status ports CLKFBSTOPPED => open, CLKINSTOPPED => open, LOCKED => open, -- Clock Inputs: 1-bit (each) input: Clock inputs CLKIN1 => clk_100, CLKIN2 => '0', -- Control Ports: 1-bit (each) input: MMCM control ports CLKINSEL => '1', PWRDWN => '0', -- 1-bit input: Power-down RST => rst, -- 1-bit input: Reset -- DRP Ports: 16-bit (each) output: Dynamic reconfiguration ports DCLK => clk_100, -- 1-bit input: DRP clock DO => DO, -- 16-bit output: DRP data DRDY => DRDY, -- 1-bit output: DRP ready -- DRP Ports: 7-bit (each) input: Dynamic reconfiguration ports DADDR => DADDR, -- 7-bit input: DRP address DEN => DEN, -- 1-bit input: DRP enable DI => DI, -- 16-bit input: DRP data DWE => DWE, -- 1-bit input: DRP write enable -- Dynamic Phase Shift Ports: 1-bit (each) input: Ports used for dynamic phase shifting of the outputs PSCLK => '0', PSEN => '0', PSINCDEC => '0', -- Feedback Clocks: 1-bit (each) input: Clock feedback ports CLKFBIN => clk_fb ); speed_change_fsm: process(clk_100) begin if rising_edge(clk_100) then di <= (others => '0'); dwe <= '0'; den <= '0'; case state is when state_idle_fast => if speed_select = '1'then state <= state_go_slow_1; -- High 10 Low 10 di <= "0001" & "001010" & "001010"; dwe <= '1'; den <= '1'; end if; when state_go_slow_1 => if drdy = '1' then state <= state_go_slow_2; end if; when state_go_slow_2 => rst <= '1'; state <= state_go_slow_3; when state_go_slow_3 => rst <= '0'; state <= state_idle_slow; when state_idle_slow => di <= (others => '0'); if speed_select = '0' and drdy = '0' then state <= state_go_fast_1; -- High 5 Low 5 di <= "0001" & "000101" & "000101"; dwe <= '1'; den <= '1'; end if; when state_go_fast_1 => if drdy = '1' then state <= state_go_fast_2; end if; when state_go_fast_2 => rst <= '1'; state <= state_go_fast_3; when state_go_fast_3 => rst <= '0'; state <= state_idle_fast; end case; end if; end process; dbounce_proc: process(clk_100) begin if rising_edge(clk_100) then if speed_select = btn then debounce <= (others => '0'); elsif debounce(debounce'high) = '1' then speed_select <= not speed_select; else debounce <= debounce + 1; end if; -- Syncronise the button btn <= btn_meta; btn_meta <= btn_raw; end if; end process; show_speed_proc: process(clk_switched) begin if rising_edge(clk_switched) then counter <= counter + 1; led(7 downto 0) <= std_logic_vector(counter(counter'high downto counter'high-7)); end if; end process; led(15) <= speed_select; end Behavioral;
  49. 1 point
    davec

    How to program Arty flash

    I figured out why the sck signal is not getting implemented- For some reason, "PARAM_VALUE.C_USE_STARTUP" is not set to zero, so when the file "board.xit" does not see this variable =0, it does not implement the sck pin (L16). I cheated and took out this test in "board.xit" (because I don't know where that PARAM_VALUE gets set). I added constraints for the pin: set_property PACKAGE_PIN L16 [get_ports qspi_flash_0_sck_io] set_property IOSTANDARD LVCMOS33 [get_ports qspi_flash_0_sck_io] and I now get a clock to the QSPI flash when the bootloader runs. One last step to solve- where to put my user program in flash that the bootloader will copy into DDR. The tools don't tell me how large the FPGA config file is (with compression on). I looked at the file size of the .bin file and rounded up to the next 1K, but I wish there was a programmatic way to do this from Vivado. Hurray- On power-up I can now config the FPGA, run the bootloader in block ram, which then copies my large user program from flash to DDR and executes.
  50. 1 point
    We have a few demo projects using the Basys2 in RF communications that we haven't gotten had time to document yet. Although it's different, there are a lot of similiarities that you may be able to use. Take a look at the attached Zip File. The demo project is using the Pmod RF1 to communicate keyboard presses to another Basys2 that is controlling a speaker. PmodRF1 Basys demo project by digilentinc, on Flickr Here is a link to get the files for the project: https://www.dropbox.com/s/83winp3zyio7or7/Basys2AudioRfPmodHID.zip?dl=0 Hope that helps!