Leaderboard


Popular Content

Showing content with the highest reputation since 04/21/19 in all areas

  1. 2 points
    xc6lx45

    Increasing the clock frequency to 260 MHz

    Hi, reading between the lines of your post, you're just "stepping up" one level in FPGA design. I don't do long answers but here's my pick on the "important stuff" - Before, take one step back from the timing report and fix asynchronous inputs and outputs (e.g. LEDs and switches). Throw in a bunch of extra registers, or even "false-path" them. The problem (assuming this "beginner mistake") is that the design tries to sample them at the high clock rate. Which creates a near-impossible problem. Don't move further before this is understood, fixed and verified. - speaking of "verified": Read the detailed timing analysis and understand it. It'll take a few working hours to make sense of it but this is where a large part of "serious" design work happens. - Once the obvious problems are fixed, I need to understand what is the so-called "critical path" in the design and improve it. For a feedforward-style design (no feedback loops) this can be systematically done by inserting delay registers. The output is generated e.g. one clock cycle later but the design is able to run at a higher clock so overall performance improves. - Don't worry about floorplanning yet (if ever) - this comes in when the "automatic" intelligence of the tools fails. But, they are very good. - Do not optimize on a P&R result that fails timing catastrophically (as in your example - there are almost 2000 paths that fail). It can lead into a "rabbit's hole" where you optimize non-critical paths (which is usually a bad idea for long-term maintenance) - You may adjust your coding style based on the observations, e.g. throw in extra registers where they will "probably" make sense (even if those paths don't show up in the timing analysis, the extra registers allow the tools to essentially disregard them in optimization to focus on what is important) - There are a few tricks like forcing redundant registers to remain separate. Example, I have a dozen identical blocks that run on a common, fast 32-bit system clock and are critical to timing. Step 1, I sample the clock into a 32-bit register at each block's input to relax timing, and step 2) I declare these register as DONT_TOUCH because the tools would otherwise notice they are logically equivalent and try to use one shared instance. This as an example. - For BRAMs and DSP blocks, check the documentation where extra registers are needed (that get absorbed into the BRAM or DSP using a dedicated hardware register). This is the only way to reach the device's specified memory or DSP performance. - Read the warnings. Many relate to timing, e.g. when the design forces a BRAM or DSP to bypass a hardware register. - Finally, 260 MHz on Artix is already much harder than 130 MHz (very generally speaking). Usually feasible but you need to pay attention to what you're doing and design for it (e.g. a Microblaze with the wrong settings will most likely not make it through timing). - You might also have a look at the options ("strategy") but don't expect any miracles on a bad design. Ooops, this almost qualifies as "long" answer ...
  2. 2 points
    Thinking of which... actually I do have a plain-Verilog FIFO around from an old design. It's not a showroom piece but I think it did work as expected (whatever that is...) For 131072 elements you'd set ADDRBITS to 17 and DATABITS to 18 for 18 bit width. module FIFO(i_clk, i_reset, i_push, i_pushData, i_pop, o_popAck, o_popData, o_empty, o_full, o_error, o_nItems, o_nFree); parameter DATABITS = -1; parameter ADDRBITS = -1; localparam ADDR_ZERO = {{(ADDRBITS){1'b0}}}; localparam ADDR_ONE = {{(ADDRBITS-1){1'b0}}, 1'b1}; localparam DATA_X = {{(DATABITS){1'bx}}}; input wire i_clk; input wire i_push; input wire i_reset; input wire [DATABITS-1:0] i_pushData; input wire i_pop; output reg o_popAck = 1'b0; output wire [DATABITS-1:0] o_popData; output reg o_error = 1'b0; output wire [31:0] o_nItems; output wire [31:0] o_nFree; output wire o_empty; output wire o_full; reg popAckB = 1'b0; reg [DATABITS-1:0] mem[((1 << ADDRBITS)-1):0]; reg [ADDRBITS-1:0] pushPtr = ADDR_ZERO; reg [ADDRBITS-1:0] popPtr = ADDR_ZERO; reg [DATABITS-1:0] readReg = DATA_X; reg [DATABITS-1:0] readRegB = DATA_X; wire [ADDRBITS-1:0] nextPushPtr = i_push ? pushPtr + ADDR_ONE : pushPtr; wire [ADDRBITS-1:0] nextPopPtr = i_pop ? popPtr + ADDR_ONE : popPtr; assign o_popData = o_popAck ? readReg : DATA_X; // === items counter === // note: needs extra bit (e.g. 4 slots may hold [0, 1, 2, 3, 4] elements) reg [ADDRBITS:0] nItems; assign o_nItems = {{{31-ADDRBITS-1}{1'b0}}, nItems}; assign o_nFree = (1 << ADDRBITS) - nItems; localparam NITEMS_ONE = {{(ADDRBITS){1'b0}}, 1'b1}; assign o_empty = nItems == 0; assign o_full = nItems == {1'b1, {{ADDRBITS}{1'b0}}}; always @(posedge i_clk) begin // === preliminary assignments === readRegB <= DATA_X; popAckB <= 1'b0; case ({i_push, i_pop}) 2'b10: nItems <= nItems + NITEMS_ONE; 2'b01: nItems <= nItems - NITEMS_ONE; default: begin end endcase o_error <= (i_push && ~i_pop && o_full) || (i_pop && o_empty); // === output register (delay 1) === o_popAck <= popAckB; readReg <= readRegB; pushPtr <= nextPushPtr; popPtr <= nextPopPtr; if (i_push) mem[pushPtr] <= i_pushData; if (i_pop) begin readRegB <= mem[popPtr]; popAckB <= 1'b1; end if (i_reset) begin pushPtr <= ADDR_ZERO; popPtr <= ADDR_ZERO; o_error <= 1'b0; o_popAck <= 1'b0; popAckB <= 1'b0; readReg <= DATA_X; readRegB <= DATA_X; nItems <= 0; end end endmodule
  3. 1 point
    Floorplanning is where you start before designing a new board. Once you've assigned pins and created a PCB your options for meeting timing for a particularly complex, dense, and high clock rate design are limited. Of course you will need to have a reasonably 'close to final version' of your FPGA design to start with so that the tools can select the best pin locations. For a general purpose development board like the one you are using only a few interfaces need to be 'optimized' for speed; and of course the speed grade of the parts on the board have a large impact on limiting the performance of any design. It is not always possible to select an arbitrary clock rate for any application for a particular board and always meet timing. On the other hand it's easy to create a design that doesn't have a chance to operate at a desired clock rate when a better conceived design might. Providing the tools with good guidance in the form of constraints is often the key to achieving a particular performance goal, though don't expect Vivado to turn a poor design into a greate design.
  4. 1 point
    attila

    Waveforms SDK calibration

    Hi @Thore The calibration is implicitly used in the SDK. The set or got voltage values with API functions are always corrected based on the device calibration parameters.
  5. 1 point
    That's amazing to hear! I appreciate all of your help. I will update the board files now. Take care, Justen
  6. 1 point
    jpeyron

    Bluetooth stack in verilog

    Hi @Aamirnagra, Unfortunately, I have not worked with implementing the BLE protocol with HDL nor have i found any examples. We do have a Pmod BLE along with the Pmod BLE IP Core usable with microblaze that facilitates the uart communication. best regards, Jon
  7. 1 point
    jpeyron

    hdmi ip clocking error

    Hi @askhunter, I did a little more searching and found a forum thread here where the customer is having a similar issue. A community member also posted a pass through zynq project that should be useful for your project. best regards, Jon
  8. 1 point
    For the Protocol / SPI-I2C /Spy mode you should specify the approximate (or highest) protocol frequency which will be used to filter transient glitches, like ringing on clock signal transition. The Errors you get indicate the signals are not correctly captured. - make sure to have proper grounding between the devices/circuits - use twisted wires (signal/ground) to reduce EMI - use logic analyzer and/or scope to verify the captured data / voltage levels at higher sample rate at least 10x the protocol frequency Like here in the Logic Analyzer you can see a case when the samples are noisy:
  9. 1 point
    jpeyron

    hdmi ip clocking error

    Hi @askhunter, Please attach a screen shot of your vivado block design. Have you tried changing the MMCM to PLL in the DVI2RGB IP Core? best regards, Jon
  10. 1 point
    mjdbishop

    JTAG-HS2 firmware erased by accident

    Hi Jon Many thanks - cable now working Many Thanks Martin
  11. 1 point
    Hi @askhunter, I believe that you would only need the more recent DVI2RGB IP Core and the IF folder in the Vivado library. best regards, Jon
  12. 1 point
    Hi @Mohammad12gsj, I have sent you a PM about this. best regards, Jon
  13. 1 point
    Hi @askhunter, Here is the newest version of the DVI2RGB IP Core. Please add the full vivado library folder in the ip repository. There is mandatory files that need to be included for the DVI2RGB IP Core to work. What development board are you using? best regards, Jon
  14. 1 point
    jpeyron

    Zybo HDMI Input not working

    Hi @nakedtofu, Welcome to the Digilent Forums! I would suggest using the version of vivado that this project was made for and is verified to work with. Upgrading projects to newer versions of Vivado can become a non-trivial task. IP Cores can have changes that require configuration alterations to other IP Cores for them to interact correctly. We currently do not have an eta for when we will be upgrading this project to a newer version of Vivado. If you are going to continue in the newer version of vivado this might be a long road ahead of you before having a verified project. best regards, Jon
  15. 1 point
    Thank you Dan. Your answer helps clarify things greatly for me on this subject. I appreciate you taking the time to help me out. -Sean
  16. 1 point
    Yes, you can combine more than one block RAM. There is more than one way to implement a FIFO. If I had to do it for myself, I'd write it in plain Verilog, it's about two or three screen lengths of code if the interface requirements are "clean" (such as, one clock and freedom to leave a few clock cycles of latency, before the first input appears at the output). I didn't check but I think there is an "IP block wizard" for FIFOs in Vivado that may do what you need. With "expensive" I meant just that, it costs a lot of money to use half an FPGA just for memory.
  17. 1 point
    Well, to be honest, I didn't read the datasheet to the high-capacity devices with 9M bits. So this one isn't even EOL. Well, it depends. Have a look at https://www.xilinx.com/support/documentation/data_sheets/ds180_7Series_Overview.pdf table 6. There are 325 of those 36 kB blocks on the FPGA (11700 kB in total), so you need about 1/4 of the total FPGA for memory. Technically feasible and easiest to implement but a very expensive FIFO. Now, connecting this chip to realize its full performance potential (e.g. 225 MHz) is not straightforward. From between the lines ... >> Do I must use all pins of external FIFO? ... I read that you don't have hands-on experience with e.g. CMOS ICs (the answer is "you must drive any input at any time, unless the data sheet says explicitly otherwise", or strange things will happen. "Strange" in a sense that the circuit may respond to waving my hands over it, and that's not an exaggeration). If I'm correct in this, it may be a good idea to get e.g. a few CD4017 or some other standard CMOS chip with simple functionality for < $1 and use this to bring up your FPGA IOs. If the FIFO chip doesn't work because of IO issues, it will be near-impossible to debug.
  18. 1 point
    fpga_babe

    HID protocol on Basys3

    Thank you Jon. I will check it.
  19. 1 point
    @longboard, Yeah, that's really confusing isn't it? At issue is the fact that many of these chips are specified in Mega BITS not BYTES. So the 1Gib is mean to refer to a one gigabit memory, which is also a 128 megabyte memory. That's what the parentheses are trying to tell you. Where this becomes a real problem is that I've always learned that a MiB is a reference to a million bytes, 10^6 bytes, rather than a mega byte, or 2^20 bytes. The proper acronyms, IMHO, should be Gb, GB, Mb, and MB rather than GiB or MiB which are entirely misleading. As for the memory, listed as 16 Meg x 8 x 8, that's a reference to 8-banks of 16-mega words or memory, where each word is 8-bits wide. In other words, the memory has 16MB*8 or 128MB of storage. You could alternatively say it had 1Gb of memory, which would be the same thing, but this is often confused with 1GB of memory--hence the desire for the parentheses again. Dan
  20. 1 point
    I'm going to echo @xc6lx45 and suggest that you reconsider. Does your Kintex have insufficient BRAM for an on-chip FIFO? Using BRAM would be so much easier. If you choose to use the external FIFO, you'll have to adapt the logic that you use to interface to the FIFO to use the signaling of this other FIFO. Sometimes it helps to post more context: what do you want the system to be able to do that it does not now?
  21. 1 point
    Hi @Esti.A, The first error that you get is the following one: ERROR: [Common 17-179] Fork failed: Cannot allocate memory This kind of error is generated when your machine does not have enough RAM memory. Please post here the configuration of your system (CPU, OS, RAM, ..). Do you have swap enabled? Also, have a look on this post.
  22. 1 point
    you might give a bit more information to not be mistaken for a lazy student. My first thought is simply "do not". The component is EOL and you can have the same using the FPGA's BRAM with a LOT less hassle.
  23. 1 point
    revathi

    xadc_zynq

    Hi @jpeyron, Thankyou for your kind reply. I will check the code that you have sent. The board that am using is Zynq ZC702, evalutaion kit. NO, I didn't try any auxillary channel. Today i will try and update you. Thankyou once again
  24. 1 point
    Thanks much, Jon. Best Cuikun
  25. 1 point
    kwilber

    Dynamic voltage and frequency scaling

    Here are two additional articles I have read on the technique being applied to a zynq. https://xilinx-wiki.atlassian.net/wiki/spaces/A/pages/18842065/Zynq-7000+AP+SoC+Low+Power+Techniques+part+5+-+Linux+Application+Control+of+Processing+System+-+Frequency+Scaling+More+Tech+Tip https://github.com/tulipp-eu/tulipp-guidelines/wiki/Dynamic-voltage-and-frequency-scaling-(DVFS)-on-ZC702
  26. 1 point
    xc6lx45

    Dynamic voltage and frequency scaling

    https://www.xilinx.com/support/documentation/white_papers/wp389_Lowering_Power_at_28nm.pdf page 3
  27. 1 point
    xc6lx45

    Dynamic voltage and frequency scaling

    >> is it possible to present DVFS on it. >> For now I now about clock wizard, DCM, PLL for different clock generation (frequency) but this is not frequency scaling mi right? you may have your own answer there. This is some university project? Have you done your own research? For example, this has all the right keywords: https://highlevel-synthesis.com/2017/04/12/voltage-scaling-on-xilinx-zynq
  28. 1 point
    Hi @Phil_D The gain switch is adjusted automatically based on the selected scope range. At 500mV/div (5Vpk2pk ~0.3mV resolution) or lower the high gain is used with and above this the low gain (50Vpk2pk w ~3mV resolution). In case you specify trigger level out of the screen (5Vpk2pk) or offset higher/lower than +/- 2.5V the low gain will be used for the trigger source channel. This will be noted on the screen with red warning text. The attenuation is a different thing. This option lets you specify the external attenuation or amplification on the signals which enter the scope inputs and the data is scaled accordingly. Like, if you use a 10x scope probe, the scope input will actually get 1/10th of the original signal, but specifying 10x attenuation the signal is scaled to show values on the probe. In this case the 500mV/div (5Vpk2pk) low/high gain limit moves up to 5V/div (50Vpk2pk) and the low gain up to 50V/div If you have an external 100x amplifier on the scope input you can specify 0.01x attenuation. With this you will have 5mV/div (50mVpk2pk ~0.003mV resolution) for high gain.
  29. 1 point
    Vivado is complaining that there are active (not commented) pins in the constraint file that do not have matching port names in your design. Open your design_1_wrapper.v file and reconcile the port names specified there with the constraints file. It is not uncommon to have to change the name of a pin in the constraints file to match the port name in the wrapper. This could happen for example if you used "Make external" on an i/o pin from an IP block. One thing that has helped in the past was to delete the top level wrapper and regenerate it. Sometimes when you make pins external after generating the wrapper, there can be inconsistencies between port and pin naming. Xilinx UG903, page 42 and following elaborates on the scoping mechanism Vivado uses.
  30. 1 point
    benl

    Aliasing in Waveforms Live

    Oh, absolutely agree / understand that some form of decimation and interpolation is required to display the waveform (though my DSP is getting pretty rusty). The current implementation is clearly inadequate to the point of not being useful: the waveform displayed when zoomed out is a very inaccurate representation of the signal. You should at least be able to get a somewhat accurate view of the signal's envelope. The same goes for the buffer view at the top; at present it's not useful to inform on much beyond the relative position of the display within the buffer - ideally you'd be able to use it to pick out envelope changes and transients (the latter would certainly benefit from a peak-detection mechanism). In a nutshell, it'd be useful if WFL presented the signal in a similar fashion as the same data on a deep-memory oscilloscope. Ta, Ben
  31. 1 point
    billium

    HS1 HS3 to Spartan 6 in Linux

    Hello Daniel What do you get with: djtgcfg enum and after that djtgcfg init <device name> Which flavour of Linux are you using? Have you installed digilent adept and utilities? Billy
  32. 1 point
    D@n

    Understand Resource Usage for FIR Compiler

    @aabbas02, What's the data rate of the filter compared to the number of taps? As in, are you going to need to accept one sample per clock into this filter, or will you have reliable idle clock periods to work with? I'm wondering, if you have to build this, how easy/hard it might be. Dan
  33. 1 point
    So, if I am getting the point of the previous two posts from xclx45 and Dan, make your own filter in Verilog or VHDL and figure out the details ( signed, unsigned, fixed point, etc, .... etc ). You can instantiate DSP48E primitives (difficult) or just let the synthesis tool do it from your HDL code (easier). Debating how things should be verses how they are when using third party scripts to generate unreadable code seems like a waste of time to me... If you don't like what you get then design what you want. If you can make the time to write your own IP ( would be nice to not depend on a particular vendor's DSP architecture ) you'll learn a lot and save a lot of time later. If a vendor's IP doesn't make timing closure for your design a nightmare and you don't have the time to figure out all the details just let the IP handle it. I suspect that trying to optimize the FIR Compiler will be frustrating at best. I once had to come up with pipelined code to calculate signal strength in dB at a minimum sustained data rate. My approach to converting 32-bit integers to logarithms used a combination of Taylor Series expansion and look-up tables. I had a few versions**. One was straight VHDL so that I could compare Altera and Xilinx DSP tiles. One instantiated DSP48 tile primitives for a particular Xilinx device. These were fixed point designs. There's theory and there's practical experience.. they are usually not the same. ** I played with a number of approaches based on extremely limited specifications so there were quite a few versions. Every time I presented one the requirements changed and so did the complexity and resource requirements. I should mention that my intent for mentioning this experience is not to denigrate the information presented by others or to claim superiority in any way. When getting advice it's important to put that into context. A lot of times facts aren't necessarily relevant to solving a particular problem. If I haven't made this clear I've never had the experience that vendor IP optimizes resource usage... in fact quite the opposite. This is why in a commercial setting companies are willing to pay to develop their own IP. Sometimes FPGA silicon costs overshadow development costs.
  34. 1 point
    D@n

    Understand Resource Usage for FIR Compiler

    @aabbas02, So let's start at the top. An N-point FIR filter, such as this one, requires N multiplies and N-1 adds. Let's just count multiplies, though, for now. Let's now look at some optimizations you might apply. A complex multiply usually requires 4 multiplies. If you have complex input and taps, that's 4N real multiplies. There's a trick you can use to drop this to 3N multiplies. If the filter is real, and the incoming signal is complex, you can drop this to 2N multiplies by filtering each of the real and imaginary channels separately. If your filter taps are fixed, a good optimizer should be able to replace the zeros with a constant output, the ones with data pass through, etc. Implementing taps with two or three bits this way is possible. This only works, though, if the filter taps are fixed. Many of the common FIR filter developments generate linear phase filters. These filters are symmetric about a common point. With a little bit of work, you can use that to reduce the number of multiplies (for dynamic taps) down to (N-1)/2. There is a very significant class of filters called half band filters. In a half band filter, every other tap (other than the center tap) is zero. With a bit of work, you can then drop your usage down to (N-1)/4 multiplies. This optimization applies to Hilbert transforms as well. In the given list above, I've assumed that you need to operate at one sample in and one sample out per clock. In that case, there's no time or room for memory, since all of the stages need their data on every clock. I should also point out that I'm counting multiplies, not DSP slices. If your multiplies are smaller than 18x18 bits, you may be able to use a single DSP slice per multiply. If they are larger, you might find yourself using many DSP slices per multiply. It depends. Let's now consider the case where you have an N sample filter but you are only going to provide it with one sample of data every N+ samples (some number more than N). You could then multiplex this multiply and implement an N-point filter with 1 multiply and 2^(ceil(log_2(N)) RAM elements. If your filter was symmetric, you could process a filter roughly twice as long, or a sample rate roughly twice as fast while still using a single multiply. The half-band and hilbert tricks can apply to (roughly) double your filter size or your data rate again. In these cases, however, you can't spare the multiply at all, since it is used to implement every coefficient. That should give you both ends of the spectrum. Now, while I don't understand how Xilinx's FIR compiler works, I can say that if I were to make one I would allow a user to design crosses between these two ends of the spectrum. In the case of a such a cascaded filter, however, you may find it difficult to continue to implement the optimizations we've just discussed, simply because the cascaded structure becomes more difficult to deal with. (By cascade, I mean cascade in implementation and not filtering a stream and then filtering it again.) Looking at your two designs, none of the optimizations I've mentioned would apply. In the high speed filters we started out with, sure, you might manage to remove a multiply or two by multiplying by zero. On the other hand, you can't share multiplies across elements if you do so. For example, you mentioned [1 2 3 4 0 1 2 3 4]. Sure, it's got repeating coefficients, but this is actually a really obscure filtering structure, and not likely one that I (were I Xilinx) would pay to support. Try [ 1 2 3 4 0 4 3 2 1] instead and see if things change. Similarly for [ 1 0 0 0 0 0 0 0 1]. Yeah, it's symmetric about the mid-point, but in all of my symmetric filtering applications I'm assuming the mid point has a value of 2^N-1 or similar so that it doesn't need a multiply. Your choice of filters offers no such optimization, so who knows how it might work. Hope this helps to shed some light on things, Dan
  35. 1 point
    Xilinx encrypts the FIR compiler code so it's difficult to figure out what's going on with your experiments. You should read PG149 which is the IP product Guide for the FIR Compiler. It does indicate when coefficient symmetry optimizations might or won't be made. It also has a link to a Resource Utilization "Spreadsheet" for some FPGA family devices depending on your customized specifications. Oddly, that bit of guidance doesn't show any usages where significant numbers of BRAMS are ever used. The guide does have a section titled "Resource Considerations" that has some interesting information but not enough to answer you questions. I suspect that you'll have to understand actual resource utilization for your application empirically. (SOP from my experience when I really need optimal performance) It's always a good idea to be familiar with all device and IP documentation though, in my experience, the IP documentation rarely addresses all of my questions when I'm deep into a design and committed to IP that's not mine. Again, my guess is that when you specify a clock rate, sample rate, data and coefficient widths, etc resource utilization increases with increasing throughput. The same thing happens if you pipeline your code to maximize data rates. But I'm only guessing. I don't use the FIR Compiler so my views are an extrapolation of experience with other IP from all FPGA vendors that I have used.
  36. 1 point
    Amen. And that's a good view for all Xilinx IP. The structure for a simple digital filter is not that complex; you can implement them in HDL. I've done that. Xilinx IP is convenient but usually not the best approach when you have concerns about using up limited resources. The issue for using very fast resources like BRAM and DSP slices is that they are placed in particular locations throughout the device with limited routing resources for the signals between them or other logic. You can let Xilinx balance throughput, resource usage, logic placement, and throughput or you can try to do that yourself. Trying to use 100% of every BRAM or DSP resource in order to minimize the number of BRAM or DSP resources used is not easy. In my experience FPGA vendors are content to have their IP wizards make the customer think that he needs a larger and more expensive device. So that's the trade-off; let the vendors' tools do the work to save time or write your own IP and be responsible for taking care of all the little details that the IP hides form you. I've spent some time experimenting with DSP resources from various FPGA vendors. They are complicated with a lot of modes and depending on how you use them throughput can decline substantially from the ideal. Just read the user's guide and switching specs in the datasheet to get the idea. Generally the DSP slices are arranged to perform optimally with certain topologies but not all. Implementing designs that are iterative or have feedback can get ugly; especially when you try and fit that into a larger design using most of the devices resources. As a general rule, in my experience, use vendor IP and don't ask a lot of questions or design your own IP and be prepared to learn how to handle a lot of details that aren't obvious. Time verses convenience.
  37. 1 point
    Hi, [1 2 3 4 0 1 2 3 4] is not symmetric in a linear-phase sense. That would be e.g. [1 2 3 4 0 4 3 2 1]. You could exploit the shared coefficients manually, see e.g. Figure 3 for the general concept. But this case is so unusual that I doubt tools will take it into account. The tool does nothing magical. If performance matters more than design time, you'll always get better results for one specific problem with manual design. One performance metric is multiplier utilization (e.g. assuming you design for a 200 MHz clock, one DSP delivering 200M operations / second performs at 100 %. Reaching 50+ % is a realistic goal for a simple / single rate structure). For example, do I want to use an expensive BRAM at all, when I could use ring shift registers for delay line and coefficients. Then you only need a small controlling state machine around it that does a full circular shift for each sample, muxing in the new input sample every full cycle (the BRAM makes more sense when the filter serves many channels in parallel, then FF count becomes an issue).
  38. 1 point
    zygot

    HID protocol on Basys3

    From the Basys3 Reference Manual: The Auxiliary Function microcontroller hides the USB HID protocol from the FPGA and emulates an old-style PS/2 bus. The microcontroller behaves just like a PS/2 keyboard or mouse would. This means new designs can re-use existing PS/2 IP cores. I've written code for Digilent's PS/2 interface and have never configured the keyboard. Be aware that pressing some keys send more than one keycode byte. I don't have the Baysys3 board so I can't confirm that the PS/2 emulation is exactly as claimed. [edit] One of the Forums around here is the Project Vault where users can (should) post code and projects. You might find my submission 'Fast Inter-board Interface' to have some interesting stuff to get you started. Look in the HDL source for the Genesys.
  39. 1 point
    @askhunter It's not clear from your pictures what it is that you are referring to since the times scales are different. The purpose of post place and route timing simulation is to show the relative signal path delays in your implemented design as well as possible inferred latches or registers hidden by IP. The RTL simulation merely indicates if your behavioural logic is performing as you intended ( assuming that the testbench is well designed ). It is merely a simplified (no delays, no setup, no hold times) idealistic representation of simple logic. If the timing simulation doesn't give the same results as the RTL simulation then it's unlikely that your hardware will behave as you intend either. In the typical professional setting a lot of people are working on parts of a large design effort simultaneously. No one can afford to schedule a design effort where everything is done sequentially. In such a case timing simulations become a very important indicator of risks of projects not making deadlines. It simply isn't possible to create a lot of hardware, software, test protocols etc sequentially or even in parallel and 2 weeks before shipment throw all that stuff together for the first time and then figure out why things don't work. So we have a lot of ways to do simulation that offer increasingly more accurate, and hence reliable, views of how our design ( after it's been optimized, re-worked and reduced to LUT equations ) might actually work in a system before having to run it in hardware. When there are 10 engineers doing parts of 1 large FPGA design and all of those parts are integrated it's not uncommon for some of them to start failing due to limited routing resources and clock line limitations.
  40. 1 point
    My my... I'm not sure what LFSR you are using but mine have a shift enable input so that I can use any clock that's available but update the LFSR output at almost any update rate needed. You can create a counter to control the shift enable so that it's synchronous with whatever logic is running at 8 KHz and needs data at that rate. It's typical in a design to have lots of parts of the logic changing states at lots of different frequencies. You don't want separate clock domains for all of those rates even if those clocks are derived and phase coherent. Sometimes, for high speed applications you do need a higher, phase coherent clock; like in video where there might be a reference clock and a higher but synchronous pixel clock. In general it's best to have the minimum number of clock domains in a design that you can get away with. FPGA devices don't have long analog or combinatorial delay lines on the order of microseconds or milliseconds. The Series 7 devices do allow adding very small delays to signals coming into FPGA pins via the IDELAY2 primitive. If your device has outputs on pins on an HP bank you can also add similar small delays to output signals using the ODELAY2 primitive. Synchronous delays lines using counters and enables as I mentioned before are the normal way to achieve teh equivalent of the analog delay line that used to be part of some digital logic long long ago.
  41. 1 point
    PG_R

    RGMII (ZCU102-Zybo Z7)

    Thanks Jon and Zygot, I appreciate the explanation. Regards
  42. 1 point
    Hi @Ciprian, After some time, I managed to solve this issue. In fact, It was a problem in the hardware and device tree configuration. I discovered it when probing with another example project named Zybo-hdmi-out (https://github.com/Digilent/Zybo-hdmi-out). However, as this project is for a previous version of Vivado, I tested with Vivado 2017.4. Surprisingly, it worked fine but with another pixel-format in the device tree. The Zybo-base-linux project which I used, has a pixel format in DRM device tree configuration set to "rgb888", however, for the Zybo-hdmi-out, it displayed correctly with pixel-format "xrgb8888". If I use other pixel formats, no output is displayed in both cases. Going deep into the configuration of both projects, I discovered that there are some differences in the VDMA and Subset converter settings, which changed to the configuration in Zybo-hdmi-out, solves the problem of colors and rendering, considering also a pixel format in the devicetree equal to "xrgb8888". I attached the images of both configurations. In addition to this, I managed to update the design for the Vivado version I use (2018.2) with no more differences that a change in the AXI memory interconnect replaced by the AXI Smart connect in the newer version, which is added automatically when using Vivado autoconnect tool for the VDMA block. Hope this information could help others which run in the same issue. Thanks for your help. Luighi Vitón
  43. 1 point
    if you can bring it up once in Vivado HW manager (maybe with the help of an external +5 V supply), you might be able to erase the flash. If not, you may be able to prevent loading the bitstream from flash e.g. by pulling INIT_B low (R32 is on the bottom of the board, its label slightly below the CE mark). See https://www.xilinx.com/support/documentation/user_guides/ug470_7Series_Config.pdf "INIT_B can externally be held Low during power-up to stall the power-on configuration sequence at the end of the initialization process. When a High is detected at the INIT_B input after the initialization process, the FPGA proceeds with the remainder of the configuration sequence dictated by the M[2:0] pin settings.""
  44. 1 point
    zygot

    A UART Based Debugger Tool

    Here's a utility for debugging and testing your code in hardware and uses any IO pin to send an ASCII representation of any signal through a hardware UART interface. If you don't have a UART on you FPGA board there are TTL USB UART breakout boards and cables that allow any spare IO pin to become a UART interface. This code is functionally the same as one recently released by Hamster but developed independently for the Fast Data Interface project. I recommend comparing the different coding styles. I decided to release this as a separate project as there are likely more people interested in this one that the other. This project contains test bench code. UartDebuggerR3.zip
  45. 1 point
    well, by default your signal is between 0 V and Vref. The opamp circuit has a gain of 2 (range 0.. 2 VRef) but subtracts a constant VRef (range now -VRef..Vref). It'll just shift the waveform on the scope, and double its AC magnitude.
  46. 1 point
    JColvin

    OpenLogger ADC resolution + exporting

    Hi @sgrobler, Our design engineer who designed the OpenLogger did an end-to-end analysis to determine the end number of bits of the OpenLogger. This is what they ended up doing in a summarized fashion: <start> They sampled 3 AAA battery inputs to the SD card at 250 kS/s and set the OpenLogger sample rate to 125 kS/s and then took 4096 samples; they then took the raw data stored on the SD card and converted it to a CSV file and exported the data for processing. Their Agilent scope read the battery pack at 4.61538 V and as they later found from FFT results the OpenLogger read 4.616605445 V, leading to a 0.001226445 V or ~1.2mV difference, which is presuming the Agilent is perfect (which it is not), but it was nice to see that the values worked out so closely. They calculated the RMS value of the full 4096 samples in both the time domain and using Parseval's theorem in the frequency domain as well, both of which came up with the same RMS value of 4616.606689 mV, which is very close to the DC battery voltage of 4616 mV. Because RMS is the same as DC voltage, this gives the previously mentioned DC value of 4.616605445 V. They can then remove the DC component from the total RMS value to find the remaining energy (the total noise, including analog, sampling, and quantization noise) of the OpenLogger from end-to-end. With the input range of +/- 10V input, this produces an RMS noise of 1.5mV. At the ADC input, there is a 3V reference and the analog input front end attenuates the input by a factor of 0.1392, so the 1.5mV noise on the OpenLogger is 0.2088mV at the ADC. With the 16 bits (65536 LSBs) over 3V, 0.0002088V translates to ~4.56 LSBs of noise. The ENOB is a power of 2, so log(4.56)/log(2) results in 2.189 bits, giving us a final ENOB of 16 - 2.189 = ~13.8 bits. Note though that this ENOB of 13.8 bits is based on system noise and not dynamic range, so for non-DC inputs (which will likely be measured at some point) the end number of bits is not easily determined. The datasheet for the ADC used in the OpenLogger (link) shows that the ADC itself gives an ENOB of about 14.5 bits at DC voltage (so the 13.8 bits is within that range), but at high frequencies, this of course rolls off to lower ENOB at higher frequency inputs. Thus, they cannot fully predict what the compound ENOB would be over the dynamic range, but they suspect it all mixes together and is 1 or 1.5 bits lower than the ADC ENOB response. </end> Let me know if you have questions or would like to see the non-abbreviated version of his analysis. Thanks, JColvin
  47. 1 point
    bogdan.deac

    OpenCV and Pcam5-c

    Hi @Esti.A, If you clone the repo you obtain the "source code" for the platform and you have to generate the platform by yourself. This is a time consuming and complicated task and is not recommended if you do not understand SDSoC very well. I advise you to download the last SDSoC platform release from here. You will obtain a zip file that contains the SDSoC platform already build. After that, you can follow these steps to create your first project.
  48. 1 point
    D@n

    MMCM dynamic clocking

    @rangaraj, It's a shame you only know VHDL coding, since the Verilog code I posted above would give you the ability to generate an arbitrary clock--unencumbered by the constraints of the PLL, with frequency resolution in the milli-Hertz range (100MHz/2^32). Perhaps you want to take another look at it? Sure, it would have some phase noise, but ... that could be beaten now if necessary by using an OSERDESE2 component. I've got an example design I'm working on that does just that and should knock the phase noise down to 1.25ns or better. Dan
  49. 1 point
    hamster

    MMCM dynamic clocking

    I feel a bit bad about posting a minor novel here, but here is an example of going from "5 cycles on, 5 off" (i.e. divide by 10) to "10 on, 10 off" (device by 20). The VCO is initially to 800 MHz with CLK0 being VCO divide by 8.... so after config you get 100MHz. Push the button and you get 800/20 = 40MHz, release the button and you get 80MHz. It is all really hairy in practice! EDIT: Through experimentation I just found that you don't need to reset the MMCM if you are not changing the VCO frequency. So the 'rst' signal in the code below isn't needed (and LOCKED will stay asserted). -------------------------------------------------------------------------------------------------------- -- Playing with the MMCM DRP ports. -- see https://www.xilinx.com/support/documentation/application_notes/xapp888_7Series_DynamicRecon.pdf -- for the Dynamic Reconviguration Port addresses -------------------------------------------------------------------------------------------------------- library IEEE; use IEEE.STD_LOGIC_1164.ALL; use IEEE.NUMERIC_STD.ALL; library UNISIM; use UNISIM.VComponents.all; entity mmcm_reset is Port ( clk_100 : in STD_LOGIC; btn_raw : in STD_LOGIC; led : out STD_LOGIC_VECTOR (15 downto 0)); end mmcm_reset; architecture Behavioral of mmcm_reset is signal btn_meta : std_logic := '0'; signal btn : std_logic := '0'; signal speed_select : std_logic := '0'; signal counter : unsigned(26 downto 0) := (others => '0'); signal debounce : unsigned(15 downto 0) := (others => '0'); signal clk_switched : std_logic := '0'; signal clk_fb : std_logic := '0'; type t_state is (state_idle_fast, state_go_slow_1, state_go_slow_2, state_go_slow_3, state_idle_slow, state_go_fast_1, state_go_fast_2, state_go_fast_3); signal state : t_state := state_idle_fast; ----------------------------------------------------------------------------- --- This is the CLKOUT0 ClkReg1 address - the only register to be played with ----------------------------------------------------------------------------- signal daddr : std_logic_vector(6 downto 0) := "0001000"; signal do : std_logic_vector(15 downto 0) := (others => '0'); signal drdy : std_logic := '0'; signal den : std_logic := '0'; signal di : std_logic_vector(15 downto 0) := (others => '0'); signal dwe : std_logic := '0'; signal rst : std_logic := '0'; begin MMCME2_ADV_inst : MMCME2_ADV generic map ( BANDWIDTH => "OPTIMIZED", -- Jitter programming (OPTIMIZED, HIGH, LOW) CLKFBOUT_MULT_F => 8.0, -- Multiply value for all CLKOUT (2.000-64.000). CLKFBOUT_PHASE => 0.0, -- Phase offset in degrees of CLKFB (-360.000-360.000). -- CLKIN_PERIOD: Input clock period in ns to ps resolution (i.e. 33.333 is 30 MHz). CLKIN1_PERIOD => 10.0, CLKIN2_PERIOD => 0.0, -- CLKOUT0_DIVIDE - CLKOUT6_DIVIDE: Divide amount for CLKOUT (1-128) CLKOUT1_DIVIDE => 1, CLKOUT2_DIVIDE => 1, CLKOUT3_DIVIDE => 1, CLKOUT4_DIVIDE => 1, CLKOUT5_DIVIDE => 1, CLKOUT6_DIVIDE => 1, CLKOUT0_DIVIDE_F => 8.0, -- Divide amount for CLKOUT0 (1.000-128.000). -- CLKOUT0_DUTY_CYCLE - CLKOUT6_DUTY_CYCLE: Duty cycle for CLKOUT outputs (0.01-0.99). CLKOUT0_DUTY_CYCLE => 0.5, CLKOUT1_DUTY_CYCLE => 0.5, CLKOUT2_DUTY_CYCLE => 0.5, CLKOUT3_DUTY_CYCLE => 0.5, CLKOUT4_DUTY_CYCLE => 0.5, CLKOUT5_DUTY_CYCLE => 0.5, CLKOUT6_DUTY_CYCLE => 0.5, -- CLKOUT0_PHASE - CLKOUT6_PHASE: Phase offset for CLKOUT outputs (-360.000-360.000). CLKOUT0_PHASE => 0.0, CLKOUT1_PHASE => 0.0, CLKOUT2_PHASE => 0.0, CLKOUT3_PHASE => 0.0, CLKOUT4_PHASE => 0.0, CLKOUT5_PHASE => 0.0, CLKOUT6_PHASE => 0.0, CLKOUT4_CASCADE => FALSE, -- Cascade CLKOUT4 counter with CLKOUT6 (FALSE, TRUE) COMPENSATION => "ZHOLD", -- ZHOLD, BUF_IN, EXTERNAL, INTERNAL DIVCLK_DIVIDE => 1, -- Master division value (1-106) -- REF_JITTER: Reference input jitter in UI (0.000-0.999). REF_JITTER1 => 0.0, REF_JITTER2 => 0.0, STARTUP_WAIT => FALSE, -- Delays DONE until MMCM is locked (FALSE, TRUE) -- Spread Spectrum: Spread Spectrum Attributes SS_EN => "FALSE", -- Enables spread spectrum (FALSE, TRUE) SS_MODE => "CENTER_HIGH", -- CENTER_HIGH, CENTER_LOW, DOWN_HIGH, DOWN_LOW SS_MOD_PERIOD => 10000, -- Spread spectrum modulation period (ns) (VALUES) -- USE_FINE_PS: Fine phase shift enable (TRUE/FALSE) CLKFBOUT_USE_FINE_PS => FALSE, CLKOUT0_USE_FINE_PS => FALSE, CLKOUT1_USE_FINE_PS => FALSE, CLKOUT2_USE_FINE_PS => FALSE, CLKOUT3_USE_FINE_PS => FALSE, CLKOUT4_USE_FINE_PS => FALSE, CLKOUT5_USE_FINE_PS => FALSE, CLKOUT6_USE_FINE_PS => FALSE ) port map ( -- Clock Outputs: 1-bit (each) output: User configurable clock outputs CLKOUT0 => clk_switched, CLKOUT0B => open, CLKOUT1 => open, CLKOUT1B => open, CLKOUT2 => open, CLKOUT2B => open, CLKOUT3 => open, CLKOUT3B => open, CLKOUT4 => open, CLKOUT5 => open, CLKOUT6 => open, -- Dynamic Phase Shift Ports: 1-bit (each) output: Ports used for dynamic phase shifting of the outputs PSDONE => open, -- Feedback Clocks: 1-bit (each) output: Clock feedback ports CLKFBOUT => clk_fb, CLKFBOUTB => open, -- Status Ports: 1-bit (each) output: MMCM status ports CLKFBSTOPPED => open, CLKINSTOPPED => open, LOCKED => open, -- Clock Inputs: 1-bit (each) input: Clock inputs CLKIN1 => clk_100, CLKIN2 => '0', -- Control Ports: 1-bit (each) input: MMCM control ports CLKINSEL => '1', PWRDWN => '0', -- 1-bit input: Power-down RST => rst, -- 1-bit input: Reset -- DRP Ports: 16-bit (each) output: Dynamic reconfiguration ports DCLK => clk_100, -- 1-bit input: DRP clock DO => DO, -- 16-bit output: DRP data DRDY => DRDY, -- 1-bit output: DRP ready -- DRP Ports: 7-bit (each) input: Dynamic reconfiguration ports DADDR => DADDR, -- 7-bit input: DRP address DEN => DEN, -- 1-bit input: DRP enable DI => DI, -- 16-bit input: DRP data DWE => DWE, -- 1-bit input: DRP write enable -- Dynamic Phase Shift Ports: 1-bit (each) input: Ports used for dynamic phase shifting of the outputs PSCLK => '0', PSEN => '0', PSINCDEC => '0', -- Feedback Clocks: 1-bit (each) input: Clock feedback ports CLKFBIN => clk_fb ); speed_change_fsm: process(clk_100) begin if rising_edge(clk_100) then di <= (others => '0'); dwe <= '0'; den <= '0'; case state is when state_idle_fast => if speed_select = '1'then state <= state_go_slow_1; -- High 10 Low 10 di <= "0001" & "001010" & "001010"; dwe <= '1'; den <= '1'; end if; when state_go_slow_1 => if drdy = '1' then state <= state_go_slow_2; end if; when state_go_slow_2 => rst <= '1'; state <= state_go_slow_3; when state_go_slow_3 => rst <= '0'; state <= state_idle_slow; when state_idle_slow => di <= (others => '0'); if speed_select = '0' and drdy = '0' then state <= state_go_fast_1; -- High 5 Low 5 di <= "0001" & "000101" & "000101"; dwe <= '1'; den <= '1'; end if; when state_go_fast_1 => if drdy = '1' then state <= state_go_fast_2; end if; when state_go_fast_2 => rst <= '1'; state <= state_go_fast_3; when state_go_fast_3 => rst <= '0'; state <= state_idle_fast; end case; end if; end process; dbounce_proc: process(clk_100) begin if rising_edge(clk_100) then if speed_select = btn then debounce <= (others => '0'); elsif debounce(debounce'high) = '1' then speed_select <= not speed_select; else debounce <= debounce + 1; end if; -- Syncronise the button btn <= btn_meta; btn_meta <= btn_raw; end if; end process; show_speed_proc: process(clk_switched) begin if rising_edge(clk_switched) then counter <= counter + 1; led(7 downto 0) <= std_logic_vector(counter(counter'high downto counter'high-7)); end if; end process; led(15) <= speed_select; end Behavioral;
  50. 1 point
    We have a few demo projects using the Basys2 in RF communications that we haven't gotten had time to document yet. Although it's different, there are a lot of similiarities that you may be able to use. Take a look at the attached Zip File. The demo project is using the Pmod RF1 to communicate keyboard presses to another Basys2 that is controlling a speaker. PmodRF1 Basys demo project by digilentinc, on Flickr Here is a link to get the files for the project: https://www.dropbox.com/s/83winp3zyio7or7/Basys2AudioRfPmodHID.zip?dl=0 Hope that helps!