SmashedTransistors reacted to zygot in Digilent CMOD A7 Disconnects and/or does not Program
Hopefully. when Digilent spins the next CMOD they will provide a better arrangement for using the configuration circuitry with an external power supply. Even as a standalone module configured from flash 1 Vcc pin and one GND pin is less than ideal, and likely problematic if the FPGA is driving or receiving a number of single-ended signals.
SmashedTransistors reacted to Christian Klein in Digilent CMOD A7 Disconnects and/or does not Program
I've seen similar issues in the forum, but no real solution.
I have 5 CMOD A7 boards and only 2 of them behave properly.
The other 3 do not accept a program and sometimes disconnect.
I get the following errors;
ERROR: [Labtools 27-3165] End of startup status: LOW
ERROR: [Common 17-39] 'program_hw_devices' failed due to earlier errors.
I have tried:
- many different USB cables
- Different ports on the PC
- Powered USB hubs
- Vivado Lab 2017.2 and 2018.1
- 2 different bitstream files
The boards that work, always works. No matter the cable or USB ports.
Anything else I should try?
SmashedTransistors reacted to zygot in pipeline granularity
First off you are clearly working with a board that I am unfamiliar with. Regardless, I have no way to provide a decently helpful answer to your question. I have been in the situation where I had to make architectural changes to a design ( including a completely approach ) to meet timing; especially one that needs to be incorporated into a larger overall design That uses most of the logic and memory resources and runs at a high clock rate and has multiple clock domains.. A big factor is how much of the device resources your design will use, how the logic interconnect works, how the clock routing works for your device etc.
What I can say with some confidence is that you should experiment with a slower clock and scaled back design. Simulate it. Get a bitstream and look over the timing report to get an idea of path delays. Figure out how may LUTs are needed for the basic design elements. Then start scaling things up. Do timing simulation for each step as you go. It is not uncommon when working on an ambitious project to get to a conclusion faster by starting off with a few simpler preliminary design projects that help you get answers to unknowns. In engineering the shortest path (in time) between two points (start and finish) is not necessarily a straight line.
It's not been my experience that you can 'fix' a design that doesn't run properly at a certain clock rate by adding a few registers here or there. Pipelining strategies, in my experience are more of an architectural holistic effort that starts with the basic elements and continues as they are grouped into larger entities.
SmashedTransistors reacted to xc6lx45 in pipeline granularity
>>thus i have a tendency to over-pipeline my design
read the warnings. If a DSP48 has pipeline registers it cannot utilize, it will complain. Similar for BRAM - it needs to absorb some levels of registers to reach nominal performance. I'd check the timing report.
At 100 MHz you are are maybe at 25..30 % of the nominal DSP performance of an Artix, but I wouldn't aim much higher without good reason (200 MHz may still be realistic but the task gets much harder).
A typical number I'd expect could be four cycles for a multiplication in a loop (e.g. IIR).
Try to predict resource usage - if FFs are abundant, I'd make that "4" an "8" to leave some margin for register rebalancing: An "optimal" design will become problematic in P&R when utilization goes up (but obviously, FF count is only a small fraction of BRAM bits so I wouldn't overdo it)
SmashedTransistors got a reaction from jpeyron in BASYS3 and Axoloti
I'll take one step after another and the forums are quite a good source of knowledge.
So far, I plan to start with very basic schemes in order to understand how Vivado works.
Then I will work on communicating with the Axoloti through SPI.
SmashedTransistors reacted to OvidiuD in BASYS3 and Axoloti
I'm very glad you're looking forward to your project and I have to admit it actually seems very interesting!
Don't hesitate to ask questions on our forum whenever you have a question, I'm sure someone will always do their best to help you and eventually succeed by working with you.
Best regards, Ovidiu
SmashedTransistors reacted to xc6lx45 in FIR compiler 7.2 stopband
... and how about a simple impulse response test (feed a stream of zeroes with an occasional 1 and check that the filter coefficients appear at the output).
Just wondering, isn't there a "ready / valid" interface also at the output if you expand the port with "+"?
SmashedTransistors reacted to xc6lx45 in Increasing the clock frequency to 260 MHz
reading between the lines of your post, you're just "stepping up" one level in FPGA design. I don't do long answers but here's my pick on the "important stuff"
- Before, take one step back from the timing report and fix asynchronous inputs and outputs (e.g. LEDs and switches). Throw in a bunch of extra registers, or even "false-path" them. The problem (assuming this "beginner mistake") is that the design tries to sample them at the high clock rate. Which creates a near-impossible problem. Don't move further before this is understood, fixed and verified.
- speaking of "verified": Read the detailed timing analysis and understand it. It'll take a few working hours to make sense of it but this is where a large part of "serious" design work happens.
- Once the obvious problems are fixed, I need to understand what is the so-called "critical path" in the design and improve it. For a feedforward-style design (no feedback loops) this can be systematically done by inserting delay registers. The output is generated e.g. one clock cycle later but the design is able to run at a higher clock so overall performance improves.
- Don't worry about floorplanning yet (if ever) - this comes in when the "automatic" intelligence of the tools fails. But, they are very good.
- Do not optimize on a P&R result that fails timing catastrophically (as in your example - there are almost 2000 paths that fail). It can lead into a "rabbit's hole" where you optimize non-critical paths (which is usually a bad idea for long-term maintenance)
- You may adjust your coding style based on the observations, e.g. throw in extra registers where they will "probably" make sense (even if those paths don't show up in the timing analysis, the extra registers allow the tools to essentially disregard them in optimization to focus on what is important)
- There are a few tricks like forcing redundant registers to remain separate. Example, I have a dozen identical blocks that run on a common, fast 32-bit system clock and are critical to timing. Step 1, I sample the clock into a 32-bit register at each block's input to relax timing, and step 2) I declare these register as DONT_TOUCH because the tools would otherwise notice they are logically equivalent and try to use one shared instance. This as an example.
- For BRAMs and DSP blocks, check the documentation where extra registers are needed (that get absorbed into the BRAM or DSP using a dedicated hardware register). This is the only way to reach the device's specified memory or DSP performance.
- Read the warnings. Many relate to timing, e.g. when the design forces a BRAM or DSP to bypass a hardware register.
- Finally, 260 MHz on Artix is already much harder than 130 MHz (very generally speaking). Usually feasible but you need to pay attention to what you're doing and design for it (e.g. a Microblaze with the wrong settings will most likely not make it through timing).
- You might also have a look at the options ("strategy") but don't expect any miracles on a bad design.
Ooops, this almost qualifies as "long" answer ...
SmashedTransistors reacted to [email protected] in Implementation SPI basys3
So, to handle the DRC violation, stick these two lines into your XDC file:
set_property CFGBVS VCCO [current_design] set_property CONFIG_VOLTAGE 3.3 [current_design] At least ... those are the lines that I placed into my own Basys3 XDC file.
As for your logic, ...
I would *highly* recommend that you do not transition your logic on either the SCK or the CS pins. Use a system wide clock instead. Transitioning on SCK and CS is going to ... set you up for problems when you wish to do anything else with your chip. (That's part of what Vivado is complaining about in those errors--these clocks (SCK and CS) have no timing associated with them--but if you stop using them as clocks and just use them as logic, the errors will go away.) To synchronize the external pins to your system wide clock, clock all of your inputs into flip flops twice before using them. Otherwise, you'll struggle with unpredictable things happening within your design. (See metastability ...) After clocking your inputs twice, look for a SCK line that is high with the previous SCK low, and for CS to be low on both clocks. This logic test, at the speed of your board clock (100MHz) should be sufficient to detect a rising clock edge. On the 8th rising clock edge, a byte has been given to you. Send that byte to the rest of your design, together with a logic "strobe" (true for only one clock) telling the rest of your logic that you have received such a byte. (You might wish to pass another line indicating if this was the first byte received since CS went low.) The approach outlined above, when applied with a 100MHz clock, should still have no problems dealing with SPI clocks 25MHz or above--even though all of your logic is running at 100MHz. You can see a discussion of this, along with other common Diligilent forum requests, on the ZipCPU blog.
SmashedTransistors reacted to [email protected] in Beginner DSP Projects
You should thank @zygot for such sensible advice: build it in Matlab or Octave, get it working, then port to hardware. Let me add another step in the middle, though, that I'm sure @zygot would agree with: Octave, then simulation, then hardware. After that, the sky's the limit! Well, you might want to study a particular application of interest as well. DSP is such a varied field, and so many things from so many fields are called DSP that ... well, it's hard for me to pontificate from here.
Still, if you are interested in some examples, feel free to read some of ZipCPU's DSP articles on line. (The ZipCPU is the name of a CPU/processor I've built, and now blog about under the name ZipCPU.) They tend to hit on many topics surrounding DSP theory and implementation. Indeed, I recently posted a rather cool simulation demo of a spectrogram to github. There's a nice screenshot avaialble there too in order to give you an idea of how far you might get with simulation.
Perhaps these ideas might stir up in your mind a project you'd like to try?
SmashedTransistors reacted to xc6lx45 in Beginner DSP Projects
Well, if you want my opinion, DSP on FPGA is a fairly specialized niche application. It's a long walk to come up with a project that really fits into that niche, justifying an FPGA (rather pair a $0.50 FPGA for programmable IO with one or more high-end DSPs for the number crunching if someone claims "we need an FPGA").
For studying, it can be "interesting" in a sense that you get to know quite a few dragons on a first-name basis. But then, is it productive to spend weeks on fixed point math when everybody else uses floats on a DSP / CPU when "time-to-market" is #1 priority. Maybe not. DSP is more fun in Matlab (Octave). And there is no point in FPGA for performance unless you have exhausted the options at algorithm level (again, exceptions e.g. well-defined brute-force filtering problems)
A lot of the online material is "sponsored" by companies that sell FPGA silicon by the square meter (Yessir. We have Floats!). But this is largely for the desperate and ill-informed (of course, there are viable use cases - say high volume basestations or automotive with need for EOL in a decade or two. As said, a niche application).
When you take the direct route, you'll run into a question like, "how on earth could I implement an audio mixing console when the FPGA has only 96 multipliers". Challenge me or anybody who has read some books and you'll find it can be done on a single multiplier (say, 100 MHz at 96 kHz is 86 multiplications per sample for 12 channels. It's just an example. In reality I'd use a few with "maintainability" of the code my major concern). The point is, the skill ceiling is fairly high but so is the design effort. It only makes sense if I plan to sell at least a hundred gazillon devices.
On the other hand, if you separate DSP and FPGA, you'll find that a lot of the Matlab (Octave) magic maps 1:1 to real life on any modern CPU platform by importing e.g. the "Eigen" library into my C code.