Jump to content

D@n

Members
  • Posts

    2,246
  • Joined

  • Last visited

Everything posted by D@n

  1. I've actually seen complaints of this often. I'm not sure I've seen the "right" solution yet, although I think I recall individuals having success by forcing Vivado to repackage their IP from scratch.
  2. You can convert PDM to PCM before the FFT or after. That's an engineering choice. Both will work. Often the PCM gives you lower logic, and more headroom for the FFT--but that may require more engineering than what you are up to at present. Try this: Simulate and verify every component individually. Dump the outputs of the component to a text file that you can read and verify with Matlab. Make sure you get what you want along the way. Just as a heads up, frequency estimation is often more difficult than just taking an FFT of an incoming waveform. I've done it. The FFT approach was the first one I tried. Depending on the details of the sample rate and the FFT size, you may have too many bins or not enough. You are also very likely going to be distracted by harmonics. If you've never "seen" the output of an FFT given real life signals, you should do that first. It's very instructive. The "best" way to do that is often via a scrolling raster, but ... video requires memory, often external memory, and ... that's likely to be a bit beyond a beginner's first project. For now, perhaps you should consider capturing data and running it through the algorithm you'd like on your PC (matlab? Octave?) to see what real-life data looks like. Perhaps you might even wish to read audio from your computer's sound card, to get an understanding for how rich real data sets can be. Unfortunately, that 7Segment display isn't going to give you enough feedback to know how to debug an algorithm like this, applied to so rich a data set as the real world. You will need software help along the way. Dan
  3. @House, Have you considered how much logic a multiple requires? How about a divide? There are no shortcuts to division, and the low-level hardware guys haven't provided many short cuts for multiplication. Both are very computationally intensive components. They'll both use a lot of LUTs, and likely keep you from meeting timing. The solution to the expensive multiplies is to use a DSP element. Those, however, have requirements to them. You have to use them "right", or ... they'll get turned into LUTs and you won't meet timing (or area) again. My general rule of thumb when building multiplies using DSP elements is to take no more than one clock per multiply. always @(posedge clk) if (CE) product <= A * B; If you use any more logic than that in an always block, you're likely to break Vivado's algorithm that packs multiplication logic into DSPs. When not using DSP elements, I take (roughly) one clock per bit. You can read about my algorithm here. While the tricks I used to make the multiply use less logic work for divides, I know of no other tricks to make divides work. You're roughly stuck with one clock per bit, and no hardware acceleration. Other things can be made cheaper, but those don't appear to be your problem at present. Dan
  4. Yes, I remember that day well. I remember taking a computer architecture course in graduate school and much of what we studied was the power PC. My how things change.
  5. You are not going to get Vivado on an RPi. There's no money in it. You are more likely to get some other design tool, such as an open source Yosys+NetPNR or some such, to run on an RPi. I know folks who have run the two on RPi to date, just not for Xilinx chips (yet). I know because ... I've had to debug some of it. (Welcome to open source, you get what you work for.) While I doubt it yet works for Xilinx, you are more than welcome to roll up your sleeves and get to work at it. Dan
  6. I've seen crazier things before--like the AF sergeant who instructed me to install a piece of PC/Windows software on a Sparc machine. Much like I expected, it didn't work. In this case, I'll wait for the forum request for help to arrive from the individual trying to install Vivado on an RPi before making such a declaration. 😄 Dan
  7. D@n

    Loading Image Onto FPGA

    @digility, When I first started working with FPGAs, my first project was a serial port to Wishbone bus translator. I call it a "debugging bus". Using this abstraction, I could then create and send bus commands to the FPGA from a C++ program. The interface involved has been so robust, that I've now used it on (just about) every project I've worked on since. Were I to attempt to load an image onto an FPGA, this would be how I would do it. A couple of C++ programs would adjust the image as necessary from whatever format it came in on, to whatever format I'd need, and off I'd go. @zygot does present some good architectural questions to consider. I've (personally) stuck with 7-bit ASCII. Over time, I've started using the 8th bit (the MSB) to multiplex two streams together (a CPU console, getting mux'd with the debugging bus), so full 8-bit ASCII is quite possible. The maximum data rate achievable then depends upon the protocol. One protocol I built and demonstrated in my blog takes 11 hexadecimal characters (8 for a 32b word, 1 control character, a carriage return and a newline) to send 32bits of information. (29% efficiency.) I'm more commonly using 6 characters for this purpose (53%). More recently, I've been testing an approach that will send 32bits of information in only 5 characters (64% efficiency), while still maintaining the ability to use the serial port as a CPU console. On one project, I've even gone so far as to run this protocol over a network--so the FPGA can go underwater. In all cases, I've continued to maintain (nearly) the same C++ interface description as I started with. For the record, while the FTDI chip declares it can send data at 12Mbaud, I've only been successful at 4MBaud and only reliable at 1 MBaud across multiple circuit boards. I wouldn't set my expectations any higher than that. (The GbE interface, however, goes *REALLY* fast ... but that's another story, and takes a lot of work to get going--since you'd need to typically support ARP and ICMP protocols, before even touching the UDP/IP required for something like this. That's just a lot of work for a student project.) The issue I would worry about, however, is ... where are you going to store this image? The Basys3 device doesn't have a lot of memory, and you are likely to need that memory for other things as well. I suppose you might just feed the image directly into your algorithm. (I've done that too ...) But either way, you should think through what you want to do and any consequences associated with it. Is it possible that the serial interface will go too fast? Do you need handshaking? All things to consider. Dan
  8. Yes, it is possible to store bitstreams on an RPi and then configure an FPGA from there via JTAG. My 10Gb Ethernet switch project does exactly that. We use the openFPGALoader tool to do the loading. openFPGALoader -c digilent --ftdi-channel 0 toplevel.bit There is no reason or need to install Vivado on a Raspberry PI. Dan
  9. @reddish, Out of curiosity ... the CMod A7 has two QSPI SCK pins. I've enjoyed this feature in the past, because it allows me to run the flash at 2x the speed because one pin as a DDR capability, and the other is used for configuration only. My question therefore is, are you using both pins in your design at all? If not, are you using the STARTUPE2 primitive in your design? If so, are you overriding the DONE pin at all? Not sure if any of these might help, but they are at least worth looking into. Dan
  10. Perhaps this might help: The conversation follows this post, and has some recommendations and resources attached as well. Dan
  11. @zygot One of the things that made the MAXIM chip ideal for GNSS handling is that it was primarily an RF to baseband samples chip. It did everything. If you wanted to change how you did your DSP, that was all on you. Adding a new satellite? As long as it was in the same frequency range and used (roughly) the same bandwidth as GPS, the chip would work nicely. Yes, I can imagine this same chip working two decades later. Why? Because FCC frequency allocations only change slowly over time, and I doubt the DOD would want to do anything to cause the (currently working) GPS receiver base to suddenly fail with a new upgrade. Dan
  12. @reddish, My first FPGA project was to build a GPS correlator given an FPGA and MAX2769. It's doable, even as a first project. As with anything, the trick is to break the project down into byte sized chunks, and to remember to plan on the need for debugging along the way. Dan
  13. Welcome to hardware design. You now get to decide which memory to use. In general, there are four places you can store results: Off chip, block RAM, distributed RAM, and in Flip Flops (FFs). The first of these three are typically called "memory", whereas storage isn't really "random access", so the term "memory" is rarely used to describe it. Off chip memory comes in many varieties as well, such as SRAM, SDRAM, DDR SDRAM, DDR[2-5] SDRAM, etc. Of these options, FFs seem to be the most appropriate storage location for a counter. Such a counter might look very much like this one, used to measure one clock rate in terms of another. It's quite similar to what you wish to do. Dan
  14. The counting part is the easy part. Beware of clock domain crossings and button bouncing. Moving the data to an external computer will take some work. How much work depends on how you wish to to it. Pushing a counter through a serial port isn't all that hard to do. Dan
  15. If the XDC file is correct, the only thing left missing should be the instantiation of an OBUFDS. Here's an HDMI example instantiating such an OBUFDS. (Yes, it's Verilog ... where I don't have to worry about which library I'm using for 10-bit integers.) Dan
  16. @engrpetero, Let's see if I can answer this. I've now been around this bush a couple of times, and learned something new each time around, but here's what seems to work for me: I formally verify all bus interaction. It's just too easy to mess that up, and end up with a design that just hangs, and at the AXI-Lite level the formal proof is pretty easy. Where possible, I'll formally verify the rest of the logic within each module as well. However, my formal proofs tend to end at the leaf level. They don't aggregate up very well into larger designs. They're useful, therefore, just not sufficient. Going up a level is a bit more of a challenge, but I usually end up with a test structure similar to the picture below: Components include: 1) A test script. A good test will often have many test scripts to choose from, and possibly even a perl (or other) script to test either one or all of the scripts. 2) Some kind of bus functional model (BFM). Xilinx provides their AXI VIP for this purpose. When using a BFM, your Verilog test script (#1) might start to read like "software"--read this register, write that register, do this computation, etc. It all becomes quite task oriented. 3) A model of any external (typically off-FPGA) device you wish to test your design against. I've been known to add "on-FPGA" models as well, just for testing--"device" logic that has no other purpose but to help you instrument and verify your design's interactions. For example, I've used a GPIO module to set an "error" condition flag that I can then see in a VCD trace, or a "trace" flag that I can use to turn on and off tracing in large designs. 4) Many projects have required a RAM model of some type for my IP to interact with. Finally, the last component of such a test structure is 5) the design under test. This structure seems to work nicely across many of the designs I've done. An alternative structure I've used in the past is one where the ZipCPU acts as a CPU in this same model. In this structure, the ZipCPU software acts as the "test script". This takes a bit more work to set up, however, since you now need an interconnect of some sort, address assignments, etc. in order to have something the CPU can interact with. Were this approach not so expensive computationally (it takes a lot of work to simulate a CPU and its infrastructure, in addition to everything else), I'd use it more often--if for no other reason than you can use this approach to verify the software you'll be eventually using to interact with your device. This software test infrastructure is something I'd like to recommend to ARM users, but I don't think Xilinx provides a suitable PS model to make this happen. That means, when simulating on a Zynq, you're either forced to use the BFM/AXI VIP model or a CPU that's not relevant to your application. Perhaps there's a way around this, but I'm unaware of it. Regarding test "sufficiency", a test is often judged "sufficient" if it checks every logic path through your device under test. Well, *every* is a hard target to hit. Sometimes I end up settling with 90-95%. Measuring how many logic paths you hit is often known as a "coverage" metric, and so IP customers often want coverage metrics for the IP they purchase. Sadly, "coverage" feels good but ... isn't good enough. I know that Xilinx's IP engineers are proud of their coverage metrics, yet I've still found some nasty bugs in their IP via formal verification approaches. Still, it's a quantifiable measure, and it is better than nothing. As an example of an IP that supposedly has good coverage metrics, consider what happens when you attempt to both read and write from Xilinx's Quad SPI IP on the same clock cycle. The read operation will complete with the write operations parameters. (Or is it vice versa? I forget ...) This is something the AXI VIP is just not capable of testing, given how it is set up. A good formal proof will find this--but that means you need to be doing formal verification in the first place. In the end, your goal is always going to be finding the bugs in your design the fastest way possible. A good Verilator lint check can take seconds, vs a minute for Vivado to give you a single syntax error. A good formal proof might take less than a minute, vs watching your design hang in hardware and not know what's going on. A good simulation will go a long way towards generating confidence in your design once you do get to hardware--however, a good simulation might take longer to check than it does to build your design for hardware, put it on the hardware, and run a test on it. At least the simulation won't require you to pull out a probe to verify things. I've written mode about the CPU simulation approach to testing here. Perhaps that might help answer some more of your questions. Dan
  17. If you have critical warnings, definitely clean them up before submitting anything. Those should all be cleaned up before you expect the tool to perform. I also recommend passing a Verilator -Wall test. Indeed, I've been doing that this morning for one of my own projects ... Once all that passes, write me again but in more detail, and we can try to understand together what's going on. Dan
  18. @kr.mk1, Unfortunately, no. While I have built several QSPI flash controllers in my time, I haven't ever had to work through EMIO or MIO to do so. In this case, I'd suggest you follow @zygot's advice and search the TRM. You might find that you cannot access these pins directly from RTL. I just don't know. It's been a long time since I looked through the Ultrascale+'s TRM. Dan
  19. @zygot, You may have to explain this one to me. If someone wants to build an RTL flash controller, and the Zynq EMIO functionality prevents them from interacting directly with the flash via EMIO, why wouldn't an external flash be useful? Something that could be found on a PMOD, as @JColvin linked? Indeed, I've used that device with great success for exactly this purpose. Why would it make a difference if this test was done on a Zynq vs an Artix, if all the work is being done in the RTL? Or, in the case of the Zynq, as an AXI slave of some type implemented in RTL? Dan
  20. Sounds like the editor wars. From my humble position, I'd just note there's more open source support for Verilog than for VHDL. Dan
  21. You could always buy a PMod flash, and use that. Dan
  22. Most any of the FPGA boards (non-Zynq) should work. They all need a flash to hold their configuration, and that flash is also made available to users to do ... whatever with. Dan
  23. @engrpetero, Yes, that is me. I am the author of the ZipCPU blog, and twitter feed. To prove it, let me predict the next article will be about formally verifying an SD card block data receiver. I might even add some hardware lessons learned--since I now have the device (somewhat) running in hardware. (Yes, the article is mostly written ...) But back to your struggles ... device lockup can be a challenge to debug. It's usually caused by a bus slave (or master) that doesn't obey the rules of the bus. I'm not sure if you are using AXI or AXI-Lite, as there are more rules for AXI than AXI-Lite. AXI-Lite is usually the easiest to work with. Lockup with AXI-Lite is typically caused by a request that doesn't get any acknowledgment. Classic examples would be 1) the number of (BVALID && BREADY)s doesn't match the number of (AWVALID && AWREADY)s or the number of (WVALID && WREADY)s, or likewise 2) if the number of (RVALID && RREADY)s doesn't match the number of (ARVALID && ARREADY)s. Bus lockups in AXI-Lite can also take place if you expect AWVALID && WVALID to arrive on the same clock cycle, or if you wait for READY before asserting VALID--such as waiting on BREADY before asserting BVALID. Another problem I've seen recently had to do with someone getting the address to their peripheral wrong, so the design then locked up when they tried to access a non-existent device. The way to deal with this problem is (easiest to hardest): 1) to formally verify anything before it touches hardware (you knew I would say that), 2) to use an internal logic analyzer to watch these signals, or 3) if all else fails assign some LEDs to the task. Key signals to look for would be VALIDs stuck without READY, or counters of the above signals counting requests vs responses and then key an LED to the counters not matching. Another useful LED might be a toggle to just tell you if your IP was accessed at all from the ARM. Others on this forum will tell you the easiest way to deal with this is not to use AXI at all--but it is kind of hard to avoid with the basic Zynq type of platforms. (Just giving you a heads up, lest this conversation get off track ...) Dan
  24. Are you using Xilinx's AXI or AXI-Lite slave templates? If so, you should know they are both quite broken. They can both cause your design to lock up. They've been broken for years. Dan
×
×
  • Create New...