• Content Count

    2146
  • Joined

  • Last visited

  • Days Won

    146

Everything posted by [email protected]

  1. @shlomishab, Absolutely! This is the job of the engineer, to know how to break a project up into subcomponents and measure which of those subcomponents are at fault. This is also what makes FPGA design difficult: because it is a challenge to debug designs after they are placed on a circuit board. Indeed, debugging FPGAs is perhaps the hardest part of the job. It requires a methodology and discipline that isn't necessarily carried over when coming to FPGA design from other fields. Why? Because you can't "see" into the FPGA. Unlike software, where you can stop it with a debugger at every step and examine every variable, you can't do that with an FPGA. (You can do it with simulation ...) Worse, it's hard to even examine variables from within an FPGA at all. Here are some good rules of thumb: Your first step to debugging is what's known as "static" checking, sometimes called "linting" in the software world. I like to use "verilator -Wall" for this--but that only works with Verilog modules. Vivado will also present warnings to you when it synthesizes your designs. (Verilator's warnings are more usable ...) When things aren't working, look over the warnings. It might save you hours of debugging. *EVERY* module within a design should have a "bench test" of some type, some way of determining that it works before it ever moves forward. In the larger companies that I've counseled over the years, a bug "escaping from unit test" is a *BIG* deal that gets a lot of engineering attention. It happens, but your goal should be to keep it from happening--it just "costs" a lot more work to find a bug after a module has left unit test. I do most of my bench testing using formal methods. Others set up rather elaborate simulation scripts to find bugs in individual modules. The difficult part of building simulation scripts is that ... you can't always predict everything that will go wrong. Formal methods often help you pick out things that might go wrong. When you purchase IP from someone else, or otherwise acquire it, look for this bench test. Vivado tries to make this easier by building a simulation script for you when you choose to create a custom IP. I don't use this script. I don't trust it. Their simulation script misses too many bugs. For example, I know (and have blogged about) bugs in every one of their AXI slave examples--both AXI and AXI-lite, and their AXI-stream master is also broken. These bugs don't show up under their simulations, but they do show up under a formal methods check. SymbioticEDA makes a product called SymbiYosys which you can use to formally verify Verilog designs. They sell a SymbioticEDA Suite for handling SystemVerilog and VHDL designs. They've also posted a similar product called mcy (mutation coverage with Yosys) which can be used to determine if your test bench is good enough or not. That is, if some piece of logic gets mutated (i.e. broken), can your test bench catch it? Evaluating your test bench to determine if it is "good enough", therefore, is the purpose of mcy. Once *EVERY* component has been formally verified (or otherwise bench tested), and only then, should your methodology move on to an integrated test. Integrated tests are still simulation tests.  Why?  Because, in simulation, you can see every variable, every value, at every time step. Sure, it's slow. Sure, it's clunky. However, it's much easier to figure out what's going right or wrong from simulation than it is from hardware. Only after your hardware passes an integrated simulation test should you ever move forwards onto real hardware. In the digital design field, this usually means FPGAs. For some of us, the design work stops at the FPGA level. For others, it goes on from FPGAs to ASICs, but FPGAs are usually a good first step before ASICs. Debugging a design within an FPGA is usually much harder than in simulation, but with the tradeoff that the FPGA can run at full speed (or close to it), whereas the simulation cannot. In order to debug a design from within an FPGA, you'll need a special piece of FPGA logic sometimes called an In-circuit Logic Analyzer (ILA). I like to call them "internal scopes". This will cost you logic and block RAM resources within your FPGA. Using a "scope" you can capture a limited number of values from within your design. As an example, I might capture 32-bits for 1024 clock cycles and read them back later. Inside a devices with thousands of flip-flops and millions of clock cycles per second, this is like trying to drink the ocean through a straw. There's an art an science to getting the right values, and to capturing them at the right time. Sometimes even the scope fails. In these cases, I like to use LEDs to try to debug what's going on. Using an LED, you can often debug missing clock problems, problems with clock not locking, and more. Sometimes a scope helps, and Digilent's Digital Discovery has been invaluable to me. Returning to the idea of using graphics on an FPGA, feel free to check out my own video simulation here. Since that article was posted, I've written AXI versions of the various demonstrators. Once you can run your design in simulation, then feel free to try running it in actual hardware. Then, when/if it dosn't work, feel free to write back telling us what piece isn't working, whether it's not working in simulation or in hardware, and be specific--isolate the problem as much as you can, so we can then help you. Or if you can't isolate it, tell us what you've tried, and we might be able to offer suggestions--similar to those above. Dan
  2. @rashimkavel7, Welcome to the fun! I'll second @JColvin's responses above. There's lots of ways you can go about this. Don't forget proper engineering discipline: be sure to break the project into parts, and verify each of the parts separately before trying to do everything at once. Failing to do so seems to be the most common problem folks have when using FFTs. (Well, that and AXI-stream signaling, generating a proper clock, etc ...) I'm also not sure what you want with Linux support. Do you want to run Linux on the FPGA? Or do you want to interact with the design from a Linux host nearby? Both are quite reasonable, although the former is a bigger challenge than the latter. Dan
  3. @shlomishab, Can anyone help you? Perhaps, but you'll have to do some more digging into what's going on. From what you've given above it'd be hard to know where to start looking. Let me recommend you start by breaking the problem up in a proper engineering fashion. Break up the operation into steps, and then check for success or failure at the end of each step. Incidentally, this is much easier to do in simulation ... Dan
  4. @ank, Hard to say. Depends upon your algorithm. Are you intending to run the algorithm in the FPGA portion of the board? Does it use fixed point or floating point? How many multiplies will you need? etc., etc. Dan
  5. I think I'm also going to have to disagree here. I've written several FIFO's over the course of the years. Rather than trying to maintain several FIFOs, each with slightly and subtly diffeent purposes, it helps to parameterize them. I will agree that you can go overboard with parameterization--such as Xilinx whose FIFO generator has nearly 100 parameters--but I haven't gotten near that point. Another example: Having built several designs, each with a wonderful purpose, I often want to get those same designs to run on multiple pieces of hardware. Even with RTL coding, there are hardware differences. For example, iCE40's don't have distributed RAM and require all RAM reads to be registered--unlike Xilinx. To be able to use the same design across both iCE40s as well as Xilinx chips, therefore, I need to have subtle changes between the two designs. Another example: Some hardware has more logic, other chips have less. Building a CPU that will run on both a Spartan 6LX4 as well as an Artix-7 200T requires either that the CPU be limited by the extremely few resources of the Spartan 6, or be parameterized so that it can support both the Spartan 6 (no caches, no pipelining) as well as an Artix-7 where I have lots of logic to spare. Being able to adjust a design for the hardware space you have available is a productive use of parameters. There is a challenge when using parameters, however: Verification. If you have 20 boolean parameters, that roughly means you need a million test benches to check all combinations of them to know they work. So ... it's a trade off. Still, I find them quite valuable. Dan
  6. @JColvin, Thank you! Dan
  7. @macethorpe, Exactly! Although I wouldn't reinstall Vivado, personally. I think you'd be spinning your wheels to do so. I've been disappointed before to notice that some menu options become available depending upon what mode of the design process you are in. You might want to check both pre and post implementation. One correction to my comment above--I see all the options before ever running synthesis, not after. I got there by right-clicking "Generate Bitstream" and then working through the options presented there. Incidentally, it's really easy to convert a .bit file to a .bin file--you just through away the header of the .bit file. I think it's about 36 bytes if I recall properly. It's usually pretty obvious upon inspection. Dan
  8. @macethorpe, I just tried with Vivado 2020.1 and had no problems getting the menu to come up. In my case, I had the hardware manager open before right clicking on the bitstream settings. Dan
  9. @macethorpe, Is it doing everything but building the bin file? 'cause I seem to recall needing to set an option in order to get it to generate the bin file in the first place. Dan
  10. @macethorpe, The newer Arty uses a Spansion flash device, and no longer the Micron device. That will make a difference in how it is programmed. Dan
  11. @mor.paul, Every Zynq board I've seen has plenty of memory to be used. There's also an AXI3 incoming port to the Zynq, which your design can use to write to the SDRAM memory on board. Indeed, this is a common use case. You just need to use an IP to write your samples to memory. Xilinx has a "data mover" IP they use for this purpose. The idea is that you configure this IP to transfer data to memory, and then off it goes. You just need to feed it with 1) an AXI-stream containing the data to be saved, 2) the address to write to, 3) the amount of data to write, and 4) a start command to tell it when to start. Beware there are some unexpected "features" when using this IP--for example, it will stop transferring on TLAST and it will lock up if you give it data before it's properly configured, but in general it should work well enough for you. I have some of my own designs to do the same, but haven't yet tried integrating them with Vivado's board design. This design, for example, acts like very much like a scope--saving data to memory until some holdoff period following a trigger. It's designed for debugging an FPGA design where you know where and when things go wrong, but not necessarily what led up to them. Here's another that works/acts very much like Xilinx's stream to memory (S2MM) data mover, with the difference that 1) it leaves TREADY low when you aren't using it, and 2) stops after the given number of samples are transferred rather than on the TLAST signal. (These are AXI stream signals.) Indeed, my S2MM converter was tried recently in a comparison with Xilinx's and (much to my pleasant surprise) it "just worked". Using a stream to memory approach like any of the above will require some ARM software. Specifically, you'll need to pay attention to 1) make certain that the physical memory you are writing to is in a known virtual address location, 2) that the memory isn't cached at the time of the write, and 3) some method for getting the samples from your design to a nearby computer (Ethernet?). Some of this is done for you when working with Xilinx's examples, some of this you'll still need to do on your own. The good news is that, again, this is a common enough operation. Ideally, you should be limited by the size of the SDRAM, but practically you'll be sharing it with Linux so ... there's a limit to how big of a memory area you can copy. On the other hand, if you dumped the Linux, you could store to a larger area but then you'd need to more work to figure out how to control the network driver so that it would send data using a known protocol. It's all about compromises and tradeoffs like any other engineering field. Dan
  12. @mor.paul, Sounds like a fun project! Dan
  13. @mor.paul, Those errors don't look familiar to me. (Which doesn't really tell you much.) You might need to do some digging there. I will point out one thing: Digilent, and Xilinx for that matter, have had problems with user demonstrations. The root cause of these problems tends to be Vivado's relationship with project management. IP changes from one version to the next. Interfaces change from one version to the next. Each new revision risks making demonstrations made from the previous revisions obsolete. Indeed, this is also a problem with a lot of instructional material. In general you can trust the RTL sources from the past, but not necessarily the standards of things they interact with, to be stable. One common solution Digilent has often given to these problems is to tell users to use a particular download for success. When I met with one of their engineers some time back, they recommended the last release of the year as the one with the fewest bugs in it, so you might wish to try the design in 2018.3 to see if it still has bugs. (I'm not going to recommend 2019.2, where they introduced Vitis. Note the dates on this Xilinx forum post, from initial complaint to final Xilinx response.) Dan
  14. @mor.paul, What is AXI? As @zygot said, it's the bus protocol the ARM within the Zynq (sometimes called the PS, or Proccessing Side) uses to communicate with the FPGA logic (also called the PL, or Programming Logic.) More on that in a moment. For now, understand that FPGA logic is not software. It's reconfigurable hardware. When building FPGA logic, you'll be using tools to route wires within a chip, to configure flip-flops for your logic, and build designs out of Look-Up-Tables, or LUTs. LUTs are boolean functions, take N-bits in, produce one bit out. The terminology changes a bit as well. Software engineers write programs. Hardware engineers build designs. Another surprise to software engineers when dealing with hardware, is that the minimum data unit is one bit and not 8-bits, but I digress. Unlike @zygot, I'm still working on my first Zynq design. I've done many basic FPGA designs on chips without the ARM processor within it, but that Zynq processor can be a challenge to work with. I'm personally not sure why Zynq is a recommended device for beginners, given the challenge's I've had so far when working with it. Most of what I know, therefore, comes from helping users with questions on forums such as this one, Reddit, or Xilinx's forums. Before discussing AXI, let me define a bus protocol in general: A "bus" in this context is something that allows a CPU to talk to memory and peripherals, and no longer the original concept of a group of wires that might be shared between multiple electrical drivers. In this case, the CPU issues either read or write commands in response to load or store assembly instructions. Each command comes with an address of where to read from or write to, and write commands also come with the data to be written to that address. A simple C statement, like: "volatile int *ptr =0xa0..., a; a = *ptr;" will cause a read command to be issued from the ARM to the PL. (Getting the address right is important, but also something I'm not going to address here and now.) A similar C statement, such as, "*ptr = a + 1;" will cause a write command to be issued across the bus. Well, okay, neither are quite true--there's the issue of the cache, where the CPU reads values into a cache to speed up its operation and then only writes values out again later. If I recall correctly, the ARM cache is disabled by default for the addresses that talk to the PL, but you can turn it back on if you want higher speed transactions. The key here is that you, as the hardware designer, get to choose what you will do when a request comes in to write to a specific address or read from a specific address. While you can make the value at an address act like memory, it doesn't need to do so. You could make a peripheral that, upon every write, sends the character written to a serial port, or reads the most recent character received from a serial port. This is sometimes called Memory Mapped I/O, and you can consider the entire memory map of the PL to be a reference to memory mapped I/O. Again, you get to decide how it behaves. That's the good news. Here's the bad news: AXI is not a trivial protocol. It is neither simple nor easy to master. Call me a dunce if you want, but it took me many months before I finally figured it out enough to be confident when working with it. (Part of my problem was that I was trying to build a rather complex design, and didn't start out simple ...) It doesn't help that Xilinx's example logic has been broken since at least 2016, and risks locking up your design--at which point you'll have no idea why it was failing. This leads to a common situation I call FPGA Hell, where the chip doesn't do what you want and you have no idea why not. Trust me, it's common. As of Vivado 2020.1, they still haven't fixed their demonstration designs despite the fact that all of their instructional material uses them. (You can find "fixed" example designs on my blog for all of the broken demonstration designs but their AXI-stream master. Check out the topics page and search on AXI if you are struggling to find the articles.) Some of the challenges associated with AXI include "back pressure", the idea that the slave can't return a response to a request from the CPU (actually the bus master) until the bus master is ready for it. (This "feature" breaks Vivado's demonstration designs if it's ever triggered.) A second challenge is that AXI permits burst requests, such as for 1-256 items at a time, but bursts can't cross 4kB boundaries. A third challenge is that every request is tagged with an ID, and that responses for requests with different ID's may be returned out of order. (This also breaks Xilinx's AXI full slave demonstration design.) A fourth challenge is the idea of "narrow bursts"--something I discuss on my blog, also broken in Xilinx's demonstration design. Suffice it to say that AXI is a non-trivial protocol. It also reads like it was designed by committee, rather than by the engineer that needed to use it, since it has so many bells and whistles to it: Quality of Service guarantees, Cache coding that ... doesn't relate to the CPU's cache, Protection fields that just aren't well defined, a method for atomic accesses that really complicates CPU design and much more. Someone is making a lot of money helping people deal with this complexity and it's not your beginning FPGA developer. Xilinx has done their level best (or worst, you decide) to help by creating an IP integrator together with the illusion that you can just compose a design out of component parts. This is sold as an "easy" method of design. What they don't tell you is that many of these component parts are encrypted black boxes, making them very difficult to debug and work with. Indeed, this seems to be a majority of forum requests: "I'm working with IP XYZ, and my design isn't doing what I expect, what's wrong?" Other components which are not black boxes have bugs within them that the unsuspecting user will need to work around (ethernet-lite, s2mm data mover, etc.) If you find yourself in this situation, make sure you grab a trace from within your design "showing" or "illustrating" the problem when you post it--it will make debugging across a forum a lot easier. (Yes, I've built my own capability to do this. You don't need to build your own, or even use my capability here, Vivado's seems to be much easier to use--even if not quite as capable. For example, I once grabbed a compressed trace containing several minutes of data from within one of my designs ...) To make matters worse, most of the instructional material for Verilog/VHDL design doesn't really discuss how to debug a broken design. They'll tell you how to build a design, but not necessarily how to build a design and guarantee that it will work when you get to hardware. This is a problem, given that hardware is much harder than software to debug: you don't have access, for example, to every variable within a debugger. At best, you can blink an LED, or capture a "trace" of signals describing what goes on within a portion of your design for a thousand clock cycles or so. This means that you, as a designer, need to learn early on to formally verify your designs, to simulate them, as well as how to blink LEDs and extract traces from within a broken design. (Why "formal verification"? Because Xilinx's Verification IP won't reproduce the bugs in their designs, leaving you believing that your designs work when they do not ...) Yeah, I suppose that's a long winded welcome. Hopefully it gives you a bit of a reality check and some helpful expectations management. If you are still excited, then welcome to the forums! You'll find other users like yourself here. Some, like myself and @zygot, quite opinionated. I figure that's a good thing. Dan
  15. @Dannny, A "Video DMA" or "VDMA" is just a piece of RTL that copies memory from block RAM to a video stream. With a few exceptions, it's not all that hard to write. As an example, I recently wrote a frame buffer to video stream DMA here. You can find a demonstration of how that might be used to drive an HDMI signal here. I figure it takes about 2 days or so to get the AXI interaction right--provided you have a bit of experience and a formal tool for verification. As for @Ciprian's comments about the Linux kernel driver, I can't really comment. I tend to do all RTL designs, and so I haven't really had much of a need for a Linux kernel driver. I will caution you this: beware of bandwidth requirements on your DDR3 memory. One of my earlier projects, using a 1080p/60Hz input and similar output, didn't have the bandwidth I wanted to go through memory in between the incoming image and the outgoing one. Something to consider, Dan
  16. @zygot, The AXI-lite wrapper is actually just a lite wrapper around a pair of FIFOs and some lighter UART cores. Indeed, I use those UART cores without AXI all the time. (Actually, I rarely use AXI myself ... which makes it all the more fun every time someone let's me know that my AXI stuff "just works".) Not only that, after building the original pair of cores (rxuart.v and txuart.v), I quickly determined that I had very little need of a UART core that supported changing baud rates or anything protocol other than 8N1. Therefore I have another pair of cores there, rxuartlite.v and txuartlite.v, that are even simpler yet. These can be subcomponents of the AXI-lite core linked above or indeed used independently, as I often do. If you don't need AXI, then don't use it. Your call. On the other hand, if you are using a Zynq (an unsaid background to this post), then AXI is almost a requirement. Dan
  17. @Antonio Fasano, Have you seen this IP core? Might be close to what you are looking for. Dan
  18. @hamster, Impressive! Would you mind sharing the hardware you chose to use? The clock speed of the delta-sigma converter? It also looks like you biased your sine wave negative by a half a sample. Out of curiosity, was this on purpose or just the result of the way you truncated floats to integers when generating your table? I'm a bit surprised by the second harmonic you show in your pictures. I had rather thought this technique would've done better than that. Do you have any suggestions as to what might have caused that harmonic? Either way, good fun, and thanks for sharing! Dan
  19. @zygot, Ok, that's fair. I'll agree we've probably been reading different material, but this idea is new to me. Thank you for sharing it. I like your distinction, and I may choose to use it in the future. You didn't read far enough. The article I quoted didn't include my home grown FIFO, but rather one I found on line that came highly recommended to me by others. The article goes through explaining how that FIFO works, and then how to go about formally verifying it--which also (BTW) involved pointing out any assumptions made when building it. I found it a very useful exercise, and one that helped prepare me for a job where I didn't have access to either Xilinx or Intel's vendor libraries. That's fair, and it's a valid position as well. I'm glad it's worked for you so far, and I wish you the best using it into the future. Certainly I enjoy making my dollars by posting the grand "it does everything" item, and then building the customized versions for individual customers. It's worked for me so far, and I've certainly enjoyed it. Even better, that article was among the top-3 all time hits on the blog all last year, even ranking so high as #8 on a search for "Asynchronous FIFOs"--so there are plenty of others interested in it as well who have also enjoyed reading it. Dan
  20. @zygot, I didn't think there were that many possible definitions for asynchronous FIFOs, so I'm not sure how our definitions might have gotten crossed. By my definition, an Asynchronous FIFO is a FIFO that crosses clock domains. It may (or may not be) mapped to specific hardware. The pointers are typically passed across clock domains via gray pointer, as @amb5l has mentioned. The full/empty flags of the resulting FIFO can also be used to make certain that the FIFO isn't read while empty or written to while full. Further, until full, the write side can operate at full speed. Likewise, until empty, the read side can operate at full speed. The pointers prevent the data path from getting near too close to the actual clock domain crossing. What other definition would you offer? Dan
  21. @amb5l, I think you'll find that an asynchronous FIFO is a much better approach to moving data streams across clock domains. I would also recommend against using the FDRE primitive within your design. It'll tie you to the Xilinx infrastructure, and make it a challenge to move your design to other (non-Xilinx) contexts. If you just use a straight VHDL register, Vivado should be able to infer the FDRE directly--so you shouldn't need to instantiate it. That would also leave you with a more generic, (more) vendor-independent design. Dan
  22. @amb5l, Thanks for sharing! It's always fun to have a project to share, and such projects can be a real encouragement for newcomers. Looking over your design, I found this piece of it which appears to include a clock domain crossing within it. Just from a desktop review, it also looks like it would drop samples or duplicate. Have you tested this using a counter running through it, at both proper clock rates, to see what would happen? Dan
  23. @zygot, Now I think you understand exactly what I mean. The flash memory is quite useful for ... whatever purpose. It's especially useful when playing with CPU's, and wanting your design to start from a known program. Flash is known for being slow. The particular flash chips that've been used on the Arty are rated for a clock of 108MHz. If you have a system clock of anything lower than 108 MHz, though, you'll either need to use an ODDR primitive to set the pin--something not available to you when going through the STARTUPE2 primitive--or suffer a 2x speed loss when going through flash. Since I like running at a 100MHz system clock (or near that amount) *and* a 100MHz QSPI_SCK, getting the full performance out of the flash requires a general purpose I/O pin. Looking over the schematic and the reference manual, this I/O pin is connected to L16. It's not listed in the XDC file. Hence my question. My best guess is either 1) they forgot to include it in the XDC file, or 2) thought (for some reason) it would be better left out of the master XDC file. I had been afraid this capability was taken off the board, but seeing it on the schematic gives me some assurance that it's still around. Dan
  24. @zygot, I'm referring to the SPI SCK pin headed to the configuration flash. It's used during configuration, but can also be used by a user design later using either the STARTUPE primitive, or (on the Arty--at least the way it was) using a second pin that was also tied to the same SCK wire leading to the flash. (There reference page shows a resistor between the two ...) Dan
  25. Xilinx requires the flash clock be connected to a special pin. The official way of accessing this pin is through the STARTUPE primitive. This limits the clock speed to 1/2 of the system clock speed, since there are no ODDR primitives available when going through the STARTUPE primitive. (Yes, you could do a CDC, but this gets annoying ...) The Arty (used to?) have an alternate method of accessing this pin via a secondary I/O pin also connected to this same line. According to the reference manual, this wire connects to pin L16. I just downloaded the A7100T master XDC file, however, and I don't see this secondary pin defined in the XDC file anywhere. Did it get removed? And if so, does your reference information also need to change? If not, shouldn't it be put back into the XDC file? As a customer, I will be disappointed if it was removed, since it will now mean that access to the flash is 2x slower than before. Dan