• Content Count

  • Joined

  • Last visited

  • Days Won


About [email protected]

  • Rank
    Prolific Poster

Contact Methods

  • Website URL

Profile Information

  • Gender
    Not Telling
  • Interests
    Building a resource efficient CPU, the ZipCPU!

Recent Profile Visitors

The recent visitors block is disabled and is not being shown to other users.

  1. @zygot, The AXI-lite wrapper is actually just a lite wrapper around a pair of FIFOs and some lighter UART cores. Indeed, I use those UART cores without AXI all the time. (Actually, I rarely use AXI myself ... which makes it all the more fun every time someone let's me know that my AXI stuff "just works".) Not only that, after building the original pair of cores (rxuart.v and txuart.v), I quickly determined that I had very little need of a UART core that supported changing baud rates or anything protocol other than 8N1. Therefore I have another pair of cores there, rxuartlite.v and txuartlite.v, that are even simpler yet. These can be subcomponents of the AXI-lite core linked above or indeed used independently, as I often do. If you don't need AXI, then don't use it. Your call. On the other hand, if you are using a Zynq (an unsaid background to this post), then AXI is almost a requirement. Dan
  2. @Antonio Fasano, Have you seen this IP core? Might be close to what you are looking for. Dan
  3. @hamster, Impressive! Would you mind sharing the hardware you chose to use? The clock speed of the delta-sigma converter? It also looks like you biased your sine wave negative by a half a sample. Out of curiosity, was this on purpose or just the result of the way you truncated floats to integers when generating your table? I'm a bit surprised by the second harmonic you show in your pictures. I had rather thought this technique would've done better than that. Do you have any suggestions as to what might have caused that harmonic? Either way, good fun, and thanks for sharing! Dan
  4. @zygot, Ok, that's fair. I'll agree we've probably been reading different material, but this idea is new to me. Thank you for sharing it. I like your distinction, and I may choose to use it in the future. You didn't read far enough. The article I quoted didn't include my home grown FIFO, but rather one I found on line that came highly recommended to me by others. The article goes through explaining how that FIFO works, and then how to go about formally verifying it--which also (BTW) involved pointing out any assumptions made when building it. I found it a very useful exercise, and one that helped prepare me for a job where I didn't have access to either Xilinx or Intel's vendor libraries. That's fair, and it's a valid position as well. I'm glad it's worked for you so far, and I wish you the best using it into the future. Certainly I enjoy making my dollars by posting the grand "it does everything" item, and then building the customized versions for individual customers. It's worked for me so far, and I've certainly enjoyed it. Even better, that article was among the top-3 all time hits on the blog all last year, even ranking so high as #8 on a search for "Asynchronous FIFOs"--so there are plenty of others interested in it as well who have also enjoyed reading it. Dan
  5. @zygot, I didn't think there were that many possible definitions for asynchronous FIFOs, so I'm not sure how our definitions might have gotten crossed. By my definition, an Asynchronous FIFO is a FIFO that crosses clock domains. It may (or may not be) mapped to specific hardware. The pointers are typically passed across clock domains via gray pointer, as @amb5l has mentioned. The full/empty flags of the resulting FIFO can also be used to make certain that the FIFO isn't read while empty or written to while full. Further, until full, the write side can operate at full speed. Likewise, until empty, the read side can operate at full speed. The pointers prevent the data path from getting near too close to the actual clock domain crossing. What other definition would you offer? Dan
  6. @amb5l, I think you'll find that an asynchronous FIFO is a much better approach to moving data streams across clock domains. I would also recommend against using the FDRE primitive within your design. It'll tie you to the Xilinx infrastructure, and make it a challenge to move your design to other (non-Xilinx) contexts. If you just use a straight VHDL register, Vivado should be able to infer the FDRE directly--so you shouldn't need to instantiate it. That would also leave you with a more generic, (more) vendor-independent design. Dan
  7. @amb5l, Thanks for sharing! It's always fun to have a project to share, and such projects can be a real encouragement for newcomers. Looking over your design, I found this piece of it which appears to include a clock domain crossing within it. Just from a desktop review, it also looks like it would drop samples or duplicate. Have you tested this using a counter running through it, at both proper clock rates, to see what would happen? Dan
  8. @zygot, Now I think you understand exactly what I mean. The flash memory is quite useful for ... whatever purpose. It's especially useful when playing with CPU's, and wanting your design to start from a known program. Flash is known for being slow. The particular flash chips that've been used on the Arty are rated for a clock of 108MHz. If you have a system clock of anything lower than 108 MHz, though, you'll either need to use an ODDR primitive to set the pin--something not available to you when going through the STARTUPE2 primitive--or suffer a 2x speed loss when going through flash. Since I like running at a 100MHz system clock (or near that amount) *and* a 100MHz QSPI_SCK, getting the full performance out of the flash requires a general purpose I/O pin. Looking over the schematic and the reference manual, this I/O pin is connected to L16. It's not listed in the XDC file. Hence my question. My best guess is either 1) they forgot to include it in the XDC file, or 2) thought (for some reason) it would be better left out of the master XDC file. I had been afraid this capability was taken off the board, but seeing it on the schematic gives me some assurance that it's still around. Dan
  9. @zygot, I'm referring to the SPI SCK pin headed to the configuration flash. It's used during configuration, but can also be used by a user design later using either the STARTUPE primitive, or (on the Arty--at least the way it was) using a second pin that was also tied to the same SCK wire leading to the flash. (There reference page shows a resistor between the two ...) Dan
  10. Xilinx requires the flash clock be connected to a special pin. The official way of accessing this pin is through the STARTUPE primitive. This limits the clock speed to 1/2 of the system clock speed, since there are no ODDR primitives available when going through the STARTUPE primitive. (Yes, you could do a CDC, but this gets annoying ...) The Arty (used to?) have an alternate method of accessing this pin via a secondary I/O pin also connected to this same line. According to the reference manual, this wire connects to pin L16. I just downloaded the A7100T master XDC file, however, and I don't see this secondary pin defined in the XDC file anywhere. Did it get removed? And if so, does your reference information also need to change? If not, shouldn't it be put back into the XDC file? As a customer, I will be disappointed if it was removed, since it will now mean that access to the flash is 2x slower than before. Dan
  11. @PaulJX, I get these errors often enough when I add new I/O pins to any design. They are readily fixed. Vivado is basically saying that it can't match all of the I/O pins in your design to those declared in your XDC file. Then, in an effort to be helpful, it tries to pick unused I/O pins to map your design to. (This is actually really bad for most designers, but it does have a purpose for new board designers.) The pins it picks, however, don't happen to have the same I/O standards as the pins currently in your design, and so the tools helpfulness ends here. Personally, I think the tool should've stopped earlier when it found an I/O declaration in my design that didn't match the one in the XDC, but either way that's what's going on. Don't try to suppress the warning--fix the problem. Toggling random unused I/O pins might cause your design to toggle a pin that's actually connected to something which could have ... undesirable effects. (My flash will never quite work the same as new again ...) Dan
  12. @zygot, You may be misunderstanding me. I'm not recommending HLS. I'm saying that in order to use it successfully you really need to understand the background of how logic gets mapped to hardware. Once you have this background, however, you're typically a more advanced user who no longer needs or desires HLS. Hence my argument above that the marketing message is off--it's not a simple answer for new users. The more experienced users are the ones you need to make it work, and these are the very same individuals who (like yourself) don't need it and likely don't even want it. The example I have comes from counseling a master's student who was building a 1023/1024 Lagrange interpolator. (I think that was the rate he was using, it's been a while.) He was still a beginning designer at the time. His goal was to build a design in both HLS and RTL, and then to compare the two flows. Along the way he became very frustrated more than once when slight and subtle changes he made to his design changed it from working nicely on the FPGA to no longer fitting on the FPGA. He was unable to explain why this was so. His conclusion was that a more experienced user could've understood this easily, but it would've required an intuition/insight into how the design was mapping onto the literal hardware to know. I wish I could point you to his dissertation on the topic, but since it wasn't in English I never maintained the link to it. It's a shame, too, since I would've loved to have it to reference. Dan
  13. @sgandhi, Perhaps I can offer a different perspective. If you are building an accelerator, then you need to know what your processing bottleneck is--otherwise you won''t be building an accelerator. Half of your job will be to find whatever bottleneck is present in your system and fix that, moving the bottleneck somewhere else. It can feel like whack-a-mole. As an engineer, half of your job is knowing where that mole is. Re: HLS. It's advertised as an awesome choice for beginners to get started. The problem beginners tend to have is that they don't understand how the constructs they create get mapped into hardware. Small, subtle, minor changes can make an HLS design go from fitting on a device to not fitting and few beginners have the insight to see what's going on. The result is that HLS seems to work best for the more advanced user--the same advanced user who doesn't really need it. The second problem/reality with HLS is that the logic it generates tends to be bloated by 3x-6x (last numbers I heard). That's going to force you to purchase a bigger FPGA than you need. Yes, all this stuff comes at a cost. I have used ethernet to transmit data to/from FPGAs before. If you look over this example, you'll find a functional example including a PC sending data to an FPGA over UDP, the data getting processed on the FPGA, and then returned to the PC. It's doable. Xilinx's ethernet-lite core has some ugly bugs in it that still haven't been fixed as of 2020.1. If you look hard enough, you can find open source alternatives to their designs. That said, the devil is in the details. A slow ethernet link could easily destroy any acceleration performance you hoped to achieve. I've had that happen to me a couple of times. You'll need to engineer that well through and through to make it worth your while. In my example, actual performance was dismal: the CPU ran instructions from SPI flash memory (*SLOW*), there was no SDRAM controller (*LIMITED ON-CHIP MEMORY*), there was no hardware available for hardware assisted data moves (*EVEN SLOWER*) and so I'm pretty certain that my "accelerator" didn't. Still, it's a functioning example you are welcome to look over. Be aware that the Versa board it was built for is a non-Xilinx board. The ethernet controller was copied from a separate Xilinx design, but that's another story. Re: Protocol processing. While you can do network protocol processing in an FPGA, it's often simpler to do it in an attached CPU--such as the ARM in a Zynq or within a MicroBlaze. I used a PicoRV32 RISC-V processor in my example. I think you'll find lots of examples of network protocol stacks in software, few in RTL. That's not a bad thing. If you use an Linux operating system of some type, then the packet processing will appear to come for free with it. (Nothing is for free ...) Again, know where your bottleneck is or your "accelerator" won't. Look for an ethernet core that comes with an integrated packet DMA to memory. Beware of what the CPU's cache is doing when you copy data to memory behind the CPU. Re: ASCII text vs something else. ASCII text is much easier to comprehend and debug. Switching to a more native format could speed up your algorithm by a factor of 4x if not more. In one project I dealt with, the difference between ASCII and binary formats was greater than a factor of 1000x. Know your bottlenecks. Do your engineering. This choice is usually an easy one to make. Xilinx's new Vitis software is supposed to help make data processing on an FPGA easy. It'll help you create a processing kernel: it'll generate the memory to RTL copy, you do your wonder in RTL, then it generates the RTL back to memory copy for you. The memory movers are supposed to be at high speed. (I haven't checked to know, myself, but I do know they aren't getting full bus bandwidth utilization--as per their data sheets.) Check out the Vitis manual for more information there. Perhaps that will help get you going. Perhaps it will only generate more questions. My bottom line is that I don't use the majority of Xilinx's IP, choosing instead to generate my own, so I really can't help you much there. Dan
  14. @José Enrique, @zygot, I'm not a VHDL expert, but this really looks wrong to me: proceso_contador : process (CLK, a_Reset, enable, contador) begin if a_reset = '1' or (contador = max_val) or enable = '0' then contador <= (others =>'0'); elsif rising_edge (CLK) then contador <= contador + 1; end if; end process proceso_contador; That's not quite the right pattern for a synchronous reset, nor for an asynchronous reset. Is this even valid VHDL? Dan
  15. Looks like a basic lowpass filter/decimator combination based upon a 3rd order CIC filter. Dan