D@n

Members
  • Content Count

    1958
  • Joined

  • Last visited

  • Days Won

    136

Everything posted by D@n

  1. Seriously? You didn't look at the material I offered then. Allow me to post it here, at the expense of any readability, since the following pictures were taking from the documentation I referenced above: Once you fire up the Memory Interface Generator IP product guide, it will lead you through a series of dialog boxes used to configure the core. Step one is to create a new design. I like to use the AXI interface for my designs. There is another interface available that I have yet to find sufficient documentation for. Step two: skip picking any pin-compatible FPGA's--you'll only need to support the one that was already selected when you created your project Step three, specify that you want a DDR3 SDRAM configuration Step four: Identify your chip, clock rate, voltage, data width, and any number of bank machines that you want. Do note that you cannot go faster than about 325MHz with this board--the chip won't allow it. Likewise, you can't go slower than about 300MHz since the DDR3 SDRAM spec doesn't allow that. Since this is rather high speed, and do to limitations of the Xilinx chip on the Arty, you'll need to use a 4:1 controller. Note that this is grayed out because the chip won't support the alternative 2:1 controller at this speed. The MIG controller will generate this 325MHz clock from the incoming clock you give it (100MHz), and give you an output clock which you can use for your design at 325/4 = 81.25MHz. Depending on whether or not your controller can handle out-of-order returns, you may wish to select "Normal" or "STRICT" ordering. I used Normal when this was written. I've since rewritten my AXI controller for strict ordering--it's just simpler and easier to deal with. You can then choose what data width you want for your AXI bus. 128-bits is the natural data width for this memory. Anything else will bloat the logic used for your memory controller with a data width converter. Depending on how you choose to generate your AXI requests, you may or may not want narrow burst support. (Xilinx's AXI example doesn't support Narrow Bursts, although it is part of the AXI spec.) This would also bloat your design, and I'm not sure it's really necessary. I have written an article on how to handle narrow burst support. Moving on, AXI allows out of order transactions. My original design used out-of-order support and allowed every transaction to have a different AXI ID and thus to get returned in a different order. It was a pain to deal with. If you want to do this though, an ID width of 5 with unique request IDs per beat will allow the controller unconstrained freedom to reorder your requests. If you choose to use strict ordering, you don't need to support any more ID widths than the rest of your system requires. Step six, where you were asking your question, is spelled out here as well: a 100MHz input clock period works just fine. It will be multiplied up to 1.3GHz and then divided back down to 325MHz to support the clock rate we selected earlier. Read burst type really depends upon how your CPUs cache wishes to access memory. If you don't care, just pick sequential--it's easier to optimize for. Output driver and RTT impedence come straight from the Digilent documentation. This board has a Chip Select pin on it's DDR3 interface, so you'll need to enable that here. I also like Row-Bank-Column ordering, since it's easier to guarantee 100% throughput, although I have no idea if the MIG controller will guarantee non-stop throughput across banks. Perhaps I'm just superstitious that way. Step seven, clock inputs: Since you will need a 200MHz reference clock (this isn't optional, nor is the frequency up to the user), and the only place you can get that on this board is from the 100MHz input clock, your only option is to select "No Buffer" for both of these clocks and to run your 100MHz through a PLL. You can then send the outputs of that PLL to the MIG. Picking 100MHz gives you greater flexibility for the rest of the clocks coming from that PLL. Reset type is up to you, I'm using Active Low. You won't need the debugging signals--if everything works well. (I have had problems before, but solved them without the debugging signals.) The board also provides you with an incoming voltage reference. I/O power reduction is the default, but not something I've experimented with, and unless you want to feed the core with the local temperature yourself, you'll need to enable the on-board XADC and give the MIG control over it. Step eight: You'll want to select the internal termination impedance of 50 Ohms. Step nine: Since Digilent already built this board, you'll want to select a "fixed pin out"--the pins are already laid out and selected for you Step 10: This one's a real pain if you don't want to reference my design as a place to start from, but ... you'll need to identify each of the external wires to the MIG for layout. You can reference my design and the UCF file within it to make this part easy, or you can painstakingly dig for that information. As I recall, I found this information from the Arty project file. Step 11: The final step I took was to make the system reset, calibration complete, and error signals "No connect"--that keeps them from being assigned I/O buffers if they aren't directly connected to pins. You can still use these signals within your design if you want. Do be aware that the latency of the MIG controller is around 20+ clocks. It works ideal for high throughput designs of 128-bits per beat, such as a cache might generate, but it's going to be a speed bump in any more random memory access pattern. Now, that's an awful long post to answer a basic question which would likely have followups, but it does show in gory detail how you can set up the MIG. If you want more detail, the project itself is an example design which you can reference--or not. Take your pick. Oh, and Yes, this information was in the project's documentation which I referenced in my first response. Dan
  2. @binry, You can see all the settings I used here. Do take note, though, that most of those who use the Arty platform tend to use the Vivado schematic editor. I'm one of the few that does not. Yes, I choose not to use buffers for the clocks specifically because they come from internal MMCM/PLLs. The other option is for clocks coming directly from pins into the MIG core. In those cases, the MIG will attach I/O buffers to the clock pins--something that can only be done to input pins before any other logic is applied. Since I needed the 200MHz clock, I went with the PLL first--forcing the MIG clocks to avoid instantiating I/O buffers lest the design not be able to match to the Xilinx hardware. I also use the 100MHz incoming clock, after passing through a PLL, directly into the MIG--so the incoming system clock rate is 100MHz rather than 166.67MHz. This is the value in the IP catalog setting, as linked above. This is separate from what the core wants as a "reference clock"--that needs to be 200MHz. The MIG then produces it's own clock, which I then use as the system clock throughout my design. Dan
  3. @binry, I usually feed the MIG controller with the 100MHz clock as the system reference clock if possible. The MIG will handle dividing this as appropriate. The reference clock must be at 200MHz. This is to support the IO delay controller within the chip, which only accepts a 200MHz clock Yes, you can connect the 100MHz system clock into the MIG. The MIG will generate a reset that you can use--based off of both when the PLLs settle and when it's internal calibration is complete I'm not familiar with the example project you cite. My own example Arty A7 project doesn't use the traffic generator at all. Dan
  4. D@n

    Advanced topics

    For CDC, consider these articles on 1) basic CDCs, 2) formally verifying asynchronous designs, and 3) Asynchronous FIFOs. For speed optimizations, you'll need to learn about pipelining. Here's an article on pipeline control. Dan
  5. Welcome to the forums! I'm an FPGA enthusiast myself, known for my blog. Feel free to check it out and let me know what you think of it. Dan
  6. @Kenny, If you are an FPGA beginner, then ... I would start somewhere else. I would recommend starting by learning how to debug FPGA designs. The problem with FPGA development is that, unlike software, you have little to no insight into what's going on within the FPGA. There are a couple keys to success: Being able to insure your design works as intended before placing it onto the FPGA. Simulation and formal methods work wonders for this task. Unlike debugging on hardware, both of these approaches to debugging offer you the ability to investigate every wire/signal/connection within your design for bugs. If you are unfamiliar with these tools, then I would recommend my own tutorial on the topic. Being able to debug your design once it gets to hardware. This should be a last resort since its so painful to do, but it is a needed resort. To do this, it helps to be able to give the hardware commands and see responses. It helps to be able to get traces from within the design showing how it is (or isn't) working. I discuss this sort of thing at length on my blog under the topic of the "debugging bus (links to articles here)", although others have used Xilinx's ILA + MicroBlaze for some of these tasks. Either way, welcome to the journey! Dan
  7. @Kenny, Any entry level board can and will do an FFT--the real question is how much of an FFT do you want to do. I've done FFT's on an Artix-7 35T, so you should be okay with the S7-50. That said, size and complexity are tightly coupled with both the size of the FFT and the precision of the bits within it. If you want an example, this FFT Demo was built using Digilent's Nexys Video board. It works by reading data from an external Pmod MIC3 microphone, filtering the data, windowing it, FFT'ing it, taking the log of the magnitude, and then writing it to memory. The memory is then treated as a framebuffer and used to display a scrolling raster via an HDMI output. Optional software will make the raster scroll in simulation window (also provided, tested on Ubuntu). Dan
  8. D@n

    Arty S7 Board Layout

    @ykaiwar, It sounds like there's an issue with the DDR3 SDRAM on your new board, but you've declared an issue with the ability to configure the FPGA at all. So let me ask about some basic things in between configuring an FPGA and getting the DDR3 SDRAM working: Can you turn an LED on? Can you turn it off? Can you make it blink? Can you mirror the serial port (you do have one, right) from receive back to transmit and verify that it works? If these tests fail, then you aren't yet ready to discuss possible SDRAM problems. If they succeed, then you can start to bootstrap your FPGA's capability more and more until you know exactly what is and isn't working. Dan
  9. D@n

    Basys3 Memory

    @trian, Truly answering this question is rather complex and will likely depend upon more details than you have shared. You'll need to check the camera and display's I/O capabilities, the FPGAs I/O capabilities, pixel clock rates, the size of the display in pixels, your memory bandwidth and more. If you intend to have a CPU on board, you'll also need to make sure you allocate space for it's memory. That's a lot of analysis you'll need to do. It's doable, but expect to make a couple mistakes along the way. We all do. Some of us seem to make more than others ... I personally tried to do something similar (framebuffer to VGA output) using the Basys3 only to discover in hindsight that it doesn't have a lot of internal RAM. The Basys3 doesn't come with any off-chip RAM, so if you want RAM all you have to work with is the block RAM. In my own design, I only managed to scrounge 128kB together before running out of resources on chip. I was then sadly disappointed when I couldn't fit any decent sized framebuffer on the Basys3. What good is a VGA when you don't have a framebuffer? Instead, I used flash memory as a ROM-based framebuffer. Flash, however, is slow so it took a lot of work to get it fast enough. (I compressed my images) This allowed me to present a basic slide-show at 640x480. I haven't tried higher resolutions (yet). At the time, I thought I'd never do any better and gave up trying. I later discovered that someone on Digilent's staff (I'll let her remain nameless ...) and often on the forum had managed to get a lot of PacMan running in this small memory space. (The project was never complete, collision detection I hear didn't (yet) work by the time the project needed to be handed in. Probably why she doesn't share more ...) So, careful engineering can overcome a lot of problems. You may find yourself limited by your creativity. Certainly necessity is the mother of many inventions. In light of all of this, I'd recommend looking into an off-chip RAM of some-type. Perhaps a hyperRAM? Be aware, though, if you are new at this then you'll have a challenge ahead of you to get something you are unfamiliar with working--even if you choose to use someone else's "proven" core. Dan
  10. @sab, Fascinating! Which core are you using? Is it a public core, a Xilinx core, or commercial core, or one of your own that you are evaluating? If you need a public core that can be used (mostly) cross-platform, then I can provide some that you can then reference in your study if you need to. Let me also suggest that your coefficients need to be run-time settable for performance measurements. If you don't, the synthesis tool might remove certain multiplies (multiply by +/- 2^n) and so otherwise bias your result. Dan
  11. @sab, Just on its face, I'm surprised you were able to implement an FIR filter in only 430 LUTs and one DSP. This sounds rather light. Tell me, though, how many coefficients did your FIR filter have? (I'm typically looking at 10+, this looks too small for that.) How bits did each of those coefficients have? (I like between 8 and 16, to match the incoming ADC) Were the coefficients fixed? (Vivado can do a lot of optimizations on fixed coefficients, not so much on run-time programmable coefficients.) How many bits did each of the input samples have? How many bits were in the output? All of these have an affect on how much logic an FIR uses. Dan
  12. @Tejna, Welcome to the forums! I'm an FPGA blogger, and so I'd invite you to check out any of the articles I've written. </shameless plug> Dan
  13. @ekazemi, The code I presented has user-selectable phase resolution, subject to the accuracy of the originating clock and some quantization error on the output. This can easily be adjusted to create whatever phase clock signal you want. Be aware, as with everything FPGA, the devil is in the details. For example, if you aren't careful, you could create a glitchy clock without intending to. On the other hand, the technique is simple enough as to offer lots of possibilities--which sounds just like what you are looking for. Dan
  14. If you are trying to create clock "glitches", then you definitely want to avoid using the MMCM. Try the above linked method. I think you'll find no problems at rates as slow as 16MHz. Dan
  15. @ekazemi What frequency rates are you trying to achieve? This method will get you user-controlled phase to within about 1ns or so, and it works nicely for sending something off-chip. But to your question, I have yet to try the dynamic interface of any clock management hard-cores. I've looked them over a couple of times, but ... not actually tried any. Dan
  16. D@n

    I bricked my CMOD-A7

    No, the USB driver disconnecting from the JTAG endpoint could be part of a normal connection. I know when I connect one of my Digilent devices, it connects, recognizes the device, and then disconnects and reconnects with a different driver. This is normal. Hence, I don't (yet) have enough information from the dmesg dump above to know if there's a problem with the JTAG chain. One of the things you can use FTDI read-memory command to get a dump and see if it looks like what it should. (You should, for example, be able to read "CMOD S7" or something similar from the FTDI's flash to confirm it "looks" okay.) Dan
  17. D@n

    I bricked my CMOD-A7

    @hamster, I think I'm going to agree with @xc6lx45, I don't think you've bricked this. You may have the flash in an unusual mode. Can you load and run a design on the S7 that doesn't use the flash, to see if that works? Dan
  18. @rivermoon, Go for it, and good luck! I've found that wireshark was very useful when debugging network interactions. Let me take a moment and suggest you look into it and try it out. Also, I'd love to hear back from you regarding your success when everything works like it should. So often these forum posts only discuss problems and we never hear successes here. That said, it's your call what you want to share. Dan
  19. @Jess, While I've done VHDL before, it's really not my strong suit so I'd love to see some VHDL designers step in at this point. That said, you should never need to instantiate a flip-flop (FF) on your own. The tools should "just do it" for you. In particular, your code snippet (below) does exactly this: divisor : process(elclock) variable div_cont : integer := 0; begin if (rising_edge(elclock)) then if (div_cont = max) then temporal <= not temporal; div_cont := 0; else div_cont := div_cont + 1; end if; end if; end process divisor; Both div_cont and temporal will be implemented with FF's. That said, I'm not familiar enough with VHDL to catch the subtleties here. For example, I've never seen something set *after* the end of the if (rising_edge(clock)) block but still within the process. This might be a bad thing, or might not, I'm not sure. The other thing to be aware of is that in spite of its name s_clock *IS* *NOT* *A* *CLOCK* *SIGNAL*! Sorry for yelling so loud, but I feel like a broken record when discussing this--it seems like every new HDL designer tries to use something like this to make a clock. Do not use s_clock like a c,lock. It is not a clock. It is a logic generated signal. Most FPGAs have dedicated clock logic, both PLLs and MMCMs as well as special clock buffers and routing networks, used to handle clocks. Logic generated "clocks", like this one, don't get access to this special purpose hardware. As a result, you are often queuing up for yourself a disaster when your code actually meets real hardware. The problem specifically comes to play when you try to do something like: broken: process (s_clock) begin if (rising_edge(s_clock)) then // Your design is now broken end if end process The correct way to do this is to use some form of clock enable signal, such as, divisor : process(elclock) variable div_cont : integer := 0; // This should probably also be limited in width begin if (rising_edge(elclock)) if (div_cont = max-1) then ce := 1; div_cont := 0; else ce := 0; div_cont := div_cont + 1; end if; end if; end process; process (elclock) begin if (rising_edge(elclock)) then if (ce) then // Now you can put your rate limited logic here // ... without worrying (as much) about simulation // vs synthesis bugs, or the synthesis tool doing // something unexpected end if; end if; end process; That said, nothing prevents you from calling s_clock a clock or outputting it on an output pin to examine with a scope. It's just that, using it's rising edge within your design will cause problems. Also, my apologies to all of the real VHDL designer out there for bugs I might be missing, but this is the basic concept of what you need to do. Finally, don't forget to make sure the name elclock matches the name of the incoming clock within your XDC file. It should be on the same pin that a hardware-clock comes in on. Dan
  20. @Jess, Here's my example for how to make a frequency divider. Yeah, I know its Verilog. The same principles apply for VHDL, although the syntax will be different. Remember: Nothing other than a bonafide clock should be in the rising_edge section of a process. In my design, I have inputs to the design from the external world. i_clk is always one such input. The XDC has a clk input that I rename i_clk. These two files (my design and the XDC) then match, and so I get the clock at the rate it comes into the board. Be careful not to adjust the frequency yourself. Use a PLL, MMCM, or other. Making your own clock is a class/common beginner mistake--so common that I have to wonder if the text books aren't teaching things wrong. Another classic mistake is to try to "create a clock" from scratch using syntax that can't be implemented. That doesn't work either. The clock coming into your design must have the same name as the one in your XDC file. I'd also recommend integer clock division with a clock-enable signal, but that's what the link above describes. Dan
  21. @Jess, You don't need to create the "code for the clock" in hardware. It's an external clock. It already exists coming into your design. You just need to bring it into your design and use it. Dan
  22. @josejose My suggestion would be to 3D print something. That said, my 3D printer doesn't have a bed big enough to print such a case. Perhaps 3D printing parts and pieces that could then be caulked together? @zygot's comment is well taken though: there be dragons here. (Not that I wouldn't want to do it myself, but ...) How will you handle the garden sprinkler system so the electronics don't get wet? Misty mornings? The greenhouse effect? Temperature extremes? These are all things that you'll need to think about, but not things that I have the experience or training to guide you through. Dan
  23. @josejose, The easy answer is to place a WiFi router in your garden. Connect the Arty to the router via a network cable, and then connect to that router from wherever--to include from your cell phone. You may need to guarantee a fixed IP address, depending on how deep you want or need to go into the DHCP stack. Dan
  24. @DanK, When I was in a similar situation, I opened the .prj files and found them to be (fairly) readable XML. From there I was able to figure out how to configure the MIG. Dan
  25. D@n

    Verilog Simulator

    @xc6lx45, This is a valid question, and a common response I get when recommending Verilator. Let's examine a couple of points here. Verilog is a very large language, consisting of both synthesizable and non-synthesizable subsets. I've seen more than one student get these two subsets mixed up, using constructs like "always @* clk <= #4 !clk;" and struggling to figure out why their design either doesn't work or fails to synthesize. I've seen a lot of student/beginners try to use these non-synthesizable constructs to generate "programs" rather than "designs". Things like "if (reset) for(k=0; k<MEMSIZE; k=k+1) mem[k] = 0;", or "always @(*) sum = 0; @(posedge clk) if (A[0]) sum = B; @(posedge clk) if (A[1]) sum = sum + (B<<1)", etc. Since verilator doesn't support #delay's, nor does it support 'x values, in many ways it does a better job matching what the synthesizer and the hardware will do together, leaving less room for confusion. C++ Verilator based wrappers can be used just as easily as Verilog for bench testing components. That said, ... The Verilog simulation language is a fairly poor scripting language for finding bugs in a module when compared to formal methods. There's been more than once that I've been deceived into thinking my design works, only to find a couple cases (or twenty) once I get to hardware where it didn't work. Indeed, both Xilinx and Intel messed up their AXI demonstration designs--designs that passed simulation but not a formal verification check. As a result, many individuals have posted unsolved bugs on the forums, complained about design quality, etc. (Xilinx has been deleting posts that aren't flattering to their methodology. I'm not yet sure about Intel in this regard) Formal methods tend not to have this problem. Why waste a student's time teaching a broken design methodology? So, if you aren't using Verilog for your bench test, then what other simulation based testing do you need? Integration testing where all the modules come together to interact with the hardware in some (potentially) very complex ways. At this point, you need hardware emulation, and Verilator provides a much better environment for integrating C/C++ hardware emulators into your design. My favorite example of this is building a VGA. VGA's are classically debugged using a scope and a probe since the definition of "working" tends to be "what my monitor will accept." The problem with this is that you lose access to all of the internal signals when abandoning your simulation environment. On one project I was working on, this one for the Basys3 where there was a paucity of memory for a video framebuffer, I chose to use the flash and to place prior compressed frames onto the flash. I would then decompress these frames on the fly as they were being displayed. My struggle was then how to debug decompression failures, since I could only "see" them when the design ran from hardware. Verilator fixes this, by allowing you to integrate a display emulator with your design making it easier to find where in the VCD/trace output file the bug lies. Another example would be a flash simulation. Most of my designs include a 16MB flash emulation as part of their simulation. This allows me to debug flash interactions in a way that I doubt you could using iverilog. This allows me to simulate things like reading from flash, erasing and programming flash--even before I ever get to actual hardware, or perhaps after I've taken my design to hardware and then discovered a nasty bug. More than once is the time where I've found a bug after reading through all 16MB of flash memory, or in the middle of programming and something doesn't read back properly. I'm not sure how I would do debug this with iverilog. A third example would be SD-card simulation. I'm currently working with a Nexys Video design with an integrated SD card. It's not a challenge to create a 32GB FAT based image on my hard drive and then serve sectors from it to my running Verilator simulation, but I'm not sure how I would do this from iverilog. So far in this project, I've been able to demonstrate an ability to read a file from the SD card--FAT system and all, and my next step will be writing data files to it via the FATFS library. I find this to be an important simulation requirement, something provided by Verilator and quite valuable. Finally, I tend to interact with many of my designs over the serial port. I find it valuable to interact with the simulation in (roughly) the same way as with hardware, and so I use a program to forward the serial port over a TCP/IP link. I can do the same from Verilator (try that with iverilog), and so all of the programs that interact with my designs can do so in the same fashion regardless of whether the design is running in simulation or in hardware. Yes, there are downsides to using Verilator. It doesn't support non-synthesizable parts of the language. This is the price you pay for getting access to the fastest simulator on the market--even beating out the various commercial simulators out there. Verilator is an open source simulator, and so it doesn't have the encryption keys necessary to run encrypted designs--such as the Vivado's FFT or even the FIFO generator that's a core component of their S2MM, MM2S, and their interconnect ... and probably quite a few other components as well. This is one of the reasons why I've written alternative, open source designs to many of these common components. [FFT, S2MM, MM2S, AXI interconect, etc.] As to which components are "better", it's a mixed bag--but that's another longer story for another day. Verilator does not support sub-clock timing simulations, although it can support multi-clock simulations. At the same time, most students don't need to know the details of sub-clock timing in their first course. (I'm not referring to clock-domain crossing issues here, since those are rarely simulated properly anyway.) Still, I find Verilator to be quite a valuable choice and one I highly recommend learning early on in the learning process. This is the reason why my beginners Verilog tutorial centers around using both Verilator and SymbiYosys. Dan