Xilinx DDR Tutorial Part 3 In the second part of this DDR tutorial we created a simple data logger that recorded the states of 7 IO pins. Upon playing back the recording any changes, up to the limit of the PLAY_FIFO_inst depths, were reported by the USB UART. If you've ever wondered just how long mechanical contacts can flop on and off the demo can measure it for you. As a good data logger application it isn't too useful, primarily due to the slow data rates of the UART HW to PC connectivity. While it demonstrated a few ideas and concepts the performance of the DDR interface wasn't too good. If you implemented the ILA IP in the NexysVideoDdrDemo.vhd entity you probably noticed that for read commands there are about 22 ui_clk periods from assertion of app_en to when app_rd_data_valid is asserted and the MIG UI provides the data that we requested. For the Nexys Video, ui_clk is 100 MHz so this works out to about a 60,000,000 bytes/s data rate for read requests. Since the DDR is capable of a peak data rate of 1600,000,000 bytes/s data rate this is fairly disappointing, to say the least. Feeding an active 1 GbE Ethernet PHY would require 125 million bytes/s to stream data into a transmit packet. And once you've started sending a packet you can't tolerate any interruption in the data flow. This is a common application requirement. In the third installment of the tutorial we'll discuss ways to increase the performance of our MIG UI read/write controller state machine. I'll provide a hardware testbench to help measure that performance as well. In NexysVideoDdrDemo we issued read and write requests to the MIG DDR3 controller one 8-burst block at a time. For reads there is a significant delay from issuing the read command to when data actually appears at the MIG UI interface. So, how do we get better performance if we can't change the delay? The standard way to do this is by using pipelining. If we can issue multiple back-back read commands without a gap, then we can receive more read data without a gap and this improves the overall data throughput. Also, if we can overlap our read and write requests without waiting for all of the data to pass through the MIG DDR PHY, then we can further improve performances. One thing to keep in mind is that for external devices like DDR memory with a bi-directional data buss there is a performance penalty for switching the FPGA and device drivers off and on without causing contention. For DDR memory with an onboard controller this is doubly so. Doing a lot switching between read and write DDR commands with short data requests isn't going to be very efficient. This is a consequence of how the DDR memory works as well as the MIG IP. Unfortunately, the MIG User Guide UG586 isn't very helpful in with regard to pipelining and overlapped controller commands. But the MIG example project does offer some guidance. If we add an ILA to the MIG generated example_top.v module to show the UI signals, then we might glean some insight as to how to run the MIG UI for optimum performance. The example project also provides something that we haven't disussed yet... that is a way to run a simulation in Vivado to examine any signal of interest in the example design. When doing the HDL design flow it's absolutely necessary to create testbenches for every design module or component in your design. Hopefully, you've wondered about this glaring deficiency in the tutorial. The mig_7series_0_ex example project happens to provide a decent testbench to simulate example_top.v and all of its underlying modules. A DDR controller is a good example of an application that can cause headaches for straightforward simulation. The biggest one is that before you can use the MIG DDR controller you have to get past the calibration phase. This can involve simulating milliseconds of actual time resulting in a long wait to start seeing actual DDR read and write operations. If the timescale is very short, and for the MIG testbench it's in femtoseconds, you can generate such a large waveform file that using it in the simulator becomes untenable. In ISE, you could set a parameter to bypass calibration altogether and greatly shorten the simulation time and resulting waveform data file. In Vivado this isn't the case for the MIG IP. But there is a parameter SIM_BYPASS_INIT_CAL = "FAST" to shorten the calibration phase. The simulation is already set up for you in the MIG example project so all you have to do is run it. Be aware that init_calib_complete doesn't get asserted until about 106 us of real time into the simulation, and this can take quite a while. Also, the default waveform doesn't even include any of the UI signals, so you have to add them yourself. Running the simulation is very important to do. Not only does it provide some good Verilog techniques but it show you how to add a device simulation module, in this case a general purpose DDR module ddr2_model.sv. When you HDL application involves an external device, such as DDR memory, then you have to add a way to simulate the behavior of the external device as well as the logic in your code. The MIG example project simulation also points out a frequent problem with simulation. If the testbench and device model doesn't match the logic in the bitstream or the behavior of the external device then your simulation and the hardware won't be in agreement. For a very complicated design such as the MIG example_top.v with lots of modes to choose from aligning the simulation with what's running on your hardware can be difficult. Even trying to figure out what the code is doing can be difficult to trace when there are a lot of heirarchical parameters to trace through. The MIG code has a lot of parameters being passed through the code heirarchy. Sometimes the quickest way to see what's going on in the hardware is to use the Vivado Debug IP like the ILA and VIO. You definately want to add an ILA to example_top that provides access to the UI if you are looking for clues to using the MIG UI efficiently. I'll point out the the logic that's generated has a lot to do with the options that were selected creating the MIG using the IP Wizard. When we selected the read burts type as 'Sequential' this had a profound effect on the design as well as the amount of resources that were used to create the DDR3 interface. This also has an effect on the performance. We also set the number of bank machines to 4, which was the default. The DDR3 device on the Nexys Video has 8 banks so this is another possible choice. A countervailing effect of selecting IP options that might provide better performance for some applications is that more logic can make timing closure more difficult. I'm going to end the discussion of part 3 of the tutorial wihout providing the final answers because it's vital that you work this out for yourself. I've certainly provided enough information and hints to get you to where you want to be for you own project. If you thought that all you had to do to use DDR in your clever design project without first doing some preparation then this tutorial should have changed your opinion. I've prepared a hardware testbench of sorts to serve as a platform for you to see if you can improve the DDR data throughput over the baseline that was in NexysVideoDdrDemo; that is doing single burst read and write operations. NexysVideoDdrTest.vhd is this platform. It's simpler than the NexysVideoDdrDemo bceause we only use the ui_clk domain and no FIFO elastic storage. The design is similar to NexysVideoDdrDemo and even uses the same UART code to report statistics. You will want to create a new project. Of course, as before you will need to alter NexysVideoDdrTest.vhd to work with your board, but you've already done that so this should go pretty smoothly. After creating a project, and setting the projct to VHDL and your board or FPGA device, you can add NexysVideoDdrTest.vhd, UART_DEBUGGER2.vhd, and YASUTX.vhd as source files. You can also add the constraints file that you made for the demo project; in my case this is NexysVideoDdrDemo.xdc. If you remember from the demo project, we had to re-create our MIG IP from scratch because we wanted to modify it. For the NexysVideoDdrTest project we can simply add ..\NexysVideoDdrDemo\NexysVideoDdrDemo.srcs\sources_1\ip\mig_7series_0\mig_7series_0.xci to our test project because we are going to use it as is. Don't copy that file into your test project, just reference and re-generate it in the test project. Once all of the source files are in your project you will need to create an MMCM and ILA IP that matches what's in NexysVideoDdrTest.vhd. Then you must create a FIFO for the REPORT_FIFO_inst component. This is just as you've done for the demo project. Building the bitstream should be straightforward. Once you have a bitstream configure your FPGA from Vivado hardware Manager. Connect your board UART to a terminal application like Putty. The test is pretty simple. It uses the same one burst at a time read and write, but this time at the maximum UI ui_clk rate and UI data word widths. The TEST_SM state machine runs a simple test. It writes TEST_WORDS to the DDR, then read them back. The DATA_CHECK_PROC process checks for errors. There are states in TEST_SM that send out 5 32-bit words to the UART. The first word is the number of ui_clk periods that it takes to write all of the bytes to the DDR. The second word is the number of UI write data words that were written to the DDR. The third word is the number of ui_clk periods that it takes to read all of the bytes from the DDR. The fourth word is the is the number of ui_clk periods that it takes to read all of the bytes from the DDR. The fifth value is the final error count. TEST_WORDS is set to 65536 by default. For my Nexys Video MIG design this works out to be 16x65536 = 1 MB. Of course you can change this to anything that you like. To run a test, simply press the hard reset button and then the button referenced in the TEST_SM. Here is one actual session output from my board ( with annotations added for clarity ): 00052981 = 338305 --> wtimer --> 0.00338305 seconds 00010000 = 65536 --> wcount --> 1048576 bytes --> 309,949,897 bytes/s to write 1 MB 001A6DC7 = 1732039 --> rtimer --> 0.01732039 seconds 00010000 = 65536 --> rcount --> 1048576 bytes --> 60,539,976 bytes/s to read 1 MB 00000000 = 0 --> So this is the baseline. See how close you can come to the peak data rate for your board and MIG design. For mine, this is 1600 million bytes/s. You have all of the tools and information that you need. A nice goal for reading DDR data might be 75% of the peak rate.