Jump to content
  • 0

Simple code for DDR3 SDRAM


rappysaha

Question

Hi,

I am very new at field of FPGA. Now I am working Genesys2. I have to control DDR3 memory. I find some examples in Digilent site for DDR3 using microblaze processor. But, in my case I don't have to use microblaze processor. I have to send some fixed value through the DDR3 memory like 8-bit data (X'FF') i.e. I will write that data into the Genesys2 DDR3 memory and readout the data from the memory. I already go through Xilinx manual ug_586 . But still it is not clear to me how to start coding for the DDR3 memory. My questions are:

1) Is it possible to have example code without using microblaze processor for DDR3 memory?

 Or any suggestion for starting code to control DDR3 memory.

Actually, I have do it in any way. So any helpful suggestion will be appreciated.

Thank you.

Link to comment
Share on other sites

12 answers to this question

Recommended Posts

@rappysaha,

Wow, I know what you mean!  Those are some wonderful, good, and hard questions to answer.  I'll tell you what I know:

  1. I am working on a project to handle DDR3 SDRAM memory apart from Xilinx's Memory Interface Generator (MIG).  I've been working on that project for about two months now.  If I had it working, I would commend that approach to you.  (I know I just linked to a project on OpenCores ... and yet OpenCores has been down for a couple of days now.  Kind of sad ... I've got (what I think is a) nice blog describing my efforts to date.  The last news I have is that my "logic" for controlling the DDR3 SDRAM works in my home-made DDR3 simulator, and that (in the simulator) I get about a 9-clock delay from request to completion, where the MIG generated core takes about 23 clocks from request to completion.   (These are 81.25MHz clock ticks, running on an Arty ...)
  2. Just as a little background:
    1. Every DDR3 memory transfer is 128 bits long.  Sure, the memory supports a 64 bit transfer mode, but that transfer mode still takes as many clocks as the 128-bit transfers, so you don't get any advantage there.  (This assumes a 16-bit bus transfer width, such as the Arty has.  Other bus transfer sizes will scale with the memory width.)
    2. Any transaction to/from the memory that is less than 128 bits is a misnomer--every request that crosses the DDR3 memory bus is 128 bits.
    3. It costs several clocks before you can write to the memory.  If you are familiar with SDRAM, there are 8-banks of memory within each chip.  To read/write memory, it must first be copied from the DRAM to an SRAM--this is called activating the bank.  If the bank was already activated with the wrong "row" of memory, then the bank must be closed, or as the spec calls it it must be "precharged" before being activated.  This takes a clock.  Once the bank is activated, you may request to read/write on the memory bus.  The following clock starts a read/write, and the full read/write takes place on the clock following.  In all, a transaction may require one clock to precharge the bank, one clock to activate it on another row, one clock to issue the read/write command, one clock to start the bus going, one clock to transfer the data, and ... Xilinx's MIG stuffs another 20+ transactions to those 5 clocks of bus interactions.  (Keep in mind, the memory clock is going at 4x the speed of the "clock" of your interface.)
    4. The memory is very particular about what clock speeds it can and cannot support.  (This is why my own controller has, to date, been rewritten about five times ...)  The memory clock speed cannot be slower than 3.3ns, and on the Arty it cannot be faster than 3ns (the spec goes much faster ...)  The speed of the controller MIG gives you is likely to be 1/4 this rate.
    5. The MIG can be hard to configure.  The Digilent how to's and device project files should help you do so.
    6. MIG wants to control your clock.   Therefore, source the clock for your whole design, and indeed your reset as well, from the MIG core.  MIG also wants your external clock input as well as a 200MHz clock input.  I find that I need to go through a PLL to generate these two clocks.  They then need to be passed to the core "unbuffered".
  3. When I finally got frustrated with my efforts above, I built a Wishbone to AXI4 bridge.  (I've since moved this project to github from OpenCores, as you may notice from the link address ...)  This project is very similar to a prior wishbone to AXI3 bridge written by another great, with the one exception that it pipelines memory accesses to the extent the MIG and AXI4 allows.  This means that you can issue one read (or write) command per clock, and let the memory deal with things.  Xilinx does require, when sending "pipelined" requests across their AXI4 bus, that your request cannot cross a 4kB boundary.  (You'll have to start a new request at the boundary--this is what the documentation says, I haven't tried it in practice ...)
  4. If you wish to look into the pipelined bridge I mention above ...
    1. You'll need to understand a touch about the pipelined mode within the Wishbone B4 specification.  OpenCores is down, or I'd point you to that spec ... so let me give you a couple details.  To initiate a transaction, raise the CYC and STB lines, while setting the address, write-enable, and data (if it's a write transaction).  The transaction request has been made on the same clock that STB is high and STALL (from the memory peripheral) is low.  Once you've finished requesting the transactions you wish, drop the STB line.  The transaction is complete when the ACK line goes high.  On that clock, if you requested data, the data is returned to you.  You'll then want to drop the CYC line.  (Go ahead and read the AXI4 spec--it's not nearly that simple ...)
    2. The bridge core (above) uses a 6-bit transaction identifier, and a 128-bit transaction width within MIG.  You'll need those numbers as you generate the core.  It also supports natural (rather than strict) ordering.  (If you select strict ordering, the portion of the bridge core that handles it ... isn't quite ready yet.  Use the non-strict bridge option--it'll cost you some extra logic and an extra clock, but it'll work.  If you really want strict ordering, you can help me get the 10 lines of code needed to get the strict ordering code working ...)
    3. If you only wish to write 8-bits, you'll need to still fill out at least 32-bits of a transaction on the bridge core.  You can use the SEL line to select which byte within the 32-bits you give it is actually written.  Alternatively, you can gather your write requests until they fill out a 32-bit word and write them then.  Still, the memory transaction itself is 128-bits, so the 8-bits you write will turn into a 128-bit transaction--even if all you wish to write is 8-bits.

If you haven't figured it out, this solution is not the one on the beaten path.  It works, though, and I'll be happy to discuss it further if you would like.

Dan

Link to comment
Share on other sites

@D@n

Hi D@n,

I tried follow your code but as I am new so, it is not easy for me to follow the whole thing. Besides, I think I only want to use the IP not the AXI4 interface as I am not using any soft core processor for my whole project. DDR3 is a part of my project. I want to control my DDR3 for simple data transfer. So, if you may give any suggestion about how to start with this MIG IP this will be very helpful. Already I go through the user guide and example. I also upload my code here to get my output. So, any suggestion will be very helpful. Thank you.

 

Hi @jpeyron

Actually digilent have some example code for other interface like VGA. And they are very easy to follow. But incase of DDR3 I think the examples are not so clear. Any way followed the site that you referred. But I need more specific if it is possible. Anyway thank you.  

 

MIg write and read.txt

Link to comment
Share on other sites

@rappysaha,

I was afraid that would happen.  Okay, let's work with the user interface.

Do you have Xilinx's ug586, "7 Series FPGAs Memory Inteface Solutions: User Guide"?  I'll admit it's not very comprehensible, but ... it's what we have to work with.

Looking at the guide, and your code, let me offer some pointers:

  1. Be careful of setting your state at the very first line of your process.  This isn't computer code, where the state is set on the first clock and then set to something else.  Still, having said that, it looks like your state machinery would still work for what you want--no corrections are necessary.
  2. Xilinx AXI documentation talks a lot about the ready signal and the enable signal, and concerns particular race conditions.  They recommend setting the enable signal before checking the ready signal, lest some race condition occurrs.  In your code, you wait for the ready signal on the write line before setting the enable signal.
  3. On page 156 of Xilinx's document, they show three potential write timing relationships compared to the command relationship.  I would recommend you use #1 or #2, rather than #3, because of this warning they give.
  4. Be aware of the condition whereby the app_rdy signal is high, but wdf_rdy is not or vice versa.  With your code as written, you might find yourself issuing a whole bunch of commands, but with no data to go with them 'cause their data fifo wasn't ready.
  5. You will need to assert the memory burst "END" command when you hit the last byte in a 32-byte group.  This could be the first data byte you send, if it's address bits end in 2'b11.  The group is not defined by how much data you wish to send in total, but (at least as I understand it) but how much data will cross the interface.
  6. You haven't set the 'mask' bits (app_wdf_mask) anywhere in your code.  You'll probably want to make sure those are explicitly set to zero.  These bits allow write commands to only effect certain bytes in the memory.  If I understand correctly, any bit where the mask is '1' corresponds to a byte that is not written, whereas a mask with a bit corresponding to zero is a byte that is written.

There are just some observations.  They come with no guarantee that, should you follow them, your could would work.  :P

Dan

Link to comment
Share on other sites

Hi @D@n,

Thank you for your helpful suggestion. I really need it. Reading and writing procedure may be large so make it a small part.

At first, I want to see when I am inserting app_cmd, app_addr and app_en there must be an output of app_rdy.  like the following attached figure (3). I attached my code also. I ran behavioral simulation by using force clock option. But I don't get app_rdy high (fig.4). But when I upload the code to the board I see led (2) is turned on as I set a logic in my code like following:

process (ui_clk)
begin
if rising_edge(ui_clk)then          
       app_cmd <= "000";
       app_addr<= app_addr+'1';
       app_en<= '1';  
    if (app_rdy ='1') then
        led<= X"02";
    end if;
end if;
end process;

is there anyway I can check it in real time. If you can provide any material (for real time simulation) it will be very helpful. Any suggestion will be appreciated.

Rappy

 

MIg write and read.txt

3.PNG

4.PNG

Link to comment
Share on other sites

@rappysaha,

A couple things I noticed:

  1. You don't want to increase your address unless the last command was accepted.  (i.e. app_rdy was high)
  2. Even if you don't want to use AXI4, you still might wish to take a look at that wb2axi converter.  One of the things it handles nicely is the various ready lines--both on the command bus, as well as the write bus associated with it.  Look at the o_axi_awvalid, and o_axi_wvalid line as an examples--the first is the command valid, the second is the write bus valid.  As for the core, a new value is accepted by the core anytime (i_wb_stb)&&(!o_wb_stall) is true, so that should explain those lines within that logic.
  3. Be aware that it takes quite a bit of time for the MIG to start up.  I think this is why the MIG likes to output a reset signal--so it can hold your logic in reset until the MIG has started.  (We're talking about a ms here.  First MIG has to hold the DDR reset down for 200us, then it must clock it with the clock enable line held low for 500us, etc.--nothing you need to worry about, save that this must take place.)
  4. If starting the MIG were painful, you also need to be aware that the MIG itself will go out to lunch every now and then (roughly every 7.8us) so that it can refresh its memory.  During that period of time, the memory will be unavailable.  By my calculation, that's about a 57 clock penalty--but who knows how Xilinx actually implemented their MIG?
  5. I cannot comment on how well Xilinx's simulator accurately simulates a DDR3 memory at all.  I just don't know.
  6. While I have code that will simulate a DDR3 SDRAM, it doesn't have the interface you are working with and the interface it does have  ... doesn't yet work on any Xilinx chips.  (sorry--it's just another work in progress)
  7. As for checking in realtime, and on the hardware itself instead of via simulation--this is really what you want to do.  I highly recommend doing this.  All of my projects have included some kind of checking in real time into them, so I can see what is going on within the project.  I've debugged interactions with the ICAPE interface, QSPI flash, DDR3 memory, and now I'm working on the Arty's network card--all using this sort of approach.  I'd highly recommend it to you.  Can I say that again? 
  8. I'm sure others here on this forum can describe to you how to use Xilinx's ChipScope for that purpose.  I personally have never used it.  It might be simpler than what I'm about to discuss and propose to you.
  9. To see what's going on within the hardware, I use what I call a "wishbone scope".  The scope records until a trigger plus a programmable number of clocks.  So, for example, you might want to look for what happens when you assert the enable line and start recording there.  (Set the programmable delay to the size of the scope's memory)  Alternatively, you might wish to stop recording when an error condition takes place (set the delay to zero)--or later to pan through your logic from a start condition to ... however much later.  That said, doing this requires a lot of ... preliminary stuff to work.  You will need a means of communicating with your board and read/writing to the wishbone bus that the scope is parked on.  (It only takes a 1-bit address line, so even if you don't park it on a wishbone bus properly, the interface is fairly simple--but you'll still need to communicate with it from something external.)
  10. You can see some of the projects I have where I do this on GitHub.  There's a project using a XuLA2-LX25 board, one using a CMod-S6, and I'm now working on one using the Arty platform (this one uses the MIG, but via the AXI4 interface).  If you wish to cut/copy/paste, I'll warn you: none of these projects are simple, and the Arty one is still a work in progress.  My basic design works as follows: A host computer communicates via a standard protocol with the board.  The protocol was built so that I might use it no matter what the boards interface, even over PCIe if necessary.  I typically build a basic host program wbregs, just to read and write addresses on the board (like the scope configuration address) from the command line.  When that gets old, I build a C++ file to do what I need--such as reading from the scope.  From the RTL side, check out all the RTL files beginning with wbu--the top one is wbubus.v.  The interface is generic enough to be able to be run from a UART (the Arty), a JTAG/User command (the XuLA2-LX25), or even the Digilent's parallel DEPP interface.  Of course, my fear with even mentioning these is that they could easily overwhelm you like I did earlier in this thread.  (I would be overwhelmed personally ...)  At the same time, copying from such a project might be one way you could get started quickly--so I'll let you be the judge.

Let me know if this helps, and we can go from there,

Dan

Link to comment
Share on other sites

hello @rappysaha and @D@n

is there any progress in this question. 

i'm also working with MIG7 to communicate with ddr3, while doing so im able to get init_calib_complete signal high and all other signals correctly also but not able to write and read the required data.

i simply instantiate the mig ip in a vivado testbench example.

@rappysaha where you write this above mention code (in testbench?) and is it work(you received the data correctly from the addr)

 and @D@n will you please elaborate this 

"Be aware that it takes quite a bit of time for the MIG to start up.  I think this is why the MIG likes to output a reset signal--so it can hold your logic in reset until the MIG has started.  (We're talking about a ms here.  First MIG has to hold the DDR reset down for 200us, then it must clock it with the clock enable line held low for 500us, etc.--nothing you need to worry about, save that this must take place.)"

thankyou

 

Link to comment
Share on other sites

The Spartan 6 family had hard external memory controllers in most of the devices but you could also do a soft-controller in logic. For Series7 devices there is a loud absence of any mention of such a thing. Curiously, all Series7 devices have at least 1 PCIe hard controller, except Spartan7. There aren't many vendors offering cheap Artix boards that let you use the PCIe, or transceivers for that matter. I guess that Xilinx decided that the Series7 devices were so good that there was no need for a hard external memory controller. That's too bad as they also push the idea that every design has to have a MicroBlaze, which of course uses the ubiquitous DDR memory.

You shouldn't have a soft-processor in your design just to use the DDR memory on your board. Unfortunately, there aren't many FPGA development board vendors that want to help you do proper HDL designs, particularly ones that use external memory.  The two exceptions that I know of are Numato Labs and Opal Kelly. My opinion is that it's worth the cost of admission to but a low cost FPGA board from a vendor that wants to make using the functionality that you've paid for as easy as possible.

I do suggest that you use the Mig IP as a basic External Memory controller and add whatever is needed to complete your design. What will be needed is a few state machines to do basic read and write operations to the controller. This is true regardless of the FPGA vendor external memory controller or their IP. You don't want to use AXI for pure HDL designs ever... unless you are connecting to an AXI bus as would be the case for ZYNQ PS/PL designs. You do want to implement the Mig using Verilog. I've done this for numerous boards and simply instantiate the Verilog modules into my VHDL design. You can't simulate those so you'll want to confirm the DDR functionality using all Verilog so that it can be simulated. The Mig IP has parameters to short-circuit the calibration phase to make the simulation length tolerable. The design that links your FPGA HDL application to the external memory controller is very much tied to the needs of your HDL application.

This is a 5 year old thread. I find it irksome that anyone would post exhaustive answers to such a post that mostly talks about all of the stuff they've done that has nothing to do with an real Verilog DDR design that works on a particular board. Complaining about latencies isn't very interesting when all that you've done with (supposedly) better performance in that area doesn't actually work. I'd much rather read over a posted design with source in the Project Vault than read about ideas that might or might not work. But if what you are selling is a blog then that's another matter.

For those wanting to learn how to use external memory I've had experience with the Mimas-A7 from Numato Labs and a few of the Opal Kelly boards. These vendors are more willing to provide demo source code to make their board offerings more usable. Once, you see how a demo project works you can go on to doing your own custom DDR designs. For Intel FPGA boards Terasic provides enough information to figure out how to tie the Qsys memory controllers to logic designs. The older Cyclone V parts have hard external memory controllers. If your device has one of these you should use it, not only to free up resource and routing paths but for the best performance.

Selecting a board and then trying to make it do what you want to do for a particular project is tough sledding. The same goes for selecting  vendor. Do some homework to see what kind of support they offer before making a purchase. Unfortunately, it's been my experience that hoping for any particular board to be a good platform for some future, as yet un-imagined project, to go well is a rare thing indeed. One nice thing about the typical Digilent FPGA board is that there are numerous external devices like Ethernet and video that make them better general purpose platforms, or for leaning projects. I'd be happier if they didn't tow the MicroBlaze connectivity line favored by Xilinx and provided good solid HDL support for their boards, but... I'm not holding my breath.

Link to comment
Share on other sites

@lukum,

I'm going to agree with @zygot here, "This is a 5 year old thread."  It would make more sense to start new questions in a new thread.

The statement you are asking about above follows directly from the DDR3 specification from JEDEC.  Feel free to look it up if you have more questions.

Dan

Link to comment
Share on other sites

Perhaps I'm missing something here. The thread is asking the question "Why is it so hard to do a design using the DDR memory on my board if I don't want to burden it with using resources for unnecessary things like a soft-processor IP?". This isn't by far an unusual question posted to this site. Asking the same question in a different thread just confuses those who are asking the same question and trying to find answers. As to the why, I have opinions. As to the how I've provided a reasonable, if not quite satisfying, answer.

I believe that some FPGA board vendors don't have financial incentives, like preferential pricing for components, so they are free to offer 'better' support. That opinion isn't based on any inside information... just decades of dealing with large semiconductor companies through the prism of a wide range of companies and organizations.

Link to comment
Share on other sites

@zygot,

You and I are reading this question very differently.  It started out as, can I get a design that uses DDR3 without microblaze.  I think that question has already been answered.  The new question is how to get a design containing a MIG based DDR3 controller to operate in a test bench.  That's really a separate question.

Dan

Link to comment
Share on other sites

2 hours ago, D@n said:

You and I are reading this question very differently. 

Yup. I'm not sure anyone simply wants to go to the bother of simulating a DDR connected interface without implementing it on hardware. As far as simulation goes, I've done this for the ATLYS using a Spartan 6 and a Bus Functional Model for the memory device that, as I recall, came from the memory vendor. As to finding  BFM for any old DDR3 device you might be stuck with I haven't found this to be a sure thing. Perhaps, someone will chime in and clear that up.

I don't see simulation as separate from design,  but one phase of the design. More to the point verification is part of the design process as I see it.

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...