Cmod A7 SRAM project INSIDE


Recommended Posts

Hi guys,

Some time ago, I asked here for a SRAM controller for the external SRAM of the Cmod A7 FPGA board.

I've decided to make it myself. I've made a very basic and simple project. Next month, I will publish a much more precise and aggressive controller.

I've documented everything and made a full project to test the SRAM in a friendly way.

All the details (video, code, etc) here: https://www.hackster.io/salvador-canas/a-practical-introduction-to-sram-memories-using-an-fpga-i-3f3992

 

Link to post
Share on other sites
19 hours ago, Juan said:

Some time ago, I asked here for a SRAM controller for the external SRAM of the Cmod A7 FPGA board.

I've decided to make it myself.

Somewhere between those statements you forgot to mention my reply to your request pointing to code examples already posted here. Perhaps you didn't see it. Regardless, I'm happy that you made you own design. I think that the spirit of the Project Vault is to provide a bit of guidance and encouragement to beginners by posting code samples here. I'd hate to see it be over-run with click-bait, even if it's as innocent as fishing for for-profit views. I don't decide but I certainly can post my viewpoint and decide whether or not this is a forum that I will want to use in the future.

Edited by zygot
Link to post
Share on other sites

Hi,

generally I appreciate enthusiasm. But the amount of text you have written, including transistor-level SRAM cell schematics, seems grossly disproportionate to the actual length of Verilog code :)

Seriously, you have not implemented a "controller". Not in any conventional meaning of the word.
Now that's not very diplomatic of me. Still, I'm writing this in the spirit of constructive feedback. Harsh words, but then one of the worst things is people telling "you're doing great" when you are not.

How should I start... Think of writing a "controller" as composing a symphony. It may be a long one or a short one, loud or quiet, a good one or a bad one but it's definitely not something I'd try in my first week at music school. Why, because it better be perfect. It's supposed to turn a complex device into a simple one, anticipate all the problems and make sure they don't happen.

Example for "problems": your module implements combinational routing of address, outbound data and write-enable signal directly to the chip. The chip requires (datasheet, drawing page 14 and table page 12) that address and data are valid at least up to the deassertion of enable, or longer (very strictly, this is stated as t_ha/t_hd = 0 ns). For your "controller", the output timing is totally at the mercy of the driving signals.

And it even gets worse: For the FPGA design tool "synchronous logic" world, only the level at an associated clock edge matters. What the signal does in-between is don't-care. Not being aware of this is a fundamental misconception of modern "synchronous logic", and suddenly straightforward and systematic engineering becomes black arts. Here is the long story.
In your design, it means for example the write-enable signal may go "blip!" between clock cycles, courtesy of the outside circuitry. And this is totally OK for the design tools. But not so for the SRAM, which will randomly write garbage.

Edited by xc6lx45
Link to post
Share on other sites
On 10/30/2018 at 9:35 PM, xc6lx45 said:

Hi,

generally I appreciate enthusiasm. But the amount of text you have written, including transistor-level SRAM cell schematics, seems grossly disproportionate to the actual length of Verilog code :)

Seriously, you have not implemented a "controller". Not in any conventional meaning of the word.
Now that's not very diplomatic of me. Still, I'm writing this in the spirit of constructive feedback. Harsh words, but then one of the worst things is people telling "you're doing great" when you are not.

How should I start... Think of writing a "controller" as composing a symphony. It may be a long one or a short one, loud or quiet, a good one or a bad one but it's definitely not something I'd try in my first week at music school. Why, because it better be perfect. It's supposed to turn a complex device into a simple one, anticipate all the problems and make sure they don't happen.

Example for "problems": your module implements combinational routing of address, outbound data and write-enable signal directly to the chip. The chip requires (datasheet, drawing page 14 and table page 12) that address and data are valid at least up to the deassertion of enable, or longer (very strictly, this is stated as t_ha/t_hd = 0 ns). For your "controller", the output timing is totally at the mercy of the driving signals.

And it even gets worse: For the FPGA design tool "synchronous logic" world, only the level at an associated clock edge matters. What the signal does in-between is don't-care. Not being aware of this is a fundamental misconception of modern "synchronous logic", and suddenly straightforward and systematic engineering becomes black arts. Here is the long story.
In your design, it means for example the write-enable signal may go "blip!" between clock cycles, courtesy of the outside circuitry. And this is totally OK for the design tools. But not so for the SRAM, which will randomly write garbage.

You're right. Thank you very much for your comments. I appreciate them very much.

I've decided to make a more formal controller. Please take a look and share with me your comments: https://github.com/salcanmor/SRAM-tester-for-Cmod-A7-35T/blob/master/basic controller v2/sram_ctrl3.v

I think this new version is a good starting point, isn't it? and now I'm going to work in making this controller more aggresive to be mucher closer to the timing parameters of the datasheet. Probably I will use the cloking wizard IP, to create a 100MHz, since I cannot be aggresive with a 12MHz clock.

Finally, I will use the oscilloscope to demostrate how fast my controller is.

Link to post
Share on other sites

Hi,

I just had a very quick look. This is maybe more about style and opinions, but is there a reason to use a blocking assignment for the state variable?

To me as a human reader, this code

if(~start_operation)
  state_reg = idle;

suggests that the new value of state_reg would still be important for code further down during the same clock cycle. Functionally, it should be the same, though.

Link to post
Share on other sites

You could move the body of wr0 and rd0 into idle state.

As an optimization, this would reduce the write cycle time from 5 to 4 (+20 % throughput).

But, more importantly: Right now, "busy_signal_output" comes up only two cycles after start_operation. If the surrounding circuitry isn't aware of this, it might check in the next cycle, see no "busy" and assume the operation is complete.

edit: and mixing bitwidths 3, 4 and 5 (4:0=5) in the "localparam" statement seems dangerous.

 

Edited by xc6lx45
Link to post
Share on other sites
  • 2 weeks later...
On 11/7/2018 at 6:59 PM, xc6lx45 said:

You could move the body of wr0 and rd0 into idle state.

As an optimization, this would reduce the write cycle time from 5 to 4 (+20 % throughput).

But, more importantly: Right now, "busy_signal_output" comes up only two cycles after start_operation. If the surrounding circuitry isn't aware of this, it might check in the next cycle, see no "busy" and assume the operation is complete.

edit: and mixing bitwidths 3, 4 and 5 (4:0=5) in the "localparam" statement seems dangerous.

 

I've designed an improved version. It is the most efficient SRAM controller that can be done. It has an FSM with only 3 states. If possible, I would like to hear your opinion.

Thanks in advance.

Link: https://github.com/salcanmor/SRAM-tester-for-Cmod-A7-35T/blob/master/basic controller v2/sram_ctrl5.v

Link to post
Share on other sites

yes, this looks like what I'd expect.

Purely regarding style (opinion!), this

if(~start_operation)
              state_reg <= idle;
else begin

is redundant since state_reg retains its value until overwritten.

Same with this...

if(rw) begin
	address_to_sram_output[18:0]<=address_input[18:0];
...
end else  begin
	address_to_sram_output[18:0]<=address_input[18:0];
...                                                     

which puts the task of removing redundancy on the reader.

Both could be a coding style - spell out everything explicitly - that seems more appropriate for a much larger project where you immediately see the patterns (opinion).

-------------------------------------------------------------------

I'm wondering whether it is actually possible to solve the problem stateless, just using constraints If I specify Vivado a minimum delay for e.g. an enable signal relative to data, it will introduce dummy buffers. Works on a nanosecond time scale, which is exactly what this chip needs.

If anybody knows the answer or has done the exercise, I'm curious.

Link to post
Share on other sites
15 hours ago, xc6lx45 said:

yes, this looks like what I'd expect.

Purely regarding style (opinion!), this

if(~start_operation)
              state_reg <= idle;
else begin

is redundant since state_reg retains its value until overwritten.

Same with this...

if(rw) begin
	address_to_sram_output[18:0]<=address_input[18:0];
...
end else  begin
	address_to_sram_output[18:0]<=address_input[18:0];
...                                                     

which puts the task of removing redundancy on the reader.

Both could be a coding style - spell out everything explicitly - that seems more appropriate for a much larger project where you immediately see the patterns (opinion).

-------------------------------------------------------------------

I'm wondering whether it is actually possible to solve the problem stateless, just using constraints If I specify Vivado a minimum delay for e.g. an enable signal relative to data, it will introduce dummy buffers. Works on a nanosecond time scale, which is exactly what this chip needs.

If anybody knows the answer or has done the exercise, I'm curious.

you mean something like: https://github.com/salcanmor/SRAM-tester-for-Cmod-A7-35T/blob/master/basic controller v2/sram_ctrl6.v

I've checked the resource utilization in both designs (sram_ctrl6.v and sram_ctrl5.v) and it is exactly the same. However, for some reason that I don't know, when I move out address_to_sram_output[18:0]<=address_input[18:0] just like you said, the initial message of my design get spoiled.

This message is showed just after the bitstream is loaded (sram_ctrl6.v).

image.png.ddc8736226cc01ed8e3fb23950ae1e10.png

 

The right message is (sram_ctrl5.v).

image.png.5cd99ac0f3739c0028a2fe6c22574b2f.png

 

Link to post
Share on other sites
On 11/16/2018 at 4:38 PM, Juan said:

It is the most efficient SRAM controller that can be done. It has an FSM with only 3 states.

I always wince when someone declares an effort to be 'unbeatable' in some way. Not sure what you mean by 'most efficient'. Are you thinking.. lines of Verilog? Least number of LUTs or registers used? Smallest implemented area?

What counts in the end is that you can read and write the memory without bus contention and meet all of the timing requirements. The higher you clock your FSM the more granularity you get in timing control. Higher clock rates have there own baggage. Bus contention should be avoided for many reasons and your analysis has to extend through the output drivers on both devices. One measure of efficiency that's interesting would be minimizing the SRAM write/read cycle time that you can achieve with a guarantee of no bus contention; assuming that you need to maximize random read/write accesses rather than block read/write accesses. Get where I'm heading with this? Optimizing one element of performance for one project might not fit the needs of another project. It's hard to see that for something simple like using the CMOD-A7 SRAM.  What you'll eventually find when doing complicated FPGA designs is that modules that behave perfectly as intended in trivial designs often don't when integrated into those complicated ones. I mention it, in the context of maximizing performance,  because sometimes going for self-satisfying design goals can bite you in unpleasant places when you thought that they were done and perfect and need them later on. 

Aside from bragging rights ( some of us don't need to do much to brag ) I can't imagine a project that would work with the 'bestest' anything but not with something 80% 'bestest'. Nonetheless, I appreciate the desire to do things well and an SRAM interface is a good starting project.

Link to post
Share on other sites

BTW, just mentioning this: Since you have the code already on git, it seems odd to maintain multiple files with numbers.

Alternatives:

For example, it enables a one-click "diff" between versions (which, in this example, would be extremely helpful)

I can only encourage people to spend a day or two with git basics. Version management is as critical as coding, if much smaller in scope for the average user.

Link to post
Share on other sites
17 hours ago, xc6lx45 said:

>> However, for some reason that I don't know

I do know :) but you get one more opportunity to find it.
Hint, don't stare at the code for too long or it'll bite you (it's very simple).

  

I'm trying to figure out the reason, but I can't. I've run a functional simulation for both controllers, and the behaviour is the same.

it's not simple for me :( 

I've even tried things like:

if(start_operation)begin
  address_to_sram_output[18:0]<=address_input[18:0];

  if(rw) begin
    state_reg <= rd0;
  end

  else  begin
    register_for_writing_data[7:0]<=data_f2s[7:0];
    state_reg <= wr0;
  end

end

What's the reason for that error?

Link to post
Share on other sites

What I meant is that the old version updates address_input only on start_operation, but the new one updates it continuously.

Thinking about it, I'm not sure if this really is the root cause. The version from your last post would seem the most logical, though (for readability, not functionality).

I'm in a bit of a hurry right now, but I'd check all warnings. Can I explain every one to yourself? Also review the data sheet / timing.

I can only encourage you not to leave any unsolved problems behind. It's annoying and hard work but the potential insights are too valuable

.

 

Link to post
Share on other sites

my theory is the deassertion of WE in the transition of wr0 to idle.

Right now it's uncontrolled in a sense that the address might change nanoseconds before WE returns to non-asserted. Try to add a wr1 state that only changes the enable lines and leaves address / data untouched.

 

 

 

Link to post
Share on other sites

... actually, as it's the deassertion edge of WE that counts for level-driven inputs, I think the write logic needs to be like this:

- cycle 0: set address, data
- cycle 1: hold address, data, assert WE (stable address is necessary to not overwrite other addresses randomly)
- cycle 2: hold address, data, deassert WE (stable address and data is necessary since this ends the write operation)

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now