Jump to content
  • 0

Total Negative Slack Violation Cmod A7 35T


smarano

Question

I have a CMOD A7 35T.

I have created a design with Microblazed, two custom ip core, two UART, one SPI FLASH, an SRAM controller and a GPIO controller. 
I use two different clock, a 96 MHz clock for MB and a 128 MHz clock for IP Core and UART. With this settings it work perfectly, but i want to increase its performances. 
My idea is to use only one clock, a 128 MHz for all components. But with this settings i have a timing violation. There is something that i can do to solve this problem?
WNS and TNS are negative, there are some Shyntesys or Implementations strategies that can help me?

regards Stefano 

Link to comment
Share on other sites

4 answers to this question

Recommended Posts

Welcome to some of my nightmares!  :D

Getting timing closure can be a challenge.  I gather it may be an even greater challenge if you are using someone else's IP (i.e. microBlaze).  As I tend to do my work in Verilog, it may not apply to the GUI approaches.  Still, here's what I do:

Read Xilinx's "HDL Coding Practices to Accelerate Design Performance."

Reduce logic--reduce, reduce, and reduce some more!  It helps if all your logic is captured in clocked register blocks, although it's not required.  In the clocked register blocks (think always @ blocks in Verilog),  at least you can see the logic that must get calculated between clocks.  For the fastest designs, try to make it so your logic depends on no more than 6 inputs.  I often try to keep them less then 8, but ... only as often as it's possible.  My worst ones depend upon up to 32 inputs.  Try to keep these high-input registers simple and few--don't use them in case statements or long if-then-else blocks. 

Indeed, wherever and whenever possible, get rid of long if-then-else blocks.  These can create logic you don't want or need by only doing this when all the prior things aren't true.

Separate variables out.  If one variable needs a block such as if (A) then ... else if (B) then ... else if (C), (perhaps (A) is a reset) and another variable is a don't care on the reset condition, remove the if(A) section for that variable.

Always set variables whenever possible--don't rely on their prior value. Why?  Well, it just helps to get rid of one more input and hence reduce the logic.  If you can't do this, no big deal, just be sure you count the prior variable in your variable count which you use to know how difficult the logic is.

Use block RAM wherever possible--it's faster and cheaper than distributed RAM.  Consult the guide (above) to see how to make certain your design is using block RAM. 

You may need to put pipeline delays on your bus structures too ... especially since the wide fanout associated with buses can make them timing nightmares.  (Xillinx's 5-bus AXI bus structure is ... an even worse timing nightmare.)

Now, let's move to CPUs: I'm still working on a task of getting a CPU running faster than 100MHz myself ;).  In this case, I click on the Implementation timing report, and look at the list of logic networks that don't meet timing (follow the red text to get there).  I then take those networks, and change/adjust their logic to get them to meet timing.  My preference is to move logic across the clock--make something happen one clock earlier, or one clock later.  If you can do that, you win with little consequence.  If not, I need to pipeline my logic by adjusting it so it takes two clocks instead of one.  If you do it right, you can process two things in that fashion (such as one instruction on the first clock, one on the next), etc.  For example, a five stage CPU pipeline may need to be extended to 9 stages.  This has its ups and downs: if you can keep the pipeline full, you've succeeded at running at a higher clock speed.  If you can't keep it full, every stall and pipeline flush will slow your pipeline down.

Now, let's come to other people's IP, which I think is what you are working/struggling with, right?  In that case, you may be just out of luck.  You might also be able to reconfigure their IP for speed instead of area or some such.  I know MB has such options, I've just never used it to know where they are.

There are some alternative CPU's to MB that I know can run faster.  Your mileage with them might vary, but feel free to look them up.  "picorv", as I recall, runs off a 200MHz clock, although each instruction takes 3 clocks to calculate ...

Hope this helps,

Dan

Link to comment
Share on other sites

I've also been dealing with this extensively today, particularly in clock domain crossing (CDC) work that I've been up to.  (I've been trying to get 25MHz code associated with the Arty ethernet to talk to the rest of my logic at 81.25MHz ...)  If your timing violations are in clock domain crossings, you'll want to:

1. Mark the receiving register as an ASYNC_REG (i.e. place (* ASYNC_REG="TRUE" *) before the "reg" keyword, assuming you are using verilog)  This will keep the simulations from failing, by causing them to use the previous value if the value is ever changing and uncertain.  Without this, your simulations values may quickly all become 'x', or unknown.

2. Include set_max_delay elements into your .xdc file, one line per clock transition variable pair, otherwise Vivado will fail just about every variable associated with a clock transition.  This line should look something like ...

set_max_delay -datapath_only -from [get_cells -hier -filter {NAME=~ *module_name/var_name*}] -to [get_cells -hier -filter {NAME=~ *module/r_name*}] <fastest clock period>

Dan

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...