• 0
jago

Nexys4DDR QSPI max Clock Frequency

Question

Hi,
I'm trying to read the Configuration Flash of the Nexys4DDR. I need to achieve a relatively high speed. 

Here is a short summary of what I'm trying to do:
My design will be controlled by an external master and there is no way to delay the masters request. The start address is latched first. After that I have about 1 us until the first read request will be applied. The subsequent reads will occur in a burst with a read cycle time of about 350 ns. Each read must deliver 16 bits of data to the master.

I've been thinking about the QSPI-Flash as some kind of boot rom. And now I'm trying out if this is possible. With some combination of a high Frequency, the DDR and Quad I/O feature of the S25FL128S this could be done I believe.

For the first step I got the SPI-Interface itself working using the Digilent SPI_If from the Nexys4DdrUserDemo. The SPI clock is output using the STARTUPE2. I could already read the device ID and some data successfully at 25 MHz. But at 50 MHz I'm reading garbage.

Then I tested the maximum Configuration Rate (4 bit width) to find out if it is only a problem of my design. The Artix7 should be able to output a 100 MHz clock on the CCLK-Pin (FMCCK). The QSPI Flash should handle 133 MHz. But for me the maximum Configuration Rate is 40. Setting the CR to 50 will cause the FPGA to never load from SPI. Also when I attach my oscilloscope to the clock pin, the configuration at CR40 fails. 

So, my questions are:
- What is (or should be) the maximum clock frequency of the Nexys4DDR QSPI design? 
- Is the FMCCK = 100 MHz only valid for configuration or is this also the maximum clock for the user design?
- Do I have to constraint some attributes of the QSPI I/Os to achive a high clock?
- Can this be done with "normal" logic or do I have to deal with something like SERDES?

I'm using Vivado 2015.4 and have applied the contraints for "Clock signal" and "Quad SPI Flash" from the Nexys4DDR_Master.xdc from Digilent.

Regards,
Jago

Share this post


Link to post
Share on other sites

8 answers to this question

Recommended Posts

  • 1

 OK, but the SPI_If stops clocking between two consecutive transfers. I think the communication could be messed up at this point. Or is there a way to disable the clock while the SPI_If is not active without adding another delay to the clock path?

My thinking was that you would keep SCLK freerunning and control the writes with the CS pin. But I wouldn't bother with this test anymore given my suggestions below.

 I will also have a look at this. Perhaps it's better to build a new component from scratch than to modify the SPI_If.

My thinking here has changed a bit, and now I think a logic driven SCLK would be suitable (like used in SPI_if). The answer record you pointed to certainly applies to your situation whether you use the Quad SPI core or make your own. With that in mind, I think it would be smarter to just run the clock slower for your application and use Quad mode reads. You should still be fine at 25 MHz given :

 7.5ns STARTUPE2 delay + 14.5 ns Quad SPI data valid delay + ~1 ns trace delay = 23 ns

This leaves 17 ns (25MHz = 40ns) of slack for the propagation time of SCLK to the STARTUPE2 primitive and the setup time of the four data in pins. Provided you don't put any logic in-between the register that generates SCLK and the STARTUPE2 primitive, you should be fine without specifying any timing constraints. The Data in pins should already infer a 10ns max setup time (assuming you clock your design at 100MHz), and routing delays between a register and the STARTUPE2 primitive shouldn't exceed 7ns.

Given this approach, it would probably be easiest to ensure your 1us latency by doing this design as pure HDL. I think you could make it work as a microblaze design too, with a custom AXI slave for the parallel bus and the Quad SPI core, but (as you guessed) the latency restriction would be tight. Creating a master parallel core would also work, but would be much more complex than just doing the whole thing in HDL.

IMO the easiest route would be to modify the SPI_if to work with both Quad mode and single SPI mode (I think you may need to initialize the Quad SPI flash in single SPI mode in order to get it into quad mode).

Share this post


Link to post
Share on other sites
  • 0

Thanks for the detailed question, you seemed to anticipate what most of my rebound questions would have been ...

If I'm understanding correctly, your design needs to read from the Quad SPI flash with ~5.71 MB/s of bandwidth and <1 us initial latency. The bandwidth should definitely be doable, but I didn't look into the initial latency for initiating a read. I'm pretty sure it should be fine though.

Here's the answers to your questions:

1) For post-configuration accesses, the maximum CCLK clock rate is 100 MHz (as determined by FMCCK). For configuration, this number is actually much harder to characterize because the internal oscillator of the FPGA has a whopping 50% frequency tolerance. This means that if you set configrate to 50 MHz, the actual CCLK value can be anywhere from 25MHz to 75MHz. On top of that the DIN pins have a setup time of 3 ns. Considering the 14.5 ns maximum clock-to-datavalid parameter of the spi flash, its not surprising that the DIN setup requirement would possibly not be met at frequencies greater than 50MHz. So to be safe I tend to not set ConfigRate to higher than 25 MHz, unless an application absolutely demands it.  

2) I interpret the datasheet to suggest that CCLK max is always 100 MHz.

3) This is the correct way to make sure the SCLK and data pins are properly timed, see my notes below.

4) You should be able to do this with normal logic.

My bet is that your design (that uses SPI_if) is having alignment issues with the clock and incoming data. An easy way to test this is to generate the 50 MHz SCLK separately from the SPI_if component using a PLL or MMCM (you can use the clocking wizard IP core for this). The input clock into the clocking wizard will need to be the onboard 100MHz oscillator, and you should output both a 50MHz clock (for SCLK) and a 100MHz clock (to clock the rest of your design). This will give a known phase relationship between the 50MHz SCLK and the system clock. You should connect the 50MHz clock directly to the USRCLK signal of the STARTUPE2 primitive, instead of using SCLK from the SPI_if component. Keep the rest of your design the same and test it to see if it works. If it doesn't, then adjust the phase of the 50MHz clock by about 45 degrees or so at the clocking wizard, rebuild, and try again. Eventually your design should work. 

This is not a good solution long term, because the amount of phase shift you need may change every time you modify your design. Also, temperature can affect propagation times, so your design may not work after the device warms up. The correct solution is to add timing constraints to your data pins and SCLK that properly take into account the trace delays (these should be negligible though), setup/hold requirements of the Quad SPI flash, and output valid/hold specs of the Quad SPI flash. Then the Xilinx tools will try hard to meet these timing requirements, and throw a critical warning if they are not met. 

Another note, I don't think the SPI_if component is designed correctly to work at 50MHz. It generates SCLK in logic, but I believe you can achieve faster speeds if you actually clocked the component off of SCLK, and used an IDDR primitive to capture data on MISO and ODDR primitive to output data on MOSI. Note this adds design complexity because you will need to cross clock domains into the SCLK domain from the system clock domain. 

...Or you can just implement your design as a microblaze system and use Xilinx's Quad SPI IP Core. That's what I would do :)

Share this post


Link to post
Share on other sites
  • 0

Hi sbobrowicz,

thanks for your response. Unfortunately I didn't have the time to test some of your suggestions. Hopefully I have some time over the christmas holidays.

If I'm understanding correctly, your design needs to read from the Quad SPI flash with ~5.71 MB/s of bandwidth and <1 us initial latency.

This is correct.

You should connect the 50MHz clock directly to the USRCLK signal of the STARTUPE2 primitive

 OK, but the SPI_If stops clocking between two consecutive transfers. I think the communication could be messed up at this point. Or is there a way to disable the clock while the SPI_If is not active without adding another delay to the clock path?

The correct solution is to add timing constraints to your data pins and SCLK

I will have a look on how to add these constraints. I never had to use this. But I think it's time to learn how to deal with this important feature.

and used an IDDR primitive to capture data on MISO and ODDR primitive to output data on MOSI

 I will also have a look at this. Perhaps it's better to build a new component from scratch than to modify the SPI_If.

Or you can just implement your design as a microblaze system and use Xilinx's Quad SPI IP Core.

 So far I didn't consider this as I thought this would add too much delay. Maybe I'm just too inexperienced in this technology. It would be great to use an existing and tested ip core. I'm thinking about how this could be designed and connected to the microblaze system to meet the access timings. Do you have some ideas what could be a good approach? Maybe I could make an AXI slave component that captures the latched address and sends an interrupt to the microblaze processor. Than the processsor reads the data at this address using the Xilinx's Quad SPI IP Core and sends it back to my component. My component could hold the data in a FIFO until the external device reads it. But (without having done some calculations) I have some doubts that this would be fast enought. The microblaze interrupt latency and the AXI bus will produce additional delay. I will play a bit with this as soon as I have some time. Or do you think it would be better to learn how to design an AXI master? I think this is much harder to implement but as a master my component could directly request the data from the SPI Core. At the moment I really don't know how complicated it is to design a AXI master. Some months ago I did the "Getting Started with MicroBlaze" tutorial and created an AXI slave for the 7 segment display. Using this slave the processor can set the values to be displayed (and their brightness) without having to handle any interrupts. This component was not too hard to implement but I have no other experience with Microblaze and AXI.

Share this post


Link to post
Share on other sites
  • 0

You should still be fine at 25 MHz

 You're right. I tried to be as fast as possible but I should only be as fast as I need to be. I also found out that the chip has a XIP-Mode where the read command has to be sent only once. Once the chip is in this mode the SPI master needs to start with the read address. This saves 8 clock cycles for every address change what equals 320 ns at 25 MHz.

So I will follow your suggestion and implement it in logic starting at 25 MHz.

Share this post


Link to post
Share on other sites
  • 0

In case someone is interested in this: I've successfully increased the configuration speed to 66 MHz. For this I only needed to set the configuration option "Enable the FPGA to use a falling edge clock for SPI data capture" to "YES". Now it's no problem when I put my oscilloscope's probe on the clock line. The messured frequency was about 68 MHz (due to the tolerance).

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now