• 0
Notarobot

What is the fastest way to save PL data

Question

Question to experts:

What is the fastest way for saving continuous data coming from PL on Zynq without requiring the processor you would recommend?

The data rate is expected to vary in the range 4-8 MB/s. Preferred processor operational mode is standalone. Options considered so far are BRAM, OCM and DDR3. All of these options seems require custom HDL coding to interface Zynq memory. Before comitting to such effort I'd like to hear opinions from the community. 

Thank you!

 

Share this post


Link to post
Share on other sites

8 answers to this question

Recommended Posts

  • 1

The trick is your code does not need to infer a block memory generator. It will actually need to explicitly implement the block memory generator INTERFACE. This is because the block memory generator is already being instantiated in the block diagram.

You will need to design a state machine in VHDL that properly implements the interface. For a description of the signals (en, we, addr, etc.) you should refer to the block memory generator Product Guide. You can find the guide by double clicking the block memory generator IP and selecting Documentation in the upper left corner.

The end goal will be to create a custom IP core that contains this custom VHDL. Since you do not have an AXI interface on your core, this should be pretty easy. I believe you can just create a new project that targets the ZYBO and has its top level ports be the desired ports on the IP block. Then I think you can run the Create and Package IP wizard from the tools menu to convert the project to an IP core so it can be inserted into you block diagram (which will be in a different vivado project). I'd recommend simulating your project before you convert it to an IP core to help make sure it is functioning as expected.

BTW, you can just expand the BRAM_PORTB interface on the block memory generator IP core and manually connect each of the signals to your IP core if you have difficulty making you custom IP implement the BRAM interface.

See the picture below for an example of what your end goal will be:

 

zynq_bram2.png

Share this post


Link to post
Share on other sites
  • 0

Hi @Notarobot,

I have reached out to other forum members about your question. While we wait for more experience forum members to responsed Here is the Vivado AXI Reference pdf and on page 68 is a table called Data Movement Method Comparison Summary that might be useful for determining speed and complexity of task.

cheers,

Jon

Share this post


Link to post
Share on other sites
  • 0

I would use an axi_dma core, connected to your custom HDL via an AXI stream bus, and connected to the DDR3 via one of the Zynq Processing System's AXI HP slave ports. The processor will need to setup the DMA transfers, but Xilinx has demo code for doing this standalone. The AXI stream bus is a pretty easy to understand FIFO-like interface, which should make integrating it into your design pretty straightforward. This method will give you much more bandwidth than you need, and access to all 512 MB of memory. 

This is basically how we stream video data in our HDMI demos for the ZYBO, except that we use an axi_vdma core instead. They are very similar though.

Share this post


Link to post
Share on other sites
  • 0

@jpeyron and @sbobrowicz

Thank you for your recommendations. I was hoping for support of the idea to write PL data to BRAM and then let the processor read it and process.

I have done various demos with AXI DMA and like the CDMA because it makes PL a master. However, I am new to Zynq and don't feel comfortable yet with complexity of CDMA. Besides I will need to utilize interrupt to notify the processor when the new packet is ready for processing.

Also in this project RAM should be available for writing from PL and reading by the processor. These processes are not synchronous but overlap for some time and might be working on the same block. This was the reason for considering dual port BRAM. I might be wrong but read somewhere that DDR might have some issues with simultaneous access.

My goal is to find the simplest solution which tends to be the most reliable. Besides I have to leave room for upgrades.

Best regards!

Share this post


Link to post
Share on other sites
  • 0

If you want to use BRAM, then you could use a scheme like in the attached photo. Using this method, you would attach your custom IP to the BRAM_PORTB. To create this block diagram I basically just inserted a zynq block and an AXI BRAM controller block, and ran block automation, then connection automation with defaults. If you want to adjust the BRAM data width, you do it at the axi BRAM controller IP, and if you want to adjust the BRAM depth, then use the Address editor to increase the Address space size allocated to the AXI BRAM Controller (surprisingly that will propagate correctly to the block memory generator).

I've never tested this scheme before, but it seems reasonable. This is probably the simplest method from the terms of software complexity and the PL interface (the BRAM interface is really straightforward). The downside is that this will use up your BRAM in the PL, and there really is not very much in the Zynq 7010 on the ZYBO (~270 KB). Bandwidth-wise you should be fine as long as you dont need more than ~10GB/s or so of bandwidth on the write side. The processor reads might be pretty slow due to the AXI interconnect though. You will need to be sure to disable caching for the BRAM memory region too, unless you can figure out how to get the ACP port working (I've got no experience with it, it might just work automatically). 

 

zynq_bram.PNG

Share this post


Link to post
Share on other sites
  • 0

Hi @sbobrowicz

Thank you for you input very much. I've already created similar design but with two ports on the axi_bram_ctrl. I tested it in SDK writing and reading to BRAM and to DDR3.

This side is absolutely clear and copying BRAM to DDR3 is sufficiently fast. According to my tests sending 256 bytes from BRAM to DDR3 through processor took 22493 clock cycles of the ARM private timer. Transfering the same block through DMA in poll mode took 2152 clock cycles. I should mention that the data width is 32 bit.

Now I would like to ask you to assist with VHDL implementation on the PL side. Particularly, I have problem with infering the block ram code created by the Block Memory Generator in my custom design. I could not find an example demonstrating such case and my VHDL knowledge is limited. In my understanding BRAM should be declared in my custom architecture and then used as a component called when a word of data is ready to be saved on BRAM.

In my understanding I will also need to create a custom IP module for integration into the block diagram design.

Comments and insights are highly appreciated, thank you!

 

Share this post


Link to post
Share on other sites
  • 0

Quick addition, I just realized you shouldn't need to worry about caching issues because the Zynq should assume everything in the M_AXI_GP0 space is volatile. Also, the ACP port is not an option in this configuration, because it is designed for DMA accesses into DDR.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now