• 0
pyraetos

Best way to add custom logic to block design?

Question

Hey,

I have a very novice question and really just need a high level answer, but I'll get straight to the point! I'm using the Zybo z7-10 with Vivado and Vitis 2019.2. This is what I would like to do, and I'm trying to do it in VHDL:

  1. Write some data from software to control registers that I define
  2. Perform some processing on this data
  3. Use DMA to write some results to DDR

I would like the firmware piece that does the processing to be a block in the BD.

I've gone through many forums, and it seems at one time the preferred way was to package an IP. I found out about adding an RTL module, which seemed more appropriate because I want to be able to modify quickly as I go, and in the same project. Based on what I've read, I was thinking to make an RTL module with a Slave AXI-lite interface (not sure how to do the registers though?), then use a master AXI-stream to pump the results to a Xilinx DMA IP block.

I've been passing Synthesis but getting different Implementation errors ("failed to stitch checkpoint", "*.vhd is a black box") doing trial and error with this. All I've done in terms of the code is try to define the entity port to have those two interfaces, either copying from other IPs or using the Language Template (for AXI stream).

Is there a good example in VHDL of a barebones AXI peripheral like this, that will pass Implementation? Once that works, I can get into adding those registers and the processing logic. Thank you!

Share this post


Link to post
Share on other sites

16 answers to this question

Recommended Posts

  • 0

@pyraetos,

If you want a bare bones example of an AXI peripheral, you can go to the "IP packager" and ask it to generate an AXI-lite slave IP for you.  It'll give you a broken core.  Worse, it's broken in a way that'll bust your design but pass Xilinx's VIP suite.  That's okay, I wouldn't trust their AXI VIP suite for that purpose anyway.  Gut their demo slave.  Here's an example that will workHere's a discussion demonstrating how to go about customizing that slave for a new purposeAs for the AXI master interface, you can add an AXI master interface when you make the slave core.  It'll also give you an interface you can use it's just ... a useless interface that doesn't tell you much about how to build anything AXI.  This description/example of an AXI master might be more useful for you.  Yes, these are Verilog examples.

Dan

Share this post


Link to post
Share on other sites
  • 0
Posted (edited)
21 hours ago, pyraetos said:

This is what I would like to do, and I'm trying to do it in VHDL:

If you are willing to put in the effort to figure out AXI/AMBA protocol, and need high data rates, then connecting your design directly to one of the AXI_HP slave ports is the way to go. I've gotten interested in @[email protected]'s blogs since he got into his fight with FPGA vendor AXI IP. Hit em high! Hit em low! Go DAN GO!

I don't think that Dan has actually done much with the board design flow as none of his AXI IP is packaged for that. I haven't visited his blog for a few weeks and he's always posting so perhaps I'm not up to date. There are lot's of details that seem to be missing in his experiments with regard to the board design flow.

Here's my approach to ZYNQ design. I don't use MicroBlaze ever.

I try to make the most basic board design possible and try to make HDL friendly interfaces external. Here are two that I like. Instantiate one or more BRAM controllers. Don't let the tools hook up the BRAM IP because you want to do something different. Create your own dual clock true dual-port BRAMs. The tools will limit the depth because of DMA limitations. Connect the A side of the BRAMs to the BRAM controller and make the other side external ports. YOu can have as many BRAM interfaces as you want but stick with 1 BRAM to each BRAM controller. Another AXI IP that's useful is the AXI Data FIFO. you can connect one side to an AXI DMA controller. The slave side is made external. This is also HDL friendly though you have to do a bit more reading about AXI stream bus protocol. Once you have your board design validated, generate it. Then have VIvado create an HDL wrapper that you control, not VIvado. This HDL file then gets instantiated into your toplevel entity, or at least some level higher than the ZYNQ subsystem and AXI bus infrastructure. From there on you can just have an HDL flow project.

Since you are trying to DMA data directly into the PS DDR I'll mention that I've had success with the AXI Virtual FIFO. I'd stick with one channel and aggregate your data to be 64-bit for simplicity. I've used that IP to DMA 256K 32K 64-bit 100 142.8 MHz data samples (256KB is the limit ) directly to the DDR and read it out later using the PS.

Oh, and with the BRAM approach you can use the PS PL330 DMA controller i your application code. It's a lot simpler than the AXI DMA controllers.

Well, I can recommend those approaches because I've used them successfully for quite a while. Don't get too excited about approaches that are only half-done or don't have a complete design that's proven them. You need to know that Dan has his own special approach to Verilog development. That's not a bad thing at all but it might not be the approach that you want to jump head first into.

Forgot to mention this. Unfortunately VHDL is the poor cousin for Xilinx IP. If you want to instantiate a HDL Mig controller then you have to use Verilogl, at least to create the IP. Once you verify that it works you can always copy the Verilog Mig IP into your VHDL project. I've done this many times. I haven't had much success trying to get a useful VHDL created Mig (that is with the project set to VHDL). You also might find that timing will suffer a bit if you instantiate a Verilog Mig into your VHDL project. That's been my experience anyway. Don't even try and simulate the Mig IP with VHDL. There are other IP that require Verilog. The IP for both HDLs often don't allow you to have the same options. Sometimes it's almost enough to make me want to go all Verilog... but so far I've managed at being just a semi-competent Verilog designer and sticking with my trusty (sometimes frustrating VHDL). It's really about time to support integers larger than signed 32-bit...

 

Edited by zygot

Share this post


Link to post
Share on other sites
  • 0

@zygot,

No, I don't do much with Xilinx's board design features--as you guessed.  I like to have more control over my designs than the board design functionality offers.  Not only that, but I'm just about at the point where I can replace AXI board design entirely with my own RTL--I'm just missing an AXI3 to AXI4 bridge, and an upsizer and downsizer.  Once I have those last three components, I shouldn't need board design other than to get from the ARM on the Zynq into my design.  On top of that, I like have full control of the performance I can get.

That's not, however, why I don't package board design products with my IP nor why I don't discuss them on my blog.  My reason there is that I try to keep my blogs and cores (somewhat) FPGA independent to the maximum extent possible.  (I also dislike placing tool-chain produced files into github.)

As for having fun with Xilinx, someone contacted me about a "bug" in their S2MM core today.  It's a fun story, but somewhat off topic for the current thread.  (It turned out to be an unexpected "feature" after sufficient investigation.)  The fun part, as far as I was concerned, was that my own MM2S and S2MM controllers worked on hardware the first time they were placed onto the FPGA.  Yeah, 'gotta like that one.  :D

Dan

Share this post


Link to post
Share on other sites
  • 0
46 minutes ago, [email protected] said:

I'm just about at the point where I can replace AXI board design entirely with my own RTL

That's what the pep cheer was all about (honestly, it was sincere). You are ahead of me on this in some respects. ZYNQ just happens to be a low percentage  of my designs, so I'm happy to muddle along with the SDK tools (haven't the need for Vitas yet) for those few occasions where ZYNQ is warranted. Even when my designs are ARM-based the processor complex is not the main feature that the rest of the design flows around. There are plenty of people who approach FPGA design differently, and that's fine. To each their own. But I'm with you that to the extent possible, the less reliant you are on vendors tools the better off you are. It's all a bit daunting for beginners though. I'm anxious to see your first real design; that is ARM PS, HDL, and software application, all done without a board design or vendor SDK. It will be something!

This all reminds me that I've been interested in what's been the mysterious USB104A7, so I've been keeping up with what's getting posted. Not sure what the product is aimed at but at least it has a SYZYGY port with DDR that can be used for storage (the 16-bit data interface is worrisome). All of the demo designs have a MicroBlaze! I don't understand why anyone would do that. Digilent need to hire an HDL guy for  many many reasons. It's astonishing that board vendors even try to get by without one.

Share this post


Link to post
Share on other sites
  • 0

Thanks for y'all's responses! I got my VHDL code working as an RTL module now, definitely needed to use the IP Packager to at least create the stubs for all the AXI ports and registers I needed. Now I'm thinking about the DMA part.

@zygot You mentioned a few options, I definitely want the data to be available in PS DDR, not BRAM. I was reading about the Data FIFO and Virtual FIFO. Can you elaborate on "you can connect one side to an AXI DMA controller. The slave side is made external." for the Data FIFO? Are you saying it would look like MyCustomBlock --> Data FIFO --> AXI DMA Controller --> DDR?

Also I (might be wrong) understand the Virtual FIFO differs because it's making use of external memory like the DDR for the FIFO's data itself. Wouldn't this eliminate the need for a DMA block altogether? But how would I configure it to use a fixed address that I also need to read in PS? Also, it being a FIFO, won't I run into problems when I want to overwrite the DDR block with fresh data-- nobody is "popping" from the FIFO in the usual sense. Could I just put it in reset then write the fresh data and it restarts at the original DDR location?

Furthermore, with your help I'm understanding the way this is all hooking together at a high level pretty well! Actually implementing the interface from my custom block to the FIFO without using PS drivers is still daunting to me, though. What would the interface need to be on the BD? Right now my custom block has a master AXI interface, but can I connect this to the FIFO, or do I have to go through the Interconnect? All in all it's a fun learning process for a software guy like me :)

Share this post


Link to post
Share on other sites
  • 0

@pyraetos,

How much data are you intending to write to DDR SDRAM?  A small amount or a large amount?  Will it be written in sequential fashion to the DDR3 SDRAM?  Will it be written to a circular buffer?  Is the transfer length known a priori?

If your goal is to write a large amount of data to the DDR SDRAM to a known address region, to incrementing (not random) addresses of a pre-determined length, then you may want to be using a data mover of some type.  Indeed, if you look under the hood of the virtual FIFO, you'll find it's built out of the basic data mover components.  There's the MM2MM, which reads from one location in memory and writes to another--I just call it a DMA.  This is the only one of the three not used in the Virtual FIFO.  There's the MM2S, which reads from memory and generates an AXI stream.  Then there's the S2MM, which reads from an AXI stream and writes the results to memory.  Not knowing anything more about your application, this "data mover" may be something you are interested in.

A data mover needs to be first configured with the amount of data to be transferred, and the memory (source/destination) addresses.  The S2MM core then waits for data, puts the data into a FIFO, and once a full AXI burst's worth of data has been received it then writes it to the AXI bus.  The MM2S core first reads the data to the bus into a FIFO and then feeds a stream from that data, gathering more data as space is available in the FIFO.

Neither of these are small designs (2k+LUTs or so), but they are ideally suited for transferring large amounts of data from one place to another.

As for your comment about how easy board design is to hook something together, well, enjoy it while it lasts.  Don't forget that you are going to need to debug your design when those boxes don't work like you are expecting. That's the part that seems to take the longest time.  Indeed, perhaps  85-95% of a project is trying to go from a design to a design that "just works".  Be aware that it can be a challenge to debug Xilinx's designs.  Since their source designs are encrypted, it can be difficult to know what wire to trace, or what the meaning of that wire might be.  Others have found it challenging to understand the behavior of a design when it doesn't match their expectations and the data sheet appears ambiguous.  In each of these cases, it helps to have the source code of the design you are attempting to interface with.  The good news is that there are open source data movers.  The bad news is that the interface isn't quite the same and ... there are differences in how they work.  You might need to pick one approach and stick with it.

Dan

Share this post


Link to post
Share on other sites
  • 0
8 hours ago, pyraetos said:

I definitely want the data to be available in PS DDR, not BRAM

The EclypseZ7_PS_DMA.pdf shows a block diagram for using two BRAM with the PS PL330 DMA. The HDL has direct access to each BRAM and the ZYNQ processor can either access each one directly or DMA data into the DDR. You have to figure out some sort of formatting. This method is good for small packetized messages between the processor and HDL.

The EclypseZ7_VFIFO.pdf shows a block diagram using the Virtual FIFO IP. The Virtual FIFO has a DMA controller imbedded. The nice thing is that you don't have to do any programming. The HDL can just stream data to the PS DDR directly.

Both of these methods require some attention to detail. Both use slow AXI GPIO for control and status signals between the HDL and PS. You can use the PS PLLs to create clocks for your HDL when appropriate or for the BRAM the HDL can instantiate it's own clock for read/write operations.

Note that these were experimental projects and I didn't necessarily use all of the interfaces in my HDL.

 

 

EclypseZ7_PS_DMA.pdf EclypseZ7_VFIFO.pdf

Edited by zygot

Share this post


Link to post
Share on other sites
  • 0
52 minutes ago, [email protected] said:

How much data are you intending to write to DDR SDRAM?  A small amount or a large amount?  Will it be written in sequential fashion to the DDR3 SDRAM?  Will it be written to a circular buffer?  Is the transfer length known a priori?

All very useful things to define before thinking about implementation.

Dan's open source 'data movers' are great... unless you want to use them in the normal board design/SDK(Vitas) flow. You can just make one of the PS HP Slave ports external and try your luck with one of them but you'll have problems trying to create an application. I respect Dan's choice in not co-mingling his IP with the board design flow but his approach will leave most people who are tied to it out of luck.

He hasn't posted a design demo to the Digilent Project Vault for some time so perhaps this might be a way to promote his cores. :)

This would 'close the loop' between his design philosophy and the rest of the Vivado world and still allow people to choose the flow that suits their needs. Just a thought....

Edited by zygot

Share this post


Link to post
Share on other sites
  • 0
24 minutes ago, zygot said:

... but you'll have problems trying to create an application.

What problems are you anticipating?  Connecting wires from an RTL project to a board design wrapper?  That's easy--just connect the wires.  You can use the "generate IP" to get a wrapper with everything defined and then map wires from the wrapper to the design.  Making sure your memory is cache coherent with the CPU?  You need to do that both ways--same with making certain that the virtual memory mapper doesn't allocate your physical memory in a way you aren't expecting.  Am I missing something more?

Dan

Share this post


Link to post
Share on other sites
  • 0
56 minutes ago, [email protected] said:

Am I missing something more?

Try using the SDK to create an application and then get back to me. By application I mean having a PS run software that does something interesting using your own HDL design. Hence the suggestion that you post a project using one of your AXI IP with the board flow and SDK or Vitas software application that the typical Digilent user can replicate. You don't have to blog about it, though I'm betting that it would make for quite a few entertaining blog posts enlightening your readers as to why your methodology is superior. Since you've mentioned your AXI IP as a useful option on the Digilent website it seems to be a reasonable thing to do. As you don't see any problems I'm hoping to see your AXI Master connected to a PS HP slave board design/SDK demo for whatever ZYNQ board you happen to be using at the moment soon... perhaps next week? 

Don't get me wrong. I'm all for your approach as long as anyone can actually make a project for a ZYNQ target using your IP. Just close the loop, for educational purposes, and then go about your business.

Oh, and most people use the Xilinx Standalone software development tools. Burdening a ZYNQ 7000 oh the typical FPGA board with a full blown Linux OS rarely makes any sense except for educational projects. Xilinx supports Free RTOS so you could do your software application part using that as a useful alternative. You can look at the EclypseZ7 demos for reasons why demos done on Debian or Petalinux are way harder than those demos should be. Most people reading the Digilent site are starting from near zero and are looking for guidance.  For this post we are talking ZYNQ targets.

9 hours ago, pyraetos said:

All in all it's a fun learning process for a software guy like m

Unfortunately, the only two responses that you've gottnen so far are from guys that would rather not be bothered with the preferred FPGA vendor 'board design' / vendor SDK tool flow issues.There's plenty of posts dealing with those problems. We just want to create HDL designs that might incorporate, when appropriate ( to me that's not often ) a nice ARM core complex.

Edited by zygot

Share this post


Link to post
Share on other sites
  • 0
2 hours ago, [email protected] said:

What problems are you anticipating?  Connecting wires from an RTL project to a board design wrapper?

As I pointed out earlier all of my ZYNQ designs use a Vivado generated wrapper that's instantiated into my HDL hierarchy.

On a side note I'd never contemplate trying to do a complex HDL DMA into DDR design without preliminary preparatory test projects. One method that's useful is to code a data generator that supplies n number of bytes to the DDR where the PS or PC interface can get at it later. A counter is good because the test application can just read successive bytes and test for monotonic incrementing data. Quick and painless and a good way to make sure that there no loss in data transfer. This is especially good for SDK standalone application development where writing to PC memory is limited. Once the data path is verified then I create a subsequent project that adds more complexity. One or two project iterations usually does the job, unless I want to explore several different AXI /HDL options on a ZYNQ based design. Very often achieving success on a project objective that involves 2-3 project iterations is faster than trying to get the desired result in one project attempt. It's also easier to review an earlier step in the process than using a code versioning tool. But this is just what works for me.

Share this post


Link to post
Share on other sites
  • 0

@zygot,

44 minutes ago, zygot said:

On a side note I'd never contemplate trying to do a complex HDL DMA into DDR design without preliminary preparatory test projects. One method that's useful is to code a data generator that supplies n number of bytes to the DDR where the PS or PC interface can get at it later. A counter is good because the test application can just read successive bytes and test for monotonic incrementing data. Quick and painless and a good way to make sure that there no loss in data transfer. ... But this is just what works for me.

Definitely.  This is certainly good advice that has worked for me as well.

It also sounds like you haven't (yet) read about my first iteration then, so I'll at least point that article out to you..  The design has so far passed all my simulation testing--for what that's worth.  The (MM2MM) DMA built yesterday for an Artix-7 device, but only at about 110MHz, so I've been re-verifying it after rearranging some of the burst write length logic.  (AXI burst length determinations are a true challenge ...)  The result shouldn't be limited by clock rate anymore than the rest of my design at that point, so once I finish the verification pass it should be fairly high quality.  Much to my surpise, the DMA uses quite a bit of logic (2.5k LUTs or so)--indeed, more than the ZipCPU in most of it's configurations.  Hence my questions above (just to maintain a semblance of keeping this on topic) which in many ways drove at the question of whether or not all the extra logic of the DMA was really of value.

I also have the Zynq Ultrascale Plus device on my bench waiting for me to try this design in real hardware--and I'd like to compare my own approach with Xilinx's data movers in the process.  My plan was to use the project as a means of experimenting with FPGA DMAs.   I've even got an AXI performance monitor built that I want to add in and try out--just to see if it helps clear up the resulting data analysis at all.  Sadly, the tyranny of the day has so far gotten in the way of pushing this project further along ...

Dan

Share this post


Link to post
Share on other sites
  • 0
1 hour ago, [email protected] said:

I also have the Zynq Ultrascale Plus device on my bench waiting for me to try this design in real hardware-

Really all that the typical person browsing the Digilent Forums wants is to use the Xilinx tools to do something with their FPGA platforms. If it has a ZYNQ that means the whole deal, board design, HDL ( hopefully ) connecting an add-on board with some functionality, and a PS software application all doing something interesting. A big problem for most ZYNQ boards is that the user interface is limited to a UART, Ethernet, and often involves the software development IDE to do configuration and report results. Simulation is great but doesn't represent a complete design. There's a lot of details standing between your ZYNQ bitstream and an application working on hardware.

It seems apparent to me that you haven't actually used the normal ZYNQ HW/SW tools to build a real application yet. I know that you've mentioned having a Terasic ARM based FPGA platform. I have a Zedboard with a bad Ethernet PHY that I don't use very much. Send me a note if you're interested in it. You need to at least have a few simple Standalone applications running on hardware using the tools that everyone else uses before selling a better approach.

One problem with UltraScale device based boards is that the IO is mostly 1.8V for PS MIO and 1.5V or less for PL logic. This makes it tough to integrate with hardware lying around. We all want to see your AXI master demo DMAing data between your Verilog design connected to GPIO pins and something interesting. I have a (initially) cheap Ultra96V2 ZYNQ UltraScale based platform that I keep picking up and putting down because the IO makes it pretty unusable unless I buy more stuff that doesn't do what I want to do anyway...  it's all too easy to get sucked into the wormhole.

Edited by zygot

Share this post


Link to post
Share on other sites
  • 0
On 5/23/2020 at 7:07 AM, [email protected] said:

A data mover needs to be first configured with the amount of data to be transferred, and the memory (source/destination) addresses.  The S2MM core then waits for data, puts the data into a FIFO, and once a full AXI burst's worth of data has been received it then writes it to the AXI bus.  The MM2S core first reads the data to the bus into a FIFO and then feeds a stream from that data, gathering more data as space is available in the FIFO.

@[email protected] @zygot

I decided to go forward with your recommendation on using the DataMover. So now I have my custom module and the DataMover IP in my BD.

I tried to make a basic test of the DataMover, where in my custom HDL I write a magic 32-bit number every clock cycle to the AXI-Stream line going to DataMover S2MM. There is also a 72-bit AXI-Stream Command interface to the DataMover, where I put the address in DDR I'd like to write to, per PG022. I also assigned constants to the other fields, mostly all 0, and put 4 for BTT because for the test I want to set a 32-bit area of memory.

Don't see any indication that it's writing it when I read that location from software. Have I messed up the AXI protocol by doing these writes every clock cycle? I feel like its something to do with valid, ready, last, and all the other wires I'm not too clear on. Additionally, the Vivado connection automation connected DataMover's output S2MM to the AXI Interconnect IP. Is that right, and it should still be able to reach DDR?

 

Edit: I was thinking about other reasons for it not working. I do disable data cache in SW. I think since I'm running baremetal it should be ok to copy and paste a hardcoded pointer from SW into Vivado to use for the SADDR of the AXI-Stream Command? No odd virtual address translation going on?

Edited by pyraetos

Share this post


Link to post
Share on other sites
  • 0
8 hours ago, pyraetos said:

I was thinking about other reasons for it not working. I do disable data cache in SW. I think since I'm running baremetal it should be ok to copy and paste a hardcoded pointer from SW into Vivado to use for the SADDR of the AXI-Stream Command? No odd virtual address translation going on?

Both good things to consider. I haven't used Vitas yet but for the standalone (barremetal) Xilinx SDK support the standard memtest has cache turned off. You need to be mindful of the data cache having stale data when looking at DMA memory. As long as your address matches those of the .mss file you can use it as you normally would.

I've not used the data mover IP for a long while but usually the standalone IP recognized in your mss file has example projects demonstrating how to use them. I always run the memtest example for new projects as a sanity test. It will find any memory being used. It couldn't hurt to import one of the data mover example projects if one is available and look over the code. Sometimes, the IP needs a loopback arrangement in the hardware but you'll see that in the software example code commentary.

I suspect that you should be looking at the data generator code that the datamover is connected to in your PL code. It also could be a clocking issue. I assume that the board design validated without messages. So you've gottne to the fun part;... figuring out how to debug your first design. You know how to do this. Put in extraneous instrumentation into you logic to see what's going on. You can add AXI GPIO or an AXI ILA. In this case extraneous logic is a way to replace assuming with knowing and well worth the effort. See it as a learning exercise. Even if whatever is wrong is pretty silly and easily fixed there's no one stopping by your desk asking asking for a project status update. Enjoy the fun part because that's how you learn how to hunt grizzly bears without getting eaten by one.

Take time to explore the tool examples and documentation help. Xilinx IP also has HDL example projects and documentation that are worth reading even if they don't necessarily provide all of the information that your are looking for.

Edited by zygot

Share this post


Link to post
Share on other sites
  • 0

@pyraetos,

I've seen some strange behavior from Xilinx's S2MM data mover, certainly violating the principle of "least surprise"--that any hardware component should do what surprises the user the least.  Let's see, some of these include 1) ending a data transfer if ever TVALID && TLAST.  This was explained to me as necessary in order to handle network packets, where the packet size might not be known a-priori.  Or how about 2) allowing TREADY to be valid between commands, and so data sent to the data mover would get thrown in the trash when if the core wasn't (yet) ready to receive anything.  You might want to be on the lookout for these "features" as you move forward.

Dan

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now