zygot

Capture 4 channels of 120+ million ADC samples

Recommended Posts

Capture 4 channels of 120+ million ADC samples per channel. As a proof of concept project I just completed a design using the Opal Kelly XEM7320 with 2 Digilent ADC14010 ZMODs in a direct to DDR 4-channel ADC 100 MHz Fs application.

Opal Kelly has a unique approach to  FPGA development that generally appeals to software developers. For me they do a few things that usually are show stoppers. For one most boards have a closed configuration/PC interface. The only way to use their boards is with pre-compiled encrypted netlists. For another they don't provide schematics, or even partial schematics. Their price point for hardware verses say Digilent or Terasic is substantially higher. Still, I've had success with their products which allows me to overlook the bad stuff.

It's great that Digilent has gotten into the SYZYGY game with two very good converter ZMODs. Not so great is their inability to make an FPGA platform product that showcases the potential for what amounts to as an opportunity to use Xilinx Series7 Select IO in all of it's glory. SYZYGY isn't the only way to go, though it is a worthy attempt at a plug and play cross-platform cross-vendor standard. Any board vendor can make an FPGA board that allow users to explore the full value of the Xilinx Series7 devices, they just have to want to and then design board that allows it. Don't let marketing people design boards, hire an experienced FPGA guy person and let him them design boards. Trust me it'll all work out in the end and everyone will be happy. You'll make money and you will have a broader customer base buying your stuff.

The XEM7320 is the only non-ZYNQ board the Opal Kelly offers. It has an Artix 75T device and is priced competitively with the Eclypse-Z7. The XEM7320 has 2 standard ports and 1 transceiver port verses the Eclypse-Z7 2 standard plus 2 PMOD ports. The XEM7320 has a USB 3.0 PC interface and the tools for configuring the device from within the software application. The important feature that the XEM7320 has with respect to SYZYGY is a 1GB DDR device that ZMOD ADC samples can be written directly to. 

I was able to bootstrap the Opal Kelly RamTester demo C++ and HDL code to make an application that can capture 128M 16-bit samples on 4 channels using 2 ADC1410 ZMODs. It's an all HDL mixed VHdl/Verilog design. No MicroBlaze. No board design. (almost) no vendor tool version issues. No SDK or Vitas, just VS2019, which is free these days. I used the Digilent ADC1410 low level controller code as a first cut though I will be replacing that with a more appropriate approach later. The DDR is essential just a very large FIFO.

The whole design resource useage:

LUTS   <20%

LUTRAM 5%

FF  8%

BRAM <10%

DSP 2%

The VS2019 application configures the device and saves all data to an ascii file for use with SCILAB to OCTAVE.

The only big issue at this point is viewing large 4 column matrices in a waveform rendering. I'm open to suggestions. How is the best way to render a plot of this:

d = [

float4, float3, float2, float1, //sample 0

...

...

float4, float3, float2, float1] //sample 1048575

Thanks Digilent for bringing 100 Ms/s ADC/DAC options to the general Xilinx user audience. It's been a long time in coming.

[edit] Apologies to anyone reading the first version of this post where I got the sample count wrong. If I had a marketing guy writing this the title would have read " Capture over 1/2 Billion ADC samples". I hate it when the details deflate the high expectations of the attention grabbing headline, even if over half a billion samples is technically correct. I should point out that so far I've only capture and uploaded 1*1024*1024 sample for all four channels. The reason being that I don't have a way to verify more data than that. I did start of with a 100 MHz 64-bit counter written to DDR and that I was able to confirm. Even with a very fast USB 3.0 interface converting bits to mV and formatting even 10*1024*1204 samples and printing the data to a file takes more time than I want to spend at the moment.

[edit2] Gender and skill are not correlated at all. Don't select books or talent by the appearance of the cover.

Edited by zygot

Share this post


Link to post
Share on other sites

I suppose that now I have to buy another DAC ZMOD so that I can test 4 ADC channels. I have a lot more confidence in a 4 channel DAC with the Eclypse-Z7 as a useful combination.

Share this post


Link to post
Share on other sites

@zygot,

This sounds like a fun and perhaps even a nicely well paid task.  Nice.

Help me understand an overview here ... was that 4 ADCs of 100Msps coming in on each ADC?  How many bits per ADC?  (16'bits) How wide is the SDRAM you are working with?  Stored into DDR3 SDRAM via a virtual FIFO, right?  and then, you came off the board onto USB3 did you say?

Can you give me any indication of  how close you came to the throughput limits of either the SDRAM memory or the USB3 offboard transport you used?  Just trying to understand how much of a challenge it was to achieve your objectives here.

Were you using Xillinx's AXI crossbar, or was the virtual FIFO the only component that accessed memory?

Looking forward to hearing more of this fun project,

Dan

Share this post


Link to post
Share on other sites

Two ADC ZMODs each with a dual 14-bit (sign plus 13 bits) ADC. All four ADCs sampling at 100 MHz. This is an Artix 75T and is comparable to the PL in a XCZ020 device. No AXI anything. No processor cores. Just a couple of FIFOs, a DDR read/write controller, a Mig DDR soft controller, and some control logic. The ADC data is gain and offset calibrated so some of the bits can be thought of as fractional. But the DDR data write rate for 4 channels is 100 million 64-bit words/s. The DDR has a 32-bit data path. It sure would be nice if the Artix had a hard DDR controller like the Spartan6 it replaces as the Spartan6 DDR controller is both higher performance and more flexible. Unfortunately, the Mig is stuck with a 4:1 controller/DDR clock ratio, but it all works out. The DDR, for this application, is essentially a giant FIFO. A small state machine controls how many words are are captured and after capture the host application reads out the contents of the DDR. If you want to capture 128M samples per channel you could just write the DDR and use it as a circular buffer though this would complicate the design a bit. As I mentioned this is just a proof of concept design at the moment. This shows the possibilities for the boards involved. The SYZYGY is capable of demonstrating Series7 Select IO and each SYZYGY port could support a quad 16-bit ADC though the DDR would have to be beefier for 8 channels. Then there's the limitations of a small FPGA board and its power supplies. The FPGA never got above about 50C as measured with a infrared temperature sensor and has no heat sink. I'll probably add one at some point. The USB interface is vendor provided encrypt netlists and somewhat odd to work with but can do sustained 350+ MB/s data transfers befitting such a platform. The application was written in C++ using VS2019. As I mentioned I used as much vendor supplied code as possible to get off and running quickly. I had to do a few preliminary projects to create the Mig so the early ones were all Verilog. Once things came into place I copied the Mig and instantiate all of the Verilog into my VHDL toplevel entity. The USB interface allows for changing gain and offset calibration coefficients on the fly as well as analog front end gain and coupling options. It's all a lot cleaner and uses way less resources than AXI IP. Because the ADC data is aggregated into a 64-bit data word and written to an asymmetric FIFO (64-bit write, 1024 bit read) the application has to do some additional formatting to take 8-bit chars and create scaled real data in Volts.  The ZMODs are nicely thought out and well implemented. A nod of appreciation to the design team responsible. Great job!  It's really an adaption of the Analog Explorer ADC design beefed up for a higher performance platform.

The only Xilinx IP are the native MIg and FIFOs and an MMCM.

The thing that should impress people is the low resource utilization for the entire design. Try that with a Microblaze tricycle design approach!  In the real world no one uses MicroBlaze because it takes up most of the resources and leaves the designer with little left to work with.

I'm still looking for that FPGA board that let's Series7 really shine. Take the Genesy2. Remove the FMC connector. Add 2-3 standard SYZYGY ports and a 4-lane SYZYGY transceiver port. Make the DDR with a 64-bit data path, keep the mDP as is, call it Genesys3 and now we're talking business**. TI makes some nifty Ultrasound AFE devices that could work with a double-wide standard pod arrangement. Actually, there's a lot of possibilities. As I pointed out earlier SYZYGY isn't necessary but is a nice format that allows for a wide range of FPGA design applications. For too long now students and experienced engineers on a budget have had to make do with hardware that doesn't allow them to make use of the resources of the FPGA devices they are paying for. That's more than a shame.

I could have done a similar 4 channel ADC design with the Cyclone V GT development board and two DDC HSMC mezzanine cards and used the PCIe 4-lane Gen2 as a PC interface though it would be messy since that board isn't really designed to sit in a PC properly. I have used a PCIe extender cable with that board. It's just nice to be able to do this with a Xilinx device and tools. The PCIe data rate is high enough to stream data directly into the PC memory.

 

** [edit] If you are going to make a nice platform like this it should have at least a USB3.0 interface for interaction with a PC and these days most boards that aren't the lowest end should have at least 1 high quality programmable clock module. Designing a good FPGA board is hard work. Why waste the effort by doing something half baked?

Edited by zygot

Share this post


Link to post
Share on other sites

An additional thought. It took me a lot less time to make use of the ZMODADC1410 for this application than it did trying to work out the overly complicated and poorly thought out Eclypse-Z7 supporting code... and I'm not close to being done with that as I write. Compare the two approaches. Compare the usability. with this platform I can write a PC application that configures the target FPGA, executes an application and quickly gets data to a PC for analysis or storage. The Eclypse-Z7 can't do any of that. At this point it can't do anything without an SDK or Vitas support.

[edit] I'm still hopeful that Digilent has the will to at least make the Eclypse-Z7 live up to its potential. It's flawed but still worth the investment in development time to allow users to do interesting things with. 

Edited by zygot

Share this post


Link to post
Share on other sites
11 hours ago, [email protected] said:

was the virtual FIFO the only component that accessed memory?

If this was a ZYNQ platform and I was using the AXI Virtual FIFO I wouldn't be able to capture more than 4 channels of 32K samples each as there is a limit of 256KB with the core.

My philosophy is that pipes that just move data around should be as simple and fast as possible. The complexity should be in the logic. For ARM based FPGA devices the hard ARM complex is best suited to handle complexity that end. For the PL this is where your logic resources are. There's absolutely no design justification for putting AXI busses between the ARM complex and the PL. A simple bus with address, data and a handful of simple gating controls is ideal. Instead of 4-8 AXI pipes connecting the PS to the PL, and perhaps an application only using 1 or 2 of them,  there should be one or two really wide fast pipes letting the user can decide how to best use them for their application. I mean customized application specific logic is the reason for being for an FPGA. It sure can't compete with non-programmable logic. Now if you are designing a ARM CPU without programmable logic I could see where you might want to expose complicated AXI or AMBA busses to the user who might want to hang a number of all sorts of peripherals with a combination of low speed low throughput and high speed high throughput. For an FPGA with an embedded hard ARM core complex this makes no sense to me.

Share this post


Link to post
Share on other sites

@zygot,

If you just want to move from an AXI interface to a simpler interface, you can convert it to either AXI-lite or WB without too much hassle.  I just might have some bridges to handle that conversion lying around--bridges that will keep the entire bus running at 100% capacity.  That would handle your criteria of "a simple bus with address, data, and a handful of simple gating controls".  It's unfortunate that AXI-lite is a second class citizen in Xilinx-land.  The AXI-lite protocol is quite capable, but many of the AXI peripherals that use it are not (cough, like the AXI BRAM controller that'd drop AXI-lite throughput to 25%).  Thankfully, the MIG core doesn't seem to mind one way or another.

One of my own criteria when building my AXI data movers was that they should be able to handle 100% throughput even across burst boundaries.  Judging from Xilinx's spec, Xilinx's cores don't do this, and so there is a throughput difference between the two implementations.  A second difference is that I never limited the transfer to 256kB ... :D  Of course, I don't have a virtual FIFO to offer.  Never thought of building one.  If I did have to hack something together in an afternoon, it'd be a WB FIFO that then used a WB to AXI conversion (while maintaining 100% throughput ...)  Indeed, I did manage to build a fully verified WB stream to memory converter in a single morning, whereas the AXI equivalent took several days to get right.  Yes, there's a cost for all this added complexity.

I think I might disagree with you about CPU design being one potential or even necessary user of such a complex bus structure.  It doesn't need to be so.  Indeed, IMHO AXI is waaayyy over designed--but that's another story.  That said, I've been burned with cache coherency issues, so I can see a purpose for a protocol that would help the CPU maintain cache coherency.  It's just that ... AXI isn't that.

Dan

Share this post


Link to post
Share on other sites
38 minutes ago, [email protected] said:

If you just want to move from an AXI interface to a simpler interface, you can convert it to either AXI-lite or WB without too much hassle.

I think that our wires are crossed.

Except for some potential ZYNQ designs where it's unavoidable, I don't want to see or touch anything AXI. Even with ZYNQ I can usually step around them employing BRAMs or AXI FIFO streaming interfaces where I'm not worried about efficiency or data throughput. I don't see much use for all the hassle of incorporating an ARM into an FPGA design except for rare applications. For something like making a mid to high performance FPGA platform a ZYNQ solution is just going to get in the way. Now, it's possible to design a ZYNQ board that lets users do advanced IO designs. You just have to be willing to put enough PS DDR on the board to support it's OS and room for some large data buffers that might connect to the PL. You also have to provide the PL with its own DDR so support those advanced IO applications. Unfortunately, finding a moderately priced ZYNQ platform that does that is a rarity... off hand none come to mind.

Yes, I can see using a well designed AXI Master packaged in a way that's suitable for the Vivado flow so I can quickly build a standalone SDK application in the SDK or Vitas. Got one of those?

No I have no interest in experimenting with lots of 'bus adapter' IP that involves busses that I want to avoid like the plague.

Really I just want a nice non-ARM FPGA platform that doesn't prevent me from using all of that nice Series7 Select IO to its fullest capability.. and a 1gE or better yet 1,2.5,5,10 gigabit port, and a usable PCIe or USB 3.x platform/PC interconnect. The old Z7000 is really not ideal for a nice Linux/FPGA development platform. In theory the UltraScale ZYNQ families now have the processor horsepower to keep up with a Raspberry Pi 4 but the IO requires too much additional hardware to use the PL with most existing stuff that you might want to use it with. So you can get a cheap UltraScale board for less than a decent Z7020 board but you can't afford to connect it anything without a lot of expense and effort building you own adapter boards. I suppose that this is the price of technical advancement. 

One point of this exercise is that using only HDL and even a low end plain old FPGA platform that has been reasonably well designed can do a lot more than a comparable ZYNQ platform  with a lot less pain with a shorter design time using far fewer logic resources and be reusable in a different project using some future version of Vivado without spending weeks of resolving broken IP instantiations and code depreciation.

"I think I might disagree with you about CPU design being one potential or even necessary user of such a complex bus structure.  It doesn't need to be so."

My reference here is to a standalone general purpose microcontroller IC that a customer might use on a PCB. Even then I'm not advocating for it, just possibly willing to concede that it might serve a useful purpose. I'd still rather have a simple external data pipe to deal with.

Share this post


Link to post
Share on other sites

I did figure out a way to scroll though 4M samples in a text file and plot 16K  sample segments at a time using OCTAVE. Not pretty but usable. SCILAB is, unfortunately, just unable to handle matrices larger than about 64KB. No doubt an issue with how they handle memory.

I have a few things to clean up before getting moving on developing a ZMOD demo project that lets them be useful and avoid the dreaded storage closet where old and useless things sit hoping for a second chance...

Share this post


Link to post
Share on other sites
40 minutes ago, [email protected] said:

Any special reason you aren't using a binary file?

Yeah, because I'm a salmon in a bathtub when it comes to writing software. Frankly, hopping back and forth between HDL and C doesn't seem to be getting easier with time.

I tend to like text based data files for debugging or development because for one, I can usually spot most things that are wrong in NotePad++, and two because I don't have to write a GUI application to render data into a visual format. With graphics application software I'm more of a toad in the toilet.... Of course when it becomes necessary I can put in the effort to do all of that if it becomes an important tool that I can reuse. Otherwise, I tend to make do with what I have readily available.

If this project was a full blown complete application I'd have to either format my data to be used with scope waveform viewers or write my own application to handle somewhat large datasets involving plotting, scrolling, zooming etc. Not today.

OCTAVE is pretty well written and current so it turns out that, while it might take 20 seconds to read in a 13 MB text file into memory, slicing the matrix and plotting manageable segments is both quick and a workable solution... for now.

Edited by zygot

Share this post


Link to post
Share on other sites

If I forget to set the ADC gain calibration to something other than 0 I remind myself of a really quick and fast way to initialize 1 GB of DDR to all zeros. Sometimes this is helpful and sometimes not so much...

Share this post


Link to post
Share on other sites

Geez, I forgot that the baremetal DAC demo only drives one channel of a two channel DAC. It's not encouraging when a product vendor can't even figure out how to demo their own products satisfactorily... perhaps buying another DAC ZMOD isn't such as swell idea after all. There are easier ways to get the job done.... SIGH!

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now