Jump to content
  • 0

Transfer some huge files from PC to FPGA via Ethernet


fpga_123

Question

Hello,

I am working with some really huge files that contains short reads (text) in millions of numbers. I am doing some processing on it in FPGA. I have a block design that has Zynq Ultrascale+ processor and few custom created IPs from HLS. The hardware is exported in SDK and then I am working on further to develop and application (bare-metal). At this point the short read file is in SD card which is read into the DDR memory. I have tested my application and that works. However, I wish to store this file in my computer and then transfer the data from the short read text file in blocks (say 10/20) via Ethernet which is then processed by the hardware in FPGA and the results are send back to the laptop/PC and then they are displayed. Once this block is done, next 10/20 lines of data is transferred from the short read file and same way the results are computed in FPGA. I have already worked with the lwIP echo server and web server application from Xilinx but I am not sure how to proceed ahead with what I wish to do. I have searched for many tutorials or any documentation but couldn't find any nice stuff to get the idea. I do have an idea that a client needs to be created on my PC and there must be a server on the FPGA but how exactly the flow should go? How exactly do I use the Ethernet options in block design as well as in SDK programming? How am I suppose to work on the PC for client side application? Do I need to consider the TCP Perf server template or lwIP echo server template in SDK while creating the bare metal application?

Any help is highly appreciated in this regard.

 

Thanks,

Link to comment
Share on other sites

7 answers to this question

Recommended Posts

8 hours ago, sgandhi said:

I am doing some processing on it in FPGA

Sometimes it's hard to figure out what the point of a design is. As I understand it you want to send large blocks of ascii type char data from your PC to an ARM processor for some type of manipulation and then receive back blocks of similar data. You don't mention if there is any data reduction involved. As I'm betting that all of this processing would be be done a lot faster and easier on your PC I'm guessing that this is a student project.  If not, then a bit of planning before starting on the project implementation wouldn't been wise. You always knew that lots of data was was going to be transmitted back and forth between your PC and remote processing hardware. Working out how to perform that data transfer in am efficient way that is easy to implement and satisfies the project requirements would be the first step of a good design. Then you would choose your platform for the remote hardware fittingly. If this is a student project then I'm assuming that you are expected to demonstrate your competence in some field ot other. You might have needlessly made your assignment harder than necessary as well. This happens.

Back to your questions. It seems as if you understand what the best interface for communication between your host PC and FPGA platform is and the scope of what you need to do to make it happen. This will be easier to do with Linux running on both the host PC and ARM processors. You might think about trying to do raw Ethernet processing. With the same tools and OS on both platforms you only need to figure out how to implement the coding part once.

Just curious, is the PL of your ZYNQ UltraScale doing anything to make itself useful?

It's unfortunate that I just read your previous post after having replied to this one. I'm more convinced that this is a student project.

Link to comment
Share on other sites

On 6/9/2020 at 7:10 AM, zygot said:

Sometimes it's hard to figure out what the point of a design is. As I understand it you want to send large blocks of ascii type char data from your PC to an ARM processor for some type of manipulation and then receive back blocks of similar data. You don't mention if there is any data reduction involved. As I'm betting that all of this processing would be be done a lot faster and easier on your PC I'm guessing that this is a student project.  If not, then a bit of planning before starting on the project implementation wouldn't been wise. You always knew that lots of data was was going to be transmitted back and forth between your PC and remote processing hardware. Working out how to perform that data transfer in am efficient way that is easy to implement and satisfies the project requirements would be the first step of a good design. Then you would choose your platform for the remote hardware fittingly. If this is a student project then I'm assuming that you are expected to demonstrate your competence in some field ot other. You might have needlessly made your assignment harder than necessary as well. This happens.

Back to your questions. It seems as if you understand what the best interface for communication between your host PC and FPGA platform is and the scope of what you need to do to make it happen. This will be easier to do with Linux running on both the host PC and ARM processors. You might think about trying to do raw Ethernet processing. With the same tools and OS on both platforms you only need to figure out how to implement the coding part once.

Just curious, is the PL of your ZYNQ UltraScale doing anything to make itself useful?

It's unfortunate that I just read your previous post after having replied to this one. I'm more convinced that this is a student project.

Yes, the PL is working on the hardware acceleration of the design. Thank you for the reply. 

Link to comment
Share on other sites

While it's certainly possible to do ascii string manipulation in logic, an FPGA isn't the first thought that comes to mind when I think about doing it. But, like most people reading this forum, I don't use the HLS flow. It seems to be more of a curiosity than working tool to me. Again I haven't done much with it since C to logic was available so I'm not a good person to make an evaluation. There are certainly uses for an FPGA platform for text based work but it's hard for me to imagine that the time penalty for the  two way trip between PC host and external FPGA hardware accelerator for data transfer can be overcome with a typical ZYNQ platform. Still, I do understand the idea of proof of concept verses final product. As we both realize, I don't understand the objective of your project but if 'acceleration' is the goal a board designed for high speed data transfer like the Terasic C5P would seem to be a more suitable option. Altera doesn't offer HLS but does have analogous tools. None of the FPGA vendors are interested in making Ethernet an easy and free option for connectivity.

I'm sorry that you didn't get more useful help with your questions, but you seem to out of our normal experience zone.

Link to comment
Share on other sites

@sgandhi,

Perhaps I can offer a different perspective.

  1. If you are building an accelerator, then you need to know what your processing bottleneck is--otherwise you won''t be building an accelerator. :D  Half of your job will be to find whatever bottleneck is present in your system and fix that, moving the bottleneck somewhere else.  It can feel like whack-a-mole.  As an engineer, half of your job is knowing where that mole is.
  2. Re: HLS.  It's advertised as an awesome choice for beginners to get started.  The problem beginners tend to have is that they don't understand how the constructs they create get mapped into hardware.  Small, subtle, minor changes can make an HLS design go from fitting on a device to not fitting and few beginners have the insight to see what's going on.  The result is that HLS seems to work best for the more advanced user--the same advanced user who doesn't really need it.  The second problem/reality with HLS is that the logic it generates tends to be bloated by 3x-6x (last numbers I heard).  That's going to force you to purchase a bigger FPGA than you need.  Yes, all this stuff comes at a cost.
  3. I have used ethernet to transmit data to/from FPGAs before.  If you look over this example, you'll find a functional example including a PC sending data to an FPGA over UDP, the data getting processed on the FPGA, and then returned to the PC.  It's doable.  Xilinx's ethernet-lite core has some ugly bugs in it that still haven't been fixed as of 2020.1.  If you look hard enough, you can find open source alternatives to their designs.  That said, the devil is in the details.  A slow ethernet link could easily destroy any acceleration performance you hoped to achieve.  I've had that happen to me a couple of times.  You'll need to engineer that well through and through to make it worth your while.  In my example, actual performance was dismal: the CPU ran instructions from SPI flash memory (*SLOW*), there was no SDRAM controller (*LIMITED ON-CHIP MEMORY*), there was no hardware available for hardware assisted data moves (*EVEN SLOWER*) and so I'm pretty certain that my "accelerator" didn't.  Still, it's a functioning example you are welcome to look over.  Be aware that the Versa board it was built for is a non-Xilinx board.  The ethernet controller was copied from a separate Xilinx design, but that's another story.
  4. Re: Protocol processing.  While you can do network protocol processing in an FPGA, it's often simpler to do it in an attached CPU--such as the ARM in a Zynq or within a MicroBlaze.  I used a PicoRV32 RISC-V processor in my example.  I think you'll find lots of examples of network protocol stacks in software, few in RTL.  That's not a bad thing.  If you use an Linux operating system of some type, then the packet processing will appear to come for free with it.  (Nothing is for free ...)  Again, know where your bottleneck is or your "accelerator" won't.
  5. Look for an ethernet core that comes with an integrated packet DMA to memory.  Beware of what the CPU's cache is doing when you copy data to memory behind the CPU.
  6. Re: ASCII text vs something else.  ASCII text is much easier to comprehend and debug.  Switching to a more native format could speed up your algorithm by a factor of 4x if not more.  In one project I dealt with, the difference between ASCII and binary formats was greater than a factor of 1000x.  Know your bottlenecks.  Do your engineering.  This choice is usually an easy one to make.
  7. Xilinx's new Vitis software is supposed to help make data processing on an FPGA easy.  It'll help you create a processing kernel: it'll generate the memory to RTL copy, you do your wonder in RTL, then it generates the RTL back to memory copy for you.  The memory movers are supposed to be at high speed.  (I haven't checked to know, myself, but I do know they aren't getting full bus bandwidth utilization--as per their data sheets.)  Check out the Vitis manual for more information there.

Perhaps that will help get you going.  Perhaps it will only generate more questions.  My bottom line is that I don't use the majority of Xilinx's IP, choosing instead to generate my own, so I really can't help you much there.

Dan

Link to comment
Share on other sites

22 minutes ago, D@n said:

The result is that HLS seems to work best for the more advanced user--the same advanced user who doesn't really need it.

While I agree with most of what you've added to the discussion ( one that unfortunately isn't of much use to the person asking for help ) I am at a complete loss in trying to  process the particular comment above.

It's one that would seem to demand an example. Do you have an example of when you used HSL to do something that you couldn't do with the normal Vivado flow? Honestly, I'd like so see some compelling evidence that HSL is worth spending time to play with. The  argument about excessive resource usage is a canard as obviously HLS is meant to be an easy and quick way to use FPGA devices without having to be competent in FPGA development skills. I consider myself to be  a somewhat advanced user who finds this comment completely mystifying. By the time you understand all of the limitations and gotchas of this flow it seems to be a bad choice over just doing what you and I prefer to do, which is creating our own IP using an HDL and using the normal flow. If you want to target any FPGA from any vendor with work that you've already done HSL takes on the appearance of being restricted to a hard existence like being bittne by a vampire. But my mind is open to compelling evidence should anyone provide that.

"Re: ASCII text vs something else.  ASCII text is much easier to comprehend and debug"

Perhaps I misread the intent of the project but it seems to be manipulation strings specifically, though we haven't been given enough information to know what exactly that entails.

Link to comment
Share on other sites

@zygot,

You may be misunderstanding me.  I'm not recommending HLS.  I'm saying that in order to use it successfully you really need to understand the background of how logic gets mapped to hardware.  Once you have this background, however, you're typically a more advanced user who no longer needs or desires HLS.  Hence my argument above that the marketing message is off--it's not a simple answer for new users.  The more experienced users are the ones you need to make it work, and these are the very same individuals who (like yourself) don't need it and likely don't even want it.

The example I have comes from counseling a master's student who was building a 1023/1024 Lagrange interpolator.   (I think that was the rate he was using, it's been a while.)  He was still a beginning designer at the time.  His goal was to build a design in both HLS and RTL, and then to compare the two flows.  Along the way he became very frustrated more than once when slight and subtle changes he made to his design changed it from working nicely on the FPGA to no longer fitting on the FPGA.  He was unable to explain why this was so.  His conclusion was that a more experienced user could've understood this easily, but it would've required an intuition/insight into how the design was mapping onto the literal hardware to know.

I wish I could point you to his dissertation on the topic, but since it wasn't in English I never maintained the link to it.  It's a shame, too, since I would've loved to have it to reference.

Dan

 

Link to comment
Share on other sites

49 minutes ago, D@n said:

You may be misunderstanding me.  I'm not recommending HLS.

Fair enough.

49 minutes ago, D@n said:

I'm saying that in order to use it successfully you really need to understand the background of how logic gets mapped to hardware.

Well, the key word here is "successfully". But not for a carefully tailored design example that Xilinx might provide but as a normal flow for all potential projects. Understanding how logic get mapped to hardware is required for all flows; it's just more obvious to users of regular Vivado tools. I appreciate your experience with one guy on one project but I think that you have to base an assessment on how well a flow works on actual personal experience. That's where you come face to face with the particulars of following it. HSL can't resolve timing closure or logic placement and optimization any better than the normal flow because it's just another layer of abstraction, similar to the 'board design' flow, between you and the final steps of implementation. At the end of the day you have ot face the same problems. The problem with 'shortcut' flows is that you are working under the false impression the the mystery and hard work of FPGA development has been taken care of for you.  

For HSL the situation is probably more grim for those adopting it because it allows the user to have a somewhat skewed perspective on FPGA development that might entice one to tackle a problem with an approach that, if they were using a strictly HDL flow wouldn't attempt.

All of this reminds me of the days when Microsoft would wow the mainframe crowd with demos of using Visual Basic; a tool that was touted as letting you fire your software team and have your secretary build applications. We all know how well that worked out, and how secretarial positions have flourished in the intervening decades since...

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...