Jump to content
  • 0

The Complete HLS Procedure


Newport_j

Question

I am confused about something. I have done the Xilinx High Level Synthesis tutorial (UG871, Dec. 2017) version. I have not done the last two chapters.

But something is not clear.

Let me give an example:

I have some very large c programs that I would like to increase their speed of execution. They were programed properly in that the whole program is made up of subprograms that are, of course, much smaller.

The programs are written in c and they compile and run as they should. However, when I profile the program,  it is clear that the majority of the program execution time is spent in only a few subprograms. These programs make up about 98% of the program's execution time. So out of say 80 subprograms in my program only about eight are appropriate or suitable for high level synthesis.  

In other words the majority of the code is untouched and certainly not synthesized to Vierilog or VHDL.

I should say that for a long time, I have been programming for GPU and it is almost always the case that only parts of big program are appropriate for modification. The majority of the code in GPU program to speed up execution is untouched. It is just c code.

I am seeing the same when I started in high level synthesis programming. However, there sees to be a disconnect here.

In GPU programming we are always worrying about the bandwidth of the CPU-GPU bus. How much data can pass between the CPU and the GPU and how fast will it pass.

I see nothing like this is high level synthesis. I am not even sure how the FPGA I interfaced with the computer. I am guessing using a USB cable or a PCI Express connection, but I am really not sure.

That is the reason for this post.

In going through the Xilinx HLS tutorial it never discusses this aspect of the process - interfacing with the main program. I assume that it is there; I just have not seen it.

For instance, how does one integrate a translated c code subprogram into the rest of the program? It just seems to be very silent on this matter.  

I said above that I have not performed the last two chapters of the HLS Tutorial in UG871.

I think these last two chapters of the tutorial may be in the area that I am seeking. I am interested in your thoughts, since I really want to speed up a large complex program if for no other reason that to justify this expense to my sponsors.

I have done a lot of high level synthesis this year, but I am seeking some practical answers to these questions.

Sorry about the long winded post, but I just want to use HLS to speed up programs that are already running; they just are not running fast enough.

Any help appreciated. Thanks in advance.

 

Respectfully,

Newport_j

 

 

 

 

 

 

 

 

   

 

 

Link to comment
Share on other sites

2 answers to this question

Recommended Posts

Well generally you have a C code which you want to optimize of which only a part can be optimized, so you are right so far. This is due to different limitations in the way HLS transforms your C code in to HDL; basically once you start writing code for HLS you need to understand how that code will be synthesised by HLS in order to obtain the best performance. You can find more details about the directives in UG902.

I have unfortunately not worked with CPU-GPU accelerations so I don't exactly know how it works, but I assume that you have the GPU and the CPU in the same PC/Laptop. If this is the case then you can offload some functions to the GPU without actually knowing the interface between the CPU and GPU, you just give it the task and when it's finished its finished. Now if there are a lot o small actions which the GPU has to do there will be a back and forward between the GPU and The CPU for reading and writing data in which case the bandwidth between the processors becomes and issue. At least this is how I understand the issue, please correct me if I'm wrong or the info is incomplete.

When it comes to FPGA's most of the time you don't have an FPGA in a PC/Laptop (I have not heard of one so far but there might be one somewhere) so you have to chose how to interface with your FPGA (USB, PCI, Ethernet, etc) depending on the required speed and bandwidth you can chose what you like. Being on the Digilent forum I assume you have one of our boards or are considering buying one so you either have to make due with what high-speed interface you have on your dev. board or you can buy a board which suites your needs. The idea is that not knowing what HLS is going to be used for and how, the authors of the UG871 will not focus on the bandwidth between the FPGA and something else, because they assume you will chose an interface which can handle the desired data at the desired speed. There is also the point that you might have a soft-core processor implemented in the FPGA and do not desire to send data to a different processor which is not in the FPGA, in which case you will have to focus on the throughput of the soft-core CPU and the HLS core. Or you could use a Zynq which has a ARM processor next to a FPGA, but I digress....

Now from the HLS perspective they do take in to account the throughput of the data which will be flowing in to the HLS core and they do warn you about the limitations of the core using the initiation interval (aka 1/throughput) which you will get in the synthesis report, but that is the maximum throughput which your HLS core can support and that can be optimised using the HLS directives. If you need a high speed data transmission for bulk data then you will need to interface your HLS core using an AXI-Stream interface. For example if you want to accelerate your C code from a PC using PCI it would look something like this:

CPU <=> PCI <=PC side============FPGA side=> PCI <=> PCI-to-Axi-Stream core <=> HLS core 

Hope this has clarified some of your questions...

Ciprian 

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...