Jump to content

busbridge3: High-speed FTDI/FPGA interface


xc6lx45

Recommended Posts

Hi,

as I may not have time for FPGA work for a while - just started in a fascinating new role related to high-speed digital diaper changing - I decided to post this now.

Here's the Github repo (MIT-licensed)

  • The project provides a very fast (compared to UART) interface via the ubiquitous FTDI chip to a Xilinx FPGA via JTAG. Most importantly, it achieves 125 us response time (roundtrip latency), which is e.g. 20..1000x faster than a USB sound card.
  • It also reaches significantly higher throughput than a UART, since it is based on the MPSSE mode of the FTDI chip.
  • Finally, it comes with a built-in bitstream uploader, which may be useful on its own.
  • I implemented only the JTAG state transitions that I need but in principle this can be easily copy-/pasted for custom JTAG interfacing.

So what do you get:

  • On the software side an API (C#) that bundles transactions, e.g. scattered reads and writes, executes them in bulk and returns readback data
  • On the RTL side a very basic 32 bit bus-style interface that outputs the write data and accepts readback data, which must be provided in time. See the caveats.
  • In general, a significant increase in complexity over a UART. The performance comes at a price. In other words, if a UART will do the job for you, DO NOT use this project.

For more info, please see the repo's readme file.

For CMOD A7-35, it should build right out-of-the-box. For smaller FPGAs, comment out the block ram and memory test routines, or reduce the memory size in top.v and Program.cs.

I hope this is useful.

When I talked to the FTDI guys at Electronica last week I did not get the impression that USB 3.0 will make FT2232H obsolete any time soon  for FPGA: They have newer chips and modules but it didn't seem nearly as convenient, e.g. the modules are large and require high density connectors. In FPGA-land, I think USB 2.0 is going to stay...

Cheers

Markus

Link to comment
Share on other sites

Thanks for the scoop before you move on to slogging poop... First glance looks nice and definitely worth anyone's time to check it out with the aroma of solder wafting in the air. I have high expectation and intend to, just for fun, verify your latency claims whatever they mean. It's very heartening to see worthwhile contributions that can motivate a desire to learn.

For performance I'll stick with Cypress USB solutions... even the old ALTYS FX2 design keeps on giving...

THANKS, 

Link to comment
Share on other sites

Thanks - if you manage to get it running, the latency number should be printed to the console from this part in Program.cs:

sw2.Reset(); sw2.Start();
int nRep = 1000;
m.memTest32(memSize: 1,baseAddr: 0x87654321,nIter: nRep);
Console.WriteLine("roundtrip time "+((double)sw2.ElapsedMilliseconds/(double)nRep)+" ms (should be close to 0.125 ms from 8 kHz USB 2.0 microframe rate)");

Throughput is another matter - I didn't profile it today but if I recall correctly, it reaches 20+ MBit/s out of the box. There is an FTDI whitepaper on optimal buffer settings, there may be some opportunity for tweaking at FTDI driver level and in the device manager.

Starting from Visual Studio disables some of the JIT optimizations (I think). For profiling, run the .exe straight from windows explorer.

There was a missing code line in bb3_lvl2_io.cs that shows only in low-level JTAG read access, e.g. if I use it as plain JTAG interface to connect straight to BSCANE2, or something like that. It's fixed now.

Link to comment
Share on other sites

@xc6lx45,

After second, more through look, my reaction has gone to really really nice! Excellent work! For anyone wanting a one step PC application to FPGA application this is a fantastic tutorial.

[edited] And by the way best wishes in your new endeavour. When you need a change of pace stop by and drop off more of worthwhile your insight. We will all be poorer off without them.

Link to comment
Share on other sites

@xc6lx45

Hi,

I had no issues creating a bitstream in Vivado 2018.2 using your (imported) project file.

While trying to recreate your project I ran into a snag with the C# code. I have VS2010 tools so of course your project files and solutions are unusable. I did manage to use the sharpDevelop tool you suggested. The executable configured the board and then threw an unhandled exception... (Arithmetic operation resulted in an overflow)  but still terminated gracefully. I'm afraid that my C# skills have become rusty.

I had to modify the path for the top.bit since I was running out of a different directory than you did ( I had to create a new solution project); simple enough to resolve.

Let me make a few suggestions for such projects:

  • Identify exactly what tools (versions) you use to create components
  • Since Microsoft tools are notorious for not playing nice with other versions of itself you might want to anticipate that most users will have to do a work-around; not a complaint, just a thought. Assume that you audience might have a different development trajectory, especially for PC software, than you did when developing the project.

I got too close to quit but so far haven't accomplished a verification that your project can be recreated. I know that you are interested an any feedback, as I am for any projects that I've posted.

regards

Link to comment
Share on other sites

Thanks. Please check gitHub again, I added a fourth folder "sharpDevelop_build" (5.1 from sourceforge).
This should run out of the box, if a CMOD A7 35 is found, and no other Digilent USB interface is present.

If more debugging is needed, my exception handler might get into the way. I'd comment out the try/catch. Alternatively, enable "pause on exception", as shown in the screenshot here: http://community.sharpdevelop.net/blogs/siegfried_pammer/archive/2014/08/27/customizing-quot-pause-on-handled-exceptions-quot-in-sharpdevelop-5.aspx

When porting to sharpDevelop I ran into an issue with type casting and bit shifting: Running the older code on sharpDevelop will cause an overflow error. Don't know why, probably different interpretations (or versions?) of the standard. Whatever, it is fixed now by stating the problematic cast explicitly.

To get rid of the relative link to top.bit (it's ugly), change this line in Program.cs
bitstreamFile = @"..\..\..\busBridge3_RTL\busBridge3_RTL.runs\impl_1\top.bit"; Console.WriteLine("DEBUG: Trying to open bitstream from "+bitstreamFile);
e.g. to
bitstreamFile = "top.bit";
and copy top.bit into the folder where the .exe file is generated.

Link to comment
Share on other sites

2 hours ago, xc6lx45 said:

I'd comment out the try/catch.

I created an executable using your sharpDevelop_build directory but got the exact same results as the project that I created. Commenting out the try/catch lines just created an exception in the memif without the graceful termination. I'll try and debug but it may take awhile. I'm running in WIN7 (far far away from the internet )

Link to comment
Share on other sites

Sorry but it works on both my Windows 8 and Windows 7 machine (both 64 bit, though).

Does one of the changed lines from this commit look familiar? https://github.com/mnentwig/busbridge3/commit/887d1d3984dbf69ab5a7b72398d90ddde98ab0a9

If so, could you please try a clean checkout? Otherwise, the error message would be helpful (memTest is almost "toplevel", everything happens in there).

Link to comment
Share on other sites

The committed lines seem to be there. If I uncomment the memTest32 calls and comment out the try/catch I get the following:

System.OverflowException: Arithmetic operation resulted in an overflow.
   at busbridge3.memIf_cl.getUInt32(Int32 offset) in c:\Projects\busbridge3-master\busBridge3\bb3_lvl4_memIf.cs:line 362
   at busbridge3.memIf_cl.getUInt32(Int32 offset, Int32 num) in c:\Projects\busbridge3-master\busBridge3\bb3_lvl4_memIf.cs:line 373
   at busbridge3.memIf_cl.memTest32(Int32 memSize, UInt32 baseAddr, Int32 nIter) in c:\Projects\busbridge3-master\busBridge3\bb3_lvl4_memIf.cs:line 505
   at Program.Main2(String[] args) in c:\Projects\busbridge3-master\busmasterSw\Program.cs:line 162
   at Program.Main(String[] args) in c:\Projects\busbridge3-master\busmasterSw\Program.cs:line 8

I'm running WIn7 64-bit as well.

Link to comment
Share on other sites

10 hours ago, zygot said:

   at busbridge3.memIf_cl.getUInt32(Int32 offset) in c:\Projects\busbridge3-master\busBridge3\bb3_lvl4_memIf.cs:line 362

OK that is exactly where I made the change for sharpDevelop.

Could you please double-check that the line reads
retVal |= (UInt32)this.jtag.io.readData[offset++] << 24;
in the old version it was (note the brackets!)
retVal |= (UInt32)(this.jtag.io.readData[offset++] << 24);

The red one works in VS but not sharpDevelop.

Link to comment
Share on other sites

The "problem" with sharpDevelop was one of default settings. It had arithmetic overflow protection enabled, whereas my Visual Studio project disables it in the "advanced" options.

================================================================================

I added a 2nd, simplified example, on github. It runs independently in the same design, on USER1- and USER2 JTAG opcodes.

The new variant is "almost" like a UART, except that the data stream is continuous (e.g. may design the application FPGA code to send zero bytes if there is no return data)

Link to comment
Share on other sites

8 hours ago, xc6lx45 said:

retVal |= (UInt32)this.jtag.io.readData[offset++] << 24;

Yeah, my eyesight isn't that great to spot the parens on my own... the source for my last build (what was the latest source from GIT) was not the updated version. The fixed lines stopped throwing exceptions after I put the rest of the code back to yours to run everything.

I did download your latest this morning and installed it into a fresh directory. Built the bitstream and C# application. I don't get any error but am not sure what I am looking at. Your USER2_demo runs twice but doesn't provide any notification of success... so I guess that it passes. The application repeatedly prints out the lines:

  • configured test register delay...
  • round trip time...
  • margin 1:
  • margin 2:

the reported round trip times seem to be around 0.260 ms if the CMOD is plugged into a hub and around 0.06ms if I plug the CMOD directly into a PC USB port. The one LED does blink at a 1 sec interval.

I never do get to the "All tests pass..." declaration that I'm expecting from the program.cs source.

For what it's worth, last night as I was playing around with the SharpDevelop debugger it appeared that the second call to  getUInt32 at line 505 in bb3_lvl4_memif.cs was where the exceptions were happening.

I'm going to try an see if I can make the CMOD Verilog do something a bit more interesting than blink the LED...

Sorry for the troubles...

 

Link to comment
Share on other sites

>> I never do get to the "All tests pass..." declaration that I'm expecting from the program.cs source.

True. It's edited to run forever (counts up to minus 1 => the loop never terminates). Once you see repetitions in the console, it has passed all tests once.

I sometimes leave it running overnight but haven't yet witnessed any random bit errors.

With the USER2 "demo", it's just the same: It checks correctness of the received data - as long as there are no exceptions, it passes.
This could be used as a full-duplex throughput test, since there is no protocol overhead.

 

Link to comment
Share on other sites

  • 2 months later...

I'm having a hard time comprehending how this project has gotten only 200 or so looks and my demo project has 10X that. I suspect that views may not be a good metric for interest. No one's talking but I surmise that people (students) are getting some utility out of the terminal based UART user interface that I provide. I certainly do.  Who knows? Will anyone take the time to provide feedback? Is there a Multiverse?

Link to comment
Share on other sites

laughing out loud ... Formula-1-performance is niche business, combine harvesters bring home the money, walking barefoot is the norm.

And why not, I'm even discouraging people to touch it as long as a UART does the job. Same as with fast cars, speed is largely overrated. Those who know otherwise, you know who you are ?

Link to comment
Share on other sites

18 hours ago, tcmichals said:

ou can use JTAG and Fast Serial Mode

What this project offers is about the best throughput for the FTDI USB 2 devices short of synchronous 245 mode. Digilent boards can't do this mode unless you modify the EEPROM settings and use the FTDI API; something that Digilent strongly discourages as you can brick your device, and hence your board if you make mistakes. I successfully did this for my Nexys Video but rarely take advantage of the effort that was involved; perhaps one or two projects. (I count learning experience as a positive project objective). Still were talking about 20-30 MB/s instead of 8 MB/s sustained average throughput which is rarely worth the effort. I still get a lot of mileage out of my ATLAS and Genesys Virtex V boards because they use the significantly more usable Cypress device and I can get 35+MB/s for large data transfers. More importantly, they have a number of modes that you can write HDL to support optimizing a particular requirement. For USB you should understand that throughput is not the only, or for some applications, the most important metric to creating a successful application involving an FPGA/PC application. Ahhh, sometimes change isn't all that wonderful....

Link to comment
Share on other sites

  • 3 months later...

Hi,

a quick update: I released a new version 1.1 that supports variable address width in the protocol.

Functionality is unchanged,  but performance will improve for scattered writes and reads in the lower address range: 0x000000xx range saves 3 bytes per transaction, 0x0000xxxx 2 bytes and 0x00xxxxxx 1 byte.

There was also a missing -datapath_only in the constraints, which made the timing report hard to read (the intention behind the set_max_delay constraint was simply "tool, don't make this path between clock domains any slower than x ns end-to-end").

Link to comment
Share on other sites

  • 10 months later...

@xc6lx45

Your project is something i was looking for a long time! In the past i did some projects with a cmod board and uart, but it is a little bit to slow for my task. So i would like to include it in my block design (i am a real beginner fpga programmer) and make a try with my arty board.

Actual i don't really understand how to set the interface from my datas i want to send (as example a sine wave from a dds generator) to the FTDI chip. Would i have a disatvantage when i include it in a block diagram, instead of using your code direct?

Thank you for any help :)

module top(
input wire clk, // clock comes from the FPGA's own 65 MHz (-ish) ring oscillator. Please use a proper crystal-based clock in any "serious" design
input wire [31:0] busAddr,
input wire [31:0] busData,
output wire LED, // we hijack the PROG_DONE LED, which is a fairly common board feature
output wire [7:0] dataTx,
output wire [7:0] dataRx
);   
   // === STARTUPE2 block for board-independent clock and LED ===
   //STARTUPE2 iStartupE2(.USRCCLKO(1'b0), .USRCCLKTS(1'b0), .USRDONEO(LED), .USRDONETS(1'b0), .CFGMCLK(clk));
   
   // === fully independent byte-level (protocol-free) demo on USER2 port ===
   //USER2demo iUser2demo(.clk(clk));
   
   // === JTAG serial to byte-parallel (JTAG clock domain) ===
   
   wire 	    rxStrobeJtagClk;
   wire 	    txStrobeJtagClk;
   wire 	    syncStrobeJtagClk;
   wire 	    evtToggleJtagClk;
   //jtagByteIf if1(.i_dataTx(dataTx), .o_dataRx(dataRx), .o_tx(txStrobeJtagClk), .o_rx(rxStrobeJtagClk), .o_sync(syncStrobeJtagClk), .o_toggle(evtToggleJtagClk));
   
   // === CDX to application clock domain ===
   wire 	    evt;
   toggleDet iTd1(.i_clk(clk), .i_toggle(evtToggleJtagClk), .o_strobe(evt)); // evt changes late
   wire 	    rxStrobe 		= evt & rxStrobeJtagClk;
   wire 	    txStrobe 		= evt & txStrobeJtagClk;
   wire 	    syncStrobe	= evt & syncStrobeJtagClk;

   // === byte stream to bus interface ===

   wire 	    busWe;
   wire 	    busRe;
   reg [31:0] 	    busData_S2M; // readback data (slave-to-master)
   wire 	    busAck_S2M; // readback data valid (slave-to-master)
   busBridge3 if2
     (.i_CLK(clk),
      .i_dataRx(dataRx),
      .i_strobeRx(rxStrobe),
      .o_dataTx(dataTx),
      .i_strobeTx(txStrobe),
      .i_strobeSync(syncStrobe),
      .o_busAddr(busAddr),
      .o_busData(busData),
      .o_busWe(busWe),
      .o_busRe(busRe),
      .i_busData(busData_S2M),
      .i_busAck(busAck_S2M));
   ....  
     ....
     ....

 

Link to comment
Share on other sites

Hi,

the first thing I'd try is to use FTDI's driver DLL for serial port access (assuming Windows) instead of regular serial port functionality.

I think I managed to get a Baud rate closer to 6 MBit/s out of it, instead of the regular 900 kBit/s. Maybe that is already enough.
If you use busbridge3, be aware that most of the complexity is in the software, not on the FPGA.

Using block diagrams, I don't see any easy shortcuts. If you like, you can package the whole thing and make [busAddr, busData, busWe] the output interface for one-way traffic (note - return data is more complex as it needs to be provided and acknowledged in time, as JTAG traffic does not wait for the design. You may need to either guarantee data availability or implement your own protocol similar to the FIFO concept below).

As a starting point I'd suggest the example implementation that blinks a LED. Replace the LED register with your own data sink and delete everything unused from the top level C# file.

Hint: If you want to stream 16 (24) bit data, use an address increment of "0" and a word width of 2 (3) bytes. For repeated writes, there will be near-zero overhead so you should get fairly close to the 30 MBit/s limit.

However, be aware that the required flow control for real-time streaming (that is, driven by the FPGA's clock) is not trivial.

For example, you can build around a FIFO structure that discards excess input values, and a counter that reports how many inbound values have been actually accepted into the FIFO.
Put the counter on a readonly register that resets the counter on a read event.
Send each data block for the maximum capacity of the FIFO, and then at the end of the same USB transaction query the counter. On the PC side, use the return value to calculate how many data points were discarded and re-send them in the next package.

 

Link to comment
Share on other sites

1 hour ago, Weevil said:

So i would like to include it in my block design (i am a real beginner fpga programmer)

I don't feel feel that it's appropriate for me to provide an answer that the author of this fine project will address. I do feel that providing a reply to the snippet of your question, shown above, is quite appropriate.

This is indeed a terrific gift from @xc6lx45 It is not meant to be a project that, in of itself, solves a problem. That's fine. I realize that  it's easier and quicker to be provided with something that come in a forma you feel comfortable with but please take the long term view. You will be much better off understanding Verilog and how this code functions than having a quick fix to whatever your particular problem is that needs being solved. Don't fear a challenge; embrace and engage a challenge. In this case see the project for what it is: a wonderful tutorial to be read and studied offering lessons that you can make your own.

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...