Jump to content

D@n

Members
  • Posts

    2,246
  • Joined

  • Last visited

Everything posted by D@n

  1. @asmi I'm finding about one complaint in Xilinx's forums roughly every week to two weeks. The complaint is typically from a user whose design is locking for what appear to be completely inexplicable reasons. Digging further typically reveals that they are using one of Xilinx's demo designs--either the AXI-lite or the AXI (full) one. Whether or not you run into one of these problems really depends on how you configure the interconnect. If you configure it for "area optimization", you won't be likely to see the bug. Another key criteria has to do with how the bus is driven. One user experienced lock ups when issuing an unaligned access from a MicroBlaze CPU. (Help! The FPGA engineer left the company, and I'm just the S/W guy but ...) Others have run into problems when trying to interact with one of these designs using a DMA of some sort. Another recent issue had to deal with connecting a 64-bit AXI bus to a 32-bit Xilinx generated demo peripheral. It seems a key to triggering either bug would therefore be two accesses in close succession--much as I outlined in my write-ups. As you can see, whether or not the bug gets triggered is highly dependent upon the use case. Worse, when attempting to apply hypothesis testing to find where to look for the bugs (if I change A, the design fails, therefore the bug is in A somewhere), you'll often get sent looking for the bug in the wrong part of the design. Dan
  2. @zygot, Lol. You won't find me putting money down for such a bet either. @zygot, At the risk of taking this thread far off topic, let me ask, What sort of demo would you like to see? I have an AXI performance measurement tool that needs to be tested out somewhere. It's a solution looking for a problem at this point--much like the demo you would like to see. So, again, what sort of demo would you like to see? What particular items are you interested in seeing? Things that would be relevantly useful in demonstrating? I make no promises to implementing such a demo in the near future (my contract time is currently overbooked), but I'd be glad to have the suggestion for when I get a spare moment later. Dan
  3. @lowena, The "official" answer to how to move data from the PL to the PS is that you should build an AXI bus master in your PL design that can write directly to PS memory. A program running in PS can then check that memory for data written to it, and act accordingly. You'll need to beware of the data cache (turn it off--lest you read out of date information from within software), and the MMU (lest you write to the wrong memory address). Once you've dealt with those, writing an AXI master becomes quite doable. Xilinx will also try to push you towards using a DMA to move data from PL to PS. Using a DMA is not a bad idea. Just beware of the bugs in their S2MM (stream to memory) DMA implementation--lots of individuals have gotten hung up on those. (Xilinx's official answer to their S2MM bugs is that they are misunderstood features--but that's a whole nuther discussion.) There are also several ugly/nasty bugs in their example AXI slave designs--so much so that I'd recommend not using them. Better alternatives exist. Dan
  4. @hamster, Impressive! Would you mind sharing the hardware you chose to use? The clock speed of the delta-sigma converter? It also looks like you biased your sine wave negative by a half a sample. Out of curiosity, was this on purpose or just the result of the way you truncated floats to integers when generating your table? I'm a bit surprised by the second harmonic you show in your pictures. I had rather thought this technique would've done better than that. Do you have any suggestions as to what might have caused that harmonic? Either way, good fun, and thanks for sharing! Dan
  5. @zygot, So ... I got to thinking, a virtual FIFO sounds like a really easy core to design--especially with an application like this one in mind. Unlike many AXI cores, a virtual FIFO should be able to hold it's AXI burst length constant. It should also be able to maintain alignment with the memory it's working with--unlike other cores which have to worry about crossing that 4kB boundary when handling transfers with sizes determined at runtime. Even better, there's no end--so you don't have to check for transferring too much. That just really simplifies the AXI master by quite a bit. So ... I got distracted. Here's what I came up with. You can see a generalized trace below--it's what you'd get from dropping the burst size from 256 down to 8, but at least it makes a decent picture. Xilinx declares that their design depends upon their S2MM and MM2S cores. That seemed a bit heavy-weight to me. Those cores require an interface, a programmable data length, a length that might end up different from the programmed rate, a lot of TLAST processing and more. If you just want a FIFO, you can dump all of that junk and make things simple. Thank you for pointing out the utility of something like this. It was a fun diversion to design. Dan
  6. @zygot, Any special reason you aren't using a binary file? It's a whole lot easier to scroll through samples in a binary file than it is through a text file ... fseek works nicely in that case. Dan
  7. @zygot, If you just want to move from an AXI interface to a simpler interface, you can convert it to either AXI-lite or WB without too much hassle. I just might have some bridges to handle that conversion lying around--bridges that will keep the entire bus running at 100% capacity. That would handle your criteria of "a simple bus with address, data, and a handful of simple gating controls". It's unfortunate that AXI-lite is a second class citizen in Xilinx-land. The AXI-lite protocol is quite capable, but many of the AXI peripherals that use it are not (cough, like the AXI BRAM controller that'd drop AXI-lite throughput to 25%). Thankfully, the MIG core doesn't seem to mind one way or another. One of my own criteria when building my AXI data movers was that they should be able to handle 100% throughput even across burst boundaries. Judging from Xilinx's spec, Xilinx's cores don't do this, and so there is a throughput difference between the two implementations. A second difference is that I never limited the transfer to 256kB ... Of course, I don't have a virtual FIFO to offer. Never thought of building one. If I did have to hack something together in an afternoon, it'd be a WB FIFO that then used a WB to AXI conversion (while maintaining 100% throughput ...) Indeed, I did manage to build a fully verified WB stream to memory converter in a single morning, whereas the AXI equivalent took several days to get right. Yes, there's a cost for all this added complexity. I think I might disagree with you about CPU design being one potential or even necessary user of such a complex bus structure. It doesn't need to be so. Indeed, IMHO AXI is waaayyy over designed--but that's another story. That said, I've been burned with cache coherency issues, so I can see a purpose for a protocol that would help the CPU maintain cache coherency. It's just that ... AXI isn't that. Dan
  8. @zygot, This sounds like a fun and perhaps even a nicely well paid task. Nice. Help me understand an overview here ... was that 4 ADCs of 100Msps coming in on each ADC? How many bits per ADC? (16'bits) How wide is the SDRAM you are working with? Stored into DDR3 SDRAM via a virtual FIFO, right? and then, you came off the board onto USB3 did you say? Can you give me any indication of how close you came to the throughput limits of either the SDRAM memory or the USB3 offboard transport you used? Just trying to understand how much of a challenge it was to achieve your objectives here. Were you using Xillinx's AXI crossbar, or was the virtual FIFO the only component that accessed memory? Looking forward to hearing more of this fun project, Dan
  9. @RCB, Why are you converting things to sign magnitude form, vs just leaving them in twos complement again? I'm not certain what's going on. Were this my own project, I'd use an FFT I'd be able to "see" inside of so that I might debug the problem. Specifically, I'd look for overflow problems within the FFT--at least that's the only thing I can think of that might cause the bug you are referencing above. It doesn't make sense, though, that you'd have overflow with one FFT and not with an identical FFT that only differed in output ordering. You might wish to compare the .xml files of the two FFT's to see if they are truly the same as you believe. You might also wish to try dropping the amplitude by a factor of 4x or perhaps even 64x to see if that makes a difference. It might be that you have the scaling schedule messed up and that things are overflowing within. It might also be that you aren't looking at all of the output bits with your bit-cut selection above--I can't tell by just looking at it from here. Dan P.S. I don't work for Digilent, and do not get paid for answering forum posts.
  10. @RCB, Did you notice the glitch in your source signal in the second plot? It's in both data[] and frame_data. You'll want to chase down where that glitch is coming from. After looking at that source signal, I noticed that the incoming frequency of your first image didn't match the 1MHz frequency you described. At 1MHz, you should have one wavelength inside of 1us. In your first plot, it appears that one wavelength fits in 20us, for a frequency of closer to 50kHz? Further, I don't get your comment about holding config_tvalid = 1. If you have created an FFT that isn't configurable ... then why are you configuring it? It's been a while since I've read the book on the configuration --- did you hard code the scaling schedule into the FFT, or are you configuring that in real time? I can't tell from what you are showing. You also weren't clear about what config_tdata is. Was that the all zeros value you were sending? Finally, the difference you are seeing between natural order and bit-reversed order is not explained by the simple difference between the two orderings. There's something else going on in your design. Dan
  11. @Luke Abela, I recently had the opportunity to write a data processing application that used an FPGA as an "accelerator". Sadly, it probably slowed down processing, but the infrastructure is something you are more than welcome to examine and work with if you would like. Data was sent to the FPGA using UDP packets over ethernet, read on the FPGA, assembled into larger packets for an FFT engine, processed, and then returned. Dan
  12. @Davie, No, I don't think that will work. You can read many of my thoughts above. How about this, though: Why not build it, and try it, and then share with us the things you learned in the process? I'd be willing to look over anything you post, and see if I can offer any insights into things you get confused with along the way. Dan
  13. @Ahmed Alfadhel To understand what's going on, check out table 8 of the datasheet on page 15. Basically, the DAC provides outputs between 0 and max, where 0 is mapped to zero and all ones is mapped to the max. In other words, you should be plotting your data as unsigned. To convert from your current twos complement representation to an unsigned representation where zero or idle is in the middle of the range, rather than on the far end, just toggle the MSB. Dan
  14. @FR, No, that's not quite right. Your data rate is not 100MHz, it is 61MHz. Therefore your bin separation should be 61MHz / 65536. Dan
  15. @FR, Since you haven't provided me with enough information to really answer what's going on, here are some guesses: You mentioned that your FFT and FIFO are both running at 100MHz. May I assume that this is your system clock rate? Looking at your image above, it appears as though you have a much lower data rate than 100MHz. Can you tell me what your data rate is? I notice that you are using a FIFO. Can you explain the purpose of this FIFO within your design? If the data rate going into the FFT is at 100MHz, then the FIFO really only makes sense if you have bursty data at a rate faster than 100MHz. I have strong reason to believe that your tlast isn't quite right. Can you verify that if TLAST && !TVALID, then TLAST will be remain true on the next clock? Indeed, is your TLAST generation done at the rate of your incoming data? Or is your counter independent of incoming data samples? I understand you double checked your FIFO with MATLAB. You can read about my experiences with double checking my FIFO here, and the problems that remained unchecked. These are just some ideas I had. They are by no means definitive. It is difficult to be definitive without more information about your design. Dan
  16. @skandigraun, I'm not a physicist, so others might correct me here, but has I understand things audio waves are compression waves. To "read" them, you need to create a diaphragm that will move as the compression wave moves, and then you can read the position of this diaphragm over time. The PMic does this with a MEMS microphone. Consider this to be the meaning of those twelve bits. Be careful with that twelfth bit: it is a sign bit. You may need to extend it to the left some to understand it properly. For example, { int v; v = (sample<<20)>>12; }. It is possible to get volume by simply averaging the absolute values of the various samples. While crude, the estimate should work. Getting frequency is harder. Doing that requires a Fourier transform. However, sound is very often composed of many frequencies, as the attached picture shows. In that picture, time goes from left to right, frequency from bottom to top, and energy comes out of the page. It's taken from the opening of the Cathedral's recording of "Echoes from the Burning Bush." The clip starts with laughter, but otherwise has speech within it. I would particularly draw your attention to how speech has a fundamental frequency associated with it, followed by many harmonics of that same frequency--as shown in the picture. The result is that it can be difficult to say which frequency is in use, as many are present at the same time. One of the book's I have on my shelf is Cohen's "Time Frequency Analysis." In it, Leon Cohen goes through and compares many algorithms for frequency evaluation. At one time I had a paper written that proved that the Short Time Fourier Transform, among his list but widely criticized, was the *only* frequency estimation problem that preserved certain key properties of spectral energy estimation: 1) all energy values should be non-negative, 2) all frequency shifts should produce frequency shifts in the estimation, 3) time shifts should produce time shifts in the estimate, and 4) that the estimate have and achieve the "best" time-frequency resolution as measured by the uncertainty function. Perhaps I'll find a venue for publishing it in the future. For now, you might wish to study the discrete time Short Time Fourier Transform, which is appropriate for the data coming out of the PMic. At one time, I tried to build a digital tuner from sampled data. Such a tuner requires exactly what you are asking for: knowing the frequency of the incoming data. Further, it requires the assumption that there is only one incoming frequency, even when multiple are present (as the diagram shows). To get there, I evaluated the autocorrelation signal that I got by taking the Inverse Fourier Transform of the magnitude squared of the output of a Fourier transform, and looking for the biggest peak. This operation, taking place in time, usually but not always found the fundamental frequency I was looking for. One more thought: you can find forward and inverse Fourier transform code, in Verilog, here, just in case you need it. Hope that helps, Dan
  17. @Yannick, FPGA's can't represent fractions. Looking at your pictures above, you have 8-bit values coming out of your DDS. Hence, the range of these values should be (at most) between -128 and 127. According to the "unit circle" description above, 8-bit numbers are clipped to being between -64 and +64. (There really isn't any +/- 0.5 within an FPGA, but one might think of these values are representing +/- 0.5, since they are nearly half of their full range.) Multiplying two such values together should give you something in the range of -4096 and 4096 (you might think of this as -0.25 to 0.25). Although this could fit into 14 bits, you've got it in 16. Not a problem, just unused capacity. Moving on ... If your coefficients are 16-bits, then they should have values between +/- 32767 (ignoring -32768 for now). Multiplying your 16 bit value with a 16-bit coefficient nominally gives you a 32-bit value. (You are only using 14 bits, so you could spare a bit or two here if necessary ...) If you have 16 such coefficients, log_2(16)=4, so adding the results of these multiplies together might give you an additional 4 bits, bringing you to 36 bits. If you instead had 256 such coefficients, log_2(256)=8, so adding the result of the multiplies together would give you an additional 8 bits instead, bringing you to 40 bits. At this point, you are getting some really HUGE numbers. You and I both know that your signal isn't that big. How do you get back to what your signal was? To do that, you have to track the bit math and the multiplies. If you decide that you started with +/- 0.5 numbers, scaled by 2^7, then your next step left you with +/- 0.25 numbers scaled by 2^14 ... and so on. The reality, though, is you really don't have that many bits. You really only have about 8-bits of information, packed tightly into 40 bits. (Or ... not so tightly ) At this point, you need to drop some bits. Well, actually, you should've dropped the bits agressively as you went along--but that's more of a logic is precious comment, rather than a how it must be done comment. You can figure out how many bits to drop by tracking the maximum value and the standard deviation of any noise working its way through your system. How do you go about shedding such bits? My first approach to doing so was to just drop the low order bits. While doable, this will introduce a DC bias into your result. (I had to dig into this when building my own FFT ...) The solution I found was convergent rounding ... but I'll let you look that one up. Dan P.S. ... I hadn't noticed that English wasn't your native language
  18. @Yannick, Not quite sure what problem you are having as the plots look good from here. (I can't read the scale, though, on those images ...) In chart one, you create two 100 kHz signals and multiply them together. That will create a signal near DC, and a signal at 200 kHz. 200kHz is significantly above your filter cutoff, so ... it's gone. That leaves you with the signal near DC. (I assume you are sampling in the MHz range still ...) The fact that the signal near DC is not constant could be just a transient effect of your filter ... from here and with no more details I can't tell. In the second chart, the two 20kHz signals multiplied together create a signal at 40 kHz and one near zero again. (If this doesn't make sense, work the double angle trig formulas and it should) The 40kHz signal component is quite obvious on the chart. Since 40kHz is below your cutoff, it passes right through without a problem. You can also see an initial startup transient, much like I would expect. I would also expect the startup transients for your filter to be the same length--both for the DC transient as well as for the 40 kHz transient. I can't tell from your charts if there's a difference between the two transients -- since the two charts are on different time-scales. Going back to your explanation above, I'm not sure it makes sense. Ignoring the fixed point issues, a DDS should produce a sinewave between -1 and 1, not -1/2 and 1/2. Second, multiplying two sinwaves together should produce a value that is also between -1 and 1. Now if you add the fixed point issues back in, you'll need to multiply all of your numbers by 2^(N-1)-1 so that they will fit in an N bit number. Hence, your DDS output should be between -2^(N-1)+1 and 2^(N-1)-1. If you assume all your inputs have N bits, then multiplying your two values should then give you a result that fits in 2N bits. (It won't quite use up the whole range ... [2^(N-1)-1]^2 is 2^(2N-2)-2^(N-1)+1 ...) If you then run this though a filter having coefficients of N bits, your result will increase from 2N bits to 2N plus the number of bits in your filter taps, so in this example you'd end up with 3N bits for the multiply alone (neglecting the additional base two logarithm of your filter length for the accumulator portion of the FIR). If your coefficients have 2N bits each, your result goes from 2N to 4N bits, etc. I'm not sure where in this sequence you would get either a [-1/2,1/2] range or a [0,1] range. Dan
  19. @Yannick, Looks good to me! When doing digital signal processing, you really want to plan to use the highest gain that your processing will support. Using integer math, a lowpass filter gain of 0dB means you have an allpass filter--not what you designed. In order to maintain your performance, the coefficients had to be turned into integers. What you should be looking for at this point is that the stop band remains as you would like it. Since it remains at about 40dB, I'd say it's about as good as the original. (You sure you only want 40dB? I was always taught 70dB as a rule of thumb ...) To know how much this filter will "amplify" your incoming signal, just add all the coefficients together. (Works for lowpass filters ...) As for how to make certain you aren't "amplifying" your signal, you sort of need to define what truth is in order to compare against it. Is it 12-bit resolution you want? Then after a filter, you may need to drop the lower bits to get back to 12. However, the devil is in the details in order to make certain that you maintain your 12-bit range throughout your processing chain. To handle things properly, you'll want to make certain that the constant 12'h800 and 12'h7ff signals pass through your processing chain (filter plus whatever else you will be doing to them) and turning into 12'h800 and 12'h7ff signals at the far end --- without overflowing any of the math in the middle. Dan
  20. @mikeo2600, Welcome to the forum! I have an Arty as well, and love it. I'm very much an open source type, so you won't find me using any of the AXI peripherals and I tend to use the ZipCPU instead of MicroBlaze. Still, you can find what I've done with it on GitHub if you'd like. If you need any help getting your own IP up and running, I'd be glad to help out. Dan
  21. @MrKing, Welcome to the forum. Feel free to share your questions, as there tends to be a lot of learning going on here! Dan
  22. We're getting closer. Give me another holler on #digilent-fpga when you're ready to tackle this again, Dan
  23. @hirayaku, It looks like you may have changed or adjusted the parameters of the memory controller. For example, the Zybo runs with a 525MHz clock according to the board support files, not a 200MHz clock or a 533 MHz clock. Can you please check your memory controller configuration against the one in the board support files here, to check that it matches? Thanks, Dan
  24. @Krizzle, @bentwookie, @ChristopherN A couple of fun points, tangent to the conversation. First, if you start typing someone's screen name with an at (@) sign in front of it, you'll pull up a menu of possible screen names. Select from that menu @attila or @JColvin's screen name's, and the name will get highlighted here. Further, when you submit your comment, they'll get a notice that it's there--so they know something on the forum referenced them. (You should get a similar notice, since I referenced you at the top of this post.) I know sometimes the engineers lost track of active conversations that have involved them for some time, but this will help bring it to their attention. Second ... It's the Christmas holiday. (Merry Christmas everyone!) Digilent is closed for a couple days. Further, I expect several individuals on their engineering staff (don't know who ...) will be taking more extended vacations to be with family. My point? It might take the "engineering team" in question a couple days, or even perhaps a week or two, to get back to you and look at this. Digilent isn't all that large, so I doubt the "team" you are referencing is that large either. If they haven't responded here in a week or so, may I suggest you consider highlighting their names (as above) and ... they'll notice. Dan
  25. D@n

    Would you rather this as a venue for discussion?  Are you interested in further discussion?

    Dan

    1. Show previous comments  4 more
    2. D@n

      D@n

      I wasn't certain where to go next with it.  Let me reread it again.

      Dan

    3. D@n

      D@n

      Well, let me start at my own experience.  I have little experience designing for the FTDI chip--I've just used designs from other people that can tell me nothing about their designs.  ;)  I have never used the Cypress chip. 

      On one design I was a part of, the "end-point" controller you mention was a fully capable ARM.  This ARM then connected to the FPGA, but still provided the USB host with some useful interfaces: block device and serial port among them.  The FPGA itself was very small, providing just enough functionality for the mission of this board.  I enjoyed working with this design.

      The more recent design that I have insight into is the XuLA2 design.  This design connects the USB to the device JTAG, and offers little to no high speed transfer capability--just a really slow JTAG.  If you wish to communicate with the S6 on the XuLA board, you are limited to using the user command in JTAG--not very useful.  Further, the XuLA uses a PIC to implement its communication protocol across the USB.  As a final touch, the XuLA2 has a single LED on board that is controlled from the PIC, not the FPGA.  This has its plusses and ... minuses.

      I've tried to coach someone through using this XuLA JTAG interface.  He wants to do image processing with his FPGA.  However, the JTAG interface was dominating any speedup he might've received via the FPGA.  Sure, he could go to an FMC connector ... but that would require a full hardware redesign for him, and he's hoping to do with what he has.  His latest approach has him connecting XuLA I/O to a RPi--but I'm getting off topic.

      My thought was simply this: expand upon this OpenSource capability, with a bigger PIC.  Provide USB endpoints for serial, SPI, some bit-banging, and ... who knows, maybe some other things as well--but whatever you do don't cripple the interface as the XuLA's is currently.  Further, the goal was to make this as generic and opensource as possible, so that other end point functions could be built as necessary.

      At the same time, if your view is that trying to stuff more functionality into an endpoint controller is useless, then ... either I need to drop my approach (which would mean an end to the discussion--save only to ask what approach might be better), or you are not interested in it (again an end to the discussion), or ... well, I'm not sure where to go next.  As I mentioned, I have no experience with the FTDI chip to press the discussion in that direction.

      Thoughts?

      Dan

    4. zygot

      zygot

      Dan,

      Your last message helps clarify things. 

      As for the individual that you are coaching "image processing" is quite open-ended. Is it a PC or SBC communicating with a sensor? A small FPGA board tied to another FPGA board? You don't need answer those questions as they are mostly rhetorical. But you see my point, without know specifically what one wants to accomplish and what the basic requirements are... mechanical size, power requirements, thermal requirements, environmental... etc, etc, are then it's not possible to offer a solution. I had to look up XuLA2... it's a pretty small FPGA for generic image processing. (I still have one of Xess's first FPGA products sitting in a box somewhere.... though I haven't looked into them for a while). I have a LOT of USB interface experience, often involving a PC host application, and all of them had different requirements and unique FPGA interface design solutions. My only thought here is that starting with a fixed hardware platform and trying to cram a design into it rarely works unless the hardware and supporting software support are geared to that particular design. Successful engineering always starts with defining requirements, proposing a solution platform, working out the data rates of interfaces.... an idea that escapes even tech companies with scores of engineers who should know better.

      For what it's worth I've used FPGA boards from Terasic ( I like their NANO boards for prototyping ), Opal Kelly, Digilent, and of course Xilinx and Altera. I have yet to find a development board that does everything I want, or has the connectivity for any project. The ATLYS from Digilent was a nice product and generally well supported. Unfortunately, Digilent was bought out by NI and has decided that instead of wasting money paying engineers to support their products they will let "volunteers" do it for free in a forum environment.

      If you want to work with USB intelligently you have to get down and dirty into the nitty-gritty details by doing some designs using vendor (FTDI, Cypress, Atmel. etc.) tools. USB isn't the only interface around. 1G ethernet is a fine hose for transferring data, with a MAC or not, between boards or a board to a PC or SBC. The Genesys2 has 4 lanes of high-speed transceivers conveniently connected to DisplayPort connector. And then there's PCIe. Lately, I've been playing with ethernet PHYs and PCIe... haven't decided how to use that Genesys2 yet.  

      I suspect that this particular topic has been beaten to death by now but if you want to communicate further you can reach me at my email. eclektek@dejazzd.com.

       

×
×
  • Create New...