• Content Count

  • Joined

  • Last visited

  • Days Won


Everything posted by D@n

  1. @Josef, I've gotten burned too often by trying to do math in public and getting it wrong, so forgive me if I don't comment on what is and isn't possible for a given clock rate. The HyperRAM Pmod I cited above can handle one 16-bit transaction every 10ns once you get it going. Not sure if that gets you what you need or not. I will say that DDR controllers can be quite hard, SDRAM is complex but (eventually) quite doable, and SRAM and HyperRAM are both easier. Dan
  2. @Josef, If you've never built an SDRAM controller, it can be a challenge. The HyperRAM interface is a bit simpler to use, and the performance is just as good if not better. Dan
  3. @Josef, I had roughly the same problem on the Basys3 board: there wasn't enough on-board RAM for a frame buffer. I chose one solution, and have since discovered others who have used two other solutions. My solution was to store the image in flash. Since the flash was only questionably fast enough to drive a 25MHz pixel stream, I compressed the images using both a small number of bits per pixel (expanded with a programmable colormap) and run-length encoding. This worked great for static images and even a business slide show. If you are at all interested in this approach, one of the keys to my success was the VGA simulator. You can find an article discussing both how it works and how to use it here. I know others (at Digilent even!) have created a sprite-based capability. This allows video generation as part of a pipeline that "adds" items to the display as it moves through. In this manner, there were able to build things like pacman without using a frame buffer. A third approach that I've thought of using is to purchase some HyperRAM. One-bit squared is selling a HyperRAM that takes two PMod ports, yet gives you access to higher speed memory than your Basys2 will likely be able to use. (You can slow it down, though, and still use it) The logic to drive a HyperRAM isn't all that complex, and so quite doable. Hope this helps, Dan
  4. D@n

    High speed output on PMOD ports

    @DPA, Have you looked at @zygot's Differential PMod Challenge at all? It may be exactly what you are looking for. Dan
  5. D@n


    @Junior_jessy, Ditto @xc6lx45's advice. I've spent decades of my life working on signal processing issues to include voice processing. The basic rules were: 1) get it running off line, and then 2) get it running within whatever special purpose hardware (microcontroller, FPGA, etc) is required. This allows us to debug the algorithm where the debugging was easy (Matlab or Octave), so that we'd only need to debug the hardware implementation. As an added benefit, you could send the same samples through both designs (external software, online hardware) and look for differences, which you could identify then as bugs. Trust me, this would be the fastest way to get your design working in VHDL. If you run to the FPGA too fast, you'll 1) spend hours (days or weeks even!) debugging your algorithm, and 2) you'll cement parts of your algorithm before you know that they are working resulting in more rework time. Now, let's discuss your voice cross-correlation approach: It won't work. Here's why: Voice has pitch. You can think of the "pitch" as a fundamental frequency of which they are many harmonics. Pitch is not a constant, it is a varying function. The same word can be said many different ways, while the pitch just subtly shifts it around. A cross-correlation will force you to match pitch exactly, which it will never do. That's problem one. Problem two: Vocal cadence. You can say the word "Hello" at many different speeds and it will still be the same word. Hence, not only does your comparison need to stretch or shrink in frequency to accommodate pitch, it also needs to stretch or shrink in time to accommodate cadence. That's problem two. Problem three: Your mouth will shape the sound you make based upon the position of your jaw and your tongue (and probably a bit more). This acts upon the voice as a filter in frequency that doesn't scale with pitch. That is, as the pitch goes from deep base to upper treble, the same mouth and tongue shape will filter the sound the same way. (This assumes you could say the same word twice and get the *same* mouth shape.) That's problem three. Problem four: Sounds composed of fundamentals with harmonics tend to do a number on cross-correlation approaches. Specifically, I've gotten a lot of false alarms using cross-correlations which, upon investigation, had nothing to do with what I was trying to correlate for. A flute (or other instrument), for example, might give a strong cross-correlation score if you are not careful. Four problems of this magnitude should be enough to suggest you should try your algorithm in Matlab or Octave (I'd be boneheaded enough to do it in C++ personally) before jumping to the FPGA. Computers today have enough horsepower on them to even do this task in real-time, so that you don't need an FPGA for the task. (FPGA's are still fun, though, and I'd be tempted to implement the result in an FPGA anyway.) Were I you, having never worked with speech before, I'd start out not with the FPGA but rather with a spectral raster of frequency over time. I'm partial to a Hann window, but the 50% overlap (or more) is required and not optional without incurring the wrath of Nyquist. FFT lengths of about 20-50ms are usually good choices for working with voice and seeing what's going on within. Then, when returning to the FPGA, I would simulate *EVERYTHING* before touching your actual hardware. Make recordings while working with Octave, prove your algorithm in Octave on those recordings, then feed those recordings into your simulation to prove that the simulation works. Only at that point would I ever approach hardware. Oh, and ... I'd also formally verify everything too before moving to hardware. Once formally verified, it's easy to make changes to your implementation and then re-verify that they will do what you want. You might need this if you get to the hardware only to find you need to shuffle logic from one clock tick to another because you aren't meeting timing. In that case, re-verifying what you are doing would be quite valuable. Those are just some things to think about, Dan
  6. D@n


    @zygot, Here's some light reading on the topic of the speed of the DDR3 SDRAM on the Arty A7: first, on Xilinx's forums, then here again on Digilent's. (Yes, I was dense, I needed to hear the answer twice before it finally registered.) Dan
  7. Start by downloading the spec for the HC06 bluetooth module you have. A quick google search suggests this might be it. Actually connecting the module looks like you may need to do a bit of soldering. Looks like all you need to solder up are the two serial port wires. Dan
  8. D@n


    @oliviersohn, If you want to start simple, you might wish to try the tutorial I've been working on. I have several lessons yet to write, but you may find the first five valuable. They go over what it takes to make blinky, to make an LED "walk" back and forth, and then what it takes to get the LED to walk back and forth on request. The final lesson (currently) is a serial port lesson. My thought it to discuss how to get information from the FPGA in a next lesson. Dan
  9. D@n


    @oliviersohn, Is it possible to have a faster clock for the design? Yes. However, it can be so painful to do in practice that you won't likely do so. You'll need special circuitry within your design every time you cross from one clock "domain" into another. Singular bits can cross clock domains. Multiword data requires an asynchronous FIFO. This circuitry costs time (two clocks from the new domain) to do. Hence you'll lose two slow clocks going from your faster clock speed to the slower one, and two fast clocks going in the other direction There be dragons here. It's doable, don't get me wrong, but ... there are some very incomprehensible bugs along the way. What speed are you hoping to run at? When I first picked up FPGA's, I was surprised to discover the "posted" speed from the vendor had little to no relationship with the speeds I could actually accomplish. For example, despite the 500MHz+ vendor comment, a 200MHz design is really pushing things. 100MHz tends to be "comfortable". However, you may find that the difference between 100MHz and 82 MHz may not be all that sizable. Dan
  10. D@n


    @oliviersohn, Let me start out by disappointing you: The Arty's memory chips will run faster than the interface will, so follow the interface speed. Xilinx's MIG will then limit your design speed to about 82MHz or so. In each 12ns clock, the memory controller will allow you to read 16*8=128 bits. That's the good throughput number. The bad number is that it will take about 20 clocks from request to response. Yes, I was disappointed by the SDRAM when I first got it working. My OpenArty project includes instructions for how to set up the SDRAM interface if you want to use it from logic (i.e. w/o the microblaze). (I'm still working on fixing the flash controller since Digilent swapped flash chips on the newer Arty's ... but at this point the needed change works in simulation, needs to be tested on actual hardware, and then re-integrated with the SDRAM ... but that'll be working again soon.) You may find this blog post discussing how to perform a convolution on "slow" data (like audio) valuable to your needs. Dan
  11. D@n

    Using Convolution Encoder

    @Ahmed Alfadhel, Convolutional encoders are easy things to build. As in *REALLY* easy things to build. Why not build your own? By using the library, you run the issue of not necessarily understanding how the library is built, how it works, or the connectivity it needs. Worse, you'll never be able to debug it if it goes wrong. On the other hand, if you build your own, you'll be using, what, 10-15 lines of Verilog? How hard could that be to debug? Dan
  12. D@n

    UART communication control with CMOD A7

    I've had mixed success with the Digilent chips above 1Mbaud. I've used 1MBaud, 2MBaud, and even 4MBaud. 4MBaud works on one of my boards, but other boards require me to drop down a bit. I can't tell if this is a manufacturing dependent observation, crystals with different tolerances for example, or if there are actually different FTDI parts on my boards that affect this. I do think you'll find dividing your 100MHz oscillator by 25, 50, or 100 easier than whatever you are doing to create a 1843200 baud stream. Dan
  13. Are you sure there's enough memory on a Basys3 board for the frame buffer? If so (and I don't think you do), you will find half of the frame buffer code you will need here--the half necessary to read from the frame buffer, and create a video output stream from it. There's also a decent Video simulator there, both camera and VGA output, that you can use to verify that your core is doing the right thing. Dan
  14. D@n

    Arty A7 flash chip

    Digilent, The original Arty-A7 had a flash chip from Micron on it. This chip is still listed as the flash chip on the schematic. The Micron chip also had a lot of design flaws, making it difficult to guarantee that the chip could be brought to a proper reset state upon startup. This has been a discussion on the forum. I was recently informed that the newer A7 boards now have a Spansion flash device on them? While I like the Spansion flash better, may I encourage you to update your schematic and other Arty documentation to reflect this? Also, can you please share for the record what that new chip is? Thanks! Dan
  15. D@n


    @927465, You shouldn't need to constrain any clock outputs, unless the design uses them in an always @(posedge o_SCK) statement or so such. As I recall, the pull-ups are *REQUIRED*. That may well be why you are failing when you are. You might be able to enable them in your FPGA. That might help. A quick google search turned up this specification. Looks like an updated version of what I'm using, so I'll start using it myself. Dan
  16. D@n


    Some notes from reading through this .. You do have a copy of the SD card spec, right? It'll help to understand all of those CMD's. Ok, looking through the spec, if DAT1 isn't otherwise being used, it can be used as an interrupt output in SDIO, so DAT1 low can make sense. DAT2 when unused can be used as a "Read Wait" signal in SDIO mode. Shouldn't there be a CMD1 issued between steps 10 and 11? (I'm following my own script from here, p6.) There's a loop around your steps 12 (CMD55) and 13 (CMD41). This is appropriate--to wait for the card to initialize. CMD2 is useful for reading the CID register, something I skip in my sequence but okay. You should be able to plug your card into any Linux O/S device and read the CID register as well--I think you can get it from the sysfs filesystem. This becomes a nice check that you've read the right value. Ok, got to the end. Some observations: Up until you start reading or writing, all interaction will take place (for SDIO that is) on the CMD line. This includes both transmit to the device as well as receive from it. It is a bi-directional inout wire from that standpoint. CMD6 is not a valid command. ACMD6 is. However, the ACMD commands need to be preceeded by a CMD55 in order to get access to them Can you read the error response coming back from the card at all? Sounds like a pin might not be acting like it should be. (This is a very wild guess) Can you double check your XDC file? Or try a different PMod port and see if anything changes? Dan
  17. D@n

    Diving in

    @That_Guy, There are many FPGA tricks for making LO's, depending upon the quality of what you are looking for. You can use just the high bit of the phase to be your sinewave output. Works great for low logic, poor for a quality sine wave since you get a square wave with this method instead. Even still, a one-bit sine wave also works great for building simple PLL's. You can use a table lookup. Depending on the quality you need, though, the table size can grow rapidly and exponentially. You can use a CORDIC. This will not only create the sine wave for your LO, but you can use the algorithm to apply the LO as well. Requires no multiplies. However, a fully pipelined implementation for a large number of bits can get quite costly. You can also apply some sort of interpolation to your table lookup. Just two multiplies, and some 128-element tables, and you can get just about all the sine wave accuracy you need. Your choice. Decide what you need for your problem, then go for it. You can find a core generator that will generate any of the above algorithms to your specifications here. Don't forget to avoid both radians and degrees, and don't get tripped up by the complexity of building an NCO (it's not hard at all--once you can generate the sine wave). Dan
  18. D@n


    @927465, I don't have any code I can share, but I have done this before. Can you go through the startup sequence you are sending together with what you are receiving from the card? Perhaps we can debug this. Alternatively, why not monitor the MIO pins when the SD card is started up from the Linux driver in the PS, and use that as the guide that you need to emulate? Dan
  19. D@n

    GPS Pmod

    @HelplessGuy, I can't answer all of your questions, but maybe I can answer some of them. I also have a GPS Pmod, and I have enjoyed it. You can find the code I used to work with GPS within my OpenArty project. That includes not only serial port code, but also PPS tracking code. My example code does not include any AXI peripherals. If you want to use an AXI peripheral, you might find an AXI serial port to be more useful than an AXI GPIO peripheral, as most of the GPS processing you do will be with the serial port. To your question, yes, you can just plug it in. My first test with such a component (on the Arty, so PL alone) is to forward the serial port directly from the GPS Pmod to a serial port, with no logic in between. You can find a description of such a test in the Verilog tutorial I have been writing, on chapter one: wires. That should allow you to view the GPS serial port output, so you can start working on it from there (9600 baud, no parity, 8 bits, etc.). Since you are using a ZedBoard, you might find a bit more work is required to connect such a serial port to the linux kernel on the PS side of the chip. With a bit of Google-foo, I found this article. Hope that helps you out, Dan
  20. D@n

    Position from an Accelerometer

    Theoretically double integration should work. Practically, I've heard it has some serious problems. From a DSP standpoint, each integrator amplifies the noise infinitely at DC. So, to be successful, you'll need to anchor your results to reality on a periodic basis. I'd love to hear of your experience with it (or that of others). Dan
  21. D@n

    Old Knowledge seeking advice

    @Pelvart, If you are so new that you had to look up what an FPGA is, then let me welcome you to a very fun endeavor, and further suggest the EEVBLOG video on the same topic to you. Dan
  22. D@n

    Old Knowledge seeking advice

    @Pelvart, My turn to welcome you! I only started getting seriously interested in FPGA's about 3 years ago. My first board was a Digilent Basys3 board--a good beginner board with lots of peripherals. While I found it a bit pricey at first, the cheaper boards don't have as many peripherals to play with. A good example of a cheaper but (nearly) featureless board would be a TinyFPGA or perhaps even the MAX1000. Digilent also tends to do a better job of supporting their boards, though, so you will probably enjoy the Basys3 more as a beginner. If you are looking for FPGA resources, then I'd like to point you to some of my own favorites: asic-world.com, fpga4fun.com, nanland, and (my personal favorite, created by yours truly) zipcpu.com. I'm also working on a draft Verilog tutorial, but sadly it remains far from complete. You can watch this space though for further/future updates. Dan
  23. D@n

    latency and throughput of fft processors

    @farhanazneen, I've only created pipelined FFT implementations so far. I have yet to create a block FFT, and so to understand how a block FFT might be faster/better/cheaper. That said, the pipeline FFT uses information internally as soon as it is calculated and therefore become avaialble. It must therefore have the lowest latency. I'm not yet certain how this block FFT is built, but my guess is that they have one processing stage (as opposed to log_2(N)) that gets applied to the data, adjusted for the coefficients of the next stage and processed again, etc. This reuse will create a *LOT* of latency, and much more latency than pipelined--as you are noticing. Dan
  24. D@n

    latency and throughput of fft processors

    @farhanazneen, I see your numbers, and double checked them. What I don't get, though, is why you would say that the latency of a pipelined FFT implementation is always greater than that of a block FFT implementation? Shouldn't it be the other way around? Dan
  25. D@n

    Bysis 3 16 bit multiplier project

    @Samson, I love @hamster's solution! Don't forget, though, if you use buttons you'll want to use at least a 2FF synchronizer and (depending on your purpose), you may want to debounce them as well. Dan