D@n

Members
  • Content count

    1712
  • Joined

  • Last visited

  • Days Won

    126

D@n last won the day on September 26

D@n had the most liked content!

About D@n

  • Rank
    Prolific Poster

Contact Methods

  • Website URL
    https://github.com/ZipCPU

Profile Information

  • Gender
    Not Telling
  • Interests
    Building a resource efficient CPU, the ZipCPU!

Recent Profile Visitors

The recent visitors block is disabled and is not being shown to other users.

  1. D@n

    Has anyone ported GNU Radio to a zynq development board

    @miner_tom Last I recall, the universal software radio peripheral (USRP) that would feed the gnu-radio companion was built within a Zynq. That would handle downconversion, initial filtering, and rate selection. You should be able to look this code up on line. It was quite public. Dan
  2. @zygot, Last time I checked, such a capability was controlled under ITAR. Therefore, you won't find me discussing anything that good here. Dan
  3. @xc6lx45, I must not have been clear. In the multiple system test, I just counted clocks between PPS values--there was no official "loop". Alternatively, you might say that the "loop" had infinite bandwidth. I expected some noise. I didn't expect it to be correlated. A noisy PPS might cause them to be correlated, as might other conditions (common power supply, temperature, etc.) The earlier post above, where I revealed < 1us performance, discusses the results of a proper loop filter. Dan
  4. The presentation was comparing verilator 4.0 with the big simulation vendors. 4.0 is a brand new release, so if you tried it a while ago, it's probably 5x faster now. Dan
  5. Did you see the recent presentation showing the Verilator can beat the big vendor simulators in speed by 5x or more? Fun stuff, Dan
  6. For one of my projects I did something very similar years ago. I had four clock sources and one PPS source. I then counted the number of clock ticks between PPS events. Much to my surprise, all four clocks adjusted their speed together over the course of several minutes, suggesting that the PPS was .. less than perfect. In the end, I think I knew less about what was going on than when I started: "The man with two watches never knows what time it is." Dan
  7. D@n

    Hi! I'm new here

    @Austin01. Welcome to the forums! There are many types of engineers here. I tend to work on FPGA's myself. If that's your bent, then let me invite you to visit the ZipCPU blog. Dan
  8. @zygot, The project @hamster mentions above isn't really all that hard to do--I have one of my own. I use mine to calculate the absolute time of a disciplined counter within my own design. In my case, I've measured my Basys3 oscillator's frequency against the PPS to be slightly lower than 100MHz, but memory escapes me regarding just how close it is. (It was better than 100ppm as I recall, but that's being conservative) Such a counter could easily be multiplied by a frequency and then used as a phase index for a GPS synchronized audio output as @hamster suggests. While such an output would be better than a tuning fork, that doesn't really answer your question above. Properly answering the question would require a well disciplined OCXO or better--something I don't have. (I'm not even sure I could properly engineer the power rails for such an oscillator ...) Measuring time or frequency accuracy is tricky, especially since truth can be so hard to come by. After getting some counsel, I discovered that it's often done by comparing a timing measure against itself some time later. In my case, I compared my counter against the PPS one second later to see how close I was. Given my measures, I could regularly predict the next PPS within about half a microsecond or better. This was a bit of a surprise for me, since I was hoping for 10ns or better, but without better tooling I can't tell if it's the PPS or the local clock that's responsible for not doing better. (The simulation achieved much better than 10ns resolution ... ) I suspect the local on-board oscillator. Dan
  9. @zygot, I'm still listening to your preaching, but not getting it. I just don't believe in Voodoo logic design. As background, I've helped the SymbioticEDA team build a PNR algorithm for iCE40 FPGA's, and I'm also peripherally aware of Google sponsoring similar work for the Xilinx chips as well. (Both are part of the nextpnr project) This was how I knew to comment about registered I/O's. There are also metastability problems--problems that simulation won't necessarily reveal, post-PNR or not. (Well, you might get lucky ...) I trust you've been around long enough to avoid these. My point is, having looked into the internals of the various FPGA's, and having written and debugged PNR algorithms, I'm still looking for an example of a design that passes a logic simulation, a timing check, and yet fails a post-PNR simulation. My interest is to know if there's something that needs to be done within PNR to keep this from happening. Do you have such an example that you can share? (Other than latches--you have shared about those, but we already know that latches are bad.) Even better if so, can you explain the underlying phenomenology that caused the problem? Dan
  10. @zygot, I've only ever seen a couple of bugs where one placed solution would work and another one that passes the same timing requirements would not. One bug was fixed by registering the outputs of the FPGA in the I/O elements. The same can be applied to the inputs: register them as soon as they enter the chip, in the I/O element. I've also been counseled to output clocks via ODDRs or OSERDES's. Together, these approaches have kept roblems from cropping up in a PNR dependent fashion. Are you unable to use these solutions? Or do these approaches not apply for your designs? Dan
  11. @zygot, I must be missing something. You've described a post-place-and-route simulation, but you haven't quite said why it was required. Shouldn't this simulation provide identical results to the pre-place-and-route simulation if the design meets timing? Can you share an example from your own experience of a time when a design met timing, but not post-place-and-route simulation? That's question #1. Question #2: Running a full simulation of any design can be quite costly. For a CPU this might mean starting the CPU, going through the bootloader, running whatever program that follows. For a video system, it might mean working through several frames of video. When you use this post-place-and-route simulation, do you go through this much effort in simulation, or can you cut corners anywhere? Still skeptical, Dan
  12. D@n

    working of pipelined FFT architecture

    @farhanazneen, That task cannot be done without a reference implementation. There is no "theoretical" latency value without a hardware implementation: one clock per memory access, one clock per multiply, this schedule for operations, etc. However, if you use a reference implementation, then the result is no longer "theoretical" but rather "as applied". If you wish to use the Xilinx core as a reference implementation, then start a timer when the first sample is sent into the FFT and stop it when the first valid sample comes out of the FFT. Dan
  13. D@n

    working of pipelined FFT architecture

    @farhanazneen, The FFT I just pointed you at *is* a radix-2 FFT. An FFT can be pipelined and either radix 2 or radix 4 (or radix 8 and higher--but no one does that). It can also be a block FFT that is radix-2 or radix-4. The big difference between radix-2 and radix-4 are the numbers of inputs (and output) to the butterfly. A radix-2 FFT consumes two inputs and produces two outputs. A radix-4 butterfly consumes 4 inputs and produces 4 outputs. If you follow that math, for the first stage of a N-point FFT, using a decimation in frequency approach, a radix-4 algorithm will need to store the incoming values into memory until it has values k, k+N/4, k+N/2, and k+3*N/4, for k from 0 to N/4-1. The butterflies will then only operate for 1/4 of of the time, and need to wait for inputs the other 3/4. Similarly, the FFT will produce four outputs at once, while one can move on to the next stage, all the others will need to go into a memory. Hence, your memory requirements for this stage will go up from N block RAM points to 2N, although this new stage will now accomplish the work of two of the radix-2 stages. As for delays ... aside from filling memories, I'm not sure: I've never built a radix-4 FFT butterfly in HDL (yet). I'm not sure how I'd go about handling the the three complex multiplies required. Right now for my radix 2 FFT, I only have to deal with one complex multiply which I can then turn into three real multiplies. With a radix-4 butterfly, does that mean I'd be using 12 real multiplies? Or would those 12 somehow need to be multiplexed to share DSP hardware. I'm not sure--I've never built one. Normally, you just accept the delay of the FFT in your code. Why are you so concerned about the delay? May I ask what application you are trying to solve? Dan
  14. D@n

    VHDL BASYS3 internal clock problems.

    @zygot, We are way off topic, and let I'd love to hear a reason (story) illustrating why timing simulation as you have described it is essential. Perhaps we could take this to a new topic/post? Dan
  15. D@n

    VHDL BASYS3 internal clock problems.

    @zygot, I've spent my time doing logic simulation, and not so much timing simulations. While I can simulate (logically) any logic running even at multiple clock rates, I've never gotten into the analog side of how/when transitions actually take place. In my opinion, I haven't needed it. Maybe there's something I'm missing. That''s not to say there's no use for it--I just don't feel like I've needed it. As to your first question, Verilator simulates sequential (i.e. clocked) logic, and it does so very well. While it will also do asynchronous logic, it doesn't necessarily model any timing delays in that process. Dan