xc6lx45

Members
  • Content Count

    595
  • Joined

  • Last visited

  • Days Won

    35

xc6lx45 last won the day on January 16

xc6lx45 had the most liked content!

About xc6lx45

  • Rank
    Prolific Poster

Contact Methods

  • Website URL
    https://www.linkedin.com/in/markus-nentwig-380a4575/

Profile Information

  • Gender
    Male
  • Location
    MUC
  • Interests
    RF / DSP / algorithms / systems / implementation / characterization / high-speed PA test and creative abuse of Pedal Steel Guitars

Recent Profile Visitors

2712 profile views
  1. Hi, you can search for the keyword "blocking" vs "non-blocking" assignment in Verilog. You can get a hint at the answer (also the "always @(posedge clk) if you look at the schematic in your post. Note the triangle, "C" is an edge sensitive input (which goes down to the transistor level electronics that ultimately define the foundations of "synchronous" logic design as we know it). This only as sneak preview to what a web search will reveal ๐Ÿ™‚
  2. I'm quite sure you can use one account (I have done so on several PCs myself with Webpack). Looking at that license file, it says HOSTID=ANY To me, this looks like (but someone correct me if I'm wrong) that the free webpack license isn't even tied to one specific machine.
  3. Just wondering whether that is worth the trouble. Internally, the pins are all muxed to the same ADC pair. For example, this chip http://www.ti.com/product/CD74HC4051 could do the same anywhere in your design.
  4. Reality check: Production ATE equipment => 6+ digit price range Test lab equipment => 6+ digit price range Basys 3: Low 3 digit price range We can state with confidence it will be able to test a "subset" of features
  5. Hi, I finally got around to clean up my fractals pet project. It includes 30 parallel fractal engines and makes good use of the chip's 90 hardware multipliers, with up to 30 * 3 * 200 MHz =18 billion multiplications per second under full load. The demo still works on USB power - the chip does get a little warm, though. Other than clocking and ASYNC_REG for CDC, the code is 100 % inferred logic (portable). Here you can see it in action. Blurring and glitches are phone camera artifacts. There is some documentation (partly under construction), including an architecture diagram: https://github.com/mnentwig/forthytwo/tree/master/fractalsProject The control code runs on the 32-bit variant of James Bowman's J1B CPU. I use my own small "forthytwo.exe" compiler in C#, which is included in the repo (starting from the link, navigate up). I've tried to deliver it in a way that it does not require Visual Studio to build (.exe for windows is included, and you can build it from a .NET runtime. This may need some editing of the makefile, though). There is a bootloader variant included so feel free to use it as testbed for the J1B, using UART, LEDs, and switches. Note, anything image related is done in RTL - the J1B only calculates vectors that define the image location in "fractal space". If anybody wants to go ahead minus the fractals part, I've done that work for you already (refImpl project). There are also Verilator simulations included - similar to the original J1B repo - that allow interactive (UART => keyboard) experiments with the microcontroller code. The bravest can try their hands on my hand-crafted assembler floating point implementation, which is a little bit rough in some places but generally works nicely if I am aware of the limitations (e.g. no internal rounding, just truncation). Bonus: 2D matrix math feat. Bhaskara's 7th century sine approximation for the rotation matrix. Happy hacking! -Markus
  6. Yes but I had already proven that it works
  7. "Due to recent events": Please be aware that by default, the FPGA (Xilinx 7 series) wakes up with PLLs possibly still unlocked. This is explicitly stated in UG908 p. 70 and shouldn't surprise anyone who has read the manual ๐Ÿ™‚ See "BITSTREAM.STARTUP.LCK_CYCLE" option. "Recent events" means that yesterday, I found that my design (which started from the first clock cycle), worked 100 % reliably on one board but failed with 100 % certainty on the other board. Identical RTL with initial delay in SW did work reliably on both boards so this is a shy little bug ... Adding simple reset logic based on the .locked signals from the PLLs solved the issue, now both boards are functional.
  8. OK... you might still check whether the register survives retiming. Use (* DONT_TOUCH = "true") attribute on muxr and similar registers if in doubt. BTW, if you want to design a board and are not using much of the chip's floorspace, check other FPGA brands e.g. Lattice. They seem more straightforward in the external circuitry (see "tinyfpga" boards for a reference, or the ICE40 EVB. AFAIK it has on-chip non-volatile memory so the flash should be optional. What's left on the board is mostly decoupling capacitors...). It's also somewhat slower which should be more forgiving with regard to noise, signal integrity etc.
  9. are you sure you are registering your combinational logic before the output driver? See my "digital hazards" comment a few posts back... It is a fairly common misunderstanding, where reality differs from simulation (*) - synchronous design methodology guarantees correct signal levels at clock edges ONLY. It gives NO GUARANTEES WHATSOVER between clocks: For example, when you have a sequence of 0 bits, there may be spikes between clock edges and the circuit is still functioning correctly as designed. (*) it will show to some extent if you'd run e.g. "post-implementation timing simulation" with actual LUTs and routing delays. But as said, the first thing to do is check whether the outputs are registered, without any combinational logic on the output side.
  10. Check the warnings. Most likely, there is one related to this construct: >> if(rising_edge(i_clk) and i_clk ='1') then the 2nd part is redundant (can't have a rising edge without the clock being high). But, FPGA-internally, information and clock signals are kept electrically separate (clocks drive edge-sensitive inputs, information signals drive level-sensitive inputs) and mixed only in exceptional use cases like ASIC emulation. In a nutshell, use clocks only in "rising_edge" constructs => "synchronous" design methodology. Can you reduce the test case so it only shows the error and nothing else? Most likely, when that is done, you'll see the root cause for yourself.
  11. Thanks, that's a nice idea. I don't think it would be timing critical, and the complexity could be hidden away in a module.
  12. For comparison, I got the following LUT counts for James Bowman's J1B (16 bit instruction, 32 bit ALU) CPU which I know quite well: * 673 LUTs = 3.3% utilization of A7-35 with 32 stack levels in distributed RAM (replacing the original shift register based stack which does not look efficient on Xilinx 7 series) * 526 LUTs if reducing the +/- 32 bit barrel shifter to +/- 1 bit, but the performance penalty is severe (e.g. IMM values need to be constructed from shifts). * 453 LUTs if further allowing one BRAM18 for each of the two stacks. This includes a UART and runs at slightly more than 100 MHz but memory/IO need two instructions / two cycles. So the RISC "overhead" does not seem that dramatic. It's slightly bigger, somewhat slower but has baseline opcodes (e.g. arithmetic shift and subtract, if I read it correctly) that J1B needs to emulate in SW). It would be interesting to know where the memory footprint goes when I use (soft) floats. I've done the experiment in the recent past with microblaze MCS, and did not like what I saw. On J1B I need about 320 bytes for (non IEEE 754) float + - * / painfully slow without any hardware support but it keeps the boat afloat, so to speak. Using C instead of bare metal assembly would be tempting.... I just wonder how much effort it takes to install the toolchain.
  13. I'd consider attaching a wire to the grounds, possibly from the bottom side. I'd leave the 3.3 V alone. BTW, are you aware of the problem of digital hazards ? Essentially, you are not allowed to drive IO outputs from combinational logic but must register each output. This might be responsible for some of the noise and would cause weird problems when driving edge-sensitive inputs on fast external ICs e.g. a shift register. You can use an additional higher frequency clock to not introduce more delay / jitter than necessary (say, 200 MHz). Either use the same PLL with a 2nd output at a multiple of the signal frequency, or use a 2nd PLL at arbitrary high frequency and make the the timing to the original clock "don't-care" with a false_path constraint between the clocks e.g. #set_false_path -from [get_clocks -of_objects [get_nets clk1]] -to [get_clocks -of_objects [get_nets clk2]] #set_false_path -from [get_clocks -of_objects [get_nets clk2]] -to [get_clocks -of_objects [get_nets clk1]]
  14. xc6lx45

    Cmod A7 programming

    Hi, isn't there a SDK menu option "Xilinx/program flash"? I'm checking this from a Zynq project but I think it looks the same. Hint, if you run into weird tool issues during the programming process, make sure the Vivado-side hardware manager is not interfering with the device.
  15. I doubt it will fix all the problem but I am fairly certain that 10 ยตF caps are unsuitable. 100 nF is more like it. Capacitor impedance is a V-shaped curve over frequency - you want to be on the left half with frequencies of interest (also harmonics, not just the fundamental digital switching rate). Use a capacitor that's too big and you end on the right half, where it goes up, steeply (the capacitor has just become an inductor) All that said, the CMOD A7 is a great little board but with a single GND pin it does have its limitations. You could have a look at Trenz TE 0726 ... I think I've helped them sell a few of those over the years ๐Ÿ™‚ but then it's not nearly as convenient e.g. bring your own JTAG and power. The "current return path" topic is something you should look into (much easier with a four-layer board). As a quick hack, note that there are additional ground connectors on the PMOD connector, on the opposite side of the board (which is good).