• Content count

  • Joined

  • Last visited

  1. Non-clocked synchronous circuits

    Thanks Dan, I read that post sometime last week. It did contain some ideas I hadn't considered before, especially the divide by pi example. And, after reading the various links you pointed me to, I think I'm getting a handle on what can go wrong with fabric-generated logic clocks.
  2. Non-clocked synchronous circuits

    @Piasa Thank you for the response. Makes sense. @D@n Thanks again, interesting articles. I love your blog... I'm learning alot.
  3. Non-clocked synchronous circuits

    Thanks for the reply Dan, I've heard the comments about edges on non-clocks but I've also seen plenty of example code where this is done on derived clocks. For example, the sample code for the Digilent PmodCLP (I reference here: ) includes a count-based microsecond clock rather than 1MHz clock generated from some IP core. The code then goes on to use the posedge for this clock to drive a fair bit of the state machine logic. Lots of example code seems to do this. Which makes me wonder if there are times it is okay, or that I should just not read too much into example code. And I've not yet found an article which clearly outlines the reasons why using posedge on something other than a real clock is bad beyond a sentence or two description. Most of it seemed to relate to the propagation of the clock signal and the lack of a dedicated routing for any derived clocks. I like to understand the reasons behind the common wisdom whenever possible. Any recommended reads on this topic?
  4. Non-clocked synchronous circuits

    I was reading Dan Gisselquest's blog (aka @D@n), in particular, this specific part of the one that goes into some detail about the ALU for his ZipCPU: always @(posedge i_clk) if (i_ce) begin c <= 1'b0; casez(i_op) 4'b0000:{c,o_c } <= {1'b0,i_a}-{1'b0,i_b};// CMP/SUB 4'b0001: o_c <= i_a & i_b; // BTST/And 4'b0010:{c,o_c } <= i_a + i_b; // Add 4'b0011: o_c <= i_a | i_b; // Or 4'b0100: o_c <= i_a ^ i_b; // Xor 4'b0101:{o_c,c } <= w_lsr_result[32:0]; // LSR 4'b0110:{c,o_c } <= w_lsl_result[32:0]; // LSL 4'b0111:{o_c,c } <= w_asr_result[32:0]; // ASR 4'b1000: o_c <= w_brev_result; // BREV 4'b1001: o_c <= { i_a[31:16], i_b[15:0] }; // LODILO 4'b1010: o_c <= mpy_result[63:32]; // MPYHU 4'b1011: o_c <= mpy_result[63:32]; // MPYHS 4'b1100: o_c <= mpy_result[31:0]; // MPY default: o_c <= i_b; // MOV, LDI endcase end else // if (mpydone) // set the carry based upon a multiply result o_c <= (mpyhi)?mpy_result[63:32]:mpy_result[31:0]; Dan makes the comment: "Each of the blocks in this figure takes up logic when implemented within hardware. As a result, even if i_op requests that the two values be subtracted, all of the other operations (addition, and, or, xor, etc.) will still be calculated. These other results, though, are just ignored. Thus, on the final clock of the ALU, all of the operations have been calculated, but only the result of the selected operation is stored into the output register." (bold emphasis mine) I found this a very interesting comment. Dan shows how there is effectively a multiplexer based on the opcode on the output of each of the logic chains. It strikes me that this is quite a waste of power, in general, so I wondered how, or even if it is possible to do things differently. Using one of the ZipCPU ops as an example, could one reliably implement something like the following? always @(posedge i_clk) if (i_ce) begin do_cmp_sub_op <= (i_op == 4'b0000) ? 1'b1 : 1'b0; ... rest of the op codes go here ... always @(posedge do_cmp_sub_op) begin {c,o_c } <= {1'b0,i_a}-{1'b0,i_b}; do_cmp_sub_op <= 1'b0; end There are many reasons why this is not something you would do in real life in a CPU, among other downsides it would have the effect of adding extra logic and an additional (at least) 2-clock latency right as the rising edge of the various do_xxxx registers would be one cycle behind, plus you'd need another cycle to turn the clock off so that you could catch a rising edge. So clearly this isn't something one would do for CPU ops that only take 1 cycle. 1) What are the various ways to only have some of the gates / logic working in a system while most of it is quiescent and only run when needed? 2) Does an if statement have the same logic as the case in this respect, i.e. does the logic for i_ce is 1, and i_ce is 0, also both get run but discarded on the input side of a multiplexer as well? 3) What are the options and tradeoffs involved in deciding what to use as the triggers for logic?
  5. I just got a PmodCLP and downloaded the verilog files here: Specifically, the Nexys 3 Verilog Example - ISE 14.2 code. Since I'm using an Artix Arty board, I had to modify the pins used, and the reset is reversed (on the arty the ck_rst is 1 when pressed), but after that things worked fine. The Arty, like the Nexys 3 has a 100MHz clock. In looking at the code, however, I noticed two errors that make the code quite confusing: 1) In the first always block: // This process counts to 100, and then resets. It is used to divide the clock signal. // This makes oneUSClock peak aprox. once every 1microsecond always @(posedge CLK) begin if(clkCount == 7'b1100100) begin clkCount <= 7'b0000000; oneUSClk <= ~oneUSClk; end else begin clkCount <= clkCount + 1'b1; end end Note that it is flipping the oneUsClk once per 100 clocks at the 100MHz input rate. That results in a full cycle every 200 clocks, not 100. To get a 1MHz clock that has a rising edge every 1μs, you need to flip the clock every 0.5μs or 50 clock edges of the 100Mhz clock. But replacing the 100 with 50 won't be correct either, you can't get 100 clock edges by counting from 0 to 100, that's 101 edges. And you can't get 50 edges by counting from 0 to 50, the correct code need to count from 0 to 49 to get 50 edges. The second issue is that the delays are all based on this oneUsClk, while the delay counts are expressed in terms of the original 100MHz clock ticks. You can see that in this combinatorial logic: // Determines when count has gotten to the right number, depending on the state. assign delayOK = ( ((stCur == stPowerOn_Delay) && (count == 21'b111101000010010000000)) || // 2000000 -> 20 ms ((stCur == stFunctionSet_Delay) && (count == 21'b000000000111110100000)) || // 4000 -> 40 us ((stCur == stDisplayCtrlSet_Delay) && (count == 21'b000000000111110100000)) || // 4000 -> 40 us ((stCur == stDisplayClear_Delay) && (count == 21'b000100111000100000000)) || // 160000 -> 1.6 ms ((stCur == stCharDelay) && (count == 21'b000111111011110100000)) // 260000 -> 2.6 ms - Max Delay for character writes and shifts ) ? 1'b1 : 1'b0; The delay for the stPowerOn_Delay is 2,000,000. That's correct for a 100MHz clock. For a 1MHz clock with a rising edge every 1μs, you only want a count of 20,000 not 2,000,000. All of the times are off by this same factor of 100. You can see in the block that increments the count variable that this is being incremented each posedge of the oneUSClk, not the 100MHz CLK: // This process increments the count variable unless delayOK = 1. always @(posedge oneUSClk) begin if(delayOK == 1'b1) begin count <= 21'b000000000000000000000; end else begin count <= count + 1'b1; end end So the 20ms delay is actually a 2 second delay, 1.6ms is 160ms, etc. Finally, if you fix this, you will notice that things don't quite work anymore. The LCD display is now flashing and barely legible because it is now executing the final command (shift-left) every 2.6ms, which is far too fast. So, you'll need to alter the delay for the final state stCharDelay when the write is done to be a more appropriate number. I suggest 250,000μs which means 4 characters scroll by per second which is very legible. This is most easily done by adding an additional condition check for the final state delay like: // Determines when count has gotten to the right number, depending on the state. assign delayOK = ( ((stCur == stPowerOn_Delay) && (count == 21'd020000)) || // -> 20 ms ((stCur == stFunctionSet_Delay) && (count == 21'd000040)) || // -> 40 us ((stCur == stDisplayCtrlSet_Delay) && (count == 21'd000040)) || // -> 40 us ((stCur == stDisplayClear_Delay) && (count == 21'd001600)) || // -> 1.6 ms ((stCur == stCharDelay && !writeDone) && (count == 21'd002600)) || // -> 2.6 ms - Max Delay for character writes and shifts ((stCur == stCharDelay && writeDone) && (count == 21'd250000)) // -> 250 ms - 1/4 second delay for shifts ) ? 1'b1 : 1'b0; where the delay before all the characters are written is 2.6ms and it jumps to 250ms once they are all written. I have attached a file which reflects the corrections I outlined above. PmodCLP.v