• Content Count

  • Joined

  • Last visited

  • Days Won


Piasa last won the day on May 29 2018

Piasa had the most liked content!

About Piasa

  • Rank
    Frequent Visitor

Recent Profile Visitors

The recent visitors block is disabled and is not being shown to other users.

  1. I guess my statement isn't accurate. Mealy/Moore/Mendevev could all be written with two processes. I guess my real statement is that the single clocked process FSM is not what people want to write, logically. But the languages make this style common.
  2. How should the FSM should handle 8 7 0 1 1 8 8 8 7 0 1 7 8 7 0 0 0 1 7? 87011 is not 87017. 8887017 contains 87017. But if 8701 was seen before, the 87017 seen occurs after 8887. 8700017 conatins 870 and 17, but has two extra 0's. I don't think any FSM model allows async inputs that modify state. I've always found that Mealy is the FSM that devs want to write -- even when they write Moore/Mendevev FSMs. Mealy provides next-state/next-value logic -- often avoiding code duplication. But it uses more lines of code so it is rarely used. For this context, VHDL and Verilog are basically the same. This problem doesn't make use of any Verilog/VHDL specific features and both languages handle the general logic design in very similar ways. In terms of using the CE on DFFs vs the CE on BUFGCE, just make sure you constrain the CE signal to the BUFGCE. The CE signal does have a setup/hold similar to a FF and shouldn't be changing close to a clock edge that affects logic. Because designs can have thousands of different CE signals for DFF control sets, but normally only 32 BUFGCE's, the CE feature of the BUFGCE is less used.
  3. not sure what you are trying to do. You have 1Hz going into designated number which connects to 4 bit counter with a comment of "moore machine one cycle". The frequency divider has three outputs of which two go a decoder and a mux for reasons. The states also have names s0 to s5. The FSM seems to be "ignore input until a zero is seen, otherwise advance state". I don't see a relation to the diagram. I don't understand either the block diagram or the problem description or the proposed code. You need to specify the problem in a more understandable format or post this to a forum that has better mindreaders.
  4. Piasa


    Verilog has generate as well, but it is a bit different in the nested generate case. Also, make sure that the area/time trade-off makes sense. The goal is to describe a system that solves your problem. If you need to add 1000000 numbers, maybe you can add 100 at a time and solve the problem over several cycles.
  5. For FPGAs, take the input and make a bit-reversed version. This can be done with a function in VHDL -- reverses a std_logic_vector and returns a value with the same range as input. function reverse(x : std_logic_vector) return std_logic_vector is variable xnml : std_logic_vector(x'length-1 downto 0) := x; variable rev : std_logic_vector(0 to x'length-1); variable result : std_logic_vector(x'range); begin for i in xnml'range loop rev(i) := xnml(i); end loop; result := rev; return result; end function; -- find leftmost 1 and return a 1 hot version function leftmost(x : std_logic_vector) return std_logic_vector is variable xnml : std_logic_vector(x'length-1 downto 0) := x; variable rev : signed(x'length-1 downto 0); begin rev := signed(reverse(xnml)); return reverse(std_logic_vector( rev & -rev)); end function; This requires the unary "-" function, which can be imported from std_logic_signed, or by having a signed vector. If you want the index of the leftmost 1, that would be a priority encoder.
  6. This is a fun and short post on how addition/subtraction can be used for logic and not just numbers. The example I use is "x & -x". This expression takes a vector, finds the rightmost 1 and sets all left bits to 0. The right bits are already 0. If the input is 0, the expression returns an all 0 vector. Thus the expression either gives a 1-hot vector of the rightmost 1, or gives 0 if there is no bit set to 1. There are a lot of interesting expressions. But some are imperfect for use. For example "x | (x-1)" will find the rightmost 1 and set all bits to the right to 1. But if the input is 0 and a 1 isn't found the result is all 1's. I've provided a short table on some of the expressions and what they do and what they do in the all 0 or all 1 case. I think it is accurate but I haven't fully checked all of them. These are rarely used, but are useful for inductive logic. I've found this to be a useful interview question as it allows someone to show they can take an expression like "x & (x-1)" and describe what it does from a practical perspective. It also allows them to describe why the implementation might be good or bad vs other coding choices. The use of the carry chain means the logic will hotspot more than versions that don't use the carry chain. But if the logic is very local this isn't an issue. +-----------------------------------------+------------+----------+ | Task | expression | not found| +-----------------------------------------+------------+----------+ | find the rightmost 1: | | | | leave unchanged (1) | | | | leave bits on left unchanged | | | | leave bits on right unchanged (0) | x | 0 -> 0 | | set bits on right to 1: | x | (x-1) | 0 -> -1 | | set bits on left to 0: | | | | leave bits on right unchanged (0) | x & -x | 0 -> 0 | | set bits on right to 1: | x ^ (x-1) | 0 -> -1 | | set bits on left to 1: | | | | leave bits on right unchanged (0) | x | -x | 0 -> -1 | | set bits on right to 1: | -1 | 0 -> -1 | +-----------------------------------------+------------+----------+ | find the rightmost 1: | | | | clear (0) | | | | leave bits on left unchanged | | | | leave bits on right unchanged (0) | x & (x-1) | 0 -> 0 | | set bits on right to 1: | x-1 | 0 -> -1 | | set bits on left to 0: | | | | leave bits on right unchanged (0) | 0 | 0 -> 0 | | set bits on right to 1: | ~x & (x-1) | 0 -> -1 | | set bits on left to 1: | | | | leave bits on right unchanged (0) | x ^ -x | 0 -> -1 | | set bits on right to 1: | ~x | (x-1) | 0 -> -1 | +-----------------------------------------+------------+----------+ | Task | expression | not found| +-----------------------------------------+------------+----------+ | find the rightmost 0: | | | | leave unchanged (0) | | | | leave bits on left unchanged | | | | leave bits on right unchanged (1) | x | -1 -> -1 | | set bits on right to 0: | x & (x+1) | -1 -> 0 | | set bits on left to 0: | | | | leave bits on right unchanged (1) | x & (~x-1) | -1 -> -1 | | set bits on right to 0: | 0 | -1 -> 0 | | set bits on left to 1: | | | | leave bits on right unchanged (1) | x | (~x-1) | -1 -> -1 | | set bits on right to 0: | ~x ^ (x+1) | -1 -> 0 | +-----------------------------------------+------------+----------+ | find the rightmost 0: | | | | set (1) | | | | leave bits on left unchanged | | | | leave bits on right unchanged (1) | x | (x+1) | -1 -> -1 | | set bits on right to 0: | x+1 | -1 -> 0 | | set bits on left to 0: | | | | leave bits on right unchanged (1) | x ^ (x+1) | -1 -> -1 | | set bits on right to 0: | ~x & (x+1) | -1 -> 0 | | set bits on left to 1: | | | | leave bits on right unchanged (1) | -1 | -1 -> -1 | | set bits on right to 0: | ~x | (x+1) | -1 -> 0 | +-----------------------------------------+------------+----------+ For cases where the all 0 or all 1 case results in an undesired value, you might need to either do a compare and mux, or extend the vector by 1 bit and use the msb for a mux. Hopefully some of these expressions are useful or educational.
  7. You want to use an MMCM or clocking IP core of some form in order to get a 104MHz clock that can be used to get 1000MSPS vs 25/26ths that rate.
  8. You can also import just "+" and "-" from std_logic_signed as well as the conversion functions from std_logic_arith. This way you still are required to specify signed/unsigned for "<", "*", etc... Importing "-" from std_logic_signed will give you the unary "-", which can be used for logical induction in expressions like (-x) and (x).
  9. Piasa

    FIFO CDC and Gray codes

    You might be re-inventing the tree. j/k. FPGAs have async fifo's as hard IP. They also have vendor supplied IP generators. For xilinx, the "primitives guide" has more info. There is also coregen. Someone also mentioned there is a new way to create generic fifo's that isn't the very-limited unimacros. A properly constrained design will ensure that the skew between bits in this bus will be less than one other-clock cycle. In newer Vivado, there is a constraint for this specifically. In older versions of Vivado and all of ISE, the bus was simply constrained to arrive to the other clock domain within one other-clock cycle. This is a little more strict, but is safe. The constraints guide has more info on what your options are based on tool version. But again, for a first design I would avoid manually creating a dual-clock fifo unless it has some specific use that can't be met using pre-verified options. As @zygot mentions, there can be some unexpected latency differences for various flags. The xilinx memory resources guide can give you some information for common use cases. For coregen'd fifos, the relevant user guide (product guide?) can also be useful. I also suggest creating overflow/underflow error flags and having some way to access them. These can be useful in simulation and can be useful to determine issues that occur in difficult to simulate cases.
  10. Probably not. You just need to be aware that the register outputs have some delay from the logic that is in the register file. Registering outputs is "generally good" design. However, it isn't always needed or possible. It is up to you to decide the logical impact in this case and then compare against any performance benefits.
  11. "Last signal assignment wins". I'm not sure if you are pointing out that I missed a word or if you are questioning the LRM.
  12. Y There are several common things that should result in this. Linked-lists are one example, but there are probably a dozen others. Of course that is off topic from a code review of a register file for a cpu that doesn't currently have a C/C++ compiler.
  13. Just realized I didn't respond to this yesterday. By infrastructure, I mean clocks and general resets that are used at startup or otherwise not used during normal operating conditions. The distributed memory approach was presented a few posts up. It doesn't make sense for your goal of an easy to read, general implementation. In many designs, it is preferred to have output registers. This does add the 1 register delay. In some cases the unregistered outputs are unavoidable. When possible, having output registers is nice because the longest path won't be half in one file and half in another. For re-use, remember that you can't control how other people use your module. "last signal wins" is a reliable behavior. That said, it can be abused. There are structured uses where it can be very useful. However, it creates a bottom-to-top priority structure. For this reason, it should only be used in a manner that is unlikely to confuse a reader.
  14. BRAM does, but then you can't read in the same cycle. A Xilinx targeted register file could use DMEM. This requires four copies of each register for the 4-6 read/cycle case, or two copies for the 1-3 read/cycle case. The DMEM have a 3-read, 1-write config. To get 4-6 ports means two copies. To get the two writes again means two copies and the addition of a small tag ram which can be implemented using registers. It is debatable if this is that much better as both should be small for modern FPGAs. It removes the input muxes for the priority logic as portA now only writes to two DMEM and portB only writes to the others. It also removes the output muxes as these are built into the DMEM. The clock to out is higher than registers, but I'm not sure if it is higher than register + LUT6. The complexity is that the OS either needs to clear the registers at start, or accept the bootup values could be random. Also, the bits per slice is lower, but given the lack of extra muxes this is probably not a concern. There are also benefits if the design can use either 32 registers or can use two sets of registers as the DMEM is a 32b config. This can be used for fast interrupt context swaps or for barrel processors. In terms of coherency, that is based on if the CPU can ensure it never has a write-write conflict and also can avoid read-before-write where write could now be from two ports. In one design, there was a custom written 32b adder that was instantiated in the same file as a 40+ bit adder that ran at the same clock. (neither adder was in the top 100 nets and the design met timing). One of these took hours to write, the other took seconds. I also notice some people add lots of pipeline stages for simple calculations. This can be fine, but each pipeline stage increases the chance that a future modification will have a pipeline error. Because this might only show up in rare case, I take steps to ensure the pipeline naming scheme and intent is clear /wrt cycle vs sample delays. This is especially true when simplified assumptions about the pipeline are no longer true when the module is ported to a new application. This commentary is probably best suited for another thread as it does not related to the topic of a CPU register file.
  15. I think in this case the priority logic is hard to remove in a way that isn't worse. Some (or all?) synthesis tools will ignore 'x' and '-' and instead replace them with '0'. This can add some extra logic to do something you specifically didn't care about. Also, I agree that small sandbox designs can be really fun and informative. IMO, devs tend to underestimate the FPGA in some ways which results in excessive pre-optimization.