• 0

Feedback on a register file design?


Go to solution Solved by Piasa,

Question

Hey everyone,

I've done the initial design of a register file (16x 32-bit registers, two write ports, four read ports) in VHDL as part of a larger project, but seeing as I am a relative newcomer to HDLs, I was hoping to get some feedback on my design, any errors I may have made, or any improvements I might want to make.

Here is the VHDL:

-- Register file

-- Two write ports, four read ports.
-- For performance reasons, this register file does not check for the same 
-- register being written on both write ports in the same cycle. CPU control
-- circuitry is responsible for preventing this condition from happening.

library IEEE;
use IEEE.std_logic_1164.all;
use IEEE.numeric_std.all;

use work.cpu1_globals_1.all;
use work.func_pkg.all;

entity registerFile is
    port
    (
        clk : in std_logic;
        rst : in std_logic;
        writeEnableA : in std_logic;
        writeEnableB : in std_logic;
        readSelA, readSelB, readSelC, readSelD, writeSelA, writeSelB : in std_logic_vector(3 downto 0);
        data_inA, data_inB : in std_logic_vector(DATA_WIDTH - 1 downto 0);
        data_outA, data_outB, data_outC, data_outD : out std_logic_vector(DATA_WIDTH - 1 downto 0)
    );
end registerFile;

architecture behavioral of registerFile is

    type regArray is array (0 to 15) of std_logic_vector(DATA_WIDTH - 1 downto 0);
    signal registers : regArray := (others => (others => '0'));

begin

    data_outA <= registers(to_integer(unsigned(readSelA)));
    data_outB <= registers(to_integer(unsigned(readSelB)));
    data_outC <= registers(to_integer(unsigned(readSelC)));
    data_outD <= registers(to_integer(unsigned(readSelD)));

    registerFile_main : process(clk)
    
    begin
    
        if(rising_edge(clk)) then
        
            if(rst = '1') then
        	
                registers <= (others => (others => '0'));
                
            else
                
                if(writeEnableA = '1') then
                    registers(to_integer(unsigned(writeSelA))) <= data_inA;
                end if;
            	
                if(writeEnableB = '1') then
                    registers(to_integer(unsigned(writeSelB))) <= data_inB;
                end if;
                
            end if;
            
        end if;
        
    end process;
    
end behavioral;

This design is intended for use on FPGAs, hence the use of default values for the registers.

I appreciate any feedback you might have!

Thanks,

 - Curt

Link to post
Share on other sites
  • Answers 62
  • Created
  • Last Reply

Top Posters For This Question

Top Posters For This Question

Recommended Posts

  • 0
  • Solution

it looks fine.  I would normally have a different port order and have each on a different line for easy copy/paste.  I prefer to have interfaces -- readA, doutA, readB, doutB, etc...  I also place interfaces in an output, input, config, infrastructure order.  In industry the order of infrastructure, input, output, config is more common.  

For implementation, this probably infers registers.  It is possible to construct this with distributed memory, although it is more complex.  It isn't clear to me if the added complexity results in a better design at this size.

The design actually will have priority logic for data_inB if the same address is used.  This is because the last reached assignment will be used. 

Also, the logic has unregistered outputs.  Normally this isn't something that is desired.  This means critical timing paths could be due to logic in multiple modules.  Not sure if there is anything that can be done here though.

You can also add asserts for the "write to same address" case.  this can be helpful in simulation.

Link to post
Share on other sites
  • 1

@zygot,

Oh, I'm around, but you guys have given me a lot of reading material to go through.  Well, that and I haven't been waiting on the synthesizer as much so I haven't been hitting reload on the Digilent forum page as often.

@CurtP,

What @zygot is trying to point out is that I've built my own CPU, the ZipCPU, as a similar labor of love.  It's not a forth machine, but a basic sixteen 32-bit register design, with two register sets of that size.  It's also small enough to fit on Digilent's CMod S6 while running a small O/S.  I'd love to offer you an example register file module, however my own register "file" never managed to get separated out from the rest of the CPU such as you are trying to do.  I struggled to find a clean way to do so, and so didn't.  If you are curious, you can search the main CPU source and look for "regset" and see how I handled it.

In particular, the ZipCPU register file accepts only one write to the register file per clock--not two.  I was convinced that two writes per clock would leave the CPU vulnerable to coherency problems--assuming the block RAM's even supported it.  This register set supports one write and three reads per clock.  Two of those reads are for an instruction, the third is to support the debug port.  (You are thinking about how to debug your CPU already, aren't you?)

I've also written several instructional blog's on this and similar topics.  These cover my view that you should start building a CPU by first building its peripherals, then a debug port to access and test the peripherals, before starting on the CPU itself.  Further blog articles discuss how to build a debugging port into the CPU, how to debug the CPU from that port when using a simulator as well as when online.  I've discussed pipelining strategies, and presented how the ZipCPU pipeline strategy works.  More recently, I've been working with formal methods.  I've therefore presented a demonstration of how formal methods can be used to verify that a bus component works as designed, and then offered a simple prefetch as an example.  I'm hoping to post again regarding how to build an instruction prefetch and cache, as well as how to formally verify that such a module works, but I haven't managed to clean up my code enough to present it in spite of presenting why such a proof would be so valuable.

While I didn't use formal methods to build the CPU initially, I've been finding more bugs using formal methods than I had otherwise, so you might say that I've become a believer.

As a result, I'm right now in the process of formally verifying as much of the CPU's modules as I can.  I've managed to formally verify three separate prefetch modules, including the one with a cache, the memory access components, the instruction decoder.  I've also managed to formally verify several CPU related peripheral components, such as the (yet to be integrated) MMU, counters, timers, an interrupt controller, bus arbiters, bus delay components and more.  This has been my current focus with the CPU.  Once I finish it, I'm hoping to write about how to use the ZipCPU in case others are interested (and I know they are).

I know @zygot dislikes my blog, but you might find a lot of useful information available there to describe the things you've discussed above.

Dan

Link to post
Share on other sites
  • 1

@CurtP,

Simple pipeline's aren't.  Indeed, debugging the pipeline with all of its corner cases has been a challenge for me and I just wanted to build the simplest pipeline I could.  You might wish to start planning for this ahead of time, since I was perpetually surprised by little nuances I wasn't expecting.  I mean, seriously, who would ever load a register from a value pointed to by the same register?  "LOD (R0),R0" ... it doesn't make sense, why would you do that?  Well, GCC created code that did that which my CPU then needed to accommodate.

If you are interested in register renaming and/or out of order execution and stuff ... think now, before you start, about how you wish to represent the state information from within your CPU as you debug it.  This will be important to you.  Without a good way to view and inspect the problem, you won't be able to move forward to working code.

Will you be supporting unaligned instructions?  Classical RISC ISA's don't, but it's something to consider.

When I was designing my own instruction set, the requirement of only writing one register to the register set at a time prevented me from implementing such instructions as push/pop or iret.  In hind sight, GCC handled the missing push/pop so well you'd hardly know they are missing.  Indeed, the CPU is probably faster as a result.

Oh, I should mention regarding flags ... GCC (or any C compiler for that matter) will want the ability to compare and branch off of any signed or unsigned comparison.  That's =, !=, <, >, <=. and >=.  In other words, you will need to support (somehow) 11 conditions.  The ZipCPU sort of cheats and supports only 7 of these, but it's something to remember.  Also, the flags can be a hassle to get the sign bit and overflow bit right.  Don't forget to adjust the sign bit to keep it correct in case of overflow, or your extreme comparisons won't work.

Looking over your ISA, I noticed ...

  1. You don't seem to have any conditional branch instructions.  Are these the j?? instructions?  Do you have a JSR instruction?
  2. I don't see any multiply or divide instructions.  I didn't have multiply or divide instructions in my first iteration, and needed to come back and add them in.  The ones I now have are three 32x32 bit multiplies returning the top 32 bits if signed, the top 32 bits if unsigned, and the bottom 32-bits.  I've also got two 32x32-bit divide instructions, one signed and one unsigned.  The compiler would love me to have a remainder function, or even a 64x32 divide, but in the ZipCPU architecture those require some software to accomplish.
  3. I didn't see any NOOP instruction.  That was another afterthought instruction of my.  Sure, you could move register A to register A, but such an instruction might stall waiting for A to become available, whereas the NOOP doesn't need to read any instructions.
  4. How about that memory access: will your ISA allow 8-bit byte access to memory?  I had to come back and add byte and halfword instructions into my ISA as an afterthought, when I couldn't get the C-library to compile without them.
  5. While from your description it doesn't sound like you'll struggle from this, I had to wrestle with the realities of linking when I first discovered how a linker worked.  There are two basic instructions the linker wants to adjust: load a value into a register, and jump to a location.  The first one was fairly easy, I took two instructions and I could load any value into any general purpose register.  The second one was harder, but I eventually wrote something similar to what you've described above.  I consider this the LOD (PC),PC instruction--or load the value at the next memory address in the instruction stream into the PC.  It's the only instruction I have like it, as all my other instructions fit into 32'bit words with no immediate's following.

If you are interested, you can see my own instruction cheat sheet here, or a longer discussion of the ISA here.

Good luck!  Holler if you get stuck, or when you discover you can't get as far in VHDL as I did in Verilog ... :P

Dan

 

Link to post
Share on other sites
  • 1
On 1/16/2018 at 7:14 AM, CurtP said:

When you say "infrastructure", are you referring to things like clock, reset, and other signals that propagate broadly through the design?

Regarding distributed memory versus registers -- you're correct that this design infers registers upon elaboration. What kinds of tradeoffs are involved in choosing which design to pursue? This register file will be the main GP registers for a superscalar design, so I want to be able to write 2 registers and read 4 registers per clock (when made possible by the pipeline), and I would like to be able to read back a written register the cycle immediately after it was written, if possible. Of course, none of these things should come at the cost of potential data corruption.

Regarding unregistered outputs -- is this because the outputs aren't in the clocked process? I had used this approach prior but noticed that this caused an extra cycle to elapse between when a register was written and when that same register's new value could be read back.

Regarding the priority for data_inB if the same address is used -- is this behavior reliable under FPGA implementation? I had assumed that it would cause some sort of contention that would lead to undefined values. I've often heard the phrase "last signal assignment wins", but wasn't sure if that was something that merely happens in simulation, or if it was a reliable implemented behavior.

Just realized I didn't respond to this yesterday.

By infrastructure, I mean clocks and general resets that are used at startup or otherwise not used during normal operating conditions.

The distributed memory approach was presented a few posts up.  It doesn't make sense for your goal of an easy to read, general implementation.

In many designs, it is preferred to have output registers.  This does add the 1 register delay.  In some cases the unregistered outputs are unavoidable.  When possible, having output registers is nice because the longest path won't be half in one file and half in another.  For re-use, remember that you can't control how other people use your module.

"last signal wins" is a reliable behavior.  That said, it can be abused.  There are structured uses where it can be very useful.  However, it creates a bottom-to-top priority structure.  For this reason, it should only be used in a manner that is unlikely to confuse a reader. 

Link to post
Share on other sites
  • 1

@CurtP,

Let's see ...

The ZipCPU doesn't officially implement a JSR instruction either, even though the compiler *really* wants one.  To deal with this case, I taught the assembler and disassembler that a particular two instruction combination was the JSR instruction: MOV 2+PC,R0 followed by JMP <address>.  Typically, this was implemented as a long jump to the address, since the assembler never knew where the address would be 'til link time, and the linker wanted to place a 32-bit address into the instruction stream somewhere.  As I mentioned before, my long jumps were implemented by loading the value following the current instruction word into the PC, and woodenly encoded as LW (PC),PC.

Actually ... the ZipCPU doesn't even have jump instructions per se, but the assembler hides this lack.  The ADD instruction provides the other alternative: ADD.C <offset>,PC adds, if the condition C is true, the given offset to the PC.  The assembler will quietly turn BRA, BNZ, BLT, etc. into this instruction if the target fits, and the disassembler replaces these instructions with their Bxx equivalents.

The C-library will require sub-word addressable memory for its string operations.  Plan on needing arbitrary 16-bit and 8-bit load and store capability, or giving up on the C-library and implementing portable code.

An unconditional jump does need the capability to load an arbitrary value into the PC, yes.  At issue, though, is how you will come back to your machine code and place that address into your instruction stream after compilation and assembly have both finished without knowing what the value should be.  GNU's binutils helps, but you'll still need to write the hooks for your own processor.

So, moving on to push and pop.  The most common case for these routines is when you want to add (or remove) an item from the stack.  In my case, GCC calculates the stack size ahead of time, and then subtracts the stack size for the whole routine upon startup.  Any register saves will be immediately placed into known positions on the  stack afterward.  Hence, the startup for a subroutine might look like:

subroutine:
  SUB 24,SP
  STO R0,(SP)
  STO R1,4(SP)
  STO R2,8(SP)
  STO R3,12(SP)
  ... compiler generated user code goes here
  LOD R0,(SP)
  LOD R1,4(SP)
  LOD R2,8(SP)
  LOD R3,12(SP)
  JMP R0 ; This is the ZipCPU's return instruction

The neat thing about how I've set up the bus is that only the first of these loads or stores will cost any bus delays.  The second and subsequent (in any string of them) will cost only one additional clock--depending, of course, on the speed of the memory at the other end.

For INT/IRET instructions ... the ZipCPU supports two modes a user mode (where interrupts are enabled) and a supervisor mode (where interrupts are disabled).  On an interrupt or an exception, the CPU just switches register sets in order to switch modes.  The actual mode is kept in the flags register, so any write that changes this mode will cause the CPU to switch modes and hence register sets.  Incidentally, this makes it *really* easy to write interrupt routines: they are just written in "C" as part of the supervisor code.  When the supervisor is ready to switch to the user mode, it just issues a zip_rtu() command.  This turns into an OR 0x100,CC instruction which turns on the interrupt enabled bit and the CPU switches modes.  Incidentally ... getting the pipeline working for this, including all of the corner cases, was a real pain in the bitstream.

To implement a system call, I'd just call a function.  That function would contain the one assembler instruction, "LDI 0,CC", which would then disable interrupts, switching the CPU to supervisor mode--leaving all the user registers intact as though the function were actually called.  From supervisor mode, the software can do what it then likes with those register values.  There are other possibilities for entering supervisor mode as well.  For example, a division by zero error, hitting a debugging break point, at the conclusion of a single-stepped instruction, on a bus error, after hitting an illegal instruction, trying to execute an instruction from non-existent memory, etc.

When the supervisor code has dealt with whatever the exception was, it just calls zip_rtu() which executes a built-in RTU (return-to-userspace) instruction.  There are other built-ins to help out as well, such as zip_save_context(contextp); which stores the user registers into the array pointed by contextp and zip_restore_context(contextp) which does the reverse, etc.  Hence, to swap tasks, you set a timer interrupt.  When that interrupt goes off, you save the registers into an array associated with the current task, and then load the registers from the task you want to switch to.  Once you then return to userspace, the task swap is complete.

Still, the "tough" question early on is: how will you simulate your design, how will you visualize your pipeline, and how will you debug your software (and CPU) once you move to the actual hardware.  These are the real questions you need to answer up front and immediately.  Everything else follows from the answers you give to these questions.

Dan

Link to post
Share on other sites
  • 0
4 hours ago, Piasa said:

it looks fine.  I would normally have a different port order and have each on a different line for easy copy/paste.  I prefer to have interfaces -- readA, doutA, readB, doutB, etc...  I also place interfaces in an output, input, config, infrastructure order.  In industry the order of infrastructure, input, output, config is more common.  

For implementation, this probably infers registers.  It is possible to construct this with distributed memory, although it is more complex.  It isn't clear to me if the added complexity results in a better design at this size.

The design actually will have priority logic for data_inB if the same address is used.  This is because the last reached assignment will be used. 

Also, the logic has unregistered outputs.  Normally this isn't something that is desired.  This means critical timing paths could be due to logic in multiple modules.  Not sure if there is anything that can be done here though.

You can also add asserts for the "write to same address" case.  this can be helpful in simulation.

Thanks for all the info! My background is software, primarily C/C++, so I am still learning the stylistic conventions of VHDL and HDLs in general as I go. It helps to have other engineers point me in the right direction.

A few questions, if you don't mind:

When you say "infrastructure", are you referring to things like clock, reset, and other signals that propagate broadly through the design?

Regarding distributed memory versus registers -- you're correct that this design infers registers upon elaboration. What kinds of tradeoffs are involved in choosing which design to pursue? This register file will be the main GP registers for a superscalar design, so I want to be able to write 2 registers and read 4 registers per clock (when made possible by the pipeline), and I would like to be able to read back a written register the cycle immediately after it was written, if possible. Of course, none of these things should come at the cost of potential data corruption.

Regarding unregistered outputs -- is this because the outputs aren't in the clocked process? I had used this approach prior but noticed that this caused an extra cycle to elapse between when a register was written and when that same register's new value could be read back.

Regarding the priority for data_inB if the same address is used -- is this behavior reliable under FPGA implementation? I had assumed that it would cause some sort of contention that would lead to undefined values. I've often heard the phrase "last signal assignment wins", but wasn't sure if that was something that merely happens in simulation, or if it was a reliable implemented behavior.

Regarding the asserts -- thank you for the suggestion. Someone else also recommended this, and I will be implementing it for simulation.

 

Thanks again for all the help!

- Curt

Link to post
Share on other sites
  • 0
1 hour ago, CurtP said:

I've often heard the phrase "last signal assignment wins", but wasn't sure if that was something that merely happens in simulation, or if it was a reliable implemented behavior.

What's going to happen here is that the synthesizer is going to generate your design and optimize out any signal assignments that cannot be reached (or that are reached but overridden by a later assignment) on a particular path through the process.  All possible paths that modify a particular signal are then strung into a MUX determining that signal's value.  There will be one such MUX for each different signal the process assigns to (and in this case each MUX will also be coupled to a register clocked by someClock).  It's not like a programming language where a series of assignments will be run in order.  So a process like:

someProc: process(someClock) is

begin
    if (rising_edge(someClock)) then
        mySignal <= oneSignal;
        mySignal <= anotherSignal;
        mySignal <= oneMoreSignal;
    end if;
end process someProc;

Will synthesize (elaborate) to a register called mySignal whose input connects to oneMoreSignal and whose clock input is connected to someClock.  oneSignal and anotherSignal won't even appear in the elaboration of someProc.

Edited by Gau_Veldt
clarity
Link to post
Share on other sites
  • 0
9 minutes ago, Gau_Veldt said:

What's going to happen here is that the synthesizer is going to generate your design and optimize out any signal assignments that cannot be reached on a particular path through the process.  All possible paths that modify a particular signal are then strung into a MUX determining that signal's output.  There will be one such MUX for each different signal the process assigns to (and in this case each MUX will also be coupled to a register clocked by someClock).  It's not like a programming language where a series of assignments will be run in order.  So a process like:

someProc: process(someClock) is

begin
    if (rising_edge(someClock)) then
        mySignal <= oneSignal;
        mySignal <= anotherSignal;
        mySignal <= oneMoreSignal;
    end if;
end process someProc;

Will synthesize (elaborate) to a register called mySignal whose input connects to oneMoreSignal and whose clock input is connected to someClock.  oneSignal and anotherSignal won't even appear in the elaboration of someProc.

Thanks for the explanation!

Link to post
Share on other sites
  • 0

@CurtP,

I've been doing HDL designs and development for over 20 years and C/C++ for a lot longer. I started out as ( and still consider myself to be ) a digital engineer... (I won't be surprised if you say "what the heck is a digital engineer?"). I mention this only to say that the commentary that follows is colored heavily by one persons' experience. Take it as a disclaimer if you wish. Feel free to ignore everything that follows.

In general I'd say that a background in coding computers is an impediment to learning FPGA design. To me FPGA design is a digital design endeavour using text and computer assistance instead of schematics. You have more than a syntax issue.... you have a conceptual issue to master. I heartily encourage you to get a few good textbooks that cover the conceptual aspects of both digital design and HDL development. Do read the documentation and reference manuals provided by the FPGA vendors. You are dealing with not just writing robust behavioural source "code" but understanding how the synthesis, place and route and timing tools interpret that source "code". There are some good examples of HDL in the Project vault to help get a sense of good coding styles. Do get comfortable learning how to write simulation test benches and understanding how to use simulation as the tool that it is. All tools can provide false and confusing information if you don't understand how they work and their limitations. As your designs get more complex and demanding so will the issues that pop up and have to be mastered. To me this is an attraction. Good journey.

A friendly whistle in the wind....

Edited by zygot
Link to post
Share on other sites
  • 0
4 minutes ago, zygot said:

@CurtP,

I've been doing HDL designs and development for over 20 years and C/C++ for a lot longer. I started out as ( and still consider myself to be ) a digital engineer... (I won't be surprised if you say "what the heck is a digital engineer?"). I mention this only to say that the commentary that follows is colored heavily by one persons' experience. Take it as a disclaimer if you wish. Feel free to ignore everything that follows.

In general I'd say that a background in coding computers is an impediment to learning FPGA design. To me FPGA design is a digital design endeavour using text and computer assistance instead of schematics. You have more than a syntax issue.... you have a conceptual issue to master. I heartily encourage you to get a few good textbooks that cover the conceptual aspects of both digital design and HDL development. Do read the documentation and reference manuals provided by the FPGA vendors. You are dealing with not just writing robust behavioural source "code" but understanding how the synthesis, place and route and timing tools interpret that source "code". There are some good examples of HDL in the Project vault to help get a sense of good coding styles. Do get comfortable learning how to write simulation test benches and understanding how to use simulation as the tool that it is. All tools can provide false and confusing information if you don't understand how they work and their limitations. As your designs get more complex and demanding so will the issues that pop up and have to be mastered. To me this is an attraction. Good journey.

A friendly whistle in the wind....

Thanks for the advice! My journey into hardware design has been like drinking from a firehose of information so far, but that's part of the fun. You're correct that being accustomed to software development paradigms can be an impediment to learning hardware design. I think a lot of people see the superficial similarity between HDLs and C-style languages and assume that the process will be similar. It has been an interesting exercise to rework my thinking around describing the behavior of a circuit, rather than listing a series of concurrent operations to be performed.

I think one advantage I have going in is that digital logic and integrated circuits have been a fascination of mine since I was a kid. Long before I ever even knew about HDLs, I was pouring over Intel technical docs, and reading about the theory of machine design. Of course, once you dive in to actually designing a circuit, you quickly realize how much you -don't- know. But again, all part of the fun!

I have spent some time reading through Xilinx's guidelines for synthesis, but I haven't invested in any actual books on hardware design. Are there any particular ones that you recommend?

Thanks again,

- Curt

Link to post
Share on other sites
  • 0
19 minutes ago, Gau_Veldt said:

What's going to happen here is that the synthesizer is going to generate your design and optimize out any signal assignments that cannot be reached on a particular path through the process.  All possible paths that modify a particular signal are then strung into a MUX determining that signal's value. 

No, I'm uncomfortable with what this text is implying or stating.

The synthesis tool tries to interpret your source text and implement it using the resources of the target device. The simulator does the same. How  the simulator and synthesis tool interpret what you intend from your source is not always the same. I know this form experience. The synthesis tool might infer a latch where you don't intend to have one ( this is not generally a good thing ). The synthesis tool will try to infer certain logic elements like latches, counters, state machine and replace your code with "code" optimised for a particular FPGA device. You have some control over this behavior in the settings for the Vivado synthesis tool. The reality is that you can write crappy code and be lucky to end up with reasonably functional results because modern synthesis tools are pretty smart. What the synthesis tool can't do is replace flawed concepts or correct basic deficiencies in the design. This is why I encourage you to do the basic work of learning FPGA development from reliable sources. You might find a path to one immediate problem on a user's forum but don't mistake that success as learning the mastery of FPGA development.   

Link to post
Share on other sites
  • 0

I have found the schematic views of elaboration/synthesis/implementation to be very helpful for improving my understanding of VHDL and my target FPGA. It's one thing to see simulation results on a scope, but it's another to see the actual hardware that the tools generate from your VHDL. I try to ask myself as I go "what hardware will this create?", and am picking up best practices bit by bit. For every functional unit of a design that I create, I make sure to walk through the elaborated schematic and understand what each part is doing. I find that making small changes to the VHDL and seeing how it impacts the elaborated design is very useful.

Link to post
Share on other sites
  • 0
5 minutes ago, CurtP said:

Thanks for the advice! My journey into hardware design has been like drinking from a firehose of information so far, but that's part of the fun. You're correct that being accustomed to software development paradigms can be an impediment to learning hardware design. I think a lot of people see the superficial similarity between HDLs and C-style languages and assume that the process will be similar.

You are wise and entering this with "eyes wide open"; I am confident that you will be successful and be rewarded for your efforts.

As to textbooks, all of mine are currently boxed up as I am about to relocate.... Ashenden's VHDL textbook is a standard reference, though my copy is a bit dated. Sorry, the titles or authors of other books don't come to mind.

Altera's Quartus reference manual has some excellent guidance on how to write text syntax that their synthesis tool understand. They also have an excellent "cookbook" of coding syntax for various structures like registers, counters, state machines, etc. Of course imitating good coding styles is not the same as understanding why they are good. Both Altera and Xilinx offer application notes as well example designs with source code. The material can be daunting, even for seasoned professionals, so don't get discouraged by the learning curve.

Of course, sometimes you run into a problem that just doesn't get solved after hours of effort and this forum might help with that. The tools are not perfect and can be an impediment on their own.

Link to post
Share on other sites
  • 0
9 minutes ago, CurtP said:

I have found the schematic views of elaboration/synthesis/implementation to be very helpful for improving my understanding of VHDL and my target FPGA. It's one thing to see simulation results on a scope, but it's another to see the actual hardware that the tools generate from your VHDL. I try to ask myself as I go "what hardware will this create?", and am picking up best practices bit by bit. For every functional unit of a design that I create, I make sure to walk through the elaborated schematic and understand what each part is doing. I find that making small changes to the VHDL and seeing how it impacts the elaborated design is very useful.

Your instincts are very good.. it's great to not have to give advise because the audience has already figured it out for them self.. Insight is hard to come by and can be as important as textbook knowledge. The RTL view of post-route logic ( not the board design schematic ) is a good way to verify that what you want is what you got... and can help resolve confusion. Sometimes you just have to rework your logic to get what you want from the synthesis tool. I still create play projects just for the purpose of understanding the nuances of a particular design/syntax strategy and see what the simulator says and how the bitstream behaves in real hardware.

Link to post
Share on other sites
  • 0

The "priority logic" is quite an important concept to understand.

In this example, it means that if both ports write to the same register, it is guaranteed that port B gets the final word, or "priority".

If this is the desired outcome, case closed. You need priority logic.

Now, often such "special cases" can be ruled out. They may never happen in a properly working design. If two parties try to write the same register at the same time, something has gone horribly wrong. Now what should your circuit do?

This is an almost philosophical argument. Without claiming to be the sole authority (ask your boss...), I'd make a strong point for the following:

Given "impossible" input, the circuit should output "undefined" signals.
Three reasons:

1) It is faster and smaller in implementation because logic can be reduced further. The lower layers of FPGA hell have you waiting for Vivado to tell after seven eternities that it was unable to close timing and / or ran out of space. Again. And again.
2) Simulation results will be nicely green when OK and ugly red when something has gone wrong (or a part of the circuit is idle). In the upper layers of FPGA hell you'll realize that you sampled one clock cycle too late, but the signal was usually still correct. Usually.
3) You don't implement logic that will never be tested (because the conditions that trigger it are impossible at the time of writing). Imagine you work with code that looks like it will do something, only to realize after much head-scratching that what looked like a feature was never completed but just a half-*ssed attempt to make it "robust" against stray alpha particles from outer space.

Personally, I "clear" every single FF bit after its useful life has ended (e.g. local bits and pieces in a state machine) by setting it to undefined.
 

 

Edited by xc6lx45
Link to post
Share on other sites
  • 0

@xc6lx45

I'm scratching my head about how to respond to what you've contributed.... nope can't come up with anything.

I will reveal a secret; shhh... this is just between us. If you look down into the Vivado installation directory you can discover a lot of useful and accurate information. As an example in the ..//data/vhdl/src/ieee_2008 directory you can find the file std_logic_1164.vhdl that is the IEEE extension to VHDL that defines the states of STD_ULOGIC as:

  type STD_ULOGIC is ( 'U',             -- Uninitialized
                       'X',             -- Forcing  Unknown
                       '0',             -- Forcing  0
                       '1',             -- Forcing  1
                       'Z',             -- High Impedance   
                       'W',             -- Weak     Unknown
                       'L',             -- Weak     0       
                       'H',             -- Weak     1       
                       '-'              -- Don't care
                       );

You can also find the same basic information in a good textbook along with helpful and reliable commentary. ModelSim puts useful information onto your computer as well.

Don't take my word ( or anyone else's for that matter ) on any topic posted here as truth. Find the oases of reliably helpful information and remember where to go to when the desert has you seeing things that aren't there. 

 

 

Edited by zygot
Link to post
Share on other sites
  • 0
2 hours ago, zygot said:

You are wise and entering this with "eyes wide open"; I am confident that you will be successful and be rewarded for your efforts.

 

2 hours ago, zygot said:

Your instincts are very good.. it's great to not have to give advise because the audience has already figured it out for them self.. Insight is hard to come by and can be as important as textbook knowledge.

Thank you for the kind words and encouragement!


The first big project I'm working on is.. (wait for it).. a general purpose CPU. Cliche, yes, I know. But I see it as a labor of love, an opportunity to learn many principles of machine design at once, and just something I've been really curious to do for a long time. I'm sure the world doesn't need yet another new ISA, but it's still fun to create one. And I'll be open-sourcing the final product, for whoever might find it useful.

One of my design goals (or I suppose you could call it a meta-design goal) is to prioritize the use of easy-to-read, behavioral VHDL so that people who read through the source can intuitively learn the ins and outs of how a CPU really works. I've looked at a lot of other CPU designs and found that it's often difficult to quickly discern what a portion of the code is actually doing and why, because of heavy use of structural and combinational syntax, which while generally more efficient, isn't as intuitive for a human to parse beyond a certain level of complexity (at least not for me).

So will it be the best/fastest/most technically competent CPU? No. But you wouldn't get that without going the ASIC route anyway (and having a lot more resources and expertise than I do). But I do think I can make a CPU that also serves as a learning aid for others curious about the inner-workings of CPUs.

Anyhow, I digress. Thanks again!

- Curt

 

 

Link to post
Share on other sites
  • 0
2 hours ago, zygot said:

@xc6lx45

I'm scratching my head about how to respond to what you've contributed.... nope can't come up with anything....

 

Hi Zygot,

you also disagreed with what Gau_Veldt wrote so clearly we're not on the same page.

For me, Gau_Veldt's statement is 100.0 % accurate, in a sense that it would hold up in court (note, he wrote "synthesis", not "optimization"). The synthesis process will do exactly that.

Priority decoders - and why to avoid them for performance reasons - is a basic textbook topic.
It strips levels away from combinational logic. It's obvious once you get it.
And it has little to do with the language, but the resulting physical logic.

Edited by xc6lx45
Link to post
Share on other sites
  • 0

>> a general purpose CPU

2 hours ago, CurtP said:

The first big project I'm working on is.. (wait for it).. a general purpose CPU.

CurtP,

you could have a look at J1:
http://www.excamera.com/files/j1demo/verilog/j1.v

It's a beautiful design. Not that I'd propose Forth for any real-world work but it's perfect for a simple hardware implementation. It might even make make "business sense" in Picoblaze territory.
There is a VHDL port, too, but possibly not as concise.

 

 

 

Link to post
Share on other sites
  • 0

A CPU? Hello @[email protected]... are you there? [email protected]?

I am not al all surprised that you would select such a project. This is not a trivial project. Fortunately others are like minded.... such as [email protected] A solid grasp of the fundamentals would be a good prerequisite to my mind. [email protected] is likely a good deal more optimistic about this. I recently spent a few weeks working on a simple controller and found it challenging... though I was going for something quite efficient and scalable in terms of capability and speed. It's a naturally great project. I'd suggest a few less challenging projects to belt out as preparation ( assuming that you want to do this from scratch ). BTW the Ashenden text uses a CPU example to illustrate his concepts. My design was completely from scratch.

 

Link to post
Share on other sites
  • 0

@xc6lx45,

9 hours ago, xc6lx45 said:

Given "impossible" input, the circuit should output "undefined" signals.
 

I had never heard of anyone actually using this practice before.

Some time ago, I was given this article which appears to describe the practice you are recommending above.  The individual who had given it to me suggested that ARM got themselves in a lot of trouble (i.e. stuff not working that should have) by using this practice.

While I'm not familiar with all the details, I did find the article fascinating--and now more so in light of your suggestion.

Dan

Link to post
Share on other sites
  • 0

I think in this case the priority logic is hard to remove in a way that isn't worse.

Some (or all?) synthesis tools will ignore 'x' and '-' and instead replace them with '0'.  This can add some extra logic to do something you specifically didn't care about.

 

Also, I agree that small sandbox designs can be really fun and informative.  IMO, devs tend to underestimate the FPGA in some ways which results in excessive pre-optimization.

Link to post
Share on other sites
  • 0
29 minutes ago, [email protected] said:

assuming the block RAM's even supported it.

BRAM does, but then you can't read in the same cycle.  A Xilinx targeted register file could use DMEM.  This requires four copies of each register for the 4-6 read/cycle case, or two copies for the 1-3 read/cycle case.  The DMEM have a 3-read, 1-write config.  To get 4-6 ports means two copies.  To get the two writes again means two copies and the addition of a small tag ram which can be implemented using registers. 

It is debatable if this is that much better as both should be small for modern FPGAs.  It removes the input muxes for the priority logic as portA now only writes to two DMEM and portB only writes to the others.  It also removes the output muxes as these are built into the DMEM.  The clock to out is higher than registers, but I'm not sure if it is higher than register + LUT6.  The complexity is that the OS either needs to clear the registers at start, or accept the bootup values could be random.  Also, the bits per slice is lower, but given the lack of extra muxes this is probably not a concern.

There are also benefits if the design can use either 32 registers or can use two sets of registers as the DMEM is a 32b config.  This can be used for fast interrupt context swaps or for barrel processors.

In terms of coherency, that is based on if the CPU can ensure it never has a write-write conflict and also can avoid read-before-write where write could now be from two ports.

3 minutes ago, [email protected] said:

Fascinating comment.

Care to elaborate?

In one design, there was a custom written 32b adder that was instantiated in the same file as a 40+ bit adder that ran at the same clock.  (neither adder was in the top 100 nets and the design met timing).  One of these took hours to write, the other took seconds.  I also notice some people add lots of pipeline stages for simple calculations.  This can be fine, but each pipeline stage increases the chance that a future modification will have a pipeline error.  Because this might only show up in rare case, I take steps to ensure the pipeline naming scheme and intent is clear /wrt cycle vs sample delays.  This is especially true when simplified assumptions about the pipeline are no longer true when the module is ported to a new application.  This commentary is probably best suited for another thread as it does not related to the topic of a CPU register file.

Link to post
Share on other sites
  • 0

All of the comments have been interesting and intriguing; for those with a good base of experience. I've been avoiding commenting on specific concepts as details like this might be more confusing noise rather than helpful to the (even well prepared) beginner. There's plenty to time to hone one's expertise for those venturing into more complicated and sophisticated projects with high reliability requirements. As one becomes more knowledgeable about failure mechanisms even seasoned professionals can loose sleep.... I say this from experience. It seems to me... and I admit that my mind works in unusual ways..., that newbies have enough to focus on just to get reasonably competent ( there's the concepts, the tools, the device architecture, etc, etc. ) without having their focus shifted to more esoteric concepts. Having a glimpse of the complexities is not bad... but...    

It's easy to attempt to convey a valid idea and end up just peddling confusion to the mind of the intended audience. Perhaps I'm being a bit too self-reflective this week.

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now