JonK 1 Posted August 25 (edited) My modest background has been with VHDL, but it dates back to the early 2000's and I've been away from any HDL for more than 15 years. My last project was a few self-tutorials on the Xilinx 4000 series part, to date myself a bit. (I enjoyed the language experiences but hated the crappy floor planning tool and so floor-planned out everything by hand, instead, with far far better success that way.) I also have only just started at this forum site and have only read a small number of posts here. I hope the following is sufficiently clear. If not, I'll try to improve it based upon comments I receive. I've just downloaded 2017.4 (with update 1) and installed it, successfully. (I had unending troubles using the Xilinx Installer to get 2020.1, so gave up -- though just today I downloaded it using a link their forum person provided plus WGET. So I hope to be installing that soon.) I'm currently intending to modify some existing verilog code (the Abacus project at Digilent -- thanks for that.) In particular, the code that converts binary to BCD. It's based upon the dabble algorithm (easily looked up on Wiki.) Let me first illustrate a few concrete examples to help get across the larger question I have with respect to verilog. I've included a diagram illustrating the following cases: two images: one for "6-bit binary to 7-bit BCD" and another for "7-bit binary to 9-bit BCD", which illustrates one transition in the tree structure. two images: one for "9-bit binary to 11-bit BCD" and other for "10-bit binary to 13-bit BCD", which illustrates yet another transition in the tree structure. one image: phase 1 verilog module to implement a "24-bit binary to 31-bit BCD" case. (I want to know how to write this in verilog as a generic, parameterized module.) one image: phase 2 verilog module, with backward-pruning, to implement a "24-bit binary to 29-bit BCD" case. (I want to know how to write this in verilog as a generic, parameterized module.) one image: what I already know how to do for a "24-bit binary to BCD", which sets up a 49-bit array (input and output) to achieve it. I already know how to do this and I do not need any help with it. I am curious about how the compiler optimizes it and would like to compare those results with ones I hope to implement in tree-form, once I learn how to do that in verilog. (I do not know how to implement generic tree structures in verilog, just yet. It's the point of this question, in fact.) one image: 7400-series implementation of the double-dabble block showing it is combinatorial (as is everything here) -- this is implemented with a verilog "if/else" mapped through LUTs that I already know how to do and do not need any help achieving. Note that with up to N=3, there's no need for a "dabble" module block. Once 4-bit binary is reached, the first dabble module is required. And this works well up to 6-bit binary. At that point, transitioning to 7-bit, a new layer is required in the tree. (The reason has to do with the number of "S3" outputs, which cannot exceed 3 in the higher order bits -- words are difficult here but the pictures illustrate better.) Then when going from 9-bit binary to 10-bit binary, another transition takes place. This happens on every transition from (N mod 3=0) to (N mod 3=1) boundary. I can code up, easily, anything I want where I already know the value of N, a priori. And I may be able to adjust the code shown in the Abacus project by changing the "repeat(13)" to "repeat(N-3)" and then also varying the bit positions. But I think I will also need to modify the content within the begin/end block, as well, depending on the depth of the tree. So it starts getting complicated to work out. I'm wondering if anyone already knows of a good example of a generalized approach to producing arbitrarily complex tree structures in verilog, where the number of bits, N, may have very significant impacts in the numbers of wires and modules, but where it can be parameterized just the same. An example where the number of input bits, N, could be arbitrarily specified, and the number of needed (and varying) output bits would be automatically produced and the necessary tree structures also instanced, and where a top-level module would then be able to ignore (or map) those wires to what's available as output pins could be performed. For example, in the Basys 3 I've got four 7-digit displays. This means only four BCD decimal digits (avoiding hex displays) and this limits the maximum N to 13 (14 would require another display digit.) But this doesn't mean I can't handle N=16. It just means that if I handle N=16, I will only be able to support 16 output bits or 4 decimal digits. Still, I'd like a generic module where I could specify N=16, recognize this means M=19 output bits (the high bit on the higher order digit isn't needed) and then ignore the upper three and just map the lower 16 to what's available (in another physical-related module that knows about the display.) My verilog language knowledge is just beginning. (I've never coded in verilog, only VHDL and always much more physically-oriented, then.) Is there an approach within verilog that allows for this kind of generality with respect to tree structures? (It's trivial if all I'm dealing with is arrays and I don't need help there.) It's the tree structure stuff where the nesting depth varies that is giving me fits, right now. I'm honestly not entirely sure how to proceed (other than some long code block that I'll need to continue to modify in order to support larger and larger N when I feel the need for it) and my google skills appear to be coming up a little short, tonight. The subtle construction of wires is bothering me, but I have a hunch that there's something simple I'm missing in writing the code that once someone shows me I'll immediately feel dumb for not seeing right away. I won't mind a kick in the head, if that's needed. My goal is a module that uses 0 dabble module instances for N <= 3, N-3 of them for 4 <= N <= 6, (N-6)*2+3 of them for 7 <= N <= 9, (N-9)*3+9 of them for 10 <= N <= 12, etc. Now, I already know how to do this without these optimizations. The left-to-right number of device columns is N-3. (Feel free to check me out, using the diagrams below.) If I happen to already know (and I can compute it) the number of instantiations required for the column that is furthest to the right, then I can just make every column carry that many instances. They don't all need them. But that doesn't matter much. It still works okay. Then I just use repeat(N-3), in order to construct each column in sequence, with a module that generates the final column's needed number of instances. (7, in the case of the 24-bit binary input.) This will, in fact, work every time. I have to create a much larger bit array to start with (on the left side) with a lot more 0's in the upper bits. But the process "just works." The reason I'm looking to work out how to make a tree that already does its own optimization is for two reasons: To learn how it can be done and if it can be done, as I may need to know how to do this someday and I'd like to get it out of the way now as a matter of educating myself using this highly simplified conceptual idea; and, To learn how well the compiler optimizations work, by comparing the brute-force method I already know how to implement against an algorithm that self-optimizes and doesn't depend on later analysis by the compiler. I suspect I'll learn something there that is also important to know. Again, for the future so that when I write verilog, I'll know why I'm writing the way I'm writing and will be able to better defend my choices. Thanks so much for the time in reading this, regardless. It is very much appreciated. Jon EDIT: I added a few 24-bit binary to 32-bit BCD schematics to illustrate where this is all headed towards. I'd like to be able to generate any and all of these from a single module, parameterized by N, where N >= 1. I wouldn't mind parameterizing both N and M (where M is the number of output bits), as I can compute M from N from the top level that instantiates the generic module. I'm just curious how I might implement such a generic, given N and M. (I've also added yet another 24-bit binary to 32-bit BCD schematic to illustrate the addition of a backward-pruning method recognizing output bits that must be 0 and can therefore generate a further optimized final version. (This final version would be my end goal, after first figuring out how to generate the not-so-optimized tree structured version that precedes it.) Edited August 25 by JonK added the implementation I know how to do with repeat(N-3) Share this post Link to post Share on other sites