Inverse Transform Sampling method on FPGA

Bobby29999 · April 6, 2020

Hi,

I am working on pseudo random number generation topic. To be precise, Inverse Transform Sampling method , i.e., a method for generating sample numbers at random from any probability distribution given its cumulative distribution function.

The problem that the inverse transform sampling method solves is as follows:

- Let X be a random variable whose distribution can be described by the cumulative distributive function Fx.
- I want to generate values of X which are distributed according to this distribution.

The exact reference of what I would like to implement on FPGA is : https://en.wikipedia.org/wiki/Inverse_transform_sampling#Examples

Questions:

- What would be the mathematical challenges? I mean challenges in context of natural log and then data types like integer, fixed and floating points.

- What should be the right approach? I mean the starting point.

I am looking for some example codes (if any) and helping literature that would help me to understand the granularity of implementation on FPGA.

Looking forward to suggestions. Thanks a lot !

D@n · April 6, 2020

@Bobby29999,

Yea, I saw your post on Xilinx's forums as well.

Some of your question depends upon your timing requirements. How often do you need a new random number? That might determine whether you use a state machine or a deep pipeline.

You'll also need a nearly uniform psudorandom number to start your processing off. That's the easy part. If you need anything more complicated, AES isn't that hard to write in Verilog. I think there's even a core on OpenCores that handles AES, so that should at least get you started with some nice randomness.

For calculating the logarithm, I might recommend using either a lookup table or a lookup table with a linear interpolator between points. I've written about linear interpolators before. I thought about writing out how to build one that used a table for lookup, but the design ended up being so simple there wasn't much to it. Probably not anything worth blogging about. If you wanted, you could count the number of leading zeros in a prior step and calculate an exponent--that would help the log table be more exact, but I'm not sure what you'd then do with the exponent in the next step.

Your biggest challenge will probably be the fixed point representation you will be forced to work within, so that's probably something you want to work from the beginning. What are your *requirements*. Those will likely be determined by your ultimate ADC (if using one) or floating point representation (if you are just building a co-processor). The ultimate bit width you need and scale factor (if any), together with the speed that you need to produce these results, will then drive the rest of your design choices. Don't forget to be aware of overflow along the way.

Once you have a stream of results, you might find generating a histogram and checking it to be a valuable part of knowing if you've done your job correctly.

Good luck!

Dan

P.S. If your goal is just to generate Gaussians, and you only need a small bit width, there are much easier ways to do it

zygot · April 8, 2020

40 minutes ago, Bobby29999 said:

And trust me, these were not homework questions

So, out of curiosity, what is your experience... supposing that you actually need to implement your assignment in and FPGA? You might get responses pointing to literature that is better suited to your needs.

Bobby29999 · April 8, 2020

Hi,

thanks for your valuable feedback. As a newbie I will never take this valuable and nice forum for granted. I respect it. Please excuse my haphazard questions. Most of my questions got answered and I say thanks. And trust me, these were not homework questions. Having said that, you will see , the quality of my questions will improve next time. I can assure you.

zygot · April 8, 2020

17 minutes ago, D@n said:

You might be missing it now

No, late last night I finally reset my reaction to what I thought the question was by carefully reading the original post. While I admit that, to a degree, I share your feeling that "We've been had", that reaction has morphed into a more sympathetic, or at least broader perspective. As I'm prone to being philosophic, and I see a lot to consider here, I choose to leave this post behind, as you do, but for different reasons.

D@n · April 8, 2020

@zygot,

22 hours ago, Bobby29999 said:

Can you please tell me what should be the main design points that have to be considered w.r.t. FPGA?

I missed the question at first. You might be missing it now. @Bobby29999 has no intention of building this design. This is just an academic exercise. Notice his question, "what should be the main design points that have to be considered w.r.t. FPGA"? I think by this point we've already given him his answer too: data rate, precision, data flow, etc. Those are several of the "design points that have to be considered." He's not asking for the solution, but is rather just looking to answer a homework problem regarding the tradespace.

7 minutes ago, zygot said:

This reference about using an FPGA to essentially run C code on a soft-processor in an FPGA kind of demonstrates my point about being careful about where you ask questions.

Be careful about where you ask question ... yep! Someone on a Xilinx forum asked what the easiest way was to dynamically control (either a PLL or an ADC) via its AXI port. One prolific poster replied that the "easiest way" would be to use a MicroBlaze CPU. Not a small AXI-capable state machine, but a full up CPU. A CPU that would then need flash (ROM), RAM or SDRAM memory, software, chances are a serial port and most definitely an AXI interconnect if not also some internal AXI upsizers/downsizers and an AXI to AXI-lite converter. Yeah. That's the easiest way to control a hardware element with an AXI interface? I mean, maybe if you already have the CPU on board then adding another peripheral isn't all that hard, but if you are doing straight RTL design then an FSM would've been the easier way to meet his requirements.

Yeah, be careful about where you ask your questions is definitely good advice.

Dan

zygot · April 8, 2020

4 hours ago, xc6lx45 said:

This leads to a very interesting question is, "why does it not make any sense"? Because it does not use the FPGA efficiently.

This reference about using an FPGA to essentially run C code on a soft-processor in an FPGA kind of demonstrates my point about being careful about where you ask questions. It's not that it's likely that anyone habituating this forum will disagree with what @xc6lx45just posted but our approach to your questions are different. Basically, engineers try to design the simplest, most elegant solution to a problem in the context of cost, time and a few other criteria. But to quote a line from a classic Joe Jackson song, "You can't get what you want until you know what you want". In our case knowing what you want requires starting off with a set of specifications which clearly and completely defines the problem to be solved along with the constraints of cost, time etc..

We don't have that information nor the context of your exercise. There's a hint in what you've provided about the lack of parallelism in common computer architectures. Asking for something faster or more efficient is fine but it isn't a reasonable specification.

While we don't want to do your assignment for you, we also don't want to confuse or introduce specious complexity to your endeavor.

xc6lx45 · April 8, 2020

Playing the devil's advocate, you could put a softcore processor on an FPGA, then recompile the standard C code. Work done in a day (plus a week if you're bringing all the tools up for the first time) but does not make any sense.

This leads to a very interesting question is, "why does it not make any sense"? Because it does not use the FPGA efficiently. This is unfortunately a common problem to educational projects, "just do anything" even if it makes no sense. Which is unfair because providing answers would require more expertise than can be expected from a student. Instructors are lazy.

A possible answer to my own question, "yes but we need 1000x higher data rate than the softcore CPU delivers" or "it may use no more than 500 LUTs and only a single BRAM". An FPGA can do many things but the design effort for what starts to look like a "real" design task will skyrocket, e.g. a pipelined implementation with multiple independent operations in flight at the same time.

A hint, fight laziness with laziness and try to negotiate away any auxiliary tasks like starting value generation (e.g. agree on a hardcoded seed, or have it sent via UART as hex number).
For example, one strategy (that might even make sense in some scenarios e.g. when the CPU is there anyway) would be softcore CPU based with a few FPGA-optimized hardware accelerators for critical functions.

zygot · April 8, 2020

On 4/6/2020 at 5:52 AM, Bobby29999 said:

I am looking for some example codes (if any) and helping literature that would help me to understand the granularity of implementation on FPGA.

OK, I looked at your Hash reference which made me realize that I needed to re-read your original post again at bit more carefully My instinct is to approach questions on this forum as if I were wanting to solve the same problem as posted. You seem to be the exception.

So, addressing the above quote in your original post I'd say that you need to start by understanding what an FPGA does and what resources it has for solving a particular problem. You can find this information by FPGA vendor literature. There's a lot of literature to read. I have no point of reference as to your knowledge and experience in this area. Depending on your experience reading user guides and device overviews might not be very useful as the device resources are just one aspect of understanding how they might be used to solve a problem. You also need to understand, to a degree, design flows, design tools, and in your case the mechanics of expressing logic in an HDL like Verilog or VHDL. I suspect that you are looking in the wrong places for information in order to perform your assignment.

Bobby29999 · April 7, 2020

@zygot

If I may ask you the same question....If my input is the string of any size and I have to use this following Hash function (https://burtleburtle.net/bob/c/lookup2.c ) in order to get a integer seed as an output. Can you please tell me what should be the main design points that have to be considered w.r.t. FPGA? If I have to implement this complete Hash Function on FPGA.

zygot · April 7, 2020

2 hours ago, Bobby29999 said:

Regarding your enlightening point, do you mean why use a pseudo-random generator (Jenkins hash) to generate a different pseudo-random number generator (Inverse Transform Sampling) ?

Yes, that is why I subconsciously misread your original post, and because I've spent too many weeks confined to quarters.

Bobby29999 · April 7, 2020

@D@n

I understand what you mean and I agree.

In order to make my life easier ......If I say, the starting point has to be the 32-bits Jenkins Hash (https://burtleburtle.net/bob/c/lookup2.c) to get the seed for the pseudo-random generator, what should be my design considerations on FPGA for the first part. The first part being 32-but Jenkins Hash to get the seed. The input to the 32-bits Jenkins can be being any arbitrary size string.

@zygot

No you have not hijacked. You had good points. Regarding your enlightening point, do you mean why use a pseudo-random generator (Jenkins hash) to generate a different pseudo-random number generator (Inverse Transform Sampling) ?

zygot · April 6, 2020

@xc6lx45,

I wasn't intending to nit-pick your comment, it just reminded me of interesting corners of tools, like computers, that we make assumptions about every time we use them that aren't always warranted. And, I'm not talking about bugs and badly designed hardware and software components. But still... looking into it seems worth considering as an unscheduled adventure.

Apologies if anyone feels that I've high-jacked the topic. I admit that I mis-read the original post as attempting to generate true random numbers. But perhaps I could be enlightened as to why anyone would use a pseudo-random number generator to generate a different pseudo-random number generator.

xc6lx45 · April 6, 2020

4 hours ago, zygot said:

It's been too long ago but I do remember taking the scenic side journey into investigating performance of floating point on Intel processors. Mostly what I remember is that it was interesting, informing, had unexpected surprises and was a valuable exercise. Just recommending the excursion to anyone interested in 'bit exactness'.

Well OK, what I wrote was maybe not accurate to the final digit ...

What I meant is this

"The IEEE standard goes further than just requiring the use of a guard digit. It gives an algorithm for addition, subtraction, multiplication, division and square root, and requires that implementations produce the same result as that algorithm. Thus, when a program is moved from one machine to another, the results of the basic operations will be the same in every bit if both machines support the IEEE standard. This greatly simplifies the porting of programs. Other uses of this precise specification are given in Exactly Rounded Operations."

Intel got quite some publicity on their division bug so I'd assume those should work properly (note, not a word about logarithms) but yea, this is not my specialty area

D@n · April 6, 2020

@Bobby29999,

22 minutes ago, Bobby29999 said:

The physical reality is, the "random" value is pseudorandom, and the hash function is based on Richard Jenkin's 32-bit mix function. It takes integer inputs and the output is a random distribution.
https://burtleburtle.net/bob/c/lookup2.c

Ahm, no, that's not a "physical reality". Pseudorandom was an understood input to the algorithm, if you had wanted true randomness that would've been a topic in and of itself. By "physical reality" I'm referencing something physical--something that you can see, touch, and either measure or interact with in the real world--not in the computer algorithm world. The "physical reality" should define your data rate requirement and the number of bits you need both in terms of inputs and outputs. The "physical reality" would also explain why you are using an FPGA in the first place. Is the equivalent software function too slow, for example? Is this part of a larger data processing or simulation algorithm? Is it part of a data co-processing application, where you are processing externally generated files absent of any real time requirement? All of this you haven't shared. So be it, your call.

You also haven't mentioned how you intend to get data into and out of this routine. You might find that to be as challenging as the algorithm itself--especially if you are this new to FPGA processing.

Dan

Bobby29999 · April 6, 2020

@D@n

Thanks for your really helpful tips.

The physical reality is, the "random" value is pseudorandom, and the hash function is based on Richard Jenkin's 32-bit mix function. It takes integer inputs and the output is a random distribution.
https://burtleburtle.net/bob/c/lookup2.c

D@n · April 6, 2020

@Bobby29999,

3 minutes ago, Bobby29999 said:

"Some of your question depends upon your timing requirements. How often do you need a new random number? That might determine whether you use a state machine or a deep pipeline."

Very interesting point. TBH, I did not think in this dimension (timing..should have thought). Can't answer it exactly but this I know that the random value portion of the code is a slow part. The random function is based on hashing. It takes integer inputs and the output is a random distribution....but is repeatable....Thus pseudo random.
If its slow, should I go for a deep pipeline?

It depends on your algorithm. Which algorithm you choose depends upon your timing requirements, and the bit-level precision you require. Usually some physical reality is driving these things, but you haven't (yet) shared any such reality with us. Perhaps this is a homework problem (that's okay too ... I'm not writing your solution for you). But the "right" answer is really dependent upon these external realities.

6 minutes ago, Bobby29999 said:

"You'll also need a nearly uniform psudorandom number to start your processing off. "

Do I still need Uniform Pseudorandom number when in my case the mapper uses Exponential Distribution? I mean the algorithm I am working on says it computes exponential random variable using inversion method.

Of course! Otherwise what input are you going to use for your mapper? Or have I misinterpreted your problem statement?

Dan

zygot · April 6, 2020

16 minutes ago, xc6lx45 said:

For example, IEEE 754 guarantees bit-exactness in many cases

It's been too long ago but I do remember taking the scenic side journey into investigating performance of floating point on Intel processors. Mostly what I remember is that it was interesting, informing, had unexpected surprises and was a valuable exercise. Just recommending the excursion to anyone interested in 'bit exactness'.

Bobby29999 · April 6, 2020

@D@n

"Some of your question depends upon your timing requirements. How often do you need a new random number? That might determine whether you use a state machine or a deep pipeline."

Very interesting point. TBH, I did not think in this dimension (timing..should have thought). Can't answer it exactly but this I know that the random value portion of the code is a slow part. The random function is based on hashing. It takes integer inputs and the output is a random distribution....but is repeatable....Thus pseudo random.
If its slow, should I go for a deep pipeline?

"You'll also need a nearly uniform psudorandom number to start your processing off. "

Do I still need Uniform Pseudorandom number when in my case the mapper uses Exponential Distribution? I mean the algorithm I am working on says it computes exponential random variable using inversion method.

xc6lx45 · April 6, 2020

I think you need to think carefully about the quality of your math functions. For example, IEEE 754 guarantees bit-exactness in many cases but on an FPGA you usually cut corners where you can since you're stuck on reprogrammable logic technology instead of ASIC / hard macros. The low bits rarely matter with one-size-fits-all floats (or "doubles" if one-size-fits-all-floats is too small size ? ) but from a cryptographic point of view (where also the statistics of an error matter), this could be a killer.

D@n · April 6, 2020

@zygot,

The table method actually works quite well across a wide range of numbers--if you start with a floating point representation of 1.xxxxx * 2^e. The xxxxx makes a great ttable lookup index that then works across a wide range of values. For better precision, use a linear table. For better precision, use a quadratic. For better ... did you notice I was already walking down the Taylor series here? The cool part is that you can easily bound the error in the lookup--provided the number starts with the right form.

@Bobby29999,

A bit count of some number of random binary bits will be approximately Gaussian by the binomial distribution thing. One of my mentors suggested 12 bits was a good choice. Be aware of scale issues, so that you can accurately match the mean and standard distribution you are attempting to achieve.

An even simpler solution, for 16-bits (or less) of pseudorandomness would be a lookup table: place the inverse transform itself into the lookup table, and just use your uniform pseudorandom value as the index into such a table. This would spare you the pain of trying to calculate a logarithm--you could just go straight to the solution in one clock cycle. Using this approach, though, you won't hit every value of 16.16 with an appropriate probability. You'll hit a subset of those values with a uniform probability, only the subset won't be evenly distributed across its range. How big your table would need to be is a measure of both your requirements and how much hardware you can afford. Again, linearly interpolating the table might give you more capability--but I wouldn't jump for it unless the problem required it.

Now, coming back to your bit width, 64-bit math is overkill unless you are trying to do something like count the number of atoms in the known universe, or measure the circumference of the known universe to within one hydrogen atom. Let's be honest and real here, what bit-width do you really need? Be aware in your answer that the more width you want, and the faster you want it, the more you will pay for it. Pay? Yes. Your hardware will only support so much logic, and I'm guessing you want to do other things with it as well. At some level, you'd need to build your own ASIC, but at another level you could do the problem in a cheap PIC microcontroller. Somewhere in between your PC would do a better job than the FPGA. Your FPGA is likely to be faster than the PC, but you will also struggle to get that 64-bit double-precision floating point math on the FPGA. It's doable, but you'll pay for it. Are you sure that's what you want?

Dan

Bobby29999 · April 6, 2020

51 minutes ago, D@n said:

@Bobby29999,

Thanks Dan for the comprehensive reply. This is the kind of information I needed....very helpful ! ?

You wrote: ""P.S. If your goal is just to generate Gaussians, and you only need a small bit width, there are much easier ways to do it"

- In my case the problem statement says that binomial distribution is approximately Gaussian. Can you please tell me the reason why generating Gaussians would be easy?
- The bit-width is: div64_s64(s64 dividend, s64 divisor)...dividend being the natural log.....and the divisor is 16.16 fixed-point weight.

zygot · April 6, 2020

Getting decent dynamic range for converting numbers to a natural log using lookup tables is problematic. I've done pipelined log generators for 32-bit integers using a combination of lookup and Taylor Series having about 15+ bits of fractional precision. The Taylor series is nice in that it's trivial to get a log to any base like 2, e, or 10 from any other base. Fixed point binary math is pretty much a matter of bookkeeping. If you can constrain your log values things are easier. As to verifying the 'randomness' of your end results I'll leave that to you and whoever reviews your work. Generating truly random numbers digitally has been tried many times by many people. I suppose someone can figure it out.

Inverse Transform Sampling method on FPGA

Question

Link to comment

Share on other sites

22 answers to this question

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Archived