I peek at the code pcg32_random_bounded_divisionless(). Why variable 'leftover' ab

Why variable 'leftover' able to reject unbiased randoms ? </blockquot

Ok. So you get a biased return... <div class="snippet-clipboard-content notranslat

How pcg32_random_bounded_divisionless() work ? about fastrand HOT 23 CLOSED

lemire commented on September 2, 2024

How pcg32_random_bounded_divisionless() work ?

from fastrand.

Comments (23)

achan001 commented on September 2, 2024

Why variable 'leftover' able to reject unbiased randoms ?

Uh, I meant to reject biased randoms

random32bit =  pcg32_random(); 
multiresult = random32bit * range; 
leftover = (uint32_t) multiresult; 
return multiresult >> 32;             // [0, range)

Returned result look just like your modulo-reduction trick.

As you mentioned in the blog, above code work very well, especially for small range.
So, the question is how much the added sample rejection code contributed.

variable 'leftover' seems like a random number (MCG) to me ...

from fastrand.

lemire commented on September 2, 2024

So, the question is how much the added sample rejection code contributed.

The question we ask is "given that we have a perfect source of random numbers, can you prove mathematically that the ranged result is unbiased". There is no "how much" question in this context. Results are either biased or unbiased.

A separate question is whether the bias matters. Well. That depends, doesn't it? Some applications might be super sensitive to biases, others might not care.

from fastrand.

achan001 commented on September 2, 2024

So, the question is how much the added sample rejection code contributed

I phrase it that way because the code will generate bias, even with true random source.
Even your comment for pcg32_random_bounded_divisionless() say so. (shuffle.c, line 55)

// map random value to [0,range) with slight bias, redraws to avoid bias if needed
static inline uint32_t pcg32_random_bounded_divisionless(uint32_t range) {
...

So, the code will return random numbers with slight bias.
Sample rejection code contributed to the benefit of redrawing.

I don't follow the criteria of redrawing, and how much it contribute.
The code implied that is a line between acceptable and unacceptable bias
It is unclear where the line is ...

from fastrand.

achan001 commented on September 2, 2024

I made two programs to show that the code seems to work.
This just fills the slots using modulo-reduction trick

#include <stdio.h>              // ranged_slot.c
#include <stdlib.h>       

int main(int argc, char **argv) 
{
  unsigned range = strtoul(argv[1], NULL, 10);
  int slot[range];
  for(int i=0; i<range; i++) slot[i] = 0;
  unsigned rand = 0;
  unsigned long long rand64 = 0;    
  do {
    slot[(int)(rand64 >> 32)]++;
    rand64 += range;            // rand64 = rand * range
  } while (++rand);             // all 32-bits randoms
  for(int i=0; i<range; i++)    // show the slots
    printf("slot[%d] = %d\n", i, slot[i]);
  return 0;
}

This showed what got rejected

#include <stdio.h>              // ranged_reject.c
#include <stdlib.h>       
#define SCALED(x, n)   (unsigned) (((unsigned long long) x * n) >> 32)

int main(int argc, char **argv)
{
  unsigned range = strtoul(argv[1], NULL, 10);
  unsigned threshold = -range % range;
  unsigned rand = 0, leftover = 0;
  printf("leftover < (threshold = %u):\n", threshold);  
  do {
    if (leftover < threshold) 
      printf("0x%08x --> %u\n", rand, SCALED(rand, range));
    leftover += range;          // leftover = (rand * range) & bit32  
  } while (++rand);             // all 32-bits randoms
  return 0;
}

For range = 6, all slots (after sample rejection) are equally likely

C:\tmp>ranged_slot 6
slot[0] = 715827883
slot[1] = 715827883
slot[2] = 715827882
slot[3] = 715827883
slot[4] = 715827883
slot[5] = 715827882

C:\tmps> ranged_reject 6
leftover < (threshold = 4):
0x00000000 --> 0
0x2aaaaaab --> 1
0x80000000 --> 3
0xaaaaaaab --> 4

from fastrand.

lemire commented on September 2, 2024

Given your last comment, I think I now understand your point.

from fastrand.

lemire commented on September 2, 2024

Ok. So you get a biased return...

>ranged_slot 6
slot[0] = 715827883
slot[1] = 715827883
slot[2] = 715827882
slot[3] = 715827883
slot[4] = 715827883
slot[5] = 715827882

Some values are more likely than others. As I point out in my blog post, the difference is at most one between frequencies, so for small ranges, it is a "slight" bias.

Try this program (with rejection)...

#include <stdio.h>              // ranged_slot.c
#include <stdlib.h>

int main(int argc, char **argv)
{
  uint32_t range = strtoul(argv[1], NULL, 10);
  uint32_t slot[range];
  for(int i=0; i<range; i++) slot[i] = 0;
  uint32_t rand = 0;
  uint64_t rand64 = 0;
  uint32_t threshold = -range % range;
  do {
    if((rand64 & 0xFFFFFFFF) >= threshold) slot[(uint32_t)(rand64 >> 32)]++;
    rand64 += range;            // rand64 = rand * range
  } while (++rand);             // all 32-bits randoms
  for(int i=0; i<range; i++)    // show the slots
    printf("slot[%d] = %d\n", i, slot[i]);
  return 0;
}

We get an unbiased result...

$ ./ranged_slot 6
slot[0] = 715827882
slot[1] = 715827882
slot[2] = 715827882
slot[3] = 715827882
slot[4] = 715827882
slot[5] = 715827882

You seem to want me to tell you when a bias is ok. I don't have an answer to this question.

I'll close this, reopen if I have not answered to your satisfaction.

from fastrand.

achan001 commented on September 2, 2024

With your revised slots + rejection program, 6 slots are all equaly likely.

Is it always true ? (all 32-bits range)

Brute force to prove it always true probably take too long.
Besides, it will not explain WHY and HOW it work.

I understand that rejected cases count is same as pcg32_random_bounded()
(leftover = (rand * range) >>32, is a bijection of rand, rejection rate = threshold / 2^32)

But, why the rejected cases happened to be the slots with excess counts ?
It probably is not an accident.

Explanation is still welcome.

from fastrand.

lemire commented on September 2, 2024

With your revised slots + rejection program, 6 slots are all equaly likely.

Yes.

Is it always true ? (all 32-bits range)

Yes.

Brute force to prove it always true probably take too long.

Definitively.

Besides, it will not explain WHY and HOW it work.

Indeed. I am working on a short paper right now. It is not yet ready. I will publish it soon.

I understand that rejected cases count is same as pcg32_random_bounded()
(leftover = (rand * range) >>32, is a bijection of rand, rejection rate = threshold / 2^32)

Yes. All these algorithms rely on the same math.

But, why the rejected cases happened to be the slots with excess counts ? It probably is not an accident.

Indeed, that is no accident. It is the desired result.

Explanation is still welcome.

Email me at [email protected] and I will share the draft of my paper.

from fastrand.

achan001 commented on September 2, 2024

// map random value to [0,range) with slight bias, redraws to avoid bias if needed

You might consider changing above comment (split.c, line 55)
I had mis-read it to imply the code accept slight bias, redraw if bias unacceptable

This might be better:

// map random value to [0,range) without bias (by redraws if needed)

from fastrand.

lemire commented on September 2, 2024

You are correct. Done.

from fastrand.

achan001 commented on September 2, 2024

I think I can visualize why the code removed all bias. (Note: not a proof)

After threshold values are rejected, remaining values are divisible by range.
So, if reject correctly picked threshold values, the result have no bias.

The reason it reject the slot with excess count is because all rejected cases
are all very very close to edge of 2 slots. It almost seems as if it could fall to
the wrong slots. Edge cases just barely able to squeeze into that particular slot.

So, if we have to reject anything, pick the edge cases.

from fastrand.

lemire commented on September 2, 2024

After threshold values are rejected, remaining values are divisible by range.

That's the gist of the proof.

from fastrand.

achan001 commented on September 2, 2024

Thanks.

Since possible values are evenly spaced, the slot with the edge case
allow most room for other values.

Uh, I think I just proved it ...

from fastrand.

lemire commented on September 2, 2024

I think you did, indeed.

from fastrand.

achan001 commented on September 2, 2024

To confirm my edge case idea valid, I tried top edge instead of bottom

uint32_t unbiased = ~(-range % range);    // unbiased = ~threshold
while (leftover > unbiased) ...           // rejected same slots as leftover < threshold

For unknown reason, top edge rejection sometimes run faster

range_reject.c (post 5) , range = 6:
speedup = 5.393 sec / 3.595 sec = 1.501X, or 50% faster

from fastrand.

lemire commented on September 2, 2024

It seems unlikely that you'd see a 50% speed difference. I will start with the usual warning: never benchmark on a laptop... always benchmark on a server configured for testing.

I cannot reproduce your speed difference. Try this gist:

https://gist.github.com/lemire/0ead15045a4c174799338b231bacf199

$ gcc -O3 -o isflipped isflipped.c
$ ./isflipped
initbase(buffer,len,range)                                  	:  4.799 cycles per operation (best) 	4.875 cycles per operation (avg)
flippedbase(buffer,len,range)                               	:  4.790 cycles per operation (best) 	4.799 cycles per operation (avg)
$ ./isflipped
initbase(buffer,len,range)                                  	:  4.798 cycles per operation (best) 	4.874 cycles per operation (avg)
flippedbase(buffer,len,range)                               	:  4.793 cycles per operation (best) 	4.799 cycles per operation (avg)

$ clang -O3 -o isflipped isflipped.c
$ ./isflipped
initbase(buffer,len,range)                                  	:  6.029 cycles per operation (best) 	6.030 cycles per operation (avg)
flippedbase(buffer,len,range)                               	:  6.029 cycles per operation (best) 	6.051 cycles per operation (avg)
$ ./isflipped
initbase(buffer,len,range)                                  	:  6.029 cycles per operation (best) 	6.030 cycles per operation (avg)
flippedbase(buffer,len,range)                               	:  6.029 cycles per operation (best) 	6.030 cycles per operation (avg)

from fastrand.

achan001 commented on September 2, 2024

ranged_reject.c (post 5) does not have a PRNG (not even slots!)

With cost of PRNG added, the effect of top edge vs bottom edge is just noise.

My goal is to confirm top edge rejected the same slots ... it does !
The speedup is just a curious observation (w/ strawberryperl.com supplied gcc 7.1.0)

from fastrand.

achan001 commented on September 2, 2024

Your pre-screening test for pcg32_random_bounded_divisionless_flipped is wrong

Pre-screened cases must include all the top edge cases.
It should be like this:

if (leftover > ~range) ...     // pre-screening

If range > 0, above can be optimized a bit more:

if (leftover > -range) ...     // pre-screening, range > 0

from fastrand.

lemire commented on September 2, 2024

Your pre-screening test for pcg32_random_bounded_divisionless_flipped is wrong

Right. It still does not affect the running speed.

With cost of PRNG added, the effect of top edge vs bottom edge is just noise.

Makes sense!

from fastrand.

achan001 commented on September 2, 2024

You might consider adding this top edge case to your unfinished draft.
It showed that the threshold rejection test is just picking the edge cases.

Which edge cases to pick does not matter (as long as it is the same side)

from fastrand.

lemire commented on September 2, 2024

@achan001 Mathematically, this needs to be pointed out, but there is no reason to have two distinct implementations unless one has benefits, somehow.

from fastrand.

lemire commented on September 2, 2024

Sorry for the delay, I posted the reference tonight:

Daniel Lemire, Fast Random Integer Generation in an Interval, ACM Transactions on Modeling and Computer Simulation (to appear)

from fastrand.

achan001 commented on September 2, 2024

Thank you.

from fastrand.

How pcg32_random_bounded_divisionless() work ? about fastrand HOT 23 CLOSED

Comments (23)

Related Issues (17)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent