Git Product home page Git Product logo

Comments (23)

achan001 avatar achan001 commented on September 2, 2024

Why variable 'leftover' able to reject unbiased randoms ?

Uh, I meant to reject biased randoms

random32bit =  pcg32_random(); 
multiresult = random32bit * range; 
leftover = (uint32_t) multiresult; 
return multiresult >> 32;             // [0, range)

Returned result look just like your modulo-reduction trick.

As you mentioned in the blog, above code work very well, especially for small range.
So, the question is how much the added sample rejection code contributed.

variable 'leftover' seems like a random number (MCG) to me ...

from fastrand.

lemire avatar lemire commented on September 2, 2024

So, the question is how much the added sample rejection code contributed.

The question we ask is "given that we have a perfect source of random numbers, can you prove mathematically that the ranged result is unbiased". There is no "how much" question in this context. Results are either biased or unbiased.

A separate question is whether the bias matters. Well. That depends, doesn't it? Some applications might be super sensitive to biases, others might not care.

from fastrand.

achan001 avatar achan001 commented on September 2, 2024

So, the question is how much the added sample rejection code contributed

I phrase it that way because the code will generate bias, even with true random source.
Even your comment for pcg32_random_bounded_divisionless() say so. (shuffle.c, line 55)

// map random value to [0,range) with slight bias, redraws to avoid bias if needed
static inline uint32_t pcg32_random_bounded_divisionless(uint32_t range) {
...

So, the code will return random numbers with slight bias.
Sample rejection code contributed to the benefit of redrawing.

I don't follow the criteria of redrawing, and how much it contribute.
The code implied that is a line between acceptable and unacceptable bias
It is unclear where the line is ...

from fastrand.

achan001 avatar achan001 commented on September 2, 2024

I made two programs to show that the code seems to work.
This just fills the slots using modulo-reduction trick

#include <stdio.h>              // ranged_slot.c
#include <stdlib.h>       

int main(int argc, char **argv) 
{
  unsigned range = strtoul(argv[1], NULL, 10);
  int slot[range];
  for(int i=0; i<range; i++) slot[i] = 0;
  unsigned rand = 0;
  unsigned long long rand64 = 0;    
  do {
    slot[(int)(rand64 >> 32)]++;
    rand64 += range;            // rand64 = rand * range
  } while (++rand);             // all 32-bits randoms
  for(int i=0; i<range; i++)    // show the slots
    printf("slot[%d] = %d\n", i, slot[i]);
  return 0;
}

This showed what got rejected

#include <stdio.h>              // ranged_reject.c
#include <stdlib.h>       
#define SCALED(x, n)   (unsigned) (((unsigned long long) x * n) >> 32)

int main(int argc, char **argv)
{
  unsigned range = strtoul(argv[1], NULL, 10);
  unsigned threshold = -range % range;
  unsigned rand = 0, leftover = 0;
  printf("leftover < (threshold = %u):\n", threshold);  
  do {
    if (leftover < threshold) 
      printf("0x%08x --> %u\n", rand, SCALED(rand, range));
    leftover += range;          // leftover = (rand * range) & bit32  
  } while (++rand);             // all 32-bits randoms
  return 0;
}

For range = 6, all slots (after sample rejection) are equally likely

C:\tmp>ranged_slot 6
slot[0] = 715827883
slot[1] = 715827883
slot[2] = 715827882
slot[3] = 715827883
slot[4] = 715827883
slot[5] = 715827882

C:\tmps> ranged_reject 6
leftover < (threshold = 4):
0x00000000 --> 0
0x2aaaaaab --> 1
0x80000000 --> 3
0xaaaaaaab --> 4

from fastrand.

lemire avatar lemire commented on September 2, 2024

Given your last comment, I think I now understand your point.

from fastrand.

lemire avatar lemire commented on September 2, 2024

Ok. So you get a biased return...

>ranged_slot 6
slot[0] = 715827883
slot[1] = 715827883
slot[2] = 715827882
slot[3] = 715827883
slot[4] = 715827883
slot[5] = 715827882

Some values are more likely than others. As I point out in my blog post, the difference is at most one between frequencies, so for small ranges, it is a "slight" bias.

Try this program (with rejection)...

#include <stdio.h>              // ranged_slot.c
#include <stdlib.h>

int main(int argc, char **argv)
{
  uint32_t range = strtoul(argv[1], NULL, 10);
  uint32_t slot[range];
  for(int i=0; i<range; i++) slot[i] = 0;
  uint32_t rand = 0;
  uint64_t rand64 = 0;
  uint32_t threshold = -range % range;
  do {
    if((rand64 & 0xFFFFFFFF) >= threshold) slot[(uint32_t)(rand64 >> 32)]++;
    rand64 += range;            // rand64 = rand * range
  } while (++rand);             // all 32-bits randoms
  for(int i=0; i<range; i++)    // show the slots
    printf("slot[%d] = %d\n", i, slot[i]);
  return 0;
}

We get an unbiased result...

$ ./ranged_slot 6
slot[0] = 715827882
slot[1] = 715827882
slot[2] = 715827882
slot[3] = 715827882
slot[4] = 715827882
slot[5] = 715827882

You seem to want me to tell you when a bias is ok. I don't have an answer to this question.

I'll close this, reopen if I have not answered to your satisfaction.

from fastrand.

achan001 avatar achan001 commented on September 2, 2024

With your revised slots + rejection program, 6 slots are all equaly likely.

Is it always true ? (all 32-bits range)

Brute force to prove it always true probably take too long.
Besides, it will not explain WHY and HOW it work.

I understand that rejected cases count is same as pcg32_random_bounded()
(leftover = (rand * range) >>32, is a bijection of rand, rejection rate = threshold / 2^32)

But, why the rejected cases happened to be the slots with excess counts ?
It probably is not an accident.

Explanation is still welcome.

from fastrand.

lemire avatar lemire commented on September 2, 2024

With your revised slots + rejection program, 6 slots are all equaly likely.

Yes.

Is it always true ? (all 32-bits range)

Yes.

Brute force to prove it always true probably take too long.

Definitively.

Besides, it will not explain WHY and HOW it work.

Indeed. I am working on a short paper right now. It is not yet ready. I will publish it soon.

I understand that rejected cases count is same as pcg32_random_bounded()
(leftover = (rand * range) >>32, is a bijection of rand, rejection rate = threshold / 2^32)

Yes. All these algorithms rely on the same math.

But, why the rejected cases happened to be the slots with excess counts ? It probably is not an accident.

Indeed, that is no accident. It is the desired result.

Explanation is still welcome.

Email me at [email protected] and I will share the draft of my paper.

from fastrand.

achan001 avatar achan001 commented on September 2, 2024

// map random value to [0,range) with slight bias, redraws to avoid bias if needed

You might consider changing above comment (split.c, line 55)
I had mis-read it to imply the code accept slight bias, redraw if bias unacceptable

This might be better:

// map random value to [0,range) without bias (by redraws if needed)

from fastrand.

lemire avatar lemire commented on September 2, 2024

You are correct. Done.

from fastrand.

achan001 avatar achan001 commented on September 2, 2024

I think I can visualize why the code removed all bias. (Note: not a proof)

After threshold values are rejected, remaining values are divisible by range.
So, if reject correctly picked threshold values, the result have no bias.

The reason it reject the slot with excess count is because all rejected cases
are all very very close to edge of 2 slots. It almost seems as if it could fall to
the wrong slots. Edge cases just barely able to squeeze into that particular slot.

So, if we have to reject anything, pick the edge cases.

from fastrand.

lemire avatar lemire commented on September 2, 2024

After threshold values are rejected, remaining values are divisible by range.

That's the gist of the proof.

from fastrand.

achan001 avatar achan001 commented on September 2, 2024

Thanks.

Since possible values are evenly spaced, the slot with the edge case
allow most room for other values.

Uh, I think I just proved it ...

from fastrand.

lemire avatar lemire commented on September 2, 2024

I think you did, indeed.

from fastrand.

achan001 avatar achan001 commented on September 2, 2024

To confirm my edge case idea valid, I tried top edge instead of bottom

uint32_t unbiased = ~(-range % range);    // unbiased = ~threshold
while (leftover > unbiased) ...           // rejected same slots as leftover < threshold

For unknown reason, top edge rejection sometimes run faster

range_reject.c (post 5) , range = 6:
speedup = 5.393 sec / 3.595 sec = 1.501X, or 50% faster

from fastrand.

lemire avatar lemire commented on September 2, 2024

It seems unlikely that you'd see a 50% speed difference. I will start with the usual warning: never benchmark on a laptop... always benchmark on a server configured for testing.

I cannot reproduce your speed difference. Try this gist:

https://gist.github.com/lemire/0ead15045a4c174799338b231bacf199

$ gcc -O3 -o isflipped isflipped.c
$ ./isflipped
initbase(buffer,len,range)                                  	:  4.799 cycles per operation (best) 	4.875 cycles per operation (avg)
flippedbase(buffer,len,range)                               	:  4.790 cycles per operation (best) 	4.799 cycles per operation (avg)
$ ./isflipped
initbase(buffer,len,range)                                  	:  4.798 cycles per operation (best) 	4.874 cycles per operation (avg)
flippedbase(buffer,len,range)                               	:  4.793 cycles per operation (best) 	4.799 cycles per operation (avg)
$ clang -O3 -o isflipped isflipped.c
$ ./isflipped
initbase(buffer,len,range)                                  	:  6.029 cycles per operation (best) 	6.030 cycles per operation (avg)
flippedbase(buffer,len,range)                               	:  6.029 cycles per operation (best) 	6.051 cycles per operation (avg)
$ ./isflipped
initbase(buffer,len,range)                                  	:  6.029 cycles per operation (best) 	6.030 cycles per operation (avg)
flippedbase(buffer,len,range)                               	:  6.029 cycles per operation (best) 	6.030 cycles per operation (avg)

from fastrand.

achan001 avatar achan001 commented on September 2, 2024

ranged_reject.c (post 5) does not have a PRNG (not even slots!)

With cost of PRNG added, the effect of top edge vs bottom edge is just noise.

My goal is to confirm top edge rejected the same slots ... it does !
The speedup is just a curious observation (w/ strawberryperl.com supplied gcc 7.1.0)

from fastrand.

achan001 avatar achan001 commented on September 2, 2024

Your pre-screening test for pcg32_random_bounded_divisionless_flipped is wrong

Pre-screened cases must include all the top edge cases.
It should be like this:

if (leftover > ~range) ...     // pre-screening

If range > 0, above can be optimized a bit more:

if (leftover > -range) ...     // pre-screening, range > 0

from fastrand.

lemire avatar lemire commented on September 2, 2024

Your pre-screening test for pcg32_random_bounded_divisionless_flipped is wrong

Right. It still does not affect the running speed.

With cost of PRNG added, the effect of top edge vs bottom edge is just noise.

Makes sense!

from fastrand.

achan001 avatar achan001 commented on September 2, 2024

You might consider adding this top edge case to your unfinished draft.
It showed that the threshold rejection test is just picking the edge cases.

Which edge cases to pick does not matter (as long as it is the same side)

from fastrand.

lemire avatar lemire commented on September 2, 2024

@achan001 Mathematically, this needs to be pointed out, but there is no reason to have two distinct implementations unless one has benefits, somehow.

from fastrand.

lemire avatar lemire commented on September 2, 2024

Sorry for the delay, I posted the reference tonight:

from fastrand.

achan001 avatar achan001 commented on September 2, 2024

Thank you.

from fastrand.

Related Issues (17)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.