lemire / fastrange Goto Github PK

A fast alternative to the modulo reduction

License: Apache License 2.0

C 73.99% Makefile 5.55% C++ 20.46%

modulo-reduction range interval algorithm

fastrange's Introduction

fastrange: A fast alternative to the modulo reduction

A common operation in software is to take a machine word and map it to an integer value in a range [0,p) as fairly as possible. That is, you want that if all values of the machine word are equally likely then, as much as possible, all integer values in [0,p) are (almost) equally likely. This is common in hashing and probabilistic algorithms.

In computing, the modulo operation finds the remainder after division of one number by another (called the modulus of the operation). Given two positive numbers, a and n, a modulo n (abbreviated as a mod n) is the remainder of the Euclidean division of a by n, where a is the dividend and n is the divisor.

The standard approach to this problem is the modulo reduction (x % p). Though a modulo reduction works fine and has several nice properties, it can be slow even on recent processors because it relies on an integer division.

Thankfully, there is a faster way: we can replace the modulo by a multiplication followed by a shift.

It has accelerated some operations in Google's Tensorflow by 10% to 20%.

Further reading : http://lemire.me/blog/2016/06/27/a-fast-alternative-to-the-modulo-reduction/

Pre-conditions

For this code to give the desired result, the provided words should span the whole range (e.g., all 32-bit integers). The C rand function does not meet this requirement. If you must use the rand function, wrap it like so:

uint32_t get32rand() {
    return (rand() ^ (rand() << 15) ^ (rand() << 30));
}

uint64_t get64rand() {
    return (((uint64_t)get32rand()) << 32) | get32rand();
}

However, the underlying idea is general: we only require that the word values span an interval of the form [0,1<<L). It suffices then to do ( x * range ) >> L to get a fair map from [0,1<<L) to [0,range). For example, if your word values span the interval [0,65536), then you could simply do ( x * range ) >> 16.

Unbiased range functions

To generate an unbiased random number in a range, you can use an extension of this approach:

uint32_t random_bounded(uint32_t range) {
  uint64_t random32bit =  random32(); //32-bit random number 
  multiresult = random32bit * range;
  leftover = (uint32_t) multiresult;
  if(leftover < range ) {
      threshold = -range % range ;
      while (leftover < threshold) {
            random32bit =  random32();
            multiresult = random32bit * range;
            leftover = (uint32_t) multiresult;
      }
   }
  return multiresult >> 32;
}

See http://lemire.me/blog/2016/06/30/fast-random-shuffling/

fastrange's People

Contributors

Stargazers

Watchers

Forkers

tchen0123 trianglespct longjohncoder shadercoder dnbaker evilflora dendisuhubdy tigersean kramelec moneytech 5l1v3r1 qrabbani clayne hwill-ship sugarbliss gerhobbelt

fastrange's Issues

fastrangesize/fastrange64 fails to randomly select elements from half of range

Hi!

This is very exciting work.

I was using fastrangesize to randomly select a minibatch of datapoints for a stochastic gradient descent algorithm, but I found that when the minibatch size was close to the number of possible samples, it would get stuck in an infinite loop.

Removed from that code in a simpler, reproducible test case:

#include "fastrange/fastrange.h"
#include <unordered_set>
#include <iostream>

int main(void) {
    std::unordered_set<int> set;
    const size_t max(400);
    size_t nsteps(0);
    while(set.size() < 256) {
        set.insert((((uint64_t)rand() << 32) | rand()) % max);
        ++nsteps;
    }
    std::cerr << "Number of steps: " << nsteps << '\n';
    set.clear();
    while(set.size() < 256) {
        set.insert(fastrangesize((((uint64_t)rand() << 32) | rand()), max));
        if(++nsteps % 1000 == 0)
            std::cerr << "Number of steps so far: " << nsteps
                           << ". Size of set: " << set.size() << '\n';
        
    }
    std::cerr << "Number of steps: " << nsteps << '\n';
}

My output is:

Number of steps: 408
Number of steps so far: 1000. Size of set: 191
Number of steps so far: 2000. Size of set: 200
Number of steps so far: 3000. Size of set: 200
Number of steps so far: 4000. Size of set: 200
Number of steps so far: 5000. Size of set: 200
Number of steps so far: 6000. Size of set: 200
Number of steps so far: 7000. Size of set: 200
Number of steps so far: 8000. Size of set: 200
Number of steps so far: 9000. Size of set: 200
Number of steps so far: 10000. Size of set: 200
Number of steps so far: 11000. Size of set: 200
Number of steps so far: 12000. Size of set: 200
Number of steps so far: 13000. Size of set: 200
Number of steps so far: 14000. Size of set: 200
Number of steps so far: 15000. Size of set: 200
Number of steps so far: 16000. Size of set: 200
Number of steps so far: 17000. Size of set: 200
Number of steps so far: 18000. Size of set: 200
Number of steps so far: 19000. Size of set: 200
<and so on for millions of steps>

When I add a short-circuit (as below)

#include "fastrange/fastrange.h"
#include <unordered_set>
#include <iostream>

int main(void) {
    std::unordered_set<int> set;
    const size_t max(400);
    size_t nsteps(0);
    while(set.size() < 256) {
        set.insert((((uint64_t)rand() << 32) | rand()) % max);
        ++nsteps;
    }
    std::cerr << "Number of steps: " << nsteps << '\n';
    set.clear();
    while(set.size() < 256) {
        set.insert(fastrangesize((((uint64_t)rand() << 32) | rand()), max));
        if(++nsteps % 1000 == 0)
            std::cerr << "Number of steps so far: " << nsteps
                      << ". Size of set: " << set.size() << '\n';
        if(nsteps > 1000000) break;
    }
    if(set.size() == 256) std::cerr << "Number of steps: " << nsteps << '\n';
    else {
        for(const auto el: set) std::cerr << "Element: " << el << '\n';
    }
}

When I check with Python vals = set(int(line.split()[1]) for line in open("output.txt") if "Element" in line), the values in the set are precisely [0, 199].

I haven't tested with a 32-bit version, but it seems that something is off.

"-range % range" calculation is nonsensical

A number modulo itself is always zero; under a floating point calculation this always produces negative zero.

In the Go implementation:

thresh := uint32(-n) % uint32(n)

Again, this will always be zero, right?

Have you considered reversing the bit-sequence of the input value 'word'?

Have you considered reversing the bit-sequence of the input value 'word'?
In Java there is a nice function Integer.reverse(int):int which reverses the bit-sequence of an int.
This gives a better distribution for small integer values, which are more common.

public class FastRange {
private static int reverseFastRange(int word, int p) {
return (int) (((0xffffffffL & Integer.reverse(word)) * p) >>> 32);
}
}

Maybe simply right shift multiresult that's it

why not simply

uint32_t random_bounded(uint32_t range) {
  uint64_t random32bit =  random32(); //32-bit random number 
  multiresult = random32bit * range;
  return multiresult >> 32;
}

The result will always be [0,range) while the likely hood for every integer is equal

Doing fastrange64 with 2x fastrange32?

Couldn't you compute fast range in 64 bits by doing the 32-bit computation twice in parallel? Do one fastrange32 for the least significant 32 bits and one for the most significant 32 bits. Seems like it would exploit parallelism (or could even use vector operations on some chips) and be faster than the 64-bit code particularly if there is no native int128 support.

Am I right?

Edit: the result would be different but I can't think of why it wouldn't be fair.

lemire / fastrange Goto Github PK

fastrange's Introduction

fastrange: A fast alternative to the modulo reduction

Pre-conditions

Unbiased range functions

fastrange's People

Contributors

Stargazers

Watchers

Forkers

fastrange's Issues

fastrangesize/fastrange64 fails to randomly select elements from half of range

"-range % range" calculation is nonsensical

Have you considered reversing the bit-sequence of the input value 'word'?

Maybe simply right shift multiresult that's it

Doing fastrange64 with 2x fastrange32?

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent