Git Product home page Git Product logo

fastrange's Introduction

fastrange: A fast alternative to the modulo reduction

A common operation in software is to take a machine word and map it to an integer value in a range [0,p) as fairly as possible. That is, you want that if all values of the machine word are equally likely then, as much as possible, all integer values in [0,p) are (almost) equally likely. This is common in hashing and probabilistic algorithms.

In computing, the modulo operation finds the remainder after division of one number by another (called the modulus of the operation). Given two positive numbers, a and n, a modulo n (abbreviated as a mod n) is the remainder of the Euclidean division of a by n, where a is the dividend and n is the divisor.

The standard approach to this problem is the modulo reduction (x % p). Though a modulo reduction works fine and has several nice properties, it can be slow even on recent processors because it relies on an integer division.

Thankfully, there is a faster way: we can replace the modulo by a multiplication followed by a shift.

It has accelerated some operations in Google's Tensorflow by 10% to 20%.

Further reading : http://lemire.me/blog/2016/06/27/a-fast-alternative-to-the-modulo-reduction/

See also:

This library provides a single portable header file that you should be able to just drop in your C/C++ projects. The API is simple:

#include "fastrange"

// given a value word, produces an integer in [0,p) without division
uint32_t fastrange32(uint32_t word, uint32_t p);
uint64_t fastrange64(uint64_t word, uint64_t p);
size_t fastrangesize(size_t word, size_t p);
int fastrangeint(int word, int p);

Pre-conditions

For this code to give the desired result, the provided words should span the whole range (e.g., all 32-bit integers). The C rand function does not meet this requirement. If you must use the rand function, wrap it like so:

uint32_t get32rand() {
    return (rand() ^ (rand() << 15) ^ (rand() << 30));
}

uint64_t get64rand() {
    return (((uint64_t)get32rand()) << 32) | get32rand();
}

However, the underlying idea is general: we only require that the word values span an interval of the form [0,1<<L). It suffices then to do ( x * range ) >> L to get a fair map from [0,1<<L) to [0,range). For example, if your word values span the interval [0,65536), then you could simply do ( x * range ) >> 16.

Unbiased range functions

To generate an unbiased random number in a range, you can use an extension of this approach:

uint32_t random_bounded(uint32_t range) {
  uint64_t random32bit =  random32(); //32-bit random number 
  multiresult = random32bit * range;
  leftover = (uint32_t) multiresult;
  if(leftover < range ) {
      threshold = -range % range ;
      while (leftover < threshold) {
            random32bit =  random32();
            multiresult = random32bit * range;
            leftover = (uint32_t) multiresult;
      }
   }
  return multiresult >> 32;
}

See http://lemire.me/blog/2016/06/30/fast-random-shuffling/

fastrange's People

Contributors

aditya11raj avatar lemire avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

fastrange's Issues

fastrangesize/fastrange64 fails to randomly select elements from half of range

Hi!

This is very exciting work.

I was using fastrangesize to randomly select a minibatch of datapoints for a stochastic gradient descent algorithm, but I found that when the minibatch size was close to the number of possible samples, it would get stuck in an infinite loop.

Removed from that code in a simpler, reproducible test case:

#include "fastrange/fastrange.h"
#include <unordered_set>
#include <iostream>

int main(void) {
    std::unordered_set<int> set;
    const size_t max(400);
    size_t nsteps(0);
    while(set.size() < 256) {
        set.insert((((uint64_t)rand() << 32) | rand()) % max);
        ++nsteps;
    }
    std::cerr << "Number of steps: " << nsteps << '\n';
    set.clear();
    while(set.size() < 256) {
        set.insert(fastrangesize((((uint64_t)rand() << 32) | rand()), max));
        if(++nsteps % 1000 == 0)
            std::cerr << "Number of steps so far: " << nsteps
                           << ". Size of set: " << set.size() << '\n';
        
    }
    std::cerr << "Number of steps: " << nsteps << '\n';
}

My output is:

Number of steps: 408
Number of steps so far: 1000. Size of set: 191
Number of steps so far: 2000. Size of set: 200
Number of steps so far: 3000. Size of set: 200
Number of steps so far: 4000. Size of set: 200
Number of steps so far: 5000. Size of set: 200
Number of steps so far: 6000. Size of set: 200
Number of steps so far: 7000. Size of set: 200
Number of steps so far: 8000. Size of set: 200
Number of steps so far: 9000. Size of set: 200
Number of steps so far: 10000. Size of set: 200
Number of steps so far: 11000. Size of set: 200
Number of steps so far: 12000. Size of set: 200
Number of steps so far: 13000. Size of set: 200
Number of steps so far: 14000. Size of set: 200
Number of steps so far: 15000. Size of set: 200
Number of steps so far: 16000. Size of set: 200
Number of steps so far: 17000. Size of set: 200
Number of steps so far: 18000. Size of set: 200
Number of steps so far: 19000. Size of set: 200
<and so on for millions of steps>

When I add a short-circuit (as below)

#include "fastrange/fastrange.h"
#include <unordered_set>
#include <iostream>

int main(void) {
    std::unordered_set<int> set;
    const size_t max(400);
    size_t nsteps(0);
    while(set.size() < 256) {
        set.insert((((uint64_t)rand() << 32) | rand()) % max);
        ++nsteps;
    }
    std::cerr << "Number of steps: " << nsteps << '\n';
    set.clear();
    while(set.size() < 256) {
        set.insert(fastrangesize((((uint64_t)rand() << 32) | rand()), max));
        if(++nsteps % 1000 == 0)
            std::cerr << "Number of steps so far: " << nsteps
                      << ". Size of set: " << set.size() << '\n';
        if(nsteps > 1000000) break;
    }
    if(set.size() == 256) std::cerr << "Number of steps: " << nsteps << '\n';
    else {
        for(const auto el: set) std::cerr << "Element: " << el << '\n';
    }
}

When I check with Python vals = set(int(line.split()[1]) for line in open("output.txt") if "Element" in line), the values in the set are precisely [0, 199].

I haven't tested with a 32-bit version, but it seems that something is off.

"-range % range" calculation is nonsensical

A number modulo itself is always zero; under a floating point calculation this always produces negative zero.

In the Go implementation:

thresh := uint32(-n) % uint32(n)

Again, this will always be zero, right?

Have you considered reversing the bit-sequence of the input value 'word'?

Have you considered reversing the bit-sequence of the input value 'word'?
In Java there is a nice function Integer.reverse(int):int which reverses the bit-sequence of an int.
This gives a better distribution for small integer values, which are more common.

public class FastRange {
private static int reverseFastRange(int word, int p) {
return (int) (((0xffffffffL & Integer.reverse(word)) * p) >>> 32);
}
}

Maybe simply right shift multiresult that's it

why not simply

uint32_t random_bounded(uint32_t range) {
  uint64_t random32bit =  random32(); //32-bit random number 
  multiresult = random32bit * range;
  return multiresult >> 32;
}

The result will always be [0,range) while the likely hood for every integer is equal

Doing fastrange64 with 2x fastrange32?

Couldn't you compute fast range in 64 bits by doing the 32-bit computation twice in parallel? Do one fastrange32 for the least significant 32 bits and one for the most significant 32 bits. Seems like it would exploit parallelism (or could even use vector operations on some chips) and be faster than the 64-bit code particularly if there is no native int128 support.

Am I right?

Edit: the result would be different but I can't think of why it wouldn't be fair.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.