Git Product home page Git Product logo

Comments (15)

sunoru avatar sunoru commented on August 11, 2024 1

Looks great! Thank you very much for suggesting this method to sample from all floats in [0,1).

I made some comments in your PR, and can you also try to make the CI tests pass?

from randomnumbers.jl.

milankl avatar milankl commented on August 11, 2024

@eapax, I finally managed to write an algorithm for this problem in essentially 3 lines :)

from randomnumbers.jl.

milankl avatar milankl commented on August 11, 2024

Okay, rand(UInt128) introduces a bit of overhead, doing rand(UInt64) instead shaves off another 1.5ns

ui = rand(Xoroshiro128P,UInt64) >> 1    # first bit always 0
# count leading zeros of UInt128 in two steps: 2. only if 
lz = ui == 0 ?  # 2 leading zeros is at 50%, 3 at 25%, etc.
    64+leading_zeros(rand(Xoroshiro128P,UInt64) | 0x1) : leading_zeros(ui) 
e = ((127 - lz) % UInt32) << 23     # convert lz to exponent bits of float32

then yields

julia> @btime randfloat($Float32)
  3.060 ns (0 allocations: 0 bytes)

So 2x faster than MersenneTwister and only 2x slower than rand(Xoroshiro128Plus,Float32). The "I don't care about number smaller than 5e-20"-version which avoids the ?-branching is also with 2.75ns probably only irrelevantly faster.

from randomnumbers.jl.

milankl avatar milankl commented on August 11, 2024

Turned this into a PR https://github.com/sunoru/RandomNumbers.jl/pull/73

from randomnumbers.jl.

milankl avatar milankl commented on August 11, 2024

Can you comment on whether it's possible with, say, Xoroshiro128Plus (but also other RNGs) to hit UInt64(0)? I'm not sure whether that's the reason why it's not allowed as seed.

from randomnumbers.jl.

milankl avatar milankl commented on August 11, 2024

After ui = rand(RNG,UInt64) >> 1 the ui==0 case can also be hit with ui=0x1, so for Float32 that should be fine. For Float64, however, the idea is to sample several ui = rand(RNG,UInt64) and only sample the next when the previous one is ui==0 but would that ever happen?

from randomnumbers.jl.

milankl avatar milankl commented on August 11, 2024

That's also a nice quality test

julia> using RandomNumbers
julia> x = rand(Float32,100_000_000);
julia> y = randfloat(Float32,100_000_000);
julia> length(Set(x))
8388552

julia> length(Set(y))
32781328

2^23=8388608 is the size of the set that rand(Float32) draws from, so on when you draw 1e8 float32s then you hit almost all of them and on average each of them 10 times. This problem is clearly mitigated with randfloat and probably as good as possible

julia> z = Float32.(rand(100_000_000));

julia> length(Set(z))
32784392

from randomnumbers.jl.

sunoru avatar sunoru commented on August 11, 2024

Can you comment on whether it's possible with, say, Xoroshiro128Plus (but also other RNGs) to hit UInt64(0)? I'm not sure whether that's the reason why it's not allowed as seed.

All RNGs in xorshift family cannot hit UInt64(0), since the rotation and xor operations on zero values do not have any effects. This is also the reason why it's not allowed as seed, since there is not much difference between a seed and a state in xorshift.

After ui = rand(RNG,UInt64) >> 1 the ui==0 case can also be hit with ui=0x1, so for Float32 that should be fine. For Float64, however, the idea is to sample several ui = rand(RNG,UInt64) and only sample the next when the previous one is ui==0 but would that ever happen?

I am not sure if we really need to resample a random number if we hit ui==0. The probability is just too small. As a result, a large number of numbers between 0-2^-64, around 2^23 * 63 = 528482304 states (for Float32) have to be wasted but I think it is OK?

from randomnumbers.jl.

sunoru avatar sunoru commented on August 11, 2024

And the quality test looks good!

from randomnumbers.jl.

milankl avatar milankl commented on August 11, 2024

I am not sure if we really need to resample a random number if we hit ui==0. The probability is just too small. As a result, a large number of numbers between 0-2^-64, around 2^23 * 63 = 528482304 states (for Float32) have to be wasted but I think it is OK?

Yes, but as I said earlier

The "I don't care about number smaller than 5e-20"-version which avoids the ?-branching is also with 2.75ns probably only irrelevantly faster.

However, we could also use the last 23 bits of ui for the significand, if we already say that's too unlikely to be important, which might speed up things further?

from randomnumbers.jl.

milankl avatar milankl commented on August 11, 2024
function randfloatx(rng::AbstractRNG,::Type{Float32})
    ui = rand(rng,UInt64) >> 1    # first bit always 0
    e = ((127 - leading_zeros(ui)) % UInt32) << 23 
    return reinterpret(Float32,e | ((ui % UInt32) & 0x007f_ffff))
end

times as

julia> @btime randfloatx($Float32)
  1.904 ns (0 allocations: 0 bytes)

That's with Xoroshiro128Plus and reusing the last 23bits for the significand. It still samples from 5e-20 to prevfloat(1f0) and only 5e-20 to 5e-13 will be affected with the reusing of the last bits for the significand.

from randomnumbers.jl.

sunoru avatar sunoru commented on August 11, 2024

It looks correct. I am not sure if this influence could be important in some cases, but I guess it's not. And the speed is nice, maybe we can rerun the TestU01 big crush tests on it to see the statistical performance.

from randomnumbers.jl.

milankl avatar milankl commented on August 11, 2024

I guess that if rand(Float32)/rand(Float64) already passes TestU01 then randfloat should so too. Unless, currently there's 0 chance that randfloat produces 0f0/0. Is that important?

from randomnumbers.jl.

milankl avatar milankl commented on August 11, 2024

I think a better way to do things is

function randfloat(rng::AbstractRNG,::Type{Float32})
    ui = rand(rng,UInt64)
    lz = leading_zeros(ui)
    e = ((126 - lz) % UInt32) << 23
    ui = lz > 40 ? rand(rng,UInt64) : ui
    return reinterpret(Float32,e | ((ui % UInt32) & 0x007f_ffff))
end

The ?-branch doesn't introduce any overhead for me and avoids the bits to be double used. Then randfloat really samples all 64*2^23=536870912 floats between 2.7105054f-20 and prevfloat(1f0). That's 50% of all float32s in [0,1). rand(Float32), instead, only samples from 2^32=8388608 (<1%).

Similar for Float64: randfloat samples from 64x as many float64s in [0,1) than rand. That's an improvement from 0.1% to 6% of all floa64s in [0,1). Timings are now

julia> @btime rand($Xoroshiro128P,$Float64)
  1.849 ns (0 allocations: 0 bytes)

julia> @btime rand($Xoroshiro128P,$Float32)
  1.544 ns (0 allocations: 0 bytes)

with rand, vs

julia> @btime randfloat($Xoroshiro128P,$Float64)
  2.149 ns (0 allocations: 0 bytes)

julia> @btime randfloat($Xoroshiro128P,$Float32)
  1.763 ns (0 allocations: 0 bytes)

with randfloat.

from randomnumbers.jl.

milankl avatar milankl commented on August 11, 2024

I'll close this as it got merged in https://github.com/sunoru/RandomNumbers.jl/pull/73. The descriptions here can be used for the docs, if there's need to update those.

from randomnumbers.jl.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.