sciml / quasimontecarlo.jl Goto Github PK

Lightweight and easy generation of quasi-Monte Carlo sequences with a ton of different methods on one API for easy parameter exploration in scientific machine learning (SciML)

Home Page: https://docs.sciml.ai/QuasiMonteCarlo/stable/

License: MIT License

Julia 100.00%

sciml scientific-machine-learning physics-informed-learning quasi-monte-carlo parameter-estimation julia

quasimontecarlo.jl's Introduction

QuasiMonteCarlo.jl

This is a lightweight package for generating Quasi-Monte Carlo (QMC) samples using various different methods.

Tutorials and Documentation

For information on using the package, see the stable documentation. Use the in-development documentation for the version of the documentation, which contains the unreleased features.

Example

using QuasiMonteCarlo, Distributions
lb = [0.1, -0.5]
ub = [1.0, 20.0]
n = 5
d = 2

s = QuasiMonteCarlo.sample(n, lb, ub, GridSample())
s = QuasiMonteCarlo.sample(n, lb, ub, Uniform())
s = QuasiMonteCarlo.sample(n, lb, ub, SobolSample())
s = QuasiMonteCarlo.sample(n, lb, ub, LatinHypercubeSample())
s = QuasiMonteCarlo.sample(n, lb, ub, LatticeRuleSample())
s = QuasiMonteCarlo.sample(n, lb, ub, HaltonSample())

The output s is a matrix, so one can use things like @uview from UnsafeArrays.jl for a stack-allocated view of the ith point:

using UnsafeArrays
@uview s[:, i]

API

Everything has the same interface:

A = QuasiMonteCarlo.sample(n, lb, ub, sample_method, output_type = Float64)

or to generate points directly in the unit box $[0,1]^d$

A = QuasiMonteCarlo.sample(n, d, sample_method, output_type = Float64) # = QuasiMonteCarlo.sample(n,zeros(d),ones(d),sample_method)

where:

n is the number of points to sample.
lb is the lower bound for each variable. The length determines the dimensionality.
ub is the upper bound.
d is the dimension of the unit box.
sample_method is the quasi-Monte Carlo sampling strategy.
output_type controls the output type, Float64, Float32, Rational (for exact digital net representation), etc. This feature does not yet work with every QMC sequence.

Additionally, there is a helper function for generating design matrices:

k = 2
As = QuasiMonteCarlo.generate_design_matrices(n,
    lb,
    ub,
    sample_method,
    k,
    output_type = Float64)

which returns As which is an array of k design matrices A[i] that are all sampled from the same low-discrepancy sequence.

Available Sampling Methods

Sampling methods SamplingAlgorithm are divided into two subtypes

DeterministicSamplingAlgorithm
- GridSample for samples on a regular grid.
- SobolSample for the Sobol sequence.
- FaureSample for the Faure sequence.
- LatticeRuleSample for a randomly-shifted rank-1 lattice rule.
- HaltonSample for the Halton sequence.
- GoldenSample for a Golden Ratio sequence.
- KroneckerSample(alpha, s0) for a Kronecker sequence, where alpha is a length-d vector of irrational numbers (often sqrt(d)) and s0 is a length-d seed vector (often 0).
RandomSamplingAlgorithm
- UniformSample for uniformly distributed random numbers.
- LatinHypercubeSample for a Latin Hypercube.
- Additionally, any Distribution can be used, and it will be sampled from.

Adding a new sampling method

Adding a new sampling method is a two-step process:

Add a new SamplingAlgorithm type.
Overload the sample function with the new type.

All sampling methods are expected to return a matrix with dimension d by n, where d is the dimension of the sample space and n is the number of samples.

Example

struct NewAmazingSamplingAlgorithm{OPTIONAL} <: SamplingAlgorithm end

function sample(n, lb, ub, ::NewAmazingSamplingAlgorithm)
    if lb isa Number
        ...
        return x
    else
        ...
        return reduce(hcat, x)
    end
end

Randomization of QMC sequences

Most of the previous methods are deterministic, i.e. sample(n, d, Sampler()::DeterministicSamplingAlgorithm) always produces the same sequence $X = (X_1, \dots, X_n)$. There are two ways to obtain a randomized sequence:

Either directly use QuasiMonteCarlo.sample(n, d, DeterministicSamplingAlgorithm(R = RandomizationMethod())) or sample(n, lb, up, DeterministicSamplingAlgorithm(R = RandomizationMethod())).
Or, given $n$ points $d$-dimensional points, all in $[0,1]^d$ one can do randomize(X, ::RandomizationMethod()) where $X$ is a $d\times n$-matrix.

The currently available randomization methods are:

Scrambling methods ScramblingMethods(b, pad, rng) where b is the base used to scramble and pad the number of bits in b-ary decomposition. pad is generally chosen as $\gtrsim \log_b(n)$. The implemented ScramblingMethods are
- DigitalShift
- MatousekScramble a.k.a. Linear Matrix Scramble.
- OwenScramble a.k.a. Nested Uniform Scramble is the most understood theoretically, but is more costly to operate.
Shift(rng) a.k.a. Cranley Patterson Rotation.

For numerous independent randomization, use generate_design_matrices(n, d, ::DeterministicSamplingAlgorithm), ::RandomizationMethod, num_mats) where num_mats is the number of independent X generated.

Randomization Example

Randomization of a Faure sequence with various methods.

using QuasiMonteCarlo
m = 4
d = 3
b = QuasiMonteCarlo.nextprime(d)
N = b^m # Number of points
pad = m # Length of the b-ary decomposition number = sum(y[k]/b^k for k in 1:pad)

# Unrandomized (deterministic) low discrepancy sequence
x_faure = QuasiMonteCarlo.sample(N, d, FaureSample())

# Randomized version
x_nus = randomize(x_faure, OwenScramble(base = b, pad = pad)) # equivalent to sample(N, d, FaureSample(R = OwenScramble(base = b, pad = pad)))
x_lms = randomize(x_faure, MatousekScramble(base = b, pad = pad))
x_digital_shift = randomize(x_faure, DigitalShift(base = b, pad = pad))
x_shift = randomize(x_faure, Shift())
x_uniform = rand(d, N) # plain i.i.d. uniform

using Plots
# Setting I like for plotting
default(fontfamily = "Computer Modern",
    linewidth = 1,
    label = nothing,
    grid = true,
    framestyle = :default)

Plot the resulting sequences along dimensions 1 and 3.

d1 = 1
d2 = 3 # you can try every combination of dimensions (d1, d2)
sequences = [x_uniform, x_faure, x_shift, x_digital_shift, x_lms, x_nus]
names = [
    "Uniform",
    "Faure (deterministic)",
    "Shift",
    "Digital Shift",
    "Matousek Scramble",
    "Owen Scramble"
]
p = [plot(thickness_scaling = 1.5, aspect_ratio = :equal) for i in sequences]
for (i, x) in enumerate(sequences)
    scatter!(p[i], x[d1, :], x[d2, :], ms = 0.9, c = 1, grid = false)
    title!(names[i])
    xlims!(p[i], (0, 1))
    ylims!(p[i], (0, 1))
    yticks!(p[i], [0, 1])
    xticks!(p[i], [0, 1])
    hline!(p[i], range(0, 1, step = 1 / 4), c = :gray, alpha = 0.2)
    vline!(p[i], range(0, 1, step = 1 / 4), c = :gray, alpha = 0.2)
    hline!(p[i], range(0, 1, step = 1 / 2), c = :gray, alpha = 0.8)
    vline!(p[i], range(0, 1, step = 1 / 2), c = :gray, alpha = 0.8)
end
plot(p..., size = (1500, 900))

quasimontecarlo.jl's People

Contributors

Stargazers

Watchers

quasimontecarlo.jl's Issues

Explanation/Tutorial on effective dimension in documentation

Effective dimension is a very important concept in QMC; it's necessary to understand what points you should use and how to . We should probably create a tutorial explaining them. (Otherwise people are likely to assume QMC points are a drop-in replacement for Monte Carlo points that can be used without additional thought. I know I did before learning more about them 😅 )

Estimators and Statistics for RQMC Integraiton

Using this issue to keep track of estimators we may want to implement.

Confidence intervals and standard errors for RQMC sequences
Median-of-means estimators (and perhaps other robust mean estimators?)
Control variates
Maybe control functionals (but I have no idea what those do)
?

respect the element type of the bounds

When the bounds are of Float32, the sampled data should have element type Float32 as well.

julia> QuasiMonteCarlo.sample(10,[-1f0,-1f10],[1f0,1f0],UniformSample())
2×10 Matrix{Float64}:
  0.467086    0.37927    -0.781625    0.36063    …   0.502589   -0.836102   -0.362692
 -9.25426e7  -3.14634e9  -9.59386e9  -6.39744e9     -1.94932e9  -5.11313e9  -4.61389e9

Caching in sampling to have `sample!`

Is your feature request related to a problem? Please describe.

Implement caching in the sample function using preallocated memory.

Describe the solution you’d like

Define a sample! API, which takes a sample matrix as the argument and similar arguments as the sample one.

Incorrect Halton Sequence

In v0.3 the numbers returned by the HaltonSample() changed and I believe are now incorrect.

The first 4 points in d = 2 used to return (in v0.2)

julia> s = QuasiMonteCarlo.sample(4, zeros(2), ones(2), LowDiscrepancySample([2, 3], false))
2×4 Matrix{Float64}:
 0.5       0.25      0.75      0.125
 0.333333  0.666667  0.111111  0.444444

which are the correct four points for the Halton Sequences in d=2 with bases [2, 3].

In the new version, the bases are internally selected with bases = nextprimes(one(n), d) which returns [2, 3] for two dimensions.

However, the points returned are entirely different and the sequence changes based on the number of samples requested like

julia> s = QuasiMonteCarlo.sample(4, 2, HaltonSample())
2×4 Matrix{Float64}:
 0.125  0.625     0.375     0.875
 0.125  0.458333  0.791667  0.236111

and

julia> s = QuasiMonteCarlo.sample(5, 2, HaltonSample())
2×5 Matrix{Float64}:
 0.1  0.6       0.35      0.85      0.225
 0.1  0.433333  0.766667  0.211111  0.544444

Error in sample() usage

Documentation says this method signature should work.

julia> using QuasiMonteCarlo
julia> d, n = 4, 2^15
(4, 32768)
julia> Z = QuasiMonteCarlo.sample(n, d, SobolSample());
ERROR: MethodError: no method matching sample(::Int64, ::Int64, ::SobolSample)

Closest candidates are:
  sample(::Integer, ::Union{Number, Tuple, AbstractVector}, ::Union{Number, Tuple, AbstractVector}, ::FaureSample)
  ...

Improve docstrings

Such that makedocs warnonly = [:missing_docs] can be removed.

Better Sampling Interface

The sampling interface is a bit of a mess at the moment. The way users are asked to specify sample sizes is clunky and unnatural, demanding that they perform unusual calculations to avoid shooting themselves in the foot. Sobol nets, for example, only exist for powers of two, but users can request any sample size. At the moment we just give users these sample sizes by truncating the sequence, even though these truncated sequences are not guaranteed to converge to the correct answer, and will typically perform worse than Monte Carlo point sets. For Faure nets, the situation is safer (we error if the sample size is inappropriate) but even more annoying for users (they have to calculate multiples of powers of prime numbers).

For digital nets, a better approach is outlined here, and used by Art Owen in his QMC packages. Rather than request a sample size, users can provide parameters for a point set's stratification properties (m), how many independent replications they'd like (λ), and a base (b), before providing them λ*b^m points. I'd also suggest a feature where users can set an approximate sample size by using a keyword, then receive a net with the smallest possible net larger than the requested sample size.

This would be a breaking change for sure.

Citation

I am finishing revising a paper where I use QuasiMonteCarlo.jl.
I read about how to cite Julia packages (here).
What is the common practice for SciML packages (guess answer is here)?
As there are no paper yet, can something that work?

@misc{2019_QMC_jl,
title = {\texttt{QuasiMonteCarlo.jl}},
author = {{Various contributors}},
year = {2019},
howpublished = {\url{https://github.com/SciML/QuasiMonteCarlo.jl}}
}

BTW, any paper plans? Since 1/4 of the code comes from various packages (Sobol, Lattice, Latin, I am not sure if this is a legit idea or not,

Error when the number of samples is zero

MWE:

julia> QuasiMonteCarlo.sample(0, [0.0], [1.0], SobolSample())
ERROR: MethodError: reducing over an empty collection is not allowed; consider supplying `init` to the reducer
Stacktrace:
  [1] reduce_empty(op::Base.MappingRF{typeof(eltype), typeof(promote_type)}, #unused#::Type{Vector{Float64}})
    @ Base ./reduce.jl:356
  [2] reduce_empty_iter
    @ ./reduce.jl:379 [inlined]
  [3] mapreduce_empty_iter(f::Function, op::Function, itr::Vector{Vector{Float64}}, ItrEltype::Base.HasEltype)
    @ Base ./reduce.jl:375
  [4] _mapreduce
    @ ./reduce.jl:427 [inlined]
  [5] _mapreduce_dim
    @ ./reducedim.jl:365 [inlined]
  [6] #mapreduce#765
    @ ./reducedim.jl:357 [inlined]
  [7] mapreduce
    @ ./reducedim.jl:357 [inlined]
  [8] reduce(#unused#::typeof(hcat), A::Vector{Vector{Float64}})
    @ Base ./abstractarray.jl:1653
  [9] sample(n::Int64, lb::Vector{Float64}, ub::Vector{Float64}, #unused#::SobolSample)
    @ QuasiMonteCarlo ~/.julia/packages/QuasiMonteCarlo/L0JHj/src/QuasiMonteCarlo.jl:186
 [10] top-level scope
    @ REPL[13]:1

Shouldn't this error be a proper exception @ChrisRackauckas?

cc: @AnasAbdelR

use SafeTestsets

The tests for this package are a bit unwieldy.
Best to split them up using SafeTestsets.

Halton Sample not defined

Hi,

Thanks for this great package.

It looks like I can't use HaltonSample. If I run your example code

using QuasiMonteCarlo, Distributions
lb = [0.1,-0.5]
ub = [1.0,20.0]
n = 5
d = 2
s = QuasiMonteCarlo.sample(n,lb,ub,HaltonSample([10,3], false))

I get the following error: UndefVarError: HaltonSample not defined. Is this a known problem or am I missing something?

Thanks

Evenly distributed samples?

Please try the following case:

s1 = QuasiMonteCarlo.sample(25, [0.0], [1.0], LatinHypercubeSample())
sort(s1[1,:])’
s2 = QuasiMonteCarlo.sample(25, [0.0], [1.0], LatinHypercubeSample())
sort(s2[1,:])’
s3 = QuasiMonteCarlo.sample(25, [0.0], [1.0], LatinHypercubeSample())
sort(s3[1,:])’
s4 = QuasiMonteCarlo.sample(25, [0.0], [1.0], LatinHypercubeSample())
sort(s4[1,:])’

Output:
(sort(s1[1, :]))' = [0.02 0.06 0.1 0.13999999999999999 0.18 0.22 0.26 0.30000000000000004 0.34 0.38 0.42000000000000004 0.46 0.5 0.54 0.5800000000000001 0.62 0.66 0.7000000000000001 0.74 0.78 0.8200000000000001 0.86 0.9 0.9400000000000001 0.98]
(sort(s2[1, :]))' = [0.02 0.06 0.1 0.13999999999999999 0.18 0.22 0.26 0.30000000000000004 0.34 0.38 0.42000000000000004 0.46 0.5 0.54 0.5800000000000001 0.62 0.66 0.7000000000000001 0.74 0.78 0.8200000000000001 0.86 0.9 0.9400000000000001 0.98]
(sort(s3[1, :]))' = [0.02 0.06 0.1 0.13999999999999999 0.18 0.22 0.26 0.30000000000000004 0.34 0.38 0.42000000000000004 0.46 0.5 0.54 0.5800000000000001 0.62 0.66 0.7000000000000001 0.74 0.78 0.8200000000000001 0.86 0.9 0.9400000000000001 0.98]
(sort(s4[1, :]))' = [0.02 0.06 0.1 0.13999999999999999 0.18 0.22 0.26 0.30000000000000004 0.34 0.38 0.42000000000000004 0.46 0.5 0.54 0.5800000000000001 0.62 0.66 0.7000000000000001 0.74 0.78 0.8200000000000001 0.86 0.9 0.9400000000000001 0.98]

All samples are located at the centers of 25 evenly divided subregions: such as [0,0.04], [0.04, 0.08], ...., [0.96, 1.0].
Meanwhile, the results from four times sampling are same except their orders before sorting.

Integrate QMCGenerators.jl Features

See here, and thanks to @alegresor for this wonderful package! I think we should move to add these features, but the code may require additional tests and cleanup to fit into the QMC.jl interface.

Remove `Distributions` from dependencies

OK, so I originally thought Distributions was here so you could use QMC sequences to sample from a distribution, but apparently that just doesn't work.

julia> import QuasiMonteCarlo as QMC

julia> using Distributions

julia> QMC.sample(1024, zeros(10), ones(10), SobolSample(), Normal())
ERROR: MethodError: no method matching sample(::Int64, ::Vector{Float64}, ::Vector{Float64}, ::SobolSample, ::Normal{Float64})
Closest candidates are:
  sample(::Integer, ::Union{Number, Tuple, AbstractVector}, ::Union{Number, Tuple, AbstractVector}, ::SobolSample) at ~/.julia/packages/QuasiMonteCarlo/iBmQ6/src/QuasiMonteCarlo.jl:205
  sample(::Integer, ::Union{Number, Tuple, AbstractVector}, ::Union{Number, Tuple, AbstractVector}, ::QuasiMonteCarlo.SamplingAlgorithm) at ~/.julia/packages/QuasiMonteCarlo/iBmQ6/src/QuasiMonteCarlo.jl:146
  sample(::Integer, ::Union{Number, Tuple, AbstractVector}, ::Union{Number, Tuple, AbstractVector}, ::FaureSample) at ~/.julia/packages/QuasiMonteCarlo/iBmQ6/src/Faure.jl:45
  ...

Is there any reason to keep Distributions hanging around, or should we just remove the dependency?

`LatticeRuleSample` documentation vague

The documentation for LatticeRuleSample provides no information or references as to how the generating vector is chosen or why it should be expected to perform well, which is especially weird given users can't use their own.

Register

@JuliaRegistrator register()

Calculating discrepancies (esp. L2) and searching for optimized nets

There are fast discrepancy calculations that can be used to estimate the quality of a low-discrepancy point set, see Section 10.2 (page 167) for example. We could then try optimizing point sets to minimize these discrepancies; L2 is both easy to calculate and closely related to the mean squared error when the function being integrated is modeled as a Gaussian process.

Surrogates model breaks downstream

It looks like moving from [email protected] to [email protected] breaks a surrogate model called GEKPLS downstream in Surrogates.jl.

This gist may help pinpoint the issue.

https://github.com/SciML/QuasiMonteCarlo.jl/pull/50 breaks downstream

Noticed in SciML/Surrogates.jl#417, #50 broke a lot of downstream since some of the Kronecker methods are missing ub,lb dispatches. This needs to get fixed ASAP @ParadaCarleton

`KroneckerSample` problems

I can't work out how KroneckerSample is supposed to work. I'm not sure whether this is because the docstring is extremely unclear, or if the code is actually broken.

Base `b` Gray code

Having a Gray code implementation for arbitrary bases should speed up point set generation.

no stable docs

ERROR: UndefVarError: `UniformSample` not defined

The Uniform samples does not seem to be exported/defined?

using QuasiMonteCarlo
UniformSample

gives

ERROR: UndefVarError: `UniformSample` not defined

typo in README

instead
As = QuasiMonteCarlo.generate_design_matrices(n,lb,ub,sample_method,k=2)

it works with

k=2
As = QuasiMonteCarlo.generate_design_matrices(n,lb,ub,sample_method,k)

skip=n argument for parallelization?

It would be nice to support parallelization, simply by exposing the skip feature of generators like Sobol.jl. That is, if you want to evaluate on 1000 processors with 2^15 points each, you could pass skip=2^15 * i on processor i, and then average the results.

Add assert to catch lb,ub bug

Currently, switching ub and lb won't error.

TagBot trigger issue

This issue is used to trigger TagBot; feel free to unsubscribe.

If you haven't already, you should update your TagBot.yml to include issue comment triggers.
Please see this post on Discourse for instructions and more details.

If you'd like for me to do this for you, comment TagBot fix on this issue.
I'll open a PR within a few hours, please be patient!

Lazy iteration over points

It's generally a lot better for performance to iterate lazily over QMC points, especially when the number of points is quite large. When users need the whole set, there's nothing stopping them from calling collect.

Sobol' Sequence Missing First 4 Points

Running the following code

using QuasiMonteCarlo
QuasiMonteCarlo.sample(4,0,1,SobolSample())

yields

4-element Vector{Float64}:
 0.375
 0.875
 0.625
 0.125

However, the first 4 points in a 1 dimensional Sobol' sample should be [0, .5, .25, .75]. The output are points at index 5 to 8.
I am looping in my collaborator @fjhickernell, an expert in QMC methods who will be valuable to this discussion.

QMC vs Randomized QMC

Hello,
I was wondering if I was misunderstanding some things about this package:

It gathers different option to fill d-dimensional intervals with sequences.
Quasi Monte Carlo uses deterministic sequences.
To me the term sample expect some kind of randomness. Hence, it must refer to Randomized Quasi Monte Carlo.
Indeed, for some sequences e.g. GridSample, LatticeRuleSample this is what happens.
For example the LatticeRuleSample are generated and then shifted randomly.
However, some sequences are generated deterministically, e.g. SobolSample, LowDiscrepancySample using the same sample function.

Am I missing something, or the sample function is used both for deterministic Quasi Monte Carlo sequences and random Quasi Monte Carlo sequences?

GridSample appears to sample on a line

It looks like GridSample is generating points in a line. I would expect the points to form a grid.

> using QuasiMonteCarlo
> QuasiMonteCarlo.sample(5, [-6.0, -6.0], [6.0, 6.0], GridSample())
2×5 Matrix{Float64}:
 -4.8  -2.4  0.0  2.4  4.8
 -4.8  -2.4  0.0  2.4  4.8

Randomized Halton sequence?

Question❓

The documentation states that the base for scrambling should match the base of the sequence.

In principle, the base b used for scrambling methods ScramblingMethods(b, pad, rng) can be an arbitrary integer. However, to preserve good Quasi Monte Carlo properties, it must match the base of the sequence to scramble.

How does this relate to the Halton sequence, where the next d primes are used as bases? Are the randomizations applicable here? If so, which base would yield the best results?

What is the base of the LatticeRule? 2?

`GoldenSample` Incorrect

I compared the results here to those given by GoldenSequences.jl, and the implementation here looks wrong. It looks like the mistake is that each is based on φ[i]^i for i in 1:d, rather than powers of the same ratio φ[d]^i (as they should be).