sciml / quasimontecarlo.jl Goto Github PK

Lightweight and easy generation of quasi-Monte Carlo sequences with a ton of different methods on one API for easy parameter exploration in scientific machine learning (SciML)

Home Page: https://docs.sciml.ai/QuasiMonteCarlo/stable/

License: MIT License

Julia 100.00%

sciml scientific-machine-learning physics-informed-learning quasi-monte-carlo parameter-estimation julia

quasimontecarlo.jl's Issues

Randomized Halton sequence?

Question❓

The documentation states that the base for scrambling should match the base of the sequence.

In principle, the base b used for scrambling methods ScramblingMethods(b, pad, rng) can be an arbitrary integer. However, to preserve good Quasi Monte Carlo properties, it must match the base of the sequence to scramble.

How does this relate to the Halton sequence, where the next d primes are used as bases? Are the randomizations applicable here? If so, which base would yield the best results?

What is the base of the LatticeRule? 2?

Remove `Distributions` from dependencies

OK, so I originally thought Distributions was here so you could use QMC sequences to sample from a distribution, but apparently that just doesn't work.

julia> import QuasiMonteCarlo as QMC

julia> using Distributions

julia> QMC.sample(1024, zeros(10), ones(10), SobolSample(), Normal())
ERROR: MethodError: no method matching sample(::Int64, ::Vector{Float64}, ::Vector{Float64}, ::SobolSample, ::Normal{Float64})
Closest candidates are:
  sample(::Integer, ::Union{Number, Tuple, AbstractVector}, ::Union{Number, Tuple, AbstractVector}, ::SobolSample) at ~/.julia/packages/QuasiMonteCarlo/iBmQ6/src/QuasiMonteCarlo.jl:205
  sample(::Integer, ::Union{Number, Tuple, AbstractVector}, ::Union{Number, Tuple, AbstractVector}, ::QuasiMonteCarlo.SamplingAlgorithm) at ~/.julia/packages/QuasiMonteCarlo/iBmQ6/src/QuasiMonteCarlo.jl:146
  sample(::Integer, ::Union{Number, Tuple, AbstractVector}, ::Union{Number, Tuple, AbstractVector}, ::FaureSample) at ~/.julia/packages/QuasiMonteCarlo/iBmQ6/src/Faure.jl:45
  ...

Is there any reason to keep Distributions hanging around, or should we just remove the dependency?

Explanation/Tutorial on effective dimension in documentation

Effective dimension is a very important concept in QMC; it's necessary to understand what points you should use and how to . We should probably create a tutorial explaining them. (Otherwise people are likely to assume QMC points are a drop-in replacement for Monte Carlo points that can be used without additional thought. I know I did before learning more about them 😅 )

Base `b` Gray code

Having a Gray code implementation for arbitrary bases should speed up point set generation.

GridSample appears to sample on a line

It looks like GridSample is generating points in a line. I would expect the points to form a grid.

> using QuasiMonteCarlo
> QuasiMonteCarlo.sample(5, [-6.0, -6.0], [6.0, 6.0], GridSample())
2×5 Matrix{Float64}:
 -4.8  -2.4  0.0  2.4  4.8
 -4.8  -2.4  0.0  2.4  4.8

Surrogates model breaks downstream

It looks like moving from [email protected] to [email protected] breaks a surrogate model called GEKPLS downstream in Surrogates.jl.

This gist may help pinpoint the issue.

https://github.com/SciML/QuasiMonteCarlo.jl/pull/50 breaks downstream

Noticed in SciML/Surrogates.jl#417, #50 broke a lot of downstream since some of the Kronecker methods are missing ub,lb dispatches. This needs to get fixed ASAP @ParadaCarleton

no stable docs

ERROR: UndefVarError: `UniformSample` not defined

The Uniform samples does not seem to be exported/defined?

using QuasiMonteCarlo
UniformSample

gives

ERROR: UndefVarError: `UniformSample` not defined

`GoldenSample` Incorrect

I compared the results here to those given by GoldenSequences.jl, and the implementation here looks wrong. It looks like the mistake is that each is based on φ[i]^i for i in 1:d, rather than powers of the same ratio φ[d]^i (as they should be).

Sobol' Sequence Missing First 4 Points

Running the following code

using QuasiMonteCarlo
QuasiMonteCarlo.sample(4,0,1,SobolSample())

yields

4-element Vector{Float64}:
 0.375
 0.875
 0.625
 0.125

However, the first 4 points in a 1 dimensional Sobol' sample should be [0, .5, .25, .75]. The output are points at index 5 to 8.
I am looping in my collaborator @fjhickernell, an expert in QMC methods who will be valuable to this discussion.

typo in README

instead
As = QuasiMonteCarlo.generate_design_matrices(n,lb,ub,sample_method,k=2)

it works with

k=2
As = QuasiMonteCarlo.generate_design_matrices(n,lb,ub,sample_method,k)

Improve docstrings

Such that makedocs warnonly = [:missing_docs] can be removed.

Incorrect Halton Sequence

In v0.3 the numbers returned by the HaltonSample() changed and I believe are now incorrect.

The first 4 points in d = 2 used to return (in v0.2)

julia> s = QuasiMonteCarlo.sample(4, zeros(2), ones(2), LowDiscrepancySample([2, 3], false))
2×4 Matrix{Float64}:
 0.5       0.25      0.75      0.125
 0.333333  0.666667  0.111111  0.444444

which are the correct four points for the Halton Sequences in d=2 with bases [2, 3].

In the new version, the bases are internally selected with bases = nextprimes(one(n), d) which returns [2, 3] for two dimensions.

However, the points returned are entirely different and the sequence changes based on the number of samples requested like

julia> s = QuasiMonteCarlo.sample(4, 2, HaltonSample())
2×4 Matrix{Float64}:
 0.125  0.625     0.375     0.875
 0.125  0.458333  0.791667  0.236111

and

julia> s = QuasiMonteCarlo.sample(5, 2, HaltonSample())
2×5 Matrix{Float64}:
 0.1  0.6       0.35      0.85      0.225
 0.1  0.433333  0.766667  0.211111  0.544444

use SafeTestsets

The tests for this package are a bit unwieldy.
Best to split them up using SafeTestsets.

Estimators and Statistics for RQMC Integraiton

Using this issue to keep track of estimators we may want to implement.

Confidence intervals and standard errors for RQMC sequences
Median-of-means estimators (and perhaps other robust mean estimators?)
Control variates
Maybe control functionals (but I have no idea what those do)
?

Add assert to catch lb,ub bug

Currently, switching ub and lb won't error.

Integrate QMCGenerators.jl Features

See here, and thanks to @alegresor for this wonderful package! I think we should move to add these features, but the code may require additional tests and cleanup to fit into the QMC.jl interface.

Error in sample() usage

Documentation says this method signature should work.

julia> using QuasiMonteCarlo
julia> d, n = 4, 2^15
(4, 32768)
julia> Z = QuasiMonteCarlo.sample(n, d, SobolSample());
ERROR: MethodError: no method matching sample(::Int64, ::Int64, ::SobolSample)

Closest candidates are:
  sample(::Integer, ::Union{Number, Tuple, AbstractVector}, ::Union{Number, Tuple, AbstractVector}, ::FaureSample)
  ...

Halton Sample not defined

Hi,

Thanks for this great package.

It looks like I can't use HaltonSample. If I run your example code

using QuasiMonteCarlo, Distributions
lb = [0.1,-0.5]
ub = [1.0,20.0]
n = 5
d = 2
s = QuasiMonteCarlo.sample(n,lb,ub,HaltonSample([10,3], false))

I get the following error: UndefVarError: HaltonSample not defined. Is this a known problem or am I missing something?

Thanks

Lazy iteration over points

It's generally a lot better for performance to iterate lazily over QMC points, especially when the number of points is quite large. When users need the whole set, there's nothing stopping them from calling collect.

TagBot trigger issue

This issue is used to trigger TagBot; feel free to unsubscribe.

If you haven't already, you should update your TagBot.yml to include issue comment triggers.
Please see this post on Discourse for instructions and more details.

If you'd like for me to do this for you, comment TagBot fix on this issue.
I'll open a PR within a few hours, please be patient!

skip=n argument for parallelization?

It would be nice to support parallelization, simply by exposing the skip feature of generators like Sobol.jl. That is, if you want to evaluate on 1000 processors with 2^15 points each, you could pass skip=2^15 * i on processor i, and then average the results.

Caching in sampling to have `sample!`

Is your feature request related to a problem? Please describe.

Implement caching in the sample function using preallocated memory.

Describe the solution you’d like

Define a sample! API, which takes a sample matrix as the argument and similar arguments as the sample one.

`LatticeRuleSample` documentation vague

The documentation for LatticeRuleSample provides no information or references as to how the generating vector is chosen or why it should be expected to perform well, which is especially weird given users can't use their own.

Calculating discrepancies (esp. L2) and searching for optimized nets

There are fast discrepancy calculations that can be used to estimate the quality of a low-discrepancy point set, see Section 10.2 (page 167) for example. We could then try optimizing point sets to minimize these discrepancies; L2 is both easy to calculate and closely related to the mean squared error when the function being integrated is modeled as a Gaussian process.

Evenly distributed samples?

Please try the following case:

s1 = QuasiMonteCarlo.sample(25, [0.0], [1.0], LatinHypercubeSample())
sort(s1[1,:])’
s2 = QuasiMonteCarlo.sample(25, [0.0], [1.0], LatinHypercubeSample())
sort(s2[1,:])’
s3 = QuasiMonteCarlo.sample(25, [0.0], [1.0], LatinHypercubeSample())
sort(s3[1,:])’
s4 = QuasiMonteCarlo.sample(25, [0.0], [1.0], LatinHypercubeSample())
sort(s4[1,:])’

Output:
(sort(s1[1, :]))' = [0.02 0.06 0.1 0.13999999999999999 0.18 0.22 0.26 0.30000000000000004 0.34 0.38 0.42000000000000004 0.46 0.5 0.54 0.5800000000000001 0.62 0.66 0.7000000000000001 0.74 0.78 0.8200000000000001 0.86 0.9 0.9400000000000001 0.98]
(sort(s2[1, :]))' = [0.02 0.06 0.1 0.13999999999999999 0.18 0.22 0.26 0.30000000000000004 0.34 0.38 0.42000000000000004 0.46 0.5 0.54 0.5800000000000001 0.62 0.66 0.7000000000000001 0.74 0.78 0.8200000000000001 0.86 0.9 0.9400000000000001 0.98]
(sort(s3[1, :]))' = [0.02 0.06 0.1 0.13999999999999999 0.18 0.22 0.26 0.30000000000000004 0.34 0.38 0.42000000000000004 0.46 0.5 0.54 0.5800000000000001 0.62 0.66 0.7000000000000001 0.74 0.78 0.8200000000000001 0.86 0.9 0.9400000000000001 0.98]
(sort(s4[1, :]))' = [0.02 0.06 0.1 0.13999999999999999 0.18 0.22 0.26 0.30000000000000004 0.34 0.38 0.42000000000000004 0.46 0.5 0.54 0.5800000000000001 0.62 0.66 0.7000000000000001 0.74 0.78 0.8200000000000001 0.86 0.9 0.9400000000000001 0.98]

All samples are located at the centers of 25 evenly divided subregions: such as [0,0.04], [0.04, 0.08], ...., [0.96, 1.0].
Meanwhile, the results from four times sampling are same except their orders before sorting.

`KroneckerSample` problems

I can't work out how KroneckerSample is supposed to work. I'm not sure whether this is because the docstring is extremely unclear, or if the code is actually broken.

Better Sampling Interface

The sampling interface is a bit of a mess at the moment. The way users are asked to specify sample sizes is clunky and unnatural, demanding that they perform unusual calculations to avoid shooting themselves in the foot. Sobol nets, for example, only exist for powers of two, but users can request any sample size. At the moment we just give users these sample sizes by truncating the sequence, even though these truncated sequences are not guaranteed to converge to the correct answer, and will typically perform worse than Monte Carlo point sets. For Faure nets, the situation is safer (we error if the sample size is inappropriate) but even more annoying for users (they have to calculate multiples of powers of prime numbers).

For digital nets, a better approach is outlined here, and used by Art Owen in his QMC packages. Rather than request a sample size, users can provide parameters for a point set's stratification properties (m), how many independent replications they'd like (λ), and a base (b), before providing them λ*b^m points. I'd also suggest a feature where users can set an approximate sample size by using a keyword, then receive a net with the smallest possible net larger than the requested sample size.

This would be a breaking change for sure.

QMC vs Randomized QMC

Hello,
I was wondering if I was misunderstanding some things about this package:

It gathers different option to fill d-dimensional intervals with sequences.
Quasi Monte Carlo uses deterministic sequences.
To me the term sample expect some kind of randomness. Hence, it must refer to Randomized Quasi Monte Carlo.
Indeed, for some sequences e.g. GridSample, LatticeRuleSample this is what happens.
For example the LatticeRuleSample are generated and then shifted randomly.
However, some sequences are generated deterministically, e.g. SobolSample, LowDiscrepancySample using the same sample function.

Am I missing something, or the sample function is used both for deterministic Quasi Monte Carlo sequences and random Quasi Monte Carlo sequences?

Citation

I am finishing revising a paper where I use QuasiMonteCarlo.jl.
I read about how to cite Julia packages (here).
What is the common practice for SciML packages (guess answer is here)?
As there are no paper yet, can something that work?

@misc{2019_QMC_jl,
title = {\texttt{QuasiMonteCarlo.jl}},
author = {{Various contributors}},
year = {2019},
howpublished = {\url{https://github.com/SciML/QuasiMonteCarlo.jl}}
}

BTW, any paper plans? Since 1/4 of the code comes from various packages (Sobol, Lattice, Latin, I am not sure if this is a legit idea or not,

Error when the number of samples is zero

MWE:

julia> QuasiMonteCarlo.sample(0, [0.0], [1.0], SobolSample())
ERROR: MethodError: reducing over an empty collection is not allowed; consider supplying `init` to the reducer
Stacktrace:
  [1] reduce_empty(op::Base.MappingRF{typeof(eltype), typeof(promote_type)}, #unused#::Type{Vector{Float64}})
    @ Base ./reduce.jl:356
  [2] reduce_empty_iter
    @ ./reduce.jl:379 [inlined]
  [3] mapreduce_empty_iter(f::Function, op::Function, itr::Vector{Vector{Float64}}, ItrEltype::Base.HasEltype)
    @ Base ./reduce.jl:375
  [4] _mapreduce
    @ ./reduce.jl:427 [inlined]
  [5] _mapreduce_dim
    @ ./reducedim.jl:365 [inlined]
  [6] #mapreduce#765
    @ ./reducedim.jl:357 [inlined]
  [7] mapreduce
    @ ./reducedim.jl:357 [inlined]
  [8] reduce(#unused#::typeof(hcat), A::Vector{Vector{Float64}})
    @ Base ./abstractarray.jl:1653
  [9] sample(n::Int64, lb::Vector{Float64}, ub::Vector{Float64}, #unused#::SobolSample)
    @ QuasiMonteCarlo ~/.julia/packages/QuasiMonteCarlo/L0JHj/src/QuasiMonteCarlo.jl:186
 [10] top-level scope
    @ REPL[13]:1

Shouldn't this error be a proper exception @ChrisRackauckas?

cc: @AnasAbdelR

respect the element type of the bounds

When the bounds are of Float32, the sampled data should have element type Float32 as well.

julia> QuasiMonteCarlo.sample(10,[-1f0,-1f10],[1f0,1f0],UniformSample())
2×10 Matrix{Float64}:
  0.467086    0.37927    -0.781625    0.36063    …   0.502589   -0.836102   -0.362692
 -9.25426e7  -3.14634e9  -9.59386e9  -6.39744e9     -1.94932e9  -5.11313e9  -4.61389e9

sciml / quasimontecarlo.jl Goto Github PK

quasimontecarlo.jl's Issues

Recommend Projects

Recommend Topics

Recommend Org