The sampling interface is a bit of a mess at the moment. The way users are asked to sp

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Better Sampling Interface about quasimontecarlo.jl HOT 15 OPEN

sciml commented on June 12, 2024

Better Sampling Interface

from quasimontecarlo.jl.

Comments (15)

ChrisRackauckas commented on June 12, 2024

This makes a lot of sense.

from quasimontecarlo.jl.

dmetivie commented on June 12, 2024

About making the API evolve, I had few of questions/comments:

Could we change the name LowDiscrepancySample for HaltonSample?
I find it very confusing, as Low Discrepency is a generic term used to define all QMC sequences.
Could we allow the method sample(n, d::Integer, SamplingAlgorithm()) for [0,1]^d sampling without arbitrary bounds.
This is for people only working in [0,1]^d make more sense. Plus, randomization methods are defined in this interval.
I wondered if some sequence like Faure, Halton, Sobol could have an option to be produced as Rational numbers (since they are often exactly defined with fraction //). On performance, I am not sure of the result, but it would for example allow exact binary expansion in base b).
Following this post (and ref there), could we allow alternative Sobol sample? Like S0b0lSample (with zero instead of o to highlight hat the zero is in the sequence, contrary to the SobolSample). The name is probably a bad idea!
I have a piece of code from Owen website doing that.

from quasimontecarlo.jl.

ParadaCarleton commented on June 12, 2024

About making the API evolve, I had few of questions/comments:

Could we change the name LowDiscrepancySample for HaltonSample?
I find it very confusing, as Low Discrepency is a generic term used to define all QMC sequences.

Could we allow the method sample(n, d::Integer, SamplingAlgorithm()) for [0,1]^d sampling without arbitrary bounds.
This is for people only working in [0,1]^d make more sense. Plus, randomization methods are defined in this interval.

I wondered if some sequence like Faure, Halton, Sobol could have an option to be produced as Rational numbers (since they are often exactly defined with fraction //). On performance, I am not sure of the result, but it would for example allow exact binary expansion in base b).

Following this post (and ref there), could we allow alternative Sobol sample? Like S0b0lSample (with zero instead of o to highlight hat the zero is in the sequence, contrary to the SobolSample). The name is probably a bad idea!
I have a piece of code from Owen website doing that.

I agree with all of these, and I think they'd be great additions! I was planning on making PRs for most of these myself. I think the Sobol Sample should just be centered, though--every point should be placed in the middle of the appropriate box, rather than at the start. So the first two points would be 1/4 and 3/4 instead of 0 and 1/2, for example. This is what FaureSample currently does.

from quasimontecarlo.jl.

ChrisRackauckas commented on June 12, 2024

Could we change the name LowDiscrepancySample for HaltonSample?
I find it very confusing, as Low Discrepency is a generic term used to define all QMC sequences.

100% agreed. It's something one of the students did that I never ended up correcting. It's one of the reasons I wouldn't make a v1.0 on this library yet 😅.

Could we allow the method sample(n, d::Integer, SamplingAlgorithm()) for [0,1]^d sampling without arbitrary bounds. This is for people only working in [0,1]^d make more sense. Plus, randomization methods are defined in this interval.

Yes, though the sample(n, lb, ub, sampler) is still going to be the core algorithm that most people use. We can document sample(n, d::Integer, SamplingAlgorithm()) though, and it would serve as a nice abstraction to implement sample(n, lb, ub, sampler) generically just by rescaling the outputs of sample(n, d::Integer, SamplingAlgorithm()).

I wondered if some sequence like Faure, Halton, Sobol could have an option to be produced as Rational numbers (since they are often exactly defined with fraction //). On performance, I am not sure of the result, but it would for example allow exact binary expansion in base b).

The interface should be type-matching. If you use Rational for your lb and ub, then you should get Rational values for the points. This is one reason why a shorthand for sample(n, d::Integer, SamplingAlgorithm()) is scary: it's necessarily information dropping.

Following this JuliaMath/Sobol.jl#31 (comment) (and ref there), could we allow alternative Sobol sample? Like S0b0lSample (with zero instead of o to highlight hat the zero is in the sequence, contrary to the SobolSample). The name is probably a bad idea!

Or just make it an option?

from quasimontecarlo.jl.

ParadaCarleton commented on June 12, 2024

Could we change the name LowDiscrepancySample for HaltonSample?
I find it very confusing, as Low Discrepency is a generic term used to define all QMC sequences.

100% agreed. It's something one of the students did that I never ended up correcting. It's one of the reasons I wouldn't make a v1.0 on this library yet sweat_smile.

With regards to names, I actually think we should drop Sample from all the types; it's 6 extra letters but contains 0 new information.

Could we allow the method sample(n, d::Integer, SamplingAlgorithm()) for [0,1]^d sampling without arbitrary bounds. This is for people only working in [0,1]^d make more sense. Plus, randomization methods are defined in this interval.

Yes, though the sample(n, lb, ub, sampler) is still going to be the core algorithm that most people use. We can document sample(n, d::Integer, SamplingAlgorithm()) though, and it would serve as a nice abstraction to implement sample(n, lb, ub, sampler) generically just by rescaling the outputs of sample(n, d::Integer, SamplingAlgorithm()).

Yep, that makes sense! Although actually, it makes me think of a possibly-better interface. We might want users to specify what kind of region they want (e.g. Box(lb, ub), or Box(d, type=Float64) for the unit box), then map to said region if it's not a box; by default we can have a method that assumes the unit box.

I wondered if some sequence like Faure, Halton, Sobol could have an option to be produced as Rational numbers (since they are often exactly defined with fraction //). On performance, I am not sure of the result, but it would for example allow exact binary expansion in base b).

The interface should be type-matching. If you use Rational for your lb and ub, then you should get Rational values for the points. This is one reason why a shorthand for sample(n, d::Integer, SamplingAlgorithm()) is scary: it's necessarily information dropping.

I don't think that's a huge deal; we can add an optional type argument that defaults to Float64.

Following this stevengj/Sobol.jl#31 (comment) (and ref there), could we allow alternative Sobol sample? Like S0b0lSample (with zero instead of o to highlight hat the zero is in the sequence, contrary to the SobolSample). The name is probably a bad idea!

Or just make it an option?

I'm not sure we should include an option--there's really no reason to drop the initial 0 when we can use the centered Sobol samples instead, which have better discrepancy and don't have the problem of an initial 0. Including an option would just be a "shoot self in foot" argument.

from quasimontecarlo.jl.

ChrisRackauckas commented on June 12, 2024

With regards to names, I actually think we should drop Sample from all the types; it's 6 extra letters but contains 0 new information.

It can cause namespacing issues when exported. And Sobol is already the module name of Sobol.jl

Box(d, type=Float64)

Passing DataTypes is almost always a bad idea because of how that plays with specialization heuristics. I mean, you can do it, but any function that isn't careful will lose inference.

Yep, that makes sense! Although actually, it makes me think of a possibly-better interface. We might want users to specify what kind of region they want (e.g. Box(lb, ub), or Box(d, type=Float64) for the unit box), then map to said region if it's not a box; by default we can have a method that assumes the unit box.

sample(n,region,sampler)?

from quasimontecarlo.jl.

ParadaCarleton commented on June 12, 2024

With regards to names, I actually think we should drop Sample from all the types; it's 6 extra letters but contains 0 new information.

It can cause namespacing issues when exported. And Sobol is already the module name of Sobol.jl

Hmm, maybe, but would that come up all that often? The majority of users probably don't have 2 separate packages for doing QMC. And using import QuasiMonteCarlo as QMC instead of using QuasiMonteCarlo isn't too much of a problem--QMC.Faure is still shorter than FaureSample.

As for Sobol, that shouldn't be a problem because we don't reexport it, right?

Box(d, type=Float64)

Passing DataTypes is almost always a bad idea because of how that plays with specialization heuristics. I mean, you can do it, but any function that isn't careful will lose inference.

Hmm, that's strange; I haven't had any problems with it before. Is Box{Float64}(d) any better?

Yep, that makes sense! Although actually, it makes me think of a possibly-better interface. We might want users to specify what kind of region they want (e.g. Box(lb, ub), or Box(d, type=Float64) for the unit box), then map to said region if it's not a box; by default we can have a method that assumes the unit box.

sample(n,region,sampler)?

Yep!

from quasimontecarlo.jl.

ParadaCarleton commented on June 12, 2024

Another suggestion--maybe it's better to use pointset or a similar name instead of sample? Partly this is so it doesn't conflict with sample, since not exporting sample is annoying for users. But mostly this is because I think sample gives the wrong impression that QMC is a drop-in replacement for Monte Carlo. Latin hypercube and Faure samples work like that, but not much else; Sobol samples in particular assume only the first few dimensions are relevant to your problem, so you have to choose your parametrization carefully or you'll get bad results (much worse than LHS).

from quasimontecarlo.jl.

ChrisRackauckas commented on June 12, 2024

I agree pointset is a good name. I think what we want to do is update the interface and then claim it to be the v1.0 release, so now would be the time to break things if they need to break.

from quasimontecarlo.jl.

ParadaCarleton commented on June 12, 2024

@dcarrera Just wanted to ask about this because you opened an issue about it in Sobol.jl. Would you be interested in making a pull request implementing your suggested interface here in QMC.jl?

from quasimontecarlo.jl.

dcarrera commented on June 12, 2024

@ParadaCarleton Hey! I'm not sure I feel qualified to implement this. As I noted in that other issue on Sobol.jl, I looked at the implementations scipy.stats.qmc.Sobol and Art Owen's R package. Superficially it didn't LOOK complicated, but I could not figure out what the code was doing in either case. I think I just don't understand the Sobol algorithm ell enough. I mentioned in that other issue that the R package is BSD licensed. So someone that understands it could translate it to Julia. I just had another look at the R code:

https://artowen.su.domains/code/rsobol.R

I don't know R but I can see that there are two ways to scrambe the points: A "nested uniform" ("nestu") and "Matousek's linear scramble" ("mato"). A minimal Julia version only needs one way to scramble the points. I had a look, but I don't really understand how they work.

from quasimontecarlo.jl.

Better Sampling Interface about quasimontecarlo.jl HOT 15 OPEN

Comments (15)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent