Git Product home page Git Product logo

Comments (26)

dww100 avatar dww100 commented on June 18, 2024

@raar1 What are the reasons @djgroen gives for YAML? My feeling is that we are pro-YAML as we code a lot in Python. For a wider use it's possible JSON is more 'transferable'.
Flip side is obviously readability and comments. But no one likes whitespace errors which are definitely more of a YAML thing.
Also I suppose in terms of more complex inputs YAML may be better. I think we need to have a good reason to change though, rather than just Derek and I having a personal preference.

from easyvvuq.

dww100 avatar dww100 commented on June 18, 2024

I like the proposal in point 3. Could it made such that a default or fallback variation could be optionally supplied for a given parameter?

from easyvvuq.

djgroen avatar djgroen commented on June 18, 2024

@dww100 Main reason for YAML is indeed that almost all tools in VECMA is or will be Python-based (RADICAL, QCG, MUSCLE3, FabSim). In addition, MUSCLE3 will switch to YAML (from cxa) as well I believe and FabSim already uses it. In addition, YAML is more human-readable than JSON, which given the average bugfesty nature of EU project software, is a big boon.

Lastly, any JSON can be converted to YAML, see http://yaml.org/spec/1.2/spec.html#id2759572, so even using JSON as storage with a convertor to YAML can be an option.

Whitespace errors are an issue, but we may be able to create or adopt checking routines for that (and use those with CI).

Re #3, I think the variation probability distribution could be provided if present, but I think it should be up to the UQPs themselves to apply algorithms for efficiently exploring that parameter space..?

from easyvvuq.

raar1 avatar raar1 commented on June 18, 2024

As far as I can tell, JSON is in a subset of YAML, so presumably YAML can handle more complex data structures? It's not clear if we would want it to be more complicated than that handled by JSON, though.

I suppose there are already user-friendly YAML validation routines available in the python lib we could just use, for checking whitespace issues etc. We would obviously need to write more checks on top of that, for vecma specific requirements.

Re #3, I agree with the above. However: the UQPs need to take some kind of arguments. If those arguments are not given in the current input file, they will need to be specified somewhere else. Having too many input files also becomes confusing to new users.

I think a better way to do it might be to specify the prior distribution of each parameter (which is not a matter for the UQP to decide). So we specify static, uniform, normal or file, where file is a histogram (encoded perhaps as a list of options) or a list of strings (such as forcefield names). The UQP then decides how to pick from these distributions in accordance with its algorithm. It can, of course, override this info about the prior distribution, but I think that starts becoming a bit problematic.

You can of course have, say, a UQP_Sweep (which just does parameter sweeps) or a UQP_Normal (which picks from normal distributions for input params) but then how does the UQP know which params to do that for? Does it just assume every non-static parameter is e.g. "normal"? What if you want some parameters to be systematically incremented, while others to be stochastic? For example, let's say you want to vary the LJ cut-off for an MD sim according to a normal distribution (stochastic), but do that equally for each type of forcefield available (non-stochastic). If UQPs are overly specialised then they can't do this. But if they are general, then the same info we currently have still needs to be provided.

I fundamentally like the idea of only specifying a few general statements about each parameter (e.g. float, positive, < 1000.0) but in practice I think you still end up having to specify all the rest of it anyway.

Any ideas?

from easyvvuq.

dww100 avatar dww100 commented on June 18, 2024

I guess we could go for something like this as a full specification for each parameter:

parameter_name:
    type:
    range:
    variation:

You may make one of range/variation optional in some cases as they may well not make sense for a particular UQP (which one being optional being UQP dependent)?

I would favour an approach where we provide functions to create template JSON/YAML for different applications where everything is static and then the user modifies the things to be varied. So you would expect for most cases everything to be variation: static and range either a set value or default if we have a mechanism to do such a thing (i.e. if we find most applications assume defaults).

from easyvvuq.

raar1 avatar raar1 commented on June 18, 2024

OK, so we solve it by making some information optional. The UQP could start by checking whether it has been given enough information on each parameter. Simpler UQPs maybe don't need to know so much.

I would prefer to call it "limits" rather than "range" though (assuming that's the sense in which you meant it) to avoid confusion with python range function.

"Variation" may also be better called "prior" or "distribution" since I reckon the WP2 guys will want to be able to specify the initial distributions for their input params (distributions which may also have been generated by a different part of vecma)

from easyvvuq.

dww100 avatar dww100 commented on June 18, 2024

limits is much better than range

I don't like prior as I think is non-intuitive for a lot of users. I can see the case for distribution but was thinking along the lines of a binary flag for some UQPs where the use just wants to say what to vary and what to keep static. I admit that variation isn't a great name either. Maybe sampling or a derivative thereof?

from easyvvuq.

djgroen avatar djgroen commented on June 18, 2024

I like limits better than range as well, as it sounds a bit more generic.

One idea here could be to keep the name distribution. For static values I would omit the distribution and limits field, and instead have a value field, to make it easy to read.

from easyvvuq.

dww100 avatar dww100 commented on June 18, 2024

@djgroen I don't like that as it adds an extra field which is effectively a flag. However, maybe the values would work as a generic tag for the field - which would then be a list of two objects - a flag static/distribution/normal or whatever and a second value that could be a single value, filename or list of parameters for a selected distribution as appropriate.

from easyvvuq.

djgroen avatar djgroen commented on June 18, 2024

@dww100 I'm not sure I understand what you mean. My idea was to have the value field store the actual static value of the parameter, not some boolean value.

For distributions, you would not have that field, but a distribution field containing a distribution type (e.g. uniform or gaussian). Then, depending on the distribution type, other fields could be added (such as limits, stdev, or average, depending on the distribution type specified).

from easyvvuq.

dww100 avatar dww100 commented on June 18, 2024

I mean that you would have the following for a set value:

sausage_length:
    type: float
    values: [static, 5.0]

but for a variable to be picked from a distribution:

sausage_length:
    type: float
    limits: [>0, <100]
    values: [normal, [10.0, 5.0]]

Obviously, the limits would be optional as they may not be relevant given the distribution (here the two values are mean and standard deviation).

from easyvvuq.

dww100 avatar dww100 commented on June 18, 2024

Okay so after a real world discussion with @djgroen I'm thinking that a viable compromise in terms of having everything or only minimal inputs in the user JSON/YAML is this:

We have two classes of wrapper - one generic (based on input file style - i.e. key-value pairs, simple JSON) and the other application specific

  • Generic - all required values must be specified by the user and no tests possible on whether they are sensible
  • Application specific - only the interesting values specified, there is a defaults JSON/YAML read by the wrapper. This allows checks on the validity of inputs.

from easyvvuq.

raar1 avatar raar1 commented on June 18, 2024

Could a user not specify the "sensible" limits along with the required values (when using the generic wrapper)? Or do you have even more application specific checks on the values in mind?

from easyvvuq.

dww100 avatar dww100 commented on June 18, 2024

What I meant was that there can be no check of whether 7 is a sensible option for forcefield or whatever in the application. I didn't mean that you couldn't put ranges in but that it wouldn't be possible to say whether those could be interpreted by the target application.

from easyvvuq.

raar1 avatar raar1 commented on June 18, 2024

OK, I understand. Would such a check be encoded in the "defaults" file then? Or enforced through actual code in the application specific wrapper?

I'm thinking along the lines of having a verification primitive (VVP). A simple such primitive would be one that simply checks that inputs (and later outputs) are within tolerated, physical limits.

from easyvvuq.

dww100 avatar dww100 commented on June 18, 2024

I was thinking that we might have some simple things done via the "defaults" file (type checks and things which have defined lists of accepted values being the obvious things) but you could add more complex logic to the checking in the wrapper if it makes sense (for example if optionA=True conflicts with optionB=True).

from easyvvuq.

dww100 avatar dww100 commented on June 18, 2024

I like the idea of a VVP along the lines you suggest but it could get complex. Would be well worth thinking about how it could be done though.

from easyvvuq.

raar1 avatar raar1 commented on June 18, 2024

In response to some of the things we discussed, I have made changes to the EasyVVUQ prototype.

  1. The input file now has an "app" block and a "params" block. The "params" block, as before, specifies what parameters you want varied, but the "app" block contains information about what wrapper to use for the application.
  2. I've added a "generic wrapper", that uses a template of an input file and performs simple substitution on it, FabSim style (e.g. if the params block specifies a parameter called "density", the generic wrapper will look for anything called "$density" in the template. Obviously this wrapper requires more information than the application specific wrapper. You provide this info in the "app" block (see 1.) with the labels "template", "run-cmd" and "inputfilename".

At present you can use either the application specific wrapper (test using "bash run_test1.sh) or the generic wrapper (test using "bash run_test2.sh").

from easyvvuq.

dww100 avatar dww100 commented on June 18, 2024

@raar1 I need a refresher on what we decided and what we simply haven't updated here before we move to making more specific/complex encoders and UQPs. Especially after we replaced wrappers with encoders.

Firstly and most importantly, is the "params" section in the current test JSON files in the format we want (see discussion above about limits etc.)?

from easyvvuq.

raar1 avatar raar1 commented on June 18, 2024

We didn't reach any concrete agreements as far as I remember. We'll need to design the new params input format.

I'll set the ball rolling:

For cannonsim testapp we currently have e.g.:

"params": {
    "angle":            ["range", [1.0, 2.0, 0.25]],
    "air_resistance":   ["range", [0.1, 0.3, 0.1]],
    "height":           ["normal", [10.0, 1.0, 3]],
    "time_step":        ["static", 0.01],
    "gravity":          ["static", 9.8],
    "mass":             ["static", 1.0],
    "velocity":         ["static", 10.0]
    }

How about something more like:

"params": {
    "angle":               {
                                  "distribution": {
                                                            "type": "range",
                                                            "min": 0.1,
                                                            "max": 2.0,
                                                            "increment": 0.25
                                                        },
                                  "type": "real",
                                  "min": 0,
                                  "max": 6.28
                             },
    "height":           {
                                  "distribution": {
                                                            "type": "normal",
                                                            "mean": 10.0,
                                                            "stdev": 1.0,
                                                            "num_samples": 3
                                                        },
                                  "type": "real",
                                  "min": 0,
                                  "max": "Inf"
                             },
    "time_step":      {
                                 "distribution": {
                                                           "type": "static",
                                                           "value": 0.01
                                                       },
                                  "type": "real",
                             }
    "velocity":          {
                                  distribution: {
                                                          "type": "file",
                                                           "filename": "/home/alovelace/histos/velocity_dist.histo",
                                                           "format": "histogram",
                                                           "num_samples": 10
                                                      },
                                  "type": "real",
                                  "min": "-Inf",
                                  "max": "Inf"
                             }
    "gravity":           {
                                  "type": "real",             # No distrubution given
                                  "min": "0.0",               # easyvvuq will expect
                                  "max": "100.0"           # the UQP to provide it
                             }

   "forcefield":        {
                                  distribution: {
                                                          "type": "list",
                                                           "options": ["graff", "reaxff", "CCSPVC"]
                                                      },
                                  "type": "string"
                             }
}

It's very verbose, but you get the idea...

  • You can specify (or not) the distribution of of each parameter as either "range", "normal", "file" (containing e.g. histogram or something like that), "list" or "static".
  • Not specifying a distribution for a parameter will cause easyvvuq to expect something else to set it later e.g. a UQP. If not it will complain.
  • parameter types ("int", "real", "string") and limits ("min" and "max") are mostly for verification, and for easyvvuq to fail early and hard if crazy parameter values are somehow generated.
  • parameter types may be optional when the distribution is specified, but if it is not, then they probably need to have been given. Some UQPs will need to know, for example, what the limits were in order to build the distribution.

My idea is that maybe we make a Distribution base class that is then extended to make the Range, Normal, HistoFile and Static built-in distribution classes (similar to the Encoder setup). Users can of course write more distribution classes as needed. Or we simply class it all as UQPs to be applied later? (but that would make it harder to see what was going on from the JSON input file - need to think about what our design principles are here).

from easyvvuq.

dww100 avatar dww100 commented on June 18, 2024

Is HistoFile a real format or just an illustrative placeholder (I know this is a tangent)

from easyvvuq.

dww100 avatar dww100 commented on June 18, 2024

I quite like that format.

Comments:

  1. Should we have a flag for logarithmic scaled things? Otherwise UQP may sample totally irrelevant space if given a large range.
  2. Is it sensible to allow combination of human and UQP specified variations?
  3. Future extension - units or eval (jumping off point for this was for specifying angles in N * pi radians)

from easyvvuq.

raar1 avatar raar1 commented on June 18, 2024

Unfortunately it's just an illustrative placeholder.

But I was thinking pretty much any histogram saved as a csv should look the same (or could be made to). For example:

angle, prob
0.0, 0.0025
0.2, 0.1211
0.4, 0.2305
.
.
.

etc.

Wouldn't be hard to pick from this distribution.

We could certainly have a flag for logarithmic things, if you think that's needed.

Point 2 is something I'm wondering about as well. The alternative is that we only specify types and limits in this file (and, possibly, any static values). Then all distributions are applied in the same place - in the python script. Then they'd all be some kind of UQP object.

from easyvvuq.

dww100 avatar dww100 commented on June 18, 2024

Follow on from the histogram point; what other way would we specify a distribution in a file? If only histogram we should make that implicit.

from easyvvuq.

raar1 avatar raar1 commented on June 18, 2024

I'm not sure of other ways at present. I was just thinking of using that format until someone has an application for which that somehow doesn't work.

from easyvvuq.

raar1 avatar raar1 commented on June 18, 2024

Based on our (on-going) discussion about what params should look like, and how a sensitivity analysis (with convergence checking) followed by a UQ might look like in the user's script:

my_campaign = uq.Campaign(state_filename="test_input/test_cannonsim.json")

while True:
    latinID = uq.samplers.LatinHyperCube(my_campaign, varies=["angle", "height", "gravity", "resistance"])
    my_campaign.execute()
    success = uq.analysis.Convergence(my_campaign, cfilter=Filter("cut-off", "<", "7.0"))
    if success:
        break

    sensitive_params = uq.analysis.Sensitivity(my_campaign, id=latinID)

uq.uqp.basicUQP(my_campaign, varies=sensitive_params)

Other points raised:

  • Campaigns should be assigned a random unique directory (acting also as their unique ID) as soon as they are created, and not wait until populate_runs_dir() to do so.
  • User would provide a list of variable params and static params to start with
  • There would be a defaults file for each app, specifying default values for each parameter
  • Logging: Campaign should log every operation carried out on it
  • Some sort of domain specific language/filter class/thing that will allow us to filter for all runs with, say, cut-off less than 7.0
  • UQPs state their "requirements" in some fashion. The campaign object is automatically queried to check that it meets these requirements before the UQP will apply itself. For example, an analysis UQP will require that all runs have completed before attempting to analyse the results

Any points I missed, please add them below.

from easyvvuq.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.