Git Product home page Git Product logo

alea's People

Contributors

dachengx avatar hammannr avatar hoetzsch avatar kdund avatar pre-commit-ci[bot] avatar zihaoxu98 avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

Forkers

faroutylq

alea's Issues

Optimize Parameters call for performance

Currently, the code implemented in Parameters __call__ contains two for-loops. Time this and try to find a more optimal (and still human-readable) alrernative.

Allow the composition of several conf files

It would be really nice for future run combinations etc. if we can make a statistical model from several configs each coding for a likelihood term!
So you would have one file each for SR0, SR1 etc. (once you fix each of them)

Dealing with static shape parameters

Some parameters, like the WIMP mass, are typically fixed in our analysis. However, currently, it is possible to change the nominal value of the wimp_mass parameter or fix it to a different value in the fit, which just doesn't do anything. I think we should either enable this or raise an error in case the user wants to modify such a parameter value.

Probably the easiest is to introduce a property static of a parameter that is true if

  • the parameter has no blueice_anchors
  • is not a livetime
  • is not a rate
    If you try to change the value of a static parameter we will throw an error.

I guess in case we want to allow this, we would need to reload the likelihood so you might as well just make a new instance of the model with the new value.

Descriptive names

I'd suggest introducing a few more descriptive names for some of the important args:

  • parameter_of_interest: str as a parameter necessary for any construction (this would e.g. be 'signal_rate_multiplier' for the standard nT analysis and 'mu' for the Gaussian model)
  • Instead of extra_args use hypotheses in the fits (this just makes things clearer imo)

Steering script for multiple toyMC submission

We want to be able to define a range of toyMC runs (with different true generate_args, different signals tested, etc) with a steering script, saying for example
"these wimp masses and true signal expectations should be looped over", with N toyMCs per run and M parallel runs for each.

Add Runner class

it should

  1. have a method toy_simulation
  2. have a method to call fitting of StatisticalModel
  3. have a method inference_interafce
  4. have a method run_toy_fit(generate_parameters, fit_parameters, , data) which call toy_simulation

Implement seeds in alea

Science is based on reproducibility. We should set the seed of each iteration of toy data generation. And maybe store them in toydata file or in other forms.

Infer analysis_space from templates

We could infer the analysis_spacefrom the templates instead of defining them explicitly also in the config. We would then only need to assert that all sources have the same analysis space.

ToyMC submission to Midway

the XENON inference is often run on the midway computing cluster.
Software should be made to:

  • submit sbatch runs to midway using the utilix code
  • resubmit only if output is missing

Implement data generator for ancillary likelihood term

We could implement a data generator equivalent to the 'science data generator' in BlueiceDataGeneratorfor the CustomAncillaryLikelihood class. The .simulate() method should essentially do what we currently do in ._generate_ancillary_measurements().
This way we can unify the definition and simplify the data generation by just calling [gen.simulate(**generate_values) for gen in self.data_generators] for all likelihood terms.

Unset default parameters do not give readable warnings

Describe the bug
If you initialise the gaussian example likelihood but do not set default values of the parameters, the warnings do not clearly point out the issue

To Reproduce
Steps to reproduce the behavior:

from alea.examples import gaussian_model
simple_model = gaussian_model.GaussianModel()
simple_model.data = simple_model.generate_data(mu=0,sigma=2)
simple_model.fit()

Fails with

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In [14], line 4
      2 simple_model = gaussian_model.GaussianModel()
      3 simple_model.data = simple_model.generate_data(mu=0,sigma=2)
----> 4 simple_model.fit()

File ~/Desktop/alea_tests/alea/statistical_model.py:243, in StatisticalModel.fit(self, verbose, **kwargs)
    241 # Make the Minuit object
    242 cost.errordef = Minuit.LIKELIHOOD
--> 243 m = Minuit(
    244     MinuitWrap(cost, s_args=self.parameters.names),
    245     **defaults)
    246 fixed_params = [] if fixed_parameters is None else fixed_parameters
    247 fixed_params += self.parameters.not_fittable

File ~/opt/miniconda3/envs/myenv/lib/python3.8/site-packages/iminuit/minuit.py:625, in Minuit.__init__(***failed resolving arguments***)
    617 self._strategy = MnStrategy(1)
    618 self._fcn = FCN(
    619     fcn,
    620     getattr(fcn, "grad", grad),
    621     array_call,
    622     getattr(fcn, "errordef", 1.0),
    623 )
--> 625 self._init_state = _make_init_state(self._pos2var, start, kwds)
    626 self._values = mutil.ValueView(self)
    627 self._errors = mutil.ErrorView(self)

File ~/opt/miniconda3/envs/myenv/lib/python3.8/site-packages/iminuit/minuit.py:2483, in _make_init_state(pos2var, args, kwds)
   2481 for i, x in enumerate(pos2var):
   2482     val = kwds[x] if kwds else args[i]
-> 2483     err = mutil._guess_initial_step(val)
   2484     state.add(x, val, err)
   2485 return state

File ~/opt/miniconda3/envs/myenv/lib/python3.8/site-packages/iminuit/util.py:1308, in _guess_initial_step(val)
   1307 def _guess_initial_step(val: float) -> float:
-> 1308     return 1e-2 * abs(val) if val != 0 else 1e-1

TypeError: bad operand type for abs(): 'NoneType'

Expected behavior
A short error message should inform the user that minuit needs guesses (via nominal values) OR you must set the value for all parameters.

Screenshots
If applicable, add screenshots to help explain your problem.

Additional context
Add any other context about the problem here.

Improve "static" functionality

After a discussion with @kdund and @dachengx we decided to improve the static functionality introduced in #100 in the following way:

  • instead of having a property static we make it a ptype, which we'll call needs_reinit
  • This ptype will be the fallback option in the blueice extended model in case nothing else will be specified
  • For wimp_mass e.g. it would make sense to use this parameter because for most applications you won't use it as a shape parameter but then you can't be allowed to change it's nominal value without re-initializing the entire model.

Better examples

Prepare some basic templates and example config file, which can be used for

  • unittests
  • examples
  • debugging

This could e.g. be generic NR, ER and WIMP templates in cS1-cS2 from default NEST.

Implement str-type uncertainty for the BlueiceExtendedModel

Initially, I thought we could simply get away with evaluating the str if it starts with numpy. or scipy. but for the variation of the ancillary measurements, we'd need to know which of the parameters corresponds to the measured value.
One option to solve this would be to enable a wildcard $MEASUREMENT$ in the str (e.g. uncertainty: "scipy.stats.uniform(loc=-1+$MEASUREMENT$, scale=2" or "scipy.stats.skewnorm(a=4, loc=$MEASUREMENT$, scale=2"). But also not specifying it might be ok, e.g. if there are physical bounds and the measurement doesn't affect the constraints, maybe in this case we just want to raise a warning at the beginning or something like this.

But I don't know if this is the most user-friendly approach so if anyone has a better suggestion just let me know! ๐Ÿ˜Š

Template hash

In binference, I ran into an issue that two servers had (slightly) different versions of templates.
To enable tracking the exact versions we could simply store the hashes of all used templates in the toy results.

But this is not at all critical but I think a nice-to-have feature ๐Ÿ˜Š

expectation value bounds

We should re-introduce the option to provide fit bounds and parameter interval bounds in units of expectation values.
After internal discussion, we came to the conclusion that this is probably best done via a boolean expectation_value_bounds (preliminary name), which lives in parameters.py.
We'll add a function to call with an expectation value, which does the conversion.

Add degree of freedom to `asymptotic_critical_value`

For higher dimensional confidential intervals, the Chi2 distribution in this function should not be 1 degree of freedom.

We also need to make a way to pass the added argument from the configuration file.

alea/alea/utils.py

Lines 286 to 310 in c889587

def asymptotic_critical_value(confidence_interval_kind: str, confidence_level: float):
"""Return the critical value for the confidence interval.
Args:
confidence_interval_kind (str): confidence interval kind, either 'lower', 'upper' or
'central'
confidence_level (float): confidence level
Returns:
float: critical value
Caution:
The critical value is calculated from chi2 distribution with 1 degree of freedom.
"""
if confidence_interval_kind in {"lower", "upper"}:
critical_value = chi2(1).isf(2 * (1.0 - confidence_level))
elif confidence_interval_kind == "central":
critical_value = chi2(1).isf(1.0 - confidence_level)
else:
raise ValueError(
f"confidence_interval_kind must be either 'lower', 'upper' or 'central', "
f"not {confidence_interval_kind}"
)
return critical_value

Test issue

Describe the bug
A clear and concise description of what the bug is.

To Reproduce
Steps to reproduce the behavior:

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Additional context
Add any other context about the problem here.

Define a one-dimensional (spectral fit) statistical model suitable for low/high energy

Several searches use 1D spectral fits, binned or unbinned.
They use sometimes functional forms for background/signals, and sometimes templates.
It would be great to make a general class for these searches!

Such a class should:

  • Allow the definition of binned or unbinned likelihoods
  • Allow the definition of sources as:
    • Functional forms (in particular ease of putting in flat+gaussians)
    • Templates (using inference interface)
    • with rate uncertainties and ancillary constraints as in the blueice class.
    • With or without applying an efficiency curve + convolution either in true energy (standard convolution) or in reconstructed energy (as in the lowER search)
  • Allow combined fits of several 1D fits
  • Allow useful goodness of fit tests?

This class could be defined inside alea, or perhaps in another package depending on alea?
The most important point will be that it is on the same form as the other statistical model classes, allowing you to move between just fitting and full Neyman constructions relatively easily.

Improve naming of toydata_mode

Right now we have the following options for toydata_mode:

write: generate toys and write them to toyfile
read: read toys from toyfile
none: generate toys but don't write them to file
pass: neither generate nor read toys. This is to be used with actual data.
I think the names could be a bit more descriptive and I'd suggest:

write โ†’ generate_and_write
read โ†’ read
none โ†’ generate (would match the naming of generate_args)
pass โ†’ no_toydata
Of course, one would then properly deprecate the other names.

Allow arbitrary computation names

choices=[

It can be useful to just add some throw-away computations in addition to the main named ones ( "discovery_power", "threshold", "sensitivity", opinions? It should just be a question of throwing a warning rather than an error if another name is used as long as it exists in the run config file.

Clean up toydata mode "pass"

It was discussed that it is a bit misleading to call a function toy_simulation without any toys involved.
We should take the computing part outside (call it run_inference() or something like this?), which loops through extra_args, runs the fits, and if specified the intervals. This could then be called directly if you want to use it with data instead of toys (toydata_mode="pass") and toy_simulation can call it after deciding whether to generate or read toy data.

Example Notebooks

Once things settle we should add nice example notebooks and initialize binder to make it easy for people to get started! ๐Ÿ˜Š

Parameter and ParameterList class

Introduce classes that handle all parameters of the likelihood.
The Parameter class could have the following attributes:

  • name (str)
  • nominal_value (float)
  • fittable (boolean)
  • type (str, e.g. shape, rate, livetime, etc.)
  • uncertainty (float or str, if str we could allow eval of np and scipy functions to facilitate the definition of non-gaussian constraints)
  • relative_uncertainty (bool)
  • ...

Of course, not all of them have to be set, only the name will be mandatory. But some functionality is only accessible if certain values are set.

We will add a ParameterList class which will collect all parameters of a model and can have convenient getters like

  • model.parameter_list.names โ†’ returns list of all parameters (probably also something like model.parameter_list.names(fittable=True) to return all fittable parameters)
  • Something like model.parameter_list(mu=5) should return a dict of all the defaults but mu = 5
  • model.parameter_list.sigma.nominal_value = 2 to set the nominal value etc.
  • model.parameter_list.sigma.nominal_value to get the nominal value of sigma etc.

I'm not yet entirely sure how to implement this in the best way but I'll give it a try and @dachengx suggested to take Yossis rframe package as a source of inspiration.

Access to templates

In binference we had a method get_pdfs(), which would return a list of templates for given generate values. I think it would be nice to add this functionality to alea. It should be possible to access the templates in the blueice likelihood objects, I think, which would ensure that it has all the slicing, etc. already applied to it.

Split Configs

Split the configs of toyMC running and likelihood definition. This way we can have the same likelihood config for all kinds of different studies and don't get mixed up.

Both configs should then be always copied to the output to retrace steps.

Add documentation based on readthedocs

Examples, tutorials, usage of:

  1. Parameters
  2. BlueiceDataGenerator
  3. StatisticalModel, particularly the degree of freedom of required attributes and functions, like the type of data
  4. BlueiceExtendedModel

and required format of statistical model and runner configuration:

  1. what keys are needed
  2. rate and shape parameters
  3. parameter of interest

We now have BlueiceExtendedModel which is designed based on blueice. It will be beneficial to clearly describe the structure of BlueiceExtendedModel instance, like its attributes and attributes of its attributes. I believe some of them should be in blueices's documentation, but also good to first implement here.

Make config files more readable

In binference, the definition of a parameter in the config was distributed over the entire file and it was easy to lose the overview.
I'd suggest using a structure like the following to increase the readability of parameter initialization in alea:

parameter_definition = {
    wimp_mass: {
        nominal_value: 50,
        fixed: True,  # or use floating instead?
    },
    livetime_0: {
        nominal_value: 1.,
        fixed: True,
    },
    livetime_1: {
        nominal_value: 1.,
        fixed: True,
    },
    signal_rate_multiplier: {
        nominal_value: 1.,
        type: "rate",  # or fix all of this internally in the likelihood i.e. inspect from parameter name what the parameter does.
        fixed: False,
    },
    er_rate_multiplier: {
        nominal_value: 1.,
        type: "rate",
        uncertainty: .2,
        relative_uncertainty: True,
        fixed: False,
    },
    par1: {
        nominal_value: 0,
        type: "shape",
        uncertainty: $some_expression_to_define_a_constraint,  # we could allow any scipy and np expression e.g.
        relative_uncertainty: False,
        fixed: False,
    },
}

Results collation and processing

Once a set of toyMC runs have been stored, a script or class should be defined to take these (possibly incomplete) outputs, as well as the running script defining the desired computations and store several condensed outputs when applicable:

  • Neyman thresholds for different signal hypotheses, stored in a common format.
  • Sensitivity
  • Collated toyMC results (if requested)
  • Sensitivity (bands)
  • Discovery test statistic null distribution for discovery significance
  • Discovery power curves

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.