xenonnt / alea Goto Github PK
View Code? Open in Web Editor NEWA tool to perform toyMC-based inference constructions
Home Page: https://alea.readthedocs.io/en/latest
License: BSD 3-Clause "New" or "Revised" License
A tool to perform toyMC-based inference constructions
Home Page: https://alea.readthedocs.io/en/latest
License: BSD 3-Clause "New" or "Revised" License
Currently, the code implemented in Parameters
__call__
contains two for-loops. Time this and try to find a more optimal (and still human-readable) alrernative.
It would be really nice for future run combinations etc. if we can make a statistical model from several configs each coding for a likelihood term!
So you would have one file each for SR0, SR1 etc. (once you fix each of them)
Some parameters, like the WIMP mass, are typically fixed in our analysis. However, currently, it is possible to change the nominal value of the wimp_mass
parameter or fix it to a different value in the fit, which just doesn't do anything. I think we should either enable this or raise an error in case the user wants to modify such a parameter value.
Probably the easiest is to introduce a property static
of a parameter that is true if
I guess in case we want to allow this, we would need to reload the likelihood so you might as well just make a new instance of the model with the new value.
When @hammannr presented Alea in Paris, we saw an example where a known bad fit just returned nominal values. Inspecting the minuit output showed the expected fit failure error-- should we store fit flag in the toyMC output?
Lines 42 to 45 in e4b29cd
We'd better clarify how to implement the binned and unbinned template in the documentation.
Assign the extra_dont_hash_settings
based on named_parameters
.
This could probably be used 99% of the cases and saves some space in the config ๐
Lines 268 to 273 in 6e7efd4
This is causing errors at https://github.com/XENONnT/alea/actions/runs/5710944865/job/15471836050?pr=69
A Minuit
instance is initialized every time we call StatisticalModel.fit
, in some cases when there are a lot of hypotheses, the computational efficiency is not so high.
alea/alea/statistical_model.py
Lines 243 to 245 in 80f6b16
I'd suggest introducing a few more descriptive names for some of the important args:
parameter_of_interest
: str as a parameter necessary for any construction (this would e.g. be 'signal_rate_multiplier'
for the standard nT analysis and 'mu'
for the Gaussian model)extra_args
use hypotheses
in the fits (this just makes things clearer imo)Right now, this is not implemented but all the parts are there.
We want to be able to define a range of toyMC runs (with different true generate_args, different signals tested, etc) with a steering script, saying for example
"these wimp masses and true signal expectations should be looped over", with N toyMCs per run and M parallel runs for each.
it should
toy_simulation
fitting
of StatisticalModel
inference_interafce
run_toy_fit(generate_parameters, fit_parameters, , data)
which call toy_simulation
The option already exists in the BlueiceExtendedModel (at least in the example config) but it should be checked thoroughly that it is applied correctly and works as expected.
Science is based on reproducibility. We should set the seed of each iteration of toy data generation. And maybe store them in toydata file or in other forms.
We could infer the analysis_space
from the templates instead of defining them explicitly also in the config. We would then only need to assert that all sources have the same analysis space.
the XENON inference is often run on the midway computing cluster.
Software should be made to:
We could implement a data generator equivalent to the 'science data generator' in BlueiceDataGenerator
for the CustomAncillaryLikelihood
class. The .simulate()
method should essentially do what we currently do in ._generate_ancillary_measurements()
.
This way we can unify the definition and simplify the data generation by just calling [gen.simulate(**generate_values) for gen in self.data_generators]
for all likelihood terms.
Describe the bug
If you initialise the gaussian example likelihood but do not set default values of the parameters, the warnings do not clearly point out the issue
To Reproduce
Steps to reproduce the behavior:
from alea.examples import gaussian_model
simple_model = gaussian_model.GaussianModel()
simple_model.data = simple_model.generate_data(mu=0,sigma=2)
simple_model.fit()
Fails with
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In [14], line 4
2 simple_model = gaussian_model.GaussianModel()
3 simple_model.data = simple_model.generate_data(mu=0,sigma=2)
----> 4 simple_model.fit()
File ~/Desktop/alea_tests/alea/statistical_model.py:243, in StatisticalModel.fit(self, verbose, **kwargs)
241 # Make the Minuit object
242 cost.errordef = Minuit.LIKELIHOOD
--> 243 m = Minuit(
244 MinuitWrap(cost, s_args=self.parameters.names),
245 **defaults)
246 fixed_params = [] if fixed_parameters is None else fixed_parameters
247 fixed_params += self.parameters.not_fittable
File ~/opt/miniconda3/envs/myenv/lib/python3.8/site-packages/iminuit/minuit.py:625, in Minuit.__init__(***failed resolving arguments***)
617 self._strategy = MnStrategy(1)
618 self._fcn = FCN(
619 fcn,
620 getattr(fcn, "grad", grad),
621 array_call,
622 getattr(fcn, "errordef", 1.0),
623 )
--> 625 self._init_state = _make_init_state(self._pos2var, start, kwds)
626 self._values = mutil.ValueView(self)
627 self._errors = mutil.ErrorView(self)
File ~/opt/miniconda3/envs/myenv/lib/python3.8/site-packages/iminuit/minuit.py:2483, in _make_init_state(pos2var, args, kwds)
2481 for i, x in enumerate(pos2var):
2482 val = kwds[x] if kwds else args[i]
-> 2483 err = mutil._guess_initial_step(val)
2484 state.add(x, val, err)
2485 return state
File ~/opt/miniconda3/envs/myenv/lib/python3.8/site-packages/iminuit/util.py:1308, in _guess_initial_step(val)
1307 def _guess_initial_step(val: float) -> float:
-> 1308 return 1e-2 * abs(val) if val != 0 else 1e-1
TypeError: bad operand type for abs(): 'NoneType'
Expected behavior
A short error message should inform the user that minuit needs guesses (via nominal values) OR you must set the value for all parameters.
Screenshots
If applicable, add screenshots to help explain your problem.
Additional context
Add any other context about the problem here.
After a discussion with @kdund and @dachengx we decided to improve the static functionality introduced in #100 in the following way:
static
we make it a ptype
, which we'll call needs_reinit
wimp_mass
e.g. it would make sense to use this parameter because for most applications you won't use it as a shape parameter but then you can't be allowed to change it's nominal value without re-initializing the entire model.For multiple reasons, a fraction of jobs can not be successfully finished. Maybe we should not submit a job when the output is already there.
Line 358 in c889587
Prepare some basic templates and example config file, which can be used for
This could e.g. be generic NR, ER and WIMP templates in cS1-cS2 from default NEST.
Initially, I thought we could simply get away with evaluating the str if it starts with numpy. or scipy. but for the variation of the ancillary measurements, we'd need to know which of the parameters corresponds to the measured value.
One option to solve this would be to enable a wildcard $MEASUREMENT$
in the str (e.g. uncertainty: "scipy.stats.uniform(loc=-1+$MEASUREMENT$, scale=2"
or "scipy.stats.skewnorm(a=4, loc=$MEASUREMENT$, scale=2"
). But also not specifying it might be ok, e.g. if there are physical bounds and the measurement doesn't affect the constraints, maybe in this case we just want to raise a warning at the beginning or something like this.
But I don't know if this is the most user-friendly approach so if anyone has a better suggestion just let me know! ๐
In binference, I ran into an issue that two servers had (slightly) different versions of templates.
To enable tracking the exact versions we could simply store the hashes of all used templates in the toy results.
But this is not at all critical but I think a nice-to-have feature ๐
We should re-introduce the option to provide fit bounds and parameter interval bounds in units of expectation values.
After internal discussion, we came to the conclusion that this is probably best done via a boolean expectation_value_bounds
(preliminary name), which lives in parameters.py
.
We'll add a function to call with an expectation value, which does the conversion.
For higher dimensional confidential intervals, the Chi2 distribution in this function should not be 1 degree of freedom.
We also need to make a way to pass the added argument from the configuration file.
Lines 286 to 310 in c889587
Describe the bug
A clear and concise description of what the bug is.
To Reproduce
Steps to reproduce the behavior:
Expected behavior
A clear and concise description of what you expected to happen.
Screenshots
If applicable, add screenshots to help explain your problem.
Additional context
Add any other context about the problem here.
Several searches use 1D spectral fits, binned or unbinned.
They use sometimes functional forms for background/signals, and sometimes templates.
It would be great to make a general class for these searches!
Such a class should:
This class could be defined inside alea, or perhaps in another package depending on alea?
The most important point will be that it is on the same form as the other statistical model classes, allowing you to move between just fitting and full Neyman constructions relatively easily.
All parameters should also be either "settable"-- e.g. likelihood
or "fittable" -- e.g. signal rate-- both should ideally be able to set in toyMC generation and likelihood evaluation
Right now we have the following options for toydata_mode:
write
: generate toys and write them to toyfile
read
: read toys from toyfile
none
: generate toys but don't write them to file
pass
: neither generate nor read toys. This is to be used with actual data.
I think the names could be a bit more descriptive and I'd suggest:
write
โ generate_and_write
read
โ read
none
โ generate
(would match the naming of generate_args)
pass
โ no_toydata
Of course, one would then properly deprecate the other names.
Based on the ll_nt_from_config, implements the standard nT-likelihood shape.
Line 13 in bcad6fd
It can be useful to just add some throw-away computations in addition to the main named ones ( "discovery_power", "threshold", "sensitivity", opinions? It should just be a question of throwing a warning rather than an error if another name is used as long as it exists in the run config file.
It was discussed that it is a bit misleading to call a function toy_simulation without any toys involved.
We should take the computing part outside (call it run_inference() or something like this?), which loops through extra_args, runs the fits, and if specified the intervals. This could then be called directly if you want to use it with data instead of toys (toydata_mode="pass") and toy_simulation can call it after deciding whether to generate or read toy data.
Once things settle we should add nice example notebooks and initialize binder to make it easy for people to get started! ๐
Introduce classes that handle all parameters of the likelihood.
The Parameter
class could have the following attributes:
name
(str)nominal_value
(float)fittable
(boolean)type
(str, e.g. shape, rate, livetime, etc.)uncertainty
(float or str, if str we could allow eval of np and scipy functions to facilitate the definition of non-gaussian constraints)relative_uncertainty
(bool)Of course, not all of them have to be set, only the name will be mandatory. But some functionality is only accessible if certain values are set.
We will add a ParameterList
class which will collect all parameters of a model and can have convenient getters like
model.parameter_list.names
โ returns list of all parameters (probably also something like model.parameter_list.names(fittable=True)
to return all fittable parameters)model.parameter_list(mu=5)
should return a dict of all the defaults but mu = 5model.parameter_list.sigma.nominal_value = 2
to set the nominal value etc.model.parameter_list.sigma.nominal_value
to get the nominal value of sigma etc.I'm not yet entirely sure how to implement this in the best way but I'll give it a try and @dachengx suggested to take Yossis rframe package as a source of inspiration.
For each toy make sure to also store the generate_parameters
together with the toy data.
In binference we had a method get_pdfs()
, which would return a list of templates for given generate values. I think it would be nice to add this functionality to alea. It should be possible to access the templates in the blueice likelihood objects, I think, which would ensure that it has all the slicing, etc. already applied to it.
Split the configs of toyMC running and likelihood definition. This way we can have the same likelihood config for all kinds of different studies and don't get mixed up.
Both configs should then be always copied to the output to retrace steps.
Examples, tutorials, usage of:
Parameters
BlueiceDataGenerator
StatisticalModel
, particularly the degree of freedom of required attributes and functions, like the type
of data
BlueiceExtendedModel
and required format of statistical model and runner configuration:
We now have BlueiceExtendedModel
which is designed based on blueice. It will be beneficial to clearly describe the structure of BlueiceExtendedModel
instance, like its attributes and attributes of its attributes. I believe some of them should be in blueices's documentation, but also good to first implement here.
This would be very useful for e.g. implementing efficiency parameters or histogram multipliers-- instead of multiple layers of multiplication, they would all be on the surface
You might want to be able to combine some power computation or sth with computing confidence interval distributions-- but the latter should only be for the free hypothesis, for example.
Enhancement after alea1.0
In binference, the definition of a parameter in the config was distributed over the entire file and it was easy to lose the overview.
I'd suggest using a structure like the following to increase the readability of parameter initialization in alea:
parameter_definition = {
wimp_mass: {
nominal_value: 50,
fixed: True, # or use floating instead?
},
livetime_0: {
nominal_value: 1.,
fixed: True,
},
livetime_1: {
nominal_value: 1.,
fixed: True,
},
signal_rate_multiplier: {
nominal_value: 1.,
type: "rate", # or fix all of this internally in the likelihood i.e. inspect from parameter name what the parameter does.
fixed: False,
},
er_rate_multiplier: {
nominal_value: 1.,
type: "rate",
uncertainty: .2,
relative_uncertainty: True,
fixed: False,
},
par1: {
nominal_value: 0,
type: "shape",
uncertainty: $some_expression_to_define_a_constraint, # we could allow any scipy and np expression e.g.
relative_uncertainty: False,
fixed: False,
},
}
Once a set of toyMC runs have been stored, a script or class should be defined to take these (possibly incomplete) outputs, as well as the running script defining the desired computations and store several condensed outputs when applicable:
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.