nuclear-verification-and-disarmament / bicyclus Goto Github PK
View Code? Open in Web Editor NEWBicyclus, a Bayesian Inference module for Cyclus
License: BSD 3-Clause "New" or "Revised" License
Bicyclus, a Bayesian Inference module for Cyclus
License: BSD 3-Clause "New" or "Revised" License
Slice
sampler.Slice
is not part of Bicyclus.bicyclus
?write_to_log_file(...)
in log.py
. One simple way to do this could be to use a with open()
context manager in log_print
?In this issue, I post problems, bugs etc. that are caused by PyMC as such and not by Bicyclus. To investigate these problems, I need help from outside or get into touch with the PyMC community.
pm.iter_sample
: How to correctly select initvals
?pm.sample
: Why does callback
receive a NoneType
instead of a trace?Found in bicyclus/blackbox/blackbox.py
and bicyclus/blackbox/likelihood.py
, respectively.
Do not pymc
's eval
function but rather calculate the log likelihood of, e.g., a Normal distribution either by hand or using scipy, both of which are much faster because PyMC does some lengthy stuff in the background.
Using pm.iter_sample
in examples/run.py
raises the following error:
Traceback (most recent call last):
File "/home/bicyclus/examples/run.py", line 316, in <module>
main()
File "/home/bicyclus/examples/run.py", line 312, in main
sample(args, pymc_model, initvals)
File "/home/bicyclus/examples/run.py", line 290, in sample
for trace in sampler:
File "/home/anaconda3/envs/pymc-env/lib/python3.10/site-packages/pymc/sampling.py", line 981, in iter_sample
for i, (strace, _) in enumerate(sampling):
File "/home/anaconda3/envs/pymc-env/lib/python3.10/site-packages/pymc/sampling.py", line 1062, in _iter_sample
point, stats = step.step(point)
File "/home/anaconda3/envs/pymc-env/lib/python3.10/site-packages/pymc/step_methods/arraystep.py", line 155, in step
apoint = DictToArrayBijection.map({v.name: point[v.name] for v in self.vars})
File "/home/anaconda3/envs/pymc-env/lib/python3.10/site-packages/pymc/step_methods/arraystep.py", line 155, in <dictcomp>
apoint = DictToArrayBijection.map({v.name: point[v.name] for v in self.vars})
KeyError: 'feed_assay_interval__'
I am currently investigating the error. Both pm.sample
and pm.iter_sample
eventually call pymc/sampling.py:1025
(function _iter_sample
) to perform the sampling. I suspect that pm.sample
does something to the initial values that transforms them into the correct format, compare the following pdb
output:
pm.sample
:
(Pdb) w
/home/bicyclus/toymodel/toymodel.py(452)<module>()
-> main()
/home/bicyclus/toymodel/toymodel.py(447)main()
-> arviz_summary, trace, emu, solver_str = run_simulations(model, **kwargs)
/home/bicyclus/toymodel/toymodel.py(291)run_simulations()
-> trace = pm.sample(
/home/anaconda3/envs/pymc-env/lib/python3.10/site-packages/pymc/sampling.py(655)sample()
-> mtrace = _sample_many(**sample_args)
/home/anaconda3/envs/pymc-env/lib/python3.10/site-packages/pymc/sampling.py(769)_sample_many()
-> trace = _sample(
/home/anaconda3/envs/pymc-env/lib/python3.10/site-packages/pymc/sampling.py(915)_sample()
-> for it, (strace, diverging) in enumerate(sampling):
/home/anaconda3/envs/pymc-env/lib/python3.10/site-packages/fastprogress/fastprogress.py(41)__iter__()
-> for i,o in enumerate(self.gen):
> /home/anaconda3/envs/pymc-env/lib/python3.10/site-packages/pymc/sampling.py(1025)_iter_sample()
-> model = modelcontext(model)
(Pdb) start
{'large_norm_interval__': array(1.44301298), 'small_unif_interval__': array(-0.17056391)}
pm.iter_sample
:
(Pdb) w
/home/bicyclus/toymodel/toymodel.py(452)<module>()
-> main()
/home/bicyclus/toymodel/toymodel.py(447)main()
-> arviz_summary, trace, emu, solver_str = run_simulations(model, **kwargs)
/home/bicyclus/toymodel/toymodel.py(283)run_simulations()
-> for trace in sampler:
/home/anaconda3/envs/pymc-env/lib/python3.10/site-packages/pymc/sampling.py(981)iter_sample()
-> for i, (strace, _) in enumerate(sampling):
> /home/anaconda3/envs/pymc-env/lib/python3.10/site-packages/pymc/sampling.py(1025)_iter_sample()
-> model = modelcontext(model)
(Pdb) start
{'large_norm': 7559.40626711426, 'small_unif': 0.2070833560593321}
Which function is this? I assume this will not work. To be tested and fixed!
bicyclus/bicyclus/util/util.py
Line 68 in 7fc6532
Add Used setup.py
setup.cfg
which is preferred for whatever reason.
Add list of dependencies โก๏ธ see README
๐จ How to deal with PyMC and Theano cherry-picked dependencies? ๐จ See comment below.
According to the setuptools documentation,
When your project is installed (e.g. using pip), all of the dependencies not already installed will be located (via PyPI), downloaded, built (if necessary), and installed [...].
Thus, I cannot simply put PyMC3
and arviz
into the requirements list. Maybe I could point pip
to the specific cherry-picked versions? I think for now, I will not put it in the requirements list and simply mention it in the README
.
See warning below:
python3 ../bicyclus/visualize/merge.py data/cyclus_trace_BicyclusExample_BicyclusExample_29979821_000
2022-09-23 11:37:03.507647 :: running with Namespace(infiles=['data/cyclus_trace_BicyclusExample_BicyclusExample_29980243_000_0000.cdf'], trace_dir='/max/data/', jobid=-1, json=False, trace_plot_file='plot_merge_trace.{format}', density_plot_file='plot_merge_density.{format}', histogram_plot_file='plot_merge_histogram.{format}', hist_kind='kde', format='png', combined=False, outcdf='merged.cdf', hist_vars=None, dim='chain')
2022-09-23 11:37:03.620978 :: <xarray.Dataset>
Dimensions: (chain: 4, draw: 200)
Coordinates:
* chain (chain) int64 0 1 2 3
* draw (draw) int64 0 1 2 3 4 5 6 7 ... 192 193 194 195 196 197 198 199
Data variables:
feed_assay (chain, draw) float64 ...
Attributes:
created_at: 2022-09-23T09:36:00.598989
arviz_version: 0.12.1
inference_library: pymc
inference_library_version: 4.2.0
sampling_time: 2255.8700861930847
tuning_steps: 100
2022-09-23 11:37:03.622291 :: Available posterior distributions: ['feed_assay']
<xarray.Dataset>
Dimensions: (chain: 4, draw: 200)
Coordinates:
* chain (chain) int64 0 1 2 3
* draw (draw) int64 0 1 2 3 4 5 6 7 ... 192 193 194 195 196 197 198 199
Data variables:
feed_assay (chain, draw) float64 ...
Attributes:
created_at: 2022-09-23T09:36:00.598989
arviz_version: 0.12.1
inference_library: pymc
inference_library_version: 4.2.0
sampling_time: 2255.8700861930847
tuning_steps: 100
mean sd hdi_3% hdi_97% mcse_mean mcse_sd ess_bulk ess_tail r_hat
feed_assay 0.007 0.0 0.007 0.007 0.0 0.0 705.0 513.0 1.01
2022-09-23 11:37:03.674212 :: Plotting trace
2022-09-23 11:37:04.374128 :: Plotting density
2022-09-23 11:37:04.556923 :: Plotting all histograms in array for job -1
/home/max/anaconda3/envs/pymc-env/lib/python3.10/site-packages/pandas/core/common.py:245: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
result = np.asarray(values, dtype=dtype)
/home/max/anaconda3/envs/pymc-env/lib/python3.10/site-packages/pandas/core/common.py:245: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
result = np.asarray(values, dtype=dtype)
/home/max/anaconda3/envs/pymc-env/lib/python3.10/site-packages/pandas/core/common.py:245: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
result = np.asarray(values, dtype=dtype)
/home/max/anaconda3/envs/pymc-env/lib/python3.10/site-packages/pandas/core/common.py:245: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
result = np.asarray(values, dtype=dtype)
Notably:
blackbox.LikelihoodFunction
an abstract base classbicyclus/bicyclus/blackbox/likelihood.py
Lines 8 to 13 in 1118977
simplegrad
)?Should be relatively easy to implement. Before doing this, I should investigate if this interferes with Sobol's efficient coverage of the parameter space. If so, a warning should be emitted.
Relevant code snippet:
bicyclus/bicyclus/blackbox/blackbox.py
Lines 160 to 166 in 1118977
setup.cfg
, README
.examples/
toymodel/
blackbox/
util/
except for above-mentioned function!cyclus_db/
visualize/
bicyclus.util.save_trace
, taking into account #9 .This would be part of the util
subpackage and could look something like this:
def generate_start_value(sample_parameters, n=1):
"""Generate random start values. Ensure that the random generator has been deterministically seeded!"""
def genstart():
d = {}
for (k, v) in sample_parameters.items():
if type(v) is list:
d[k] = np.random.random() * (v[1] - v[0]) + v[0]
elif type(v) is dict:
if v['type'] == 'Normal':
d[k] = (np.random.randn()*v['sigma']) + v['mu']
elif v['type'] == 'TruncatedNormal':
d[k] = pm.TruncatedNormal.dist(**{p: pv for (p, pv) in v.items() if p != 'type'}).random()
else:
assert False, f'Unknown parameter type {k} => {v}'
else:
assert False, f'Unknown parameter type {k} => {v}'
return d
if n == 1:
return genstart()
return [genstart() for i in range(0, n)]
Taken from Lewin Bormann's Bayesian Cycle. Note that pm.XY.dist.random()
is PyMC3 and must be replaced by pm.draw(pm.XY.dist(...))
, see PyMC docs.
If this is added, it is crucial that reproducibility is tested! AFAIK there's the possibility to pass a seed
keyword to pm.draw
. I should check if this is needed, or if I can fix the seeds before.
Is the deepcopy
line needed?
Lines 61 to 65 in 8317d3e
In an actual use case, I would probably outsource all of this to an external file writer. Is there a possibility to showcase this in the example without making it unnecessary complex? Would there be an advantage?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.