nuclear-verification-and-disarmament / bicyclus Goto Github PK

View Code? Open in Web Editor NEW

3.0 3.0 1.0 866 KB

Bicyclus, a Bayesian Inference module for Cyclus

License: BSD 3-Clause "New" or "Revised" License

Python 99.73% Shell 0.27%

bayesian-inference cyclus nuclear-fuel-cycle pymc

bicyclus's People

Contributors

Stargazers

Watchers

Forkers

maxschalz

bicyclus's Issues

Possible future features

Add maximum number of iterations to Slice sampler.
I think this is a good idea, however what should happen if that limit gets exceeded?
Simply raise an error and kill the process or restart with another initial value?
This question is irrelevant inasmuch as the maximum number of iterations of Slice is not part of Bicyclus.
This must be implemented in the user's driver script.
Improve class structure somehow?
Copy useful functions from Pakistan/Elbonia case study into bicyclus?
Add examples and minimum working example. Remember to take into account #3
Should we redesign the logging function? AFAIK, it currently opens a file stream but does not close it properly. See write_to_log_file(...) in log.py. One simple way to do this could be to use a with open() context manager in log_print?
Allow to use either PyMC4 or PyMC5? Related to #27

Issues caused through PyMC

In this issue, I post problems, bugs etc. that are caused by PyMC as such and not by Bicyclus. To investigate these problems, I need help from outside or get into touch with the PyMC community.

Disclaimer PyMC version: I think these issues popped up PyMC v4.x, but I am unsure. I could check if they persist in PyMC5.
pm.iter_sample: How to correctly select initvals?
pm.sample: Why does callback receive a NoneType instead of a trace?

Keep CyclusModel and UniformLikelihood and NormalLikelihood classes?

Found in bicyclus/blackbox/blackbox.py and bicyclus/blackbox/likelihood.py, respectively.

Use updated likelihood calcuations, e.g., in examples

Do not pymc's eval function but rather calculate the log likelihood of, e.g., a Normal distribution either by hand or using scipy, both of which are much faster because PyMC does some lengthy stuff in the background.

`iter_sample` does not work

Using pm.iter_sample in examples/run.py raises the following error:

Traceback (most recent call last):
  File "/home/bicyclus/examples/run.py", line 316, in <module>
    main()
  File "/home/bicyclus/examples/run.py", line 312, in main
    sample(args, pymc_model, initvals)
  File "/home/bicyclus/examples/run.py", line 290, in sample
    for trace in sampler:
  File "/home/anaconda3/envs/pymc-env/lib/python3.10/site-packages/pymc/sampling.py", line 981, in iter_sample
    for i, (strace, _) in enumerate(sampling):
  File "/home/anaconda3/envs/pymc-env/lib/python3.10/site-packages/pymc/sampling.py", line 1062, in _iter_sample
    point, stats = step.step(point)
  File "/home/anaconda3/envs/pymc-env/lib/python3.10/site-packages/pymc/step_methods/arraystep.py", line 155, in step
    apoint = DictToArrayBijection.map({v.name: point[v.name] for v in self.vars})
  File "/home/anaconda3/envs/pymc-env/lib/python3.10/site-packages/pymc/step_methods/arraystep.py", line 155, in <dictcomp>
    apoint = DictToArrayBijection.map({v.name: point[v.name] for v in self.vars})
KeyError: 'feed_assay_interval__'

I am currently investigating the error. Both pm.sample and pm.iter_sample eventually call pymc/sampling.py:1025 (function _iter_sample) to perform the sampling. I suspect that pm.sample does something to the initial values that transforms them into the correct format, compare the following pdb output:

Using pm.sample:

(Pdb) w
  /home/bicyclus/toymodel/toymodel.py(452)<module>()
-> main()
  /home/bicyclus/toymodel/toymodel.py(447)main()
-> arviz_summary, trace, emu, solver_str = run_simulations(model, **kwargs)
  /home/bicyclus/toymodel/toymodel.py(291)run_simulations()
-> trace = pm.sample(
  /home/anaconda3/envs/pymc-env/lib/python3.10/site-packages/pymc/sampling.py(655)sample()
-> mtrace = _sample_many(**sample_args)
  /home/anaconda3/envs/pymc-env/lib/python3.10/site-packages/pymc/sampling.py(769)_sample_many()
-> trace = _sample(
  /home/anaconda3/envs/pymc-env/lib/python3.10/site-packages/pymc/sampling.py(915)_sample()
-> for it, (strace, diverging) in enumerate(sampling):
  /home/anaconda3/envs/pymc-env/lib/python3.10/site-packages/fastprogress/fastprogress.py(41)__iter__()
-> for i,o in enumerate(self.gen):
> /home/anaconda3/envs/pymc-env/lib/python3.10/site-packages/pymc/sampling.py(1025)_iter_sample()
-> model = modelcontext(model)
(Pdb) start
{'large_norm_interval__': array(1.44301298), 'small_unif_interval__': array(-0.17056391)}

Using pm.iter_sample:

(Pdb) w
  /home/bicyclus/toymodel/toymodel.py(452)<module>()
-> main()
  /home/bicyclus/toymodel/toymodel.py(447)main()
-> arviz_summary, trace, emu, solver_str = run_simulations(model, **kwargs)
  /home/bicyclus/toymodel/toymodel.py(283)run_simulations()
-> for trace in sampler:
  /home/anaconda3/envs/pymc-env/lib/python3.10/site-packages/pymc/sampling.py(981)iter_sample()
-> for i, (strace, _) in enumerate(sampling):
> /home/anaconda3/envs/pymc-env/lib/python3.10/site-packages/pymc/sampling.py(1025)_iter_sample()
-> model = modelcontext(model)
(Pdb) start
{'large_norm': 7559.40626711426, 'small_unif': 0.2070833560593321}

Test and fix bicyclus.util.save_trace

Which function is this? I assume this will not work. To be tested and fixed!

bicyclus/bicyclus/util/util.py

Line 68 in 7fc6532

output_path = samples_output_path(i, args.run, default=args.output_path)

Make installable package

~~Add setup.py~~ Used setup.cfg which is preferred for whatever reason.
Add list of dependencies ➡️ see README
🚨 How to deal with PyMC and Theano cherry-picked dependencies? 🚨 See comment below.
According to the setuptools documentation,

When your project is installed (e.g. using pip), all of the dependencies not already installed will be located (via PyPI), downloaded, built (if necessary), and installed [...].

Thus, I cannot simply put PyMC3 and arviz into the requirements list. Maybe I could point pip to the specific cherry-picked versions? I think for now, I will not put it in the requirements list and simply mention it in the README.

visualize/merge.py: Investigate Pandas VisibleDeprecationWarning

See warning below:

python3 ../bicyclus/visualize/merge.py  data/cyclus_trace_BicyclusExample_BicyclusExample_29979821_000

2022-09-23 11:37:03.507647 :: running with Namespace(infiles=['data/cyclus_trace_BicyclusExample_BicyclusExample_29980243_000_0000.cdf'], trace_dir='/max/data/', jobid=-1, json=False, trace_plot_file='plot_merge_trace.{format}', density_plot_file='plot_merge_density.{format}', histogram_plot_file='plot_merge_histogram.{format}', hist_kind='kde', format='png', combined=False, outcdf='merged.cdf', hist_vars=None, dim='chain')
2022-09-23 11:37:03.620978 :: <xarray.Dataset>
Dimensions:     (chain: 4, draw: 200)
Coordinates:
  * chain       (chain) int64 0 1 2 3
  * draw        (draw) int64 0 1 2 3 4 5 6 7 ... 192 193 194 195 196 197 198 199
Data variables:
    feed_assay  (chain, draw) float64 ...
Attributes:
    created_at:                 2022-09-23T09:36:00.598989
    arviz_version:              0.12.1
    inference_library:          pymc
    inference_library_version:  4.2.0
    sampling_time:              2255.8700861930847
    tuning_steps:               100
2022-09-23 11:37:03.622291 :: Available posterior distributions: ['feed_assay']
<xarray.Dataset>
Dimensions:     (chain: 4, draw: 200)
Coordinates:
  * chain       (chain) int64 0 1 2 3
  * draw        (draw) int64 0 1 2 3 4 5 6 7 ... 192 193 194 195 196 197 198 199
Data variables:
    feed_assay  (chain, draw) float64 ...
Attributes:
    created_at:                 2022-09-23T09:36:00.598989
    arviz_version:              0.12.1
    inference_library:          pymc
    inference_library_version:  4.2.0
    sampling_time:              2255.8700861930847
    tuning_steps:               100
             mean   sd  hdi_3%  hdi_97%  mcse_mean  mcse_sd  ess_bulk  ess_tail  r_hat
feed_assay  0.007  0.0   0.007    0.007        0.0      0.0     705.0     513.0   1.01
2022-09-23 11:37:03.674212 :: Plotting trace
2022-09-23 11:37:04.374128 :: Plotting density
2022-09-23 11:37:04.556923 :: Plotting all histograms in array for job -1
/home/max/anaconda3/envs/pymc-env/lib/python3.10/site-packages/pandas/core/common.py:245: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
  result = np.asarray(values, dtype=dtype)
/home/max/anaconda3/envs/pymc-env/lib/python3.10/site-packages/pandas/core/common.py:245: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
  result = np.asarray(values, dtype=dtype)
/home/max/anaconda3/envs/pymc-env/lib/python3.10/site-packages/pandas/core/common.py:245: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
  result = np.asarray(values, dtype=dtype)
/home/max/anaconda3/envs/pymc-env/lib/python3.10/site-packages/pandas/core/common.py:245: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
  result = np.asarray(values, dtype=dtype)

Update likelihood file

Notably:

Make blackbox.LikelihoodFunction an abstract base class

bicyclus/bicyclus/blackbox/likelihood.py

Lines 8 to 13 in 1118977

class LikelihoodFunction:

def __init__(self):

pass

def log_likelihood(self, X):

assert False, "LikelihoodFunction.log_likelihood needs to be overridden"
Write docstrings
Are there unused/unnecessary classes that can be removed (e.g., simplegrad)?

Allow non-uniform distributions in forward mode

Should be relatively easy to implement. Before doing this, I should investigate if this interferes with Sobol's efficient coverage of the parameter space. If so, a warning should be emitted.

Relevant code snippet:

bicyclus/bicyclus/blackbox/blackbox.py

Lines 160 to 166 in 1118977

 for param in parameters.values(): 

 if param["type"] != "Uniform": 

 msg = ( 

 "At the moment, parameter distributions must be Uniform" 

 " distributions." 

 ) 

 raise ValueError(msg)

Upgrade from PyMC3 to PyMC

Add deterministic random start value generator

This would be part of the util subpackage and could look something like this:

def generate_start_value(sample_parameters, n=1):
    """Generate random start values. Ensure that the random generator has been deterministically seeded!"""
    def genstart():
        d = {}
        for (k, v) in sample_parameters.items():
            if type(v) is list:
                d[k] = np.random.random() * (v[1] - v[0]) + v[0]
            elif type(v) is dict:
                if v['type'] == 'Normal':
                    d[k] = (np.random.randn()*v['sigma']) + v['mu']
                elif v['type'] == 'TruncatedNormal':
                    d[k] = pm.TruncatedNormal.dist(**{p: pv for (p, pv) in v.items() if p != 'type'}).random()
                else:
                    assert False, f'Unknown parameter type {k} => {v}'
            else:
                assert False, f'Unknown parameter type {k} => {v}'
        return d

    if n == 1:
        return genstart()
    return [genstart() for i in range(0, n)]

Taken from Lewin Bormann's Bayesian Cycle. Note that pm.XY.dist.random() is PyMC3 and must be replaced by pm.draw(pm.XY.dist(...)), see PyMC docs.

If this is added, it is crucial that reproducibility is tested! AFAIK there's the possibility to pass a seed keyword to pm.draw. I should check if this is needed, or if I can fix the seeds before.

Unnecessary use of deepcopy in example?

Is the deepcopy line needed?

bicyclus/examples/run.py

Lines 61 to 65 in 8317d3e

 self.mut_model = deepcopy(self.model) 

 self.mut_model["simulation"]["recipe"][0]["nuclide"][0].update( 

 comp=parameters["feed_assay"]) # U235 

 self.mut_model["simulation"]["recipe"][0]["nuclide"][1].update( 

 comp=1.-parameters["feed_assay"]) # U238

In an actual use case, I would probably outsource all of this to an external file writer. Is there a possibility to showcase this in the example without making it unnecessary complex? Would there be an advantage?

Upgrade to PyMC5

Change from Aesara to Pytensor
Run checks

	class LikelihoodFunction:
	def __init__(self):
	pass

	def log_likelihood(self, X):
	assert False, "LikelihoodFunction.log_likelihood needs to be overridden"

	for param in parameters.values():
	if param["type"] != "Uniform":
	msg = (
	"At the moment, parameter distributions must be Uniform"
	" distributions."
	)
	raise ValueError(msg)

	self.mut_model = deepcopy(self.model)
	self.mut_model["simulation"]["recipe"][0]["nuclide"][0].update(
	comp=parameters["feed_assay"]) # U235
	self.mut_model["simulation"]["recipe"][0]["nuclide"][1].update(
	comp=1.-parameters["feed_assay"]) # U238