icb-dcm / pypesto Goto Github PK

View Code? Open in Web Editor NEW

207.0 10.0 45.0 45.9 MB

python Parameter EStimation TOolbox

Home Page: https://pypesto.readthedocs.io

License: BSD 3-Clause "New" or "Revised" License

Python 99.83% Shell 0.13% Dockerfile 0.04%

parameter-estimation optimization systems-biology python hacktoberfest

pypesto's Introduction

pyPESTO - Parameter EStimation TOolbox for python

pyPESTO is a widely applicable and highly customizable toolbox for parameter estimation.

Feature overview

pyPESTO features include:

Multi-start local optimization
Profile computation
Result visualization
Interface to AMICI for efficient simulation and sensitivity analysis of ordinary differential equation (ODE) models (example)
Parameter estimation pipeline for systems biology problems specified in SBML and PEtab (example)
Parameter estimation with relative (scaled and offset) data as described in Schmiester et al. (2020). (example)
Parameter estimation with ordinal data as described in Schmiester et al. (2020) and Schmiester et al. (2021). (example)
Parameter estimation with censored data. (example)
Parameter estimation with nonlinear-monotone data. (example)

Quick install

The simplest way to install pyPESTO is via pip:

pip3 install pypesto

More information is available here: https://pypesto.readthedocs.io/en/latest/install.html

Documentation

The documentation is hosted on readthedocs.io: https://pypesto.readthedocs.io

Examples

Multiple use cases are discussed in the documentation. In particular, there are jupyter notebooks in the doc/example directory.

Contributing

We are happy about any contributions. For more information on how to contribute to pyPESTO check out https://pypesto.readthedocs.io/en/latest/contribute.html

How to Cite

Citeable DOI for the latest pyPESTO release:

When using pyPESTO in your project, please cite

Schälte, Y., Fröhlich, F., Jost, P. J., Vanhoefer, J., Pathirana, D., Stapor, P., Lakrisenko, P., Wang, D., Raimúndez, E., Merkt, S., Schmiester, L., Städter, P., Grein, S., Dudkin, E., Doresic, D., Weindl, D., & Hasenauer, J. (2023). pyPESTO: A modular and scalable tool for parameter estimation for dynamic models, Bioinformatics, 2023, btad711, doi:10.1093/bioinformatics/btad711

When presenting work that employs pyPESTO, feel free to use one of the icons in doc/logo/:

There is a list of publications using pyPESTO. If you used pyPESTO in your work, we are happy to include your project, please let us know via a GitHub issue.

References

pyPESTO supersedes PESTO a parameter estimation toolbox for MATLAB, whose development is discontinued.

pypesto's People

Contributors

Stargazers

Watchers

pypesto's Issues

Parameter plots y axis

When there are many variables, the parameter plot y axis becomes not so fun to read. I would suggest that we simply allow the user to specify in the parameters plot arguments par_labels=None|'automatic', where when 'automatic' is passed, we just use matplotlib's default axis labeling, and use the result.problem.x_labels only otherwise (or when x_labels=['a','b','c'] is some list).

Or some other implementation.

Reduce test_sbml_conversion

test_sbml_conversion takes a looong time, thus maybe better test all scipy and dlib optimizers on a toy function like rosenbrock, and only test one or two function- and residue-based optimizers with the sbml model?

Restrict merge to master

Require review before merging to master to prevent trouble.

Approximate grad and hess

Approximate grad using finite differences as in the matlab version in https://github.com/ICB-DCM/PESTO/blob/master/private/getFiniteDifferences.m,

and approximate hess using any of bfgs, dfp, sr1, as in https://github.com/ICB-DCM/NOODLES/blob/feature_arc/%2Bnoodles/NoodleProblem.m.

The location for this would probably be the pesto.Objective class, which will need to interpret the input for the grad and hess arguments appropriately (e.g. string 'SR1' -> use that method), and memorize maybe some values for the adaptive finite differences scheme, and for the hessian approximations.

Implement Hessian vector product

Currently, the objective does not yet support / use the hessp argument it has for efficient hessian vector product computation. Doing that just requires small changes to a few functions.

Parameter Mappings

When using vectors of ExpData in AmiciObjective that correspond to different cell-lines, inhibitors, etc. these may often use a subset of the full parameter vector. Would be nice to have functionality that automatically extends pesto.Problem and constructs index vectors to be used in pesto.AmiciObjective.

Proposed format:
{'modelParameter':'newExperimentSpecificParameter',...}

Run example notebooks on travis

Example notebooks should be run on travis as additional integration tests.

Output Mapping

The current implementation of the output mapping is a overly complicated. The empirical result of this complexity is that everytime I check the code, it is not doing what is supposed to, but still miraculously passing the testcases.

We either need better tests for better coverage or substantially reduce the complexity of the output mapping.

`ls_trf` fails if get_res evaluation works but get_sres fails

Will lead to the error message "array must not contain infs or NaNs". Something in scipy.optimize.least_squares is not handled correctly. Will investigate.

AmiciObjective.init should create copies of model & solver

Sorting initial parameter by objective function value

Should we sort startpoints according to their initial objective function value and start running estimation beginning at startpoint with lowest value?

Could improve results when working with a fixed computational budget.

conversion_reaction.ipynb example fails with 'Result' object has no attribute 'get_optimizer_results_for_key'

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-5-82815144216f> in <module>()
      3 import matplotlib.pyplot as plt
      4 
----> 5 plt.plot(result.get_optimizer_results_for_key('fval'), '-r+',)
      6 
      7 #pesto.plot_waterfall(problem, result)

AttributeError: 'Result' object has no attribute 'get_optimizer_results_for_key'

Logging

For debugging, but also to provide information to the user, some sort of loggin on different levels (info, warning, ...) might be useful. In particular to replace occasional print() statements. I suggest using the logging module (https://docs.python.org/3/howto/logging-cookbook.html) for this aim. Then at runtime one could easily have output to screen or file or whatever, as far as I see. Any comments?

Sampling

pyMC3 https://docs.pymc.io
pymcmcstat https://github.com/prmiles/pymcmcstat

Keep track of minimum version numbers of dependencies

Install-time dependencies should be listed with version numbers in setup.py

Optional / run-time dependency versions should be checked on import to provide more informative error messages than "... is not defined". See e.g.
AMICI-dev/AMICI#426.

parameter plot does not work if fixed_parameters are used

pypesto.visualize.parameters fails if because dimensions of ub and lb (?) doe not agree with the numbers of parameters

~/Documents/GitHub/src/pypesto/pypesto/visualize/parameters.py in parameters(result, ax)
     31 
     32     return parameters_lowlevel(xs=xs, fvals=fvals, lb=lb, ub=ub,
---> 33                                x_labels=None, ax=ax)
     34 
     35 

~/Documents/GitHub/src/pypesto/pypesto/visualize/parameters.py in parameters_lowlevel(xs, fvals, lb, ub, x_labels, ax)
     88     parameters_ind = np.array(parameters_ind).flatten()
     89     if lb is not None:
---> 90         ax.plot(lb.flatten(), parameters_ind, 'k--', marker='+')
     91     if ub is not None:
     92         ax.plot(ub.flatten(), parameters_ind, 'k--', marker='+')

/usr/local/lib/python3.7/site-packages/matplotlib/__init__.py in inner(ax, *args, **kwargs)
   1865                         "the Matplotlib list!)" % (label_namer, func.__name__),
   1866                         RuntimeWarning, stacklevel=2)
-> 1867             return func(ax, *args, **kwargs)
   1868 
   1869         inner.__doc__ = _add_data_doc(inner.__doc__,

/usr/local/lib/python3.7/site-packages/matplotlib/axes/_axes.py in plot(self, *args, **kwargs)
   1526         kwargs = cbook.normalize_kwargs(kwargs, _alias_map)
   1527 
-> 1528         for line in self._get_lines(*args, **kwargs):
   1529             self.add_line(line)
   1530             lines.append(line)

/usr/local/lib/python3.7/site-packages/matplotlib/axes/_base.py in _grab_next_args(self, *args, **kwargs)
    404                 this += args[0],
    405                 args = args[1:]
--> 406             for seg in self._plot_args(this, kwargs):
    407                 yield seg
    408 

/usr/local/lib/python3.7/site-packages/matplotlib/axes/_base.py in _plot_args(self, tup, kwargs)
    381             x, y = index_of(tup[-1])
    382 
--> 383         x, y = self._xy_from_xy(x, y)
    384 
    385         if self.command == 'plot':

/usr/local/lib/python3.7/site-packages/matplotlib/axes/_base.py in _xy_from_xy(self, x, y)
    240         if x.shape[0] != y.shape[0]:
    241             raise ValueError("x and y must have same first dimension, but "
--> 242                              "have shapes {} and {}".format(x.shape, y.shape))
    243         if x.ndim > 2 or y.ndim > 2:
    244             raise ValueError("x and y can be no greater than 2-D, but have "

ValueError: x and y must have same first dimension, but have shapes (61,) and (66,)

Documentation

Implement a framework that easily allows to document the code. I suggest using sphinx, and then hosting the documentation on readthedocs.io.

limit computation time

Would be nice to have an option to limit computation time for individual runs.
Not quite sure that the best way to implement this is. https://pypi.org/project/timeout-decorator/ would be a solution but apparently signal is not available for windows 🙄 (pnpnpn/timeout-decorator#1)

Plotting: Cluster colors differ

Running the same problem several times (e.g. the rosenbrock notebook), different clustering colors are assigned in visualization. This might be due to the special handling of clusters of length 1 in pypesto.visualize.assign_cluster_colors (https://github.com/ICB-DCM/pyPESTO/blob/master/pypesto/visualize/clust_color.py#L49).
Either clusteres of length 1 should be assigned no color (default black), or they should not be treated specially.

Automated Tests

Implement automated testing on travis.

Codacy

I have added an organization on codacy https://app.codacy.com/organization/ICB-DCM for amici, should I add pypesto as a project?

Crashing of Optimization routines

I think I have two conditions under which optimization routines will unexpectedly stop execution (This was only tested with ls-dogbox, but I would hope for similar logic with other optimizers):

objective function fails to evaluate at the initial point.

Not much whe can do here. I think the best solution is to enable resampling of the initial point via an option.

function evaluation does not fail, but gradient evaluation does fail.

I think it is difficult to have the optimizer continue here, but it would be great if we could at least save results. We probable want routines to save intermediate results anyways (Will open a new issue for that matter).

Update README.md

Let's make the landing page more informative

EDIT: okay, CONTRIBUTING.md already exists

Pickling of results

For distributed computing it would be great to have a way to pickle result files. Currently, the attached Problem instance cannot be pickled if AmiciObjective is used as SwigPy objects cannot be pickled.

Current workaround is to only pickle the OptimizerResult data.

parameter list in the data format (benchmark models)

The new common data format will need a full list of all parameters (with bounds etc.) which should be optimized. Therefore, some fill-in tool for models, which specify parameters elsewhere than in the SBML file (e.g., as conditions), will be needed in order to generate the parameter table.

Basic optimization

Implement a basic optimization routine, i.e. specify the objective / problem, the optimizer (scipy, dlib, ...), the multistart, the result object.

Codecov more efficiently

Currently, apparently tests need to be run again for codecov, doubling the test time. Should be possible differently.

Releases

Can we please have releases that tag version that were uploaded to pypi? Without this its pretty difficult to identify which commit pip install pypesto will yield.

option to store chi2 and schi2 instead of res, sres

Storing full sres might not be such a smart idea as traces can become quite large as it has n_datasetsn_observablesn_timepoints*n_parameters entries (compared to n_parameters for schi2). We should probably introduce an option similar to self.options.trace_record_hess.

Note that computing chi2 and schi2 requires one additional vector-vector/vector-matrix multiplication, which, however, should be cheap.

Deprecation warning Regular expression

should be easy to fix, is just annoying.

/.../pypesto/pypesto/optimize/optimizer.py:349: DeprecationWarning: Flags not at the start of the expression '^(?i)(ls_)'
  return re.match(r'^(?i)(ls_)', self.method)

-- Docs: https://docs.pytest.org/en/latest/warnings.html

Make AmiciObjective save simulation results in benchmark collection format

This data can e.g. be used as input for visualizing goodness of fit

Plotting of many forward simulation in one window

If I understood correctly, it is an aim for Amici to have plotting routines for the model simulation. The question is, if we want to use those in pyPESTO or build other routines in pyPESTO for plotting the forward simulation.

No matter how we decide on this:
We should keep in mind to have it flexible enough to visualize many trajectories for many parameter samples in one plot, since we may want to visualize MCMC results later.

Basic framework

Create the basic structure of the pesto objects, e.g. what does a result contain and look like? What format? What does the problem contain? This can then be used when implementing the specific optimization, sampling etc. routines.

Fixed parameters

Implement fixed parameters, i.e. only a subset of the parameters is to be optimized. This can be handled in amici, but we should also enable this in pesto, probably in the problem, or the objective. One could have fields fixed_par_indices, fixed_par_values. The optimizer should not be aware of this.

Benchmark models

implement/integrate the 20 benchmark models as examples in pypesto

optimization seems to fail

Visualization fails when running the 'rosenbrock' example because the optimized cost function values are all 'Inf'. Might be something wrong in the optimize function.

Plotting

Plotting of results.

Implement basic functions for e.g. waterfall plots, parameter diagrams, and combinations thereof.
these would take a pypesto.Result as input, or only single lists / pd.DataFields maybe.

Addition in parameter estimation

Hello!

We recently had some research done on parameter estimation in non-gaussian processes, targeted for aerodynamics models of fixed-wing UAVs.
(c.f. https://www.researchgate.net/publication/325284594_On-line_Aerodynamic_Model_Identification_on_Small_Fixed-Wing_UAVs_with_Uncertain_Flight_Data)
We used a variation of the Generalized Total Least Squares algorithm.

I intend to write a Python library, but first I tool a look at pip for similar packages.
Yours was one of the results.
My question is whether you believe that this project could be a good starting point for implementing our algorithms or it's too foreign to what I want to achieve and I'd be better off starting fresh.

Thank you for your time,
George

conversion_reaction.ipynb fails: module 'scipy.optimize' has no attribute 'Bounds'

doc/example/conversion_reaction.ipynb fails with

start 0 failed: module 'scipy.optimize' has no attribute 'Bounds'
...

Is this a scipy version issue? Please set a minimum version number in setup.py.

AmiciObjective: Return full simulation results

Enable Amici to return aside nllh and snllh also the full rdata(s), i.e. the simulation results. This is useful for verifications and plotting.
How to do it: E.g. We could finally switch to having our objectives return a dict instead of a tuple, and then simply return the rdatas as an additional entry. This should not take more time, since the results will be stored minimally longer I guess.
Alternative: Create an additional function which works like _call_amici (and can be used by it).

res_to_fval is not doing the right conversion

fval from amici objective also includes the log(sigma) summand (which is necessary for cases where sigma is parameter dependent).

Random seeds

For reproducible results, the pseudo random number generators used must be seeded beforehand (note there might also be special issues with parallelization, but let's postpone that part till later). In python, there exist multiple generators, in particular from the standard random module and the numpy module. Each seed needs to be set separately (https://machinelearningmastery.com/how-to-generate-random-numbers-in-python/).
So far, we only use the numpy one in pypesto, but nonetheless for transparency reasons I recommend having a function pypesto.seed(n:int=0) which handles this and can easily be called by the user without further knowledge.

Additional Optimizers

List of additional optimizers to try out (or that have been tried out but were ultimately not included)

dlib: http://dlib.net
~~pyOpt: http://www.pyopt.org~~ works only for python2 - see feature_pyopt branch?

Increase max columns to 79

Can we increase the max linewidth in flake to something a bit more reasonable like 120 columns?
max 79 really makes me feel claustrophobic.

Use markers to indicate exit_message in waterfall plot

Would be nice to have this as quick visual inspection

Startpoint methods

Implement the basic start point methods like uniform, lhc, taking a dimension and number of start points (and somewhere, maybe here, handle user guessed parameters by reducing the number of points sampled).

Allow assert

Currently codacy complains about assert statements. Would it be possible to allow assert statements at least in the test folder? Or is there a way to without much ado change those assert statements? See also the discussion in fossasia/query-server#332.

Optimizer Run Labeling in Waterfall plot

Could we start optimizer runs at index 0? Then the x axis would directly correspond to the indices in result.optimize_result.list that you would want to subselect for further analysis.

gradient for fixed parameters

I think the gradient for fixed parameters should be set to 0 instead of NaN. They are constant after all.

Saving of Intermediate Results

I think the best way to achieve this is to store information such as n_feval, time, parameters every time a lower objective function value is achieved.

This would mean that we have to store information such as n_feval, n_jeval etc manually, but that is probably what we want anyways to make computation of these values more consistent across different optimization routines. Alternative would be to use a callback function but this may not be consistently possible for all optimizers

This may not work well in multi-threaded setting, but I think embarassingly parallal parallelization makes more sense for our applications anyways.

icb-dcm / pypesto Goto Github PK

pypesto's Introduction

pyPESTO - Parameter EStimation TOolbox for python

Feature overview

Quick install

Documentation

Examples

Contributing

How to Cite

References

pypesto's People

Contributors

Stargazers

Watchers

Forkers

pypesto's Issues

Recommend Projects

Recommend Topics

Recommend Org