Git Product home page Git Product logo

pypesto's Introduction

pyPESTO - Parameter EStimation TOolbox for python

pyPESTO logo

pyPESTO is a widely applicable and highly customizable toolbox for parameter estimation.

PyPI CI Coverage Documentation DOI

Feature overview

pyPESTO features include:

  • Multi-start local optimization
  • Profile computation
  • Result visualization
  • Interface to AMICI for efficient simulation and sensitivity analysis of ordinary differential equation (ODE) models (example)
  • Parameter estimation pipeline for systems biology problems specified in SBML and PEtab (example)
  • Parameter estimation with relative (scaled and offset) data as described in Schmiester et al. (2020). (example)
  • Parameter estimation with ordinal data as described in Schmiester et al. (2020) and Schmiester et al. (2021). (example)
  • Parameter estimation with censored data. (example)
  • Parameter estimation with nonlinear-monotone data. (example)

Quick install

The simplest way to install pyPESTO is via pip:

pip3 install pypesto

More information is available here: https://pypesto.readthedocs.io/en/latest/install.html

Documentation

The documentation is hosted on readthedocs.io: https://pypesto.readthedocs.io

Examples

Multiple use cases are discussed in the documentation. In particular, there are jupyter notebooks in the doc/example directory.

Contributing

We are happy about any contributions. For more information on how to contribute to pyPESTO check out https://pypesto.readthedocs.io/en/latest/contribute.html

How to Cite

Citeable DOI for the latest pyPESTO release: DOI

When using pyPESTO in your project, please cite

  • Schälte, Y., Fröhlich, F., Jost, P. J., Vanhoefer, J., Pathirana, D., Stapor, P., Lakrisenko, P., Wang, D., Raimúndez, E., Merkt, S., Schmiester, L., Städter, P., Grein, S., Dudkin, E., Doresic, D., Weindl, D., & Hasenauer, J. (2023). pyPESTO: A modular and scalable tool for parameter estimation for dynamic models, Bioinformatics, 2023, btad711, doi:10.1093/bioinformatics/btad711

When presenting work that employs pyPESTO, feel free to use one of the icons in doc/logo/:

AMICI Logo

There is a list of publications using pyPESTO. If you used pyPESTO in your work, we are happy to include your project, please let us know via a GitHub issue.

References

pyPESTO supersedes PESTO a parameter estimation toolbox for MATLAB, whose development is discontinued.

pypesto's People

Contributors

arrjon avatar atheorell avatar c-peiter avatar dantongwang avatar dilpath avatar doresic avatar dweindl avatar erikadudki avatar ffroehlich avatar giacomofabrini avatar jvanhoefer avatar kristianmeyerr avatar lcontento avatar leaseep avatar leonardschmiester avatar m-philipps avatar merktsimon avatar pauljonasjost avatar paulstapor avatar philippstaedter avatar plakrisenko avatar shoepfl avatar sleepy-owl avatar stephanmg avatar vwiela avatar yannikschaelte avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pypesto's Issues

Parameter plots y axis

When there are many variables, the parameter plot y axis becomes not so fun to read. I would suggest that we simply allow the user to specify in the parameters plot arguments par_labels=None|'automatic', where when 'automatic' is passed, we just use matplotlib's default axis labeling, and use the result.problem.x_labels only otherwise (or when x_labels=['a','b','c'] is some list).

Or some other implementation.

image

Reduce test_sbml_conversion

test_sbml_conversion takes a looong time, thus maybe better test all scipy and dlib optimizers on a toy function like rosenbrock, and only test one or two function- and residue-based optimizers with the sbml model?

Approximate grad and hess

Approximate grad using finite differences as in the matlab version in https://github.com/ICB-DCM/PESTO/blob/master/private/getFiniteDifferences.m,

and approximate hess using any of bfgs, dfp, sr1, as in https://github.com/ICB-DCM/NOODLES/blob/feature_arc/%2Bnoodles/NoodleProblem.m.

The location for this would probably be the pesto.Objective class, which will need to interpret the input for the grad and hess arguments appropriately (e.g. string 'SR1' -> use that method), and memorize maybe some values for the adaptive finite differences scheme, and for the hessian approximations.

Implement Hessian vector product

Currently, the objective does not yet support / use the hessp argument it has for efficient hessian vector product computation. Doing that just requires small changes to a few functions.

Parameter Mappings

When using vectors of ExpData in AmiciObjective that correspond to different cell-lines, inhibitors, etc. these may often use a subset of the full parameter vector. Would be nice to have functionality that automatically extends pesto.Problem and constructs index vectors to be used in pesto.AmiciObjective.

Proposed format:
{'modelParameter':'newExperimentSpecificParameter',...}

Output Mapping

The current implementation of the output mapping is a overly complicated. The empirical result of this complexity is that everytime I check the code, it is not doing what is supposed to, but still miraculously passing the testcases.

We either need better tests for better coverage or substantially reduce the complexity of the output mapping.

Sorting initial parameter by objective function value

Should we sort startpoints according to their initial objective function value and start running estimation beginning at startpoint with lowest value?

Could improve results when working with a fixed computational budget.

conversion_reaction.ipynb example fails with 'Result' object has no attribute 'get_optimizer_results_for_key'

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-5-82815144216f> in <module>()
      3 import matplotlib.pyplot as plt
      4 
----> 5 plt.plot(result.get_optimizer_results_for_key('fval'), '-r+',)
      6 
      7 #pesto.plot_waterfall(problem, result)

AttributeError: 'Result' object has no attribute 'get_optimizer_results_for_key'

Logging

For debugging, but also to provide information to the user, some sort of loggin on different levels (info, warning, ...) might be useful. In particular to replace occasional print() statements. I suggest using the logging module (https://docs.python.org/3/howto/logging-cookbook.html) for this aim. Then at runtime one could easily have output to screen or file or whatever, as far as I see. Any comments?

parameter plot does not work if fixed_parameters are used

pypesto.visualize.parameters fails if because dimensions of ub and lb (?) doe not agree with the numbers of parameters

~/Documents/GitHub/src/pypesto/pypesto/visualize/parameters.py in parameters(result, ax)
     31 
     32     return parameters_lowlevel(xs=xs, fvals=fvals, lb=lb, ub=ub,
---> 33                                x_labels=None, ax=ax)
     34 
     35 

~/Documents/GitHub/src/pypesto/pypesto/visualize/parameters.py in parameters_lowlevel(xs, fvals, lb, ub, x_labels, ax)
     88     parameters_ind = np.array(parameters_ind).flatten()
     89     if lb is not None:
---> 90         ax.plot(lb.flatten(), parameters_ind, 'k--', marker='+')
     91     if ub is not None:
     92         ax.plot(ub.flatten(), parameters_ind, 'k--', marker='+')

/usr/local/lib/python3.7/site-packages/matplotlib/__init__.py in inner(ax, *args, **kwargs)
   1865                         "the Matplotlib list!)" % (label_namer, func.__name__),
   1866                         RuntimeWarning, stacklevel=2)
-> 1867             return func(ax, *args, **kwargs)
   1868 
   1869         inner.__doc__ = _add_data_doc(inner.__doc__,

/usr/local/lib/python3.7/site-packages/matplotlib/axes/_axes.py in plot(self, *args, **kwargs)
   1526         kwargs = cbook.normalize_kwargs(kwargs, _alias_map)
   1527 
-> 1528         for line in self._get_lines(*args, **kwargs):
   1529             self.add_line(line)
   1530             lines.append(line)

/usr/local/lib/python3.7/site-packages/matplotlib/axes/_base.py in _grab_next_args(self, *args, **kwargs)
    404                 this += args[0],
    405                 args = args[1:]
--> 406             for seg in self._plot_args(this, kwargs):
    407                 yield seg
    408 

/usr/local/lib/python3.7/site-packages/matplotlib/axes/_base.py in _plot_args(self, tup, kwargs)
    381             x, y = index_of(tup[-1])
    382 
--> 383         x, y = self._xy_from_xy(x, y)
    384 
    385         if self.command == 'plot':

/usr/local/lib/python3.7/site-packages/matplotlib/axes/_base.py in _xy_from_xy(self, x, y)
    240         if x.shape[0] != y.shape[0]:
    241             raise ValueError("x and y must have same first dimension, but "
--> 242                              "have shapes {} and {}".format(x.shape, y.shape))
    243         if x.ndim > 2 or y.ndim > 2:
    244             raise ValueError("x and y can be no greater than 2-D, but have "

ValueError: x and y must have same first dimension, but have shapes (61,) and (66,)

Documentation

Implement a framework that easily allows to document the code. I suggest using sphinx, and then hosting the documentation on readthedocs.io.

Crashing of Optimization routines

I think I have two conditions under which optimization routines will unexpectedly stop execution (This was only tested with ls-dogbox, but I would hope for similar logic with other optimizers):

  1. objective function fails to evaluate at the initial point.

Not much whe can do here. I think the best solution is to enable resampling of the initial point via an option.

  1. function evaluation does not fail, but gradient evaluation does fail.

I think it is difficult to have the optimizer continue here, but it would be great if we could at least save results. We probable want routines to save intermediate results anyways (Will open a new issue for that matter).

Update README.md

Let's make the landing page more informative

EDIT: okay, CONTRIBUTING.md already exists

Pickling of results

For distributed computing it would be great to have a way to pickle result files. Currently, the attached Problem instance cannot be pickled if AmiciObjective is used as SwigPy objects cannot be pickled.

Current workaround is to only pickle the OptimizerResult data.

parameter list in the data format (benchmark models)

The new common data format will need a full list of all parameters (with bounds etc.) which should be optimized. Therefore, some fill-in tool for models, which specify parameters elsewhere than in the SBML file (e.g., as conditions), will be needed in order to generate the parameter table.

Basic optimization

Implement a basic optimization routine, i.e. specify the objective / problem, the optimizer (scipy, dlib, ...), the multistart, the result object.

Codecov more efficiently

Currently, apparently tests need to be run again for codecov, doubling the test time. Should be possible differently.

Releases

Can we please have releases that tag version that were uploaded to pypi? Without this its pretty difficult to identify which commit pip install pypesto will yield.

option to store chi2 and schi2 instead of res, sres

Storing full sres might not be such a smart idea as traces can become quite large as it has n_datasetsn_observablesn_timepoints*n_parameters entries (compared to n_parameters for schi2). We should probably introduce an option similar to self.options.trace_record_hess.

Note that computing chi2 and schi2 requires one additional vector-vector/vector-matrix multiplication, which, however, should be cheap.

Deprecation warning Regular expression

should be easy to fix, is just annoying.

/.../pypesto/pypesto/optimize/optimizer.py:349: DeprecationWarning: Flags not at the start of the expression '^(?i)(ls_)'
  return re.match(r'^(?i)(ls_)', self.method)

-- Docs: https://docs.pytest.org/en/latest/warnings.html

Plotting of many forward simulation in one window

If I understood correctly, it is an aim for Amici to have plotting routines for the model simulation. The question is, if we want to use those in pyPESTO or build other routines in pyPESTO for plotting the forward simulation.

No matter how we decide on this:
We should keep in mind to have it flexible enough to visualize many trajectories for many parameter samples in one plot, since we may want to visualize MCMC results later.

Basic framework

Create the basic structure of the pesto objects, e.g. what does a result contain and look like? What format? What does the problem contain? This can then be used when implementing the specific optimization, sampling etc. routines.

Fixed parameters

Implement fixed parameters, i.e. only a subset of the parameters is to be optimized. This can be handled in amici, but we should also enable this in pesto, probably in the problem, or the objective. One could have fields fixed_par_indices, fixed_par_values. The optimizer should not be aware of this.

Benchmark models

implement/integrate the 20 benchmark models as examples in pypesto

optimization seems to fail

Visualization fails when running the 'rosenbrock' example because the optimized cost function values are all 'Inf'. Might be something wrong in the optimize function.

Plotting

Plotting of results.

Implement basic functions for e.g. waterfall plots, parameter diagrams, and combinations thereof.
these would take a pypesto.Result as input, or only single lists / pd.DataFields maybe.

Addition in parameter estimation

Hello!

We recently had some research done on parameter estimation in non-gaussian processes, targeted for aerodynamics models of fixed-wing UAVs.
(c.f. https://www.researchgate.net/publication/325284594_On-line_Aerodynamic_Model_Identification_on_Small_Fixed-Wing_UAVs_with_Uncertain_Flight_Data)
We used a variation of the Generalized Total Least Squares algorithm.

I intend to write a Python library, but first I tool a look at pip for similar packages.
Yours was one of the results.
My question is whether you believe that this project could be a good starting point for implementing our algorithms or it's too foreign to what I want to achieve and I'd be better off starting fresh.

Thank you for your time,
George

AmiciObjective: Return full simulation results

Enable Amici to return aside nllh and snllh also the full rdata(s), i.e. the simulation results. This is useful for verifications and plotting.
How to do it: E.g. We could finally switch to having our objectives return a dict instead of a tuple, and then simply return the rdatas as an additional entry. This should not take more time, since the results will be stored minimally longer I guess.
Alternative: Create an additional function which works like _call_amici (and can be used by it).

Random seeds

For reproducible results, the pseudo random number generators used must be seeded beforehand (note there might also be special issues with parallelization, but let's postpone that part till later). In python, there exist multiple generators, in particular from the standard random module and the numpy module. Each seed needs to be set separately (https://machinelearningmastery.com/how-to-generate-random-numbers-in-python/).
So far, we only use the numpy one in pypesto, but nonetheless for transparency reasons I recommend having a function pypesto.seed(n:int=0) which handles this and can easily be called by the user without further knowledge.

Increase max columns to 79

Can we increase the max linewidth in flake to something a bit more reasonable like 120 columns?
max 79 really makes me feel claustrophobic.

Startpoint methods

Implement the basic start point methods like uniform, lhc, taking a dimension and number of start points (and somewhere, maybe here, handle user guessed parameters by reducing the number of points sampled).

Allow assert

Currently codacy complains about assert statements. Would it be possible to allow assert statements at least in the test folder? Or is there a way to without much ado change those assert statements? See also the discussion in fossasia/query-server#332.

Optimizer Run Labeling in Waterfall plot

Could we start optimizer runs at index 0? Then the x axis would directly correspond to the indices in result.optimize_result.list that you would want to subselect for further analysis.

Saving of Intermediate Results

I think the best way to achieve this is to store information such as n_feval, time, parameters every time a lower objective function value is achieved.

This would mean that we have to store information such as n_feval, n_jeval etc manually, but that is probably what we want anyways to make computation of these values more consistent across different optimization routines. Alternative would be to use a callback function but this may not be consistently possible for all optimizers

This may not work well in multi-threaded setting, but I think embarassingly parallal parallelization makes more sense for our applications anyways.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.