Git Product home page Git Product logo

gpy's Introduction

GPy

The Gaussian processes framework in Python.

Status

Branch travis-ci.org ci.appveyor.com coveralls.io codecov.io
Default branch (devel) travis-devel appveyor-devel coveralls-devel codecov-devel
Deployment branch (deploy) travis-deploy appveyor-deploy coveralls-deploy codecov-deploy

What's new:

From now on we keep track of changes in the CHANGELOG.md. If you want your changes to show up there follow the guidelines. In particular tag your commits by the gitchangelog commit message format.

Contributing to GPy

We welcome any contributions to GPy, after all it is an open source project. We use the GitHub feature of pull requests for contributions.

For an in depth description of pull requests, please visit https://help.github.com/articles/using-pull-requests/ .

Steps to a successful contribution:

  1. Fork GPy: https://help.github.com/articles/fork-a-repo/
  2. Make your changes to the source in your fork.
  3. Make sure the guidelines are met.
  4. Set up tests to test your code. We are using unittests in the testing subfolder of GPy. There is a good chance that there is already a framework set up to test your new model in model_tests.py or kernel in kernel_tests.py. have a look at the source and you might be able to just add your model (or kernel or others) as an additional test in the appropriate file. There is more frameworks for testing the other bits and pieces, just head over to the testing folder and have a look.
  5. Create a pull request to the devel branch in GPy, see above.
  6. The tests will be running on your pull request. In the comments section we will be able to discuss the changes and help you with any problems. Let us know if there are any in the comments, so we can help.
  7. The pull request gets accepted and your awesome new feature will be in the next GPy release :)

For any further questions/suggestions head over to the issues section in GPy.

Pull Request Guidelines

  • Check your code with PEP8 or pylint. Try to stick to 80 columns wide.
  • Separate commits per smallest concern.
  • Each functionality/bugfix commit should contain code, tests, and doc.
  • We are using gitchangelog to keep track of changes and log new features. So if you want your changes to show up in the changelog, make sure you follow the gitchangelog commit message format.

Support and questions to the community

Ask questions using the issues section.

Updated Structure

We have pulled the core parameterization out of GPy. It is a package called paramz and is the pure gradient based model optimization.

If you installed GPy with pip, just upgrade the package using:

$ pip install --upgrade GPy

If you have the developmental version of GPy (using the develop or -e option) just install the dependencies by running

$ python setup.py develop

again, in the GPy installation folder.

A warning: This usually works, but sometimes distutils/setuptools opens a whole can of worms here, specially when compiled extensions are involved. If that is the case, it is best to clean the repo and reinstall.

Supported Platforms:

Python 3.9 and higher

Citation

@Misc{gpy2014,
  author =   {{GPy}},
  title =    {{GPy}: A Gaussian process framework in python},
  howpublished = {\url{http://github.com/SheffieldML/GPy}},
  year = {since 2012}
}

Pronounciation:

We like to pronounce it 'g-pie'.

Getting started: installing with pip

We are requiring a recent version (1.3.0 or later) of scipy and thus, we strongly recommend using the anaconda python distribution. With anaconda you can install GPy by the following:

conda update scipy

Then potentially try,

sudo apt-get update
sudo apt-get install python3-dev
sudo apt-get install build-essential   
conda update anaconda

And finally,

pip install gpy

We've also had luck with enthought. Install scipy 1.3.0 (or later) and then pip install GPy:

pip install gpy

If you'd like to install from source, or want to contribute to the project (i.e. by sending pull requests via github), read on.

Troubleshooting installation problems

If you're having trouble installing GPy via pip install GPy here is a probable solution:

git clone https://github.com/SheffieldML/GPy.git
cd GPy
git checkout devel
python setup.py build_ext --inplace
pytest .

Direct downloads

PyPI version source Windows MacOSX

Saving models in a consistent way across versions:

As pickle is inconsistent across python versions and heavily dependent on class structure, it behaves inconsistent across versions. Pickling as meant to serialize models within the same environment, and not to store models on disk to be used later on.

To save a model it is best to save the m.param_array of it to disk (using numpy’s np.save). Additionally, you save the script, which creates the model. In this script you can create the model using initialize=False as a keyword argument and with the data loaded as normal. You then set the model parameters by setting m.param_array[:] = loaded_params as the previously saved parameters. Then you initialize the model by m.initialize_parameter(), which will make the model usable. Be aware that up to this point the model is in an inconsistent state and cannot be used to produce any results.

# let X, Y be data loaded above
# Model creation:
m = GPy.models.GPRegression(X, Y)
m.optimize()
# 1: Saving a model:
np.save('model_save.npy', m.param_array)
# 2: loading a model
# Model creation, without initialization:
m_load = GPy.models.GPRegression(X, Y, initialize=False)
m_load.update_model(False) # do not call the underlying expensive algebra on load
m_load.initialize_parameter() # Initialize the parameters (connect the parameters up)
m_load[:] = np.load('model_save.npy') # Load the parameters
m_load.update_model(True) # Call the algebra only once
print(m_load)

For Admins and Developers:

Running unit tests:

New way of running tests is using coverage:

Ensure pytest and coverage is installed:

pip install pytest

Run nosetests from root directory of repository:

python travis_tests.py

Create coverage report in htmlcov/

coverage html

The coverage report is located in htmlcov/index.html

Legacy: using nosetests

Ensure nose is installed via pip:

pip install nose

Run nosetests from the root directory of the repository:

nosetests -v GPy/testing

or from within IPython

import GPy; GPy.tests()

or using setuptools

python setup.py test

Compiling documentation:

The documentation is stored in doc/ and is compiled with the Sphinx Python documentation generator, and is written in the reStructuredText format.

The Sphinx documentation is available here: http://sphinx-doc.org/latest/contents.html

Installing dependencies:

To compile the documentation, first ensure that Sphinx is installed. On Debian-based systems, this can be achieved as follows:

sudo apt-get install python-pip
sudo pip install sphinx

Compiling documentation:

The documentation can be compiled as follows:

cd doc
sphinx-apidoc -o source/ ../GPy/
make html

alternatively:

cd doc
sphinx-build -b html -d build/doctrees -D graphviz_dot='<path to dot>' source build/html

The HTML files are then stored in doc/build/html

Commit new patch to devel

If you want to merge a branch into devel make sure the following steps are met:

  • Create a local branch from the pull request and merge the current devel in.
  • Look through the changes on the pull request.
  • Check that tests are there and are checking code where applicable.
  • [optional] Make changes if necessary and commit and push to run tests.
  • [optional] Repeat the above until tests pass.
  • [optional] bump up the version of GPy using bumpversion. The configuration is done, so all you need is bumpversion [major|minor|patch].
  • Update the changelog using gitchangelog: gitchangelog > CHANGELOG.md
  • Commit the changes of the changelog as silent update: `git commit -m "chg: pkg: CHANGELOG update" CHANGELOG.md
  • Push the changes into devel.

A usual workflow should look like this:

$ git fetch origin
$ git checkout -b <pull-origin>-devel origin/<pull-origin>-devel
$ git merge devel
$ coverage run travis_tests.py

Make changes for tests to cover corner cases (if statements, None arguments etc.) Then we are ready to make the last changes for the changelog and versioning:

$ git commit -am "fix: Fixed tests for <pull-origin>"
$ bumpversion patch # [optional]
$ gitchangelog > CHANGELOG.md
$ git commit -m "chg: pkg: CHANGELOG update" CHANGELOG.md

Now we can merge the pull request into devel:

$ git checkout devel
$ git merge --no-ff <pull-origin>-devel
$ git push origin devel

This will update the devel branch of GPy.

Deploying GPy

We have set up all deployment automatic. Thus, all you need to do is create a pull request from devel to deploy. Wait for the tests to finish (successfully!) and merge the pull request. This will update the package on pypi for all platforms fully automatically.

Funding Acknowledgements

Current support for the GPy software is coming through the following projects.

Previous support for the GPy software came from the following projects:

  • BBSRC Project No BB/K011197/1 "Linking recombinant gene sequence to protein product manufacturability using CHO cell genomic resources"
  • EU FP7-KBBE Project Ref 289434 "From Data to Models: New Bioinformatics Methods and Tools for Data-Driven Predictive Dynamic Modelling in Biotechnological Applications"
  • BBSRC Project No BB/H018123/2 "An iterative pipeline of computational modelling and experimental design for uncovering gene regulatory networks in vertebrates"
  • Erasysbio "SYNERGY: Systems approach to gene regulation biology through nuclear receptors"

gpy's People

Contributors

adamian avatar adhaka avatar ajgpitch avatar alansaul avatar alessandratosi avatar alexgrig avatar beckdaniel avatar bobturneruk avatar bwengals avatar cdguarnizo avatar ebilionis avatar ekalosak avatar esiivola avatar frb-yousefi avatar jameshensman avatar jamesmcm avatar javdrher avatar jayanthkoushik avatar jbect avatar kolanich avatar lawrennd avatar lionfish0 avatar martinbubel avatar mikecroucher avatar msbauer avatar mzwiessele avatar nfusi avatar ric70x7 avatar thangbui avatar zhenwendai avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

gpy's Issues

Memory blows up when running optimizer

Greetings,

I am trying to use GP_regression on a relatively small dataset (1832 instances, 17 features) but every time I run the optimize function on a model the memory blows up to the point it starts to swap (I am using an Intel i5, 4GB RAM, Ubuntu 11.10 32bit). This happens with all optimizers. The only constraint I am using is the "constrain_positive" on all parameters.

I managed to replicate this issue using this code: https://gist.github.com/beckdaniel/5489270

I tried to track down the point where the memory starts to increase. I believe this is happening on the "_set_params_transformed" method, in class "parameterised", which is called by both the optimizer objective function and its derivative. If I comment both calls to "_set_params_transformed" on "objective_function" and "objective_function_gradients" methods on the "model" class, the memory stops to increase.

I will continue to investigate this but I believe I should open this issue so maybe you could give more insight on why this is happening.

checkgrad output is ugly

Integrate the checkgrad output with the model printing.

It's not necessary to print the words 'ratio', 'numerical' etc on each row.

testing module missing from setup.py

Teo had a problem that when he was trying to import GPy it couldn't find the testing module, it appears this is missing from setup.py in the list of modules. I thought it would discover anything with an init.py as a module but apperently not. I believe this would break a new install on a new machine so should probably be fixed in master?

Alan

How to include examples?

Should we include the examples in the module? Then we can do
import GPy
GPy.examples.foo()

This isn't done currently.

ImportError: No module named SGD

When I use GPy-0.2 on my laptop I can't optimize a GP regression model (ImportError: No module named SGD ; arising from get_optimizer line 216). I have checked and it turns out that the .pyc file is not generated. I have tried to generate it manually but it does not help.

Is that a problem of my machine or an issue in GPy?

Thanks for your tips...

Gaussian.py, scaling and offset variables

It looks to me like these are currently called _mean and _std, which is misleading, because they aren't necessarily the _mean and the _std. Can we rename these _scale and _offset?

Need to Discuss Examples Provision

We need to discuss how we provide examples. Importantly, I think they shouldn't be just a brain dump or a test of a new feature. They should be there for end users to understand the code. But we need to decide whether to include them as a module or whatever ...

"Cross terms" for psi2 statistics

Adding together kernel functions (kernparts) brings up some extra interaction terms when computing the psi2 matrix. We're not computing these right now.

Note that they're not needed for a single covariance function combined with white noise, just when you're combining say rbf with linear.

TNC max_iters keyword

Need consistency in the optimizer interface. For example, max_iters can be passed but is ignored by (for example) tnc. At least we should throw a warning that its being ignored (and perhaps translate it to a sensible number of function evaluations).

BGPLVM clang++

BGPLVM clang++ inline code not working/compiling on Mac OS.

Report:

  • Installed newest versions of clang, gcc and scipy.
  • Reinstalled GPy (rm find GPy/GPy -name *.pyc -type f`` and reimport GPy)

nothing helped so far, is that a known bug for weave?

Extend mdot for diagonal matrices

Would be cool if mdot could handle diagonal matrices well i.e. doing:

diagonal(diag_A)[:, None]*B

instead of

np.dot(A,B)

not sure if its worth the overhead of checking if the matrix is overhead if it is not used frequently though (it is in my code!).

sympykern fails randomly

I have an idea for fixing this:

Weave accepts a bunch of arguments. One of them is the code you'd like to run, the other is "support code", where you can define functions and stuff.

In our sympykern, the covariance function and its gradients are passed as support code.

Weave first hashes the code to see if it's already compiled. it it's only hashing the "code" and not the "support code", there's our bug.

To fix, define the covariance functions in the "code", by concatenating the code and support code.

Oh look Alan is at the top of the assignees list :)

James.

GP model predict

The predict function returns variances that have dimension model.D even if (due to covariance definition) the output variances are identical in all output dimensions. This will lead to memory issues when GPLVM variances are required for models fitted to very high dimensional outputs. Can we return (as the MATLAB code does) a num_data*1 vector of variances for this case?

Some Confusion in variable names in Likelihoods

It looks to me like the likelihoods are using Y as the variable that's coming out of the GP and data as the form of the data as it's provided. This clashes with the way we do this in mathematical notation where Y is the data as provided and F is the intermediate variable that the GP models. I think we need to think about what the right naming is (I've been looking at Gaussian.py, so apologies if this is a special case, although even if it is we need to make it consistent).

I'd like to see the following. Y is the data as presented by the user and F is the data as modelled by the GP internally. Would there be a problem with this?

Unit Tests for Kernels

When someone completes a new kernel we need a set of unit tests that ensures all parts of it are functional (gradients, psi2 statistics [if implemented] etc.). The psi2 statistics could be checked approximately by sampling, then we could do gradient checks (see the old matlab code for the tests I did there).

latent plots for GPLVM

GLPVM curently raises notimplementedError when we call plot_latent.

nice features please:

  • passing some labels to plot different classes with differnet markers.
  • shade the backround to represent uncertainty in the projected output

_log_likelihood_gradients_transformed

Previously extract_gradients reads like its main role was to combine prior and likelihood gradients. In the new naming scheme this is _log_likelihood_gradients_transformed which is really a different thing (getting the transformed gradients rather than the real ones). We need to put some thought into how to deal with this, perhaps two functions (one for combining prior and likelihood and another for doing transformations??).

build fails (on Travis) due to plot_ARD

@nfusi wrote some lovely code to plot the significance of each input. Unfortunately, in Travis, there's no $DISPLAY set, so pylab things don't work.

The usual fix for no display is to set matplotlib in pdf mode, but that would be annoying for users.

Thoughts?

prod_orthogonal is inefficient

Prod_orthgonal repeatedly computes the kernel matrices for each part: once in K(), once in dK_dtheta, etc. A caching scheme would make this much faster (and admittedly more complex)

New GP model

We should integrate EP_GP and GP_regression models into a single one. That way it will be easier to keep them both up to date.

Since the log marginal likelihood for an EP model can be written as the log likelihood of a regression model for a new variable Y* = v_tilde/tau_tilde, with a covariance matrix K* = K + diag(1./tau_tilde) plus a normalization term, we can use most of the GP_regression code and just add other functions to call the EP algorithm.

Then we can also implement sparse _GP_regression and sparse_EP_GP into the same model.

For consistency between the GP_regression and the sparse_GP_regression, and also to make more clear the differences with EP, in the non-sparse regression, beta should be explicit rather than part of the kernel.

I'll open a branch called newGP for this.

objects are unpickleable

this is due to model.optimization_runs, which contains instances of GPy.inference.optimization.optimizer.

We need to find a smart way of dealing with that

tied and fixed params at kernel level

The computation of gradients does not work when the kernel's parameters are tied or fixed.

import numpy as np
import GPy

K = GPy.kern.rbf(5, ARD=True)
K.tie_param('[01]')
K.constrain_fixed('2')

X = np.random.rand(5,5)
Y = np.ones((5,1))

m = GPy.models.GP_regression(X,Y,K)
m.checkgrad()

move to PHP

I think GPy is now stable enough to consider a move to PHP as the main language in GPy. Yes, it will be nearly impossible to move the entire project to the same language. Parts of model.py will need to be in Objective-C, and probably some parts of the inference package will have to be written in perl 3. But, yeah, pretty much everything else can be done with PHP.
Writing models in a text editor and running them using ipython is a bit of a pain, so I would suggest we move to a web-based enterprise-class form with clickable elements. All the plots of the posterior distribution can then be generated server-side and sent via email to the user who requested them.

I think these changes will significantly reduce the time to market of our modelling work, and will help us to evolve intuitive platforms that drive compelling convergence.

in linalg, we should make use of scipy's C/F ordered choices

scipy provide get_lapack_funcs (and get_blas_funcs). We can use these to automagically use the correct (f|c)lapack routine: dporti, dpotrf, etc. At the moment, it's all a bit voodoo.

A quick %timeit makes me think we can gain some performance too.

PCA initialization in GPLVM.py affects model.likelihood.Y

Initialization of GPLVM via PCA currently affects the actual data matrix (both model.likelihood.Y and model.likelihood.data). I believe this is because the mean is being subtracted off the matrix which is passed (if it isn't zero mean in the base case), but this is pass by reference?? I think Python can be pretty sneaky in generating these types of bugs. I'm guessing that's the problem but since I'm not 100% I haven't edited.

gradcheck by param name

It's a bit of a pain gradchecking individual gradients in models with a lot of parameters (usually an interesting setting in which many models become unstable). We should be able to only gradcheck parameters matching a string.

The interface should be something like m.gradcheck('rbf', verbose=True)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.