Git Product home page Git Product logo

gpy's People

Contributors

adamian avatar adhaka avatar ajgpitch avatar alansaul avatar alessandratosi avatar alexgrig avatar beckdaniel avatar bobturneruk avatar bwengals avatar cdguarnizo avatar ebilionis avatar ekalosak avatar esiivola avatar frb-yousefi avatar jameshensman avatar jamesmcm avatar javdrher avatar jayanthkoushik avatar jbect avatar kolanich avatar lawrennd avatar lionfish0 avatar martinbubel avatar mikecroucher avatar msbauer avatar mzwiessele avatar nfusi avatar ric70x7 avatar thangbui avatar zhenwendai avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

gpy's Issues

objects are unpickleable

this is due to model.optimization_runs, which contains instances of GPy.inference.optimization.optimizer.

We need to find a smart way of dealing with that

build fails (on Travis) due to plot_ARD

@nfusi wrote some lovely code to plot the significance of each input. Unfortunately, in Travis, there's no $DISPLAY set, so pylab things don't work.

The usual fix for no display is to set matplotlib in pdf mode, but that would be annoying for users.

Thoughts?

gradcheck by param name

It's a bit of a pain gradchecking individual gradients in models with a lot of parameters (usually an interesting setting in which many models become unstable). We should be able to only gradcheck parameters matching a string.

The interface should be something like m.gradcheck('rbf', verbose=True)

prod_orthogonal is inefficient

Prod_orthgonal repeatedly computes the kernel matrices for each part: once in K(), once in dK_dtheta, etc. A caching scheme would make this much faster (and admittedly more complex)

_log_likelihood_gradients_transformed

Previously extract_gradients reads like its main role was to combine prior and likelihood gradients. In the new naming scheme this is _log_likelihood_gradients_transformed which is really a different thing (getting the transformed gradients rather than the real ones). We need to put some thought into how to deal with this, perhaps two functions (one for combining prior and likelihood and another for doing transformations??).

Memory blows up when running optimizer

Greetings,

I am trying to use GP_regression on a relatively small dataset (1832 instances, 17 features) but every time I run the optimize function on a model the memory blows up to the point it starts to swap (I am using an Intel i5, 4GB RAM, Ubuntu 11.10 32bit). This happens with all optimizers. The only constraint I am using is the "constrain_positive" on all parameters.

I managed to replicate this issue using this code: https://gist.github.com/beckdaniel/5489270

I tried to track down the point where the memory starts to increase. I believe this is happening on the "_set_params_transformed" method, in class "parameterised", which is called by both the optimizer objective function and its derivative. If I comment both calls to "_set_params_transformed" on "objective_function" and "objective_function_gradients" methods on the "model" class, the memory stops to increase.

I will continue to investigate this but I believe I should open this issue so maybe you could give more insight on why this is happening.

in linalg, we should make use of scipy's C/F ordered choices

scipy provide get_lapack_funcs (and get_blas_funcs). We can use these to automagically use the correct (f|c)lapack routine: dporti, dpotrf, etc. At the moment, it's all a bit voodoo.

A quick %timeit makes me think we can gain some performance too.

PCA initialization in GPLVM.py affects model.likelihood.Y

Initialization of GPLVM via PCA currently affects the actual data matrix (both model.likelihood.Y and model.likelihood.data). I believe this is because the mean is being subtracted off the matrix which is passed (if it isn't zero mean in the base case), but this is pass by reference?? I think Python can be pretty sneaky in generating these types of bugs. I'm guessing that's the problem but since I'm not 100% I haven't edited.

Extend mdot for diagonal matrices

Would be cool if mdot could handle diagonal matrices well i.e. doing:

diagonal(diag_A)[:, None]*B

instead of

np.dot(A,B)

not sure if its worth the overhead of checking if the matrix is overhead if it is not used frequently though (it is in my code!).

move to PHP

I think GPy is now stable enough to consider a move to PHP as the main language in GPy. Yes, it will be nearly impossible to move the entire project to the same language. Parts of model.py will need to be in Objective-C, and probably some parts of the inference package will have to be written in perl 3. But, yeah, pretty much everything else can be done with PHP.
Writing models in a text editor and running them using ipython is a bit of a pain, so I would suggest we move to a web-based enterprise-class form with clickable elements. All the plots of the posterior distribution can then be generated server-side and sent via email to the user who requested them.

I think these changes will significantly reduce the time to market of our modelling work, and will help us to evolve intuitive platforms that drive compelling convergence.

Unit Tests for Kernels

When someone completes a new kernel we need a set of unit tests that ensures all parts of it are functional (gradients, psi2 statistics [if implemented] etc.). The psi2 statistics could be checked approximately by sampling, then we could do gradient checks (see the old matlab code for the tests I did there).

BGPLVM clang++

BGPLVM clang++ inline code not working/compiling on Mac OS.

Report:

  • Installed newest versions of clang, gcc and scipy.
  • Reinstalled GPy (rm find GPy/GPy -name *.pyc -type f`` and reimport GPy)

nothing helped so far, is that a known bug for weave?

How to include examples?

Should we include the examples in the module? Then we can do
import GPy
GPy.examples.foo()

This isn't done currently.

Gaussian.py, scaling and offset variables

It looks to me like these are currently called _mean and _std, which is misleading, because they aren't necessarily the _mean and the _std. Can we rename these _scale and _offset?

latent plots for GPLVM

GLPVM curently raises notimplementedError when we call plot_latent.

nice features please:

  • passing some labels to plot different classes with differnet markers.
  • shade the backround to represent uncertainty in the projected output

ImportError: No module named SGD

When I use GPy-0.2 on my laptop I can't optimize a GP regression model (ImportError: No module named SGD ; arising from get_optimizer line 216). I have checked and it turns out that the .pyc file is not generated. I have tried to generate it manually but it does not help.

Is that a problem of my machine or an issue in GPy?

Thanks for your tips...

GP model predict

The predict function returns variances that have dimension model.D even if (due to covariance definition) the output variances are identical in all output dimensions. This will lead to memory issues when GPLVM variances are required for models fitted to very high dimensional outputs. Can we return (as the MATLAB code does) a num_data*1 vector of variances for this case?

"Cross terms" for psi2 statistics

Adding together kernel functions (kernparts) brings up some extra interaction terms when computing the psi2 matrix. We're not computing these right now.

Note that they're not needed for a single covariance function combined with white noise, just when you're combining say rbf with linear.

tied and fixed params at kernel level

The computation of gradients does not work when the kernel's parameters are tied or fixed.

import numpy as np
import GPy

K = GPy.kern.rbf(5, ARD=True)
K.tie_param('[01]')
K.constrain_fixed('2')

X = np.random.rand(5,5)
Y = np.ones((5,1))

m = GPy.models.GP_regression(X,Y,K)
m.checkgrad()

testing module missing from setup.py

Teo had a problem that when he was trying to import GPy it couldn't find the testing module, it appears this is missing from setup.py in the list of modules. I thought it would discover anything with an init.py as a module but apperently not. I believe this would break a new install on a new machine so should probably be fixed in master?

Alan

checkgrad output is ugly

Integrate the checkgrad output with the model printing.

It's not necessary to print the words 'ratio', 'numerical' etc on each row.

sympykern fails randomly

I have an idea for fixing this:

Weave accepts a bunch of arguments. One of them is the code you'd like to run, the other is "support code", where you can define functions and stuff.

In our sympykern, the covariance function and its gradients are passed as support code.

Weave first hashes the code to see if it's already compiled. it it's only hashing the "code" and not the "support code", there's our bug.

To fix, define the covariance functions in the "code", by concatenating the code and support code.

Oh look Alan is at the top of the assignees list :)

James.

Need to Discuss Examples Provision

We need to discuss how we provide examples. Importantly, I think they shouldn't be just a brain dump or a test of a new feature. They should be there for end users to understand the code. But we need to decide whether to include them as a module or whatever ...

TNC max_iters keyword

Need consistency in the optimizer interface. For example, max_iters can be passed but is ignored by (for example) tnc. At least we should throw a warning that its being ignored (and perhaps translate it to a sensible number of function evaluations).

New GP model

We should integrate EP_GP and GP_regression models into a single one. That way it will be easier to keep them both up to date.

Since the log marginal likelihood for an EP model can be written as the log likelihood of a regression model for a new variable Y* = v_tilde/tau_tilde, with a covariance matrix K* = K + diag(1./tau_tilde) plus a normalization term, we can use most of the GP_regression code and just add other functions to call the EP algorithm.

Then we can also implement sparse _GP_regression and sparse_EP_GP into the same model.

For consistency between the GP_regression and the sparse_GP_regression, and also to make more clear the differences with EP, in the non-sparse regression, beta should be explicit rather than part of the kernel.

I'll open a branch called newGP for this.

Some Confusion in variable names in Likelihoods

It looks to me like the likelihoods are using Y as the variable that's coming out of the GP and data as the form of the data as it's provided. This clashes with the way we do this in mathematical notation where Y is the data as provided and F is the intermediate variable that the GP models. I think we need to think about what the right naming is (I've been looking at Gaussian.py, so apologies if this is a special case, although even if it is we need to make it consistent).

I'd like to see the following. Y is the data as presented by the user and F is the data as modelled by the GP internally. Would there be a problem with this?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.