sheffieldml / pydeepgp Goto Github PK

View Code? Open in Web Editor NEW

225.0 16.0 61.0 116 KB

Deep Gaussian Processes in Python

License: BSD 3-Clause "New" or "Revised" License

Python 100.00%

pydeepgp's Introduction

Deep GP

The Python Implementation of Deep Gaussian Processes

Currently implemented models are

Deep GPs
Variational Auto-encoded Deep GPs

pydeepgp's People

Contributors

Stargazers

Watchers

pydeepgp's Issues

[BUG!!] global name 'linalg_cython' is not defined

I tried example from:

PyDeepGP/deepgp/testing/model_tests_basic.py

Line 62 in 530a691

 m = deepgp.DeepGP([Y.shape[1],5,X.shape[1]],Y, X=X,kernels=[GPy.kern.RBF(5,ARD=True), GPy.kern.RBF(X.shape[1],ARD=True)], num_inducing=2, back_constraint=False) 

Reproducible code:
import deepgp
import numpy as np
Y=np.random.randn(100,1).astype('float')
X=np.random.randn(100, 100).astype('float')
m = deepgp.DeepGP([Y.shape[1],5,X.shape[1]],Y,X=X,kernels=[GPy.kern.RBF(5,ARD=True), GPy.kern.RBF(X.shape[1],ARD=True)],
num_inducing=2, back_constraint=False)

The above returns error:
++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++
In [8]: m = deepgp.DeepGP([Y.shape[1],5,X.shape[1]],Y,X=X,kernels=[GPy.kern.RBF(5,ARD=True), GPy.kern.RBF(X.shape[1],ARD=True)], num_inducing=2, back_constraint=False)
Traceback (most recent call last):

File "", line 2, in
num_inducing=2, back_constraint=False)

File "build/bdist.linux-x86_64/egg/paramz/parameterized.py", line 49, in call
self = super(ParametersChangedMeta, self).call(*args, **kw)

File "/home/lemma/PyDeepGP/deepgp/models/model.py", line 93, in init
self.layers.append(ObservedLayer(nDims[0],nDims[1], Y, X=Xs[i], likelihood=likelihood, num_inducing=num_inducing[i], init=inits[i], kernel=kernels[i] if kernels is not None else None, back_constraint=back_constraint, inference_method=inference_method, mpi_comm=mpi_comm, mpi_root=mpi_root, auto_update=auto_update))

File "build/bdist.linux-x86_64/egg/paramz/parameterized.py", line 54, in call
self.initialize_parameter()

File "build/bdist.linux-x86_64/egg/paramz/core/parameter_core.py", line 331, in initialize_parameter
self.trigger_update()

File "build/bdist.linux-x86_64/egg/paramz/core/updateable.py", line 79, in trigger_update
self._trigger_params_changed(trigger_parent)

File "build/bdist.linux-x86_64/egg/paramz/core/parameter_core.py", line 128, in _trigger_params_changed
self.notify_observers(None, None if trigger_parent else -np.inf)

File "build/bdist.linux-x86_64/egg/paramz/core/observable.py", line 91, in notify_observers
[callble(self, which=which) for _, _, callble in self.observers]

File "build/bdist.linux-x86_64/egg/paramz/core/parameter_core.py", line 498, in _parameters_changed_notification
self.parameters_changed()

File "/home/lemma/PyDeepGP/deepgp/layers/layers.py", line 55, in parameters_changed
if self.auto_update: self.update_layer()

File "/home/lemma/PyDeepGP/deepgp/layers/layers.py", line 245, in update_layer
super(Layer,self).update_layer()

File "/home/lemma/PyDeepGP/deepgp/layers/layers.py", line 58, in update_layer
self._inference_vardtc()

File "/home/lemma/PyDeepGP/deepgp/layers/layers.py", line 73, in _inference_vardtc
self.posterior, self._log_marginal_likelihood, self.grad_dict = self.inference_method.inference(self.kern, self.X, self.Z, self.likelihood, self.Y, self.Y_metadata, Kuu_sigma=self.Kuu_sigma)

File "/home/lemma/PyDeepGP/deepgp/inference/vardtc.py", line 111, in inference
Kmm = kern.K(Z).copy()

File "/home/lemma/GPy/GPy/kern/src/kernel_slice_operations.py", line 86, in wrap
ret = f(self, s.X, s.X2, *a, **kw)

File "", line 2, in K

File "build/bdist.linux-x86_64/egg/paramz/caching.py", line 283, in g
return cacher(*args, **kw)

File "build/bdist.linux-x86_64/egg/paramz/caching.py", line 179, in call
new_output = self.operation(*args, **kw)

File "/home/lemma/GPy/GPy/kern/src/stationary.py", line 113, in K
r = self._scaled_dist(X, X2)

File "", line 2, in _scaled_dist

File "build/bdist.linux-x86_64/egg/paramz/caching.py", line 283, in g
return cacher(*args, **kw)

File "build/bdist.linux-x86_64/egg/paramz/caching.py", line 179, in call
new_output = self.operation(*args, **kw)

File "/home/lemma/GPy/GPy/kern/src/stationary.py", line 165, in _scaled_dist
return self._unscaled_dist(X/self.lengthscale, X2)

File "/home/lemma/GPy/GPy/kern/src/stationary.py", line 137, in _unscaled_dist
r2 = -2.*tdot(X) + (Xsq[:,None] + Xsq[None,:])

File "/home/lemma/GPy/GPy/util/linalg.py", line 320, in tdot
return tdot_blas(*args, **kwargs)

File "/home/lemma/GPy/GPy/util/linalg.py", line 316, in tdot_blas
symmetrify(out, upper=True)

File "/home/lemma/GPy/GPy/util/linalg.py", line 362, in symmetrify
_symmetrify_cython(A, upper)

File "/home/lemma/GPy/GPy/util/linalg.py", line 368, in _symmetrify_cython
return linalg_cython.symmetrify(A, upper)

NameError: global name 'linalg_cython' is not defined

question for the references

Dr.Damianou, for understanding these code more easily, can you show me the primary references which explain the core idea of the package? Only original deep gaussian process paper or include other helpful papers?

Definition of psi0 in vardtc

Hello,
Firstly i would like to thank you for sharing this implementation online.
I have one question about the definition of the stastistics in vardtc for uncertain inputs.
In the code it's written:

        if uncertain_inputs:
            psi0 = kern.psi0(Z, X)
            psi1 = kern.psi1(Z, X)*beta
            psi2 = kern.psi2(Z, X)*beta

Then in the computation of the likelihood:

logL = -(output_dim*(num_data*log_2_pi+logL_R+psi0-np.trace(LmInvPsi2LmInvT))+YRY- bbt)/2.-output_dim*logdet_L/2.
My question is: shouldn't it be psi0= kern.psi0(Z, X)*beta or in the logL be psi0*beta ?? because in the logL psi0 should be scaled with beta

HELP

Will the prediction speed be slower than the neural network after training?

'ObservedMRDLayer' is not defined

As in title, at this point

PyDeepGP/deepgp/models/model.py

Line 85 in aefff21

 self.layers.append(ObservedMRDLayer(nDims[0],nDims[1], Y, X=Xs[i], likelihood=likelihood, num_inducing=num_inducing[i], init=inits[i], kernel=kernels[i] if kernels is not None else None, back_constraint=back_constraint, mpi_comm=mpi_comm, mpi_root=mpi_root)) 

is raised

NameError: global name 'ObservedMRDLayer' is not defined

Trying to build deepgp classification model

Hello,

Is it possible to build classification models using deepgp ?

Thanks.

Regarding Uncertain Inputs and Setting Inducing Variables For ObservedLayer

Note: The context here is supervised regression
Without having to change it manually in the source code, is the uncertain_input a feature that can be turned off manually by the user? Or rather is it a feature that's intentionally hidden from the user?

Also, with regards to the inducing variables Z, how would I set it for the first ObservedLayer and fix it such that the subsequent layers would then follow suit and update itself w.r.t to the first layer?

Recieve “overflow encountered in expm1” when optimize the model

When I tried to optimize my model ,"overflow encountered in expm1” come out and all predicted output become a constant (like .Sometimes it will disappear when I change the training output with the same input ,and the predicted output won't be constant if there is no such warning.

(PS: As a rookie in machine learning ,it's really confusing that GP performed very very well in modeling this data,but DeepGP didn't get such a good result. I really appreciate your reply.Thank you!)

Some clarification on supervised regression

The demo given in 'example_supervised_learning.py' is that of a single hidden layer. Say my data dimensions for X is (100, 2) and y is (100, 1). If i were to apply 2 layers, what would my Q1, Q2, variables be (Q as in number of latent dimensions) would it all be X.shape[1]? Meaning that the dimensions of the output y of the first layer is X.shape?

Adding Multi-output (LVMOGP)

Is it possible to add LVMOGP to deep GP (it's already in GPy)?
The reason MRD wouldn't do it for me in the fully independent MRD mode (FI-MRD) is that I need to model unknown conditions (i.e., domain generalization).
Thx

Question about Number of latent dimensions

I'm a little confused about the "Number of latent dimensions" set in the kernel function such as example code. Does it mean the output of hidden layer is Q dimension or the number of hidden unit is Q? Or these two are the same ? Thank you very much!

Q = 5
kern1 = GPy.kern.RBF(Q,ARD=True) + GPy.kern.Bias(Q)
kern2 = GPy.kern.RBF(Q,ARD=False) + GPy.kern.Bias(X_tr.shape[1])
num_inducing = 40
back_constraint = False

m = deepgp.DeepGP([Y_tr.shape[1],Q,X_tr.shape[1]],Y_tr, X=X_tr,kernels=[kern1, kern2], num_inducing=num_inducing, back_constraint=back_constraint)

The dimension of input data or output data

Two question:
how to change input data's dimension to my use case.
output data is a tuple , which element in is predict result.

In the tutorial.ipnb, I try to change the input data dimension X_tr from (100L,55L) to (1L,55L),
and Y_tr follow it change:

m = deepgp.DeepGP([Y_tr.shape[1],Q,X_tr.shape[1]],
                  Y_tr, X_tr=X_tr,kernels=[kern1, kern2], 
                  num_inducing=1, back_constraint=back_constraint)

To:

m = deepgp.DeepGP([Y_tr.shape[1],Q,X_tr.shape[1]],
                  Y_tr[0], X_tr=X_tr[0],kernels=[kern1, kern2], 
                  num_inducing=1, back_constraint=back_constraint)

And i get this error [IndexError: tuple index out of range]:

Then i change X_tr from (1L,55L) to (2L,55L), and Y_tr follow it change.

m = deepgp.DeepGP([Y_tr.shape[1],Q,X_tr.shape[1]],
                  Y_tr[0:2], X_tr=X_tr[0:2],kernels=[kern1, kern2], 
                  num_inducing=1, back_constraint=back_constraint)

I get another error: [LinAlgError: not positive definite, even with jitter.]
It seems X_tr and Y_tr's shape still illegal.

At last i change the input shape to default, and run the predict code.
but in the output data , it seems a tuple, with two array.

it make me confuse, which one is the predict result.

Indeed, i'm using DGP to forecast an time series, but in the tutorial.ipnb i can't get any time series like example. is there any example or material help me to use DGP in this case , and how to tune the its parameter.

THANKS !

Error while using repeatX

Hi,

I am trying to use repeatX feature in the deep gp and getting the following error.

AttributeError: 'NoneType' object has no attribute 'shape'

I have defined my layers and kernels in the following way

layers = [y_train.shape[1],1,1,X_PCA.shape[1]] kernels = [GPy.kern.RBF(1+X_PCA.shape[1]), GPy.kern.RBF(1+X_PCA.shape[1], ARD=False), GPy.kern.RBF(X_PCA.shape[1], ARD=True)+GPy.kern.Bias(X_PCA.shape[1])]

My understanding is that repeatX feature allows the inclusion of the input or features at chose hidden layers, something similar to resnet neural network. Please correct me if I am wrong in assuming it.

Help -- we can't get the DeepGP to replicate your experiments using simpler data

Hey Dr. Damianou,

We tried with only minor modifications to the supervised learning example you posted to get the DeepGP working on a toy example.

import numpy as np
import GPy
from pylab import *
from sys import path
import matplotlib.pyplot as plt
np.random.seed(42)

import deepgp

# Utility to load sample data. It can be installed with pip. Otherwise just load some other data.
import pods

#Data Prep#
# Load some mocap data.
#data = pods.datasets.cmu_mocap_35_walk_jog()

Ntr = 100
Nts = 500

# All data represented in Y_all, which is the angles of the movement of the subject
#Y_all = data['Y']
X_tr = np.random.uniform(0,10, (Ntr,1))
Y_tr = np.sin(X_tr)

X_ts = np.random.uniform(0,10, (Nts,1))

#Model Construction#

# Number of latent dimensions (single hidden layer, since the top layer is observed)
Q = 1
# Define what kernels to use per layer
kern1 = GPy.kern.RBF(Q,ARD=False)
kern2 = GPy.kern.RBF(X_tr.shape[1],ARD=False)
# Number of inducing points to use
num_inducing = 3
# Whether to use back-constraint for variational posterior
back_constraint = False
# Dimensions of the MLP back-constraint if set to true
encoder_dims=[[300],[150]]

m = deepgp.DeepGP([Y_tr.shape[1],Q,X_tr.shape[1]],Y_tr, X_tr=X_tr,kernels=[kern1, kern2], num_inducing=num_inducing, back_constraint=back_constraint)



# Optimization #
# Make sure initial noise variance gives a reasonable signal to noise ratio.
# Fix to that value for a few iterations to avoid early local minima
for i in range(len(m.layers)):
    output_var = m.layers[i].Y.var() if i==0 else m.layers[i].Y.mean.var()
    m.layers[i].Gaussian_noise.variance = output_var*0.01
    m.layers[i].Gaussian_noise.variance.fix()

m.optimize(max_iters=800, messages=True)
# Unfix noise variance now that we have initialized the model
for i in range(len(m.layers)):
    m.layers[i].Gaussian_noise.variance.unfix()

m.optimize(max_iters=1500, messages=True)

#Inspection #
# Compare with GP
m_GP = GPy.models.SparseGPRegression(X=X_tr, Y=Y_tr, kernel=GPy.kern.RBF(X_tr.shape[1])+GPy.kern.Bias(X_tr.shape[1]), num_inducing=num_inducing)
m_GP.Gaussian_noise.variance = m_GP.Y.var()*0.01
m_GP.Gaussian_noise.variance.fix()
m_GP.optimize(max_iters=100, messages=True)
m_GP.Gaussian_noise.variance.unfix()
m_GP.optimize(max_iters=400, messages=True)

def rmse(predictions, targets):
    return np.sqrt(((predictions.flatten() - targets.flatten()) ** 2).mean())

Y_pred = m.predict(X_ts)[0]
Y_pred_s = m.predict_withSamples(X_ts, nSamples=500)[0]
Y_pred_GP = m_GP.predict(X_ts)[0]

plt.plot(X_tr, Y_tr, 'go')
plt.plot(X_ts, Y_pred, 'ro')
plt.show()

The output is... somewhat nonsensical. Can you please clue us in to what is happening?

(green is training, red is output test data.)

Some questions for classification

I have run this code for classification task. It seems that the results are not good. There may be something wrong in my classification usage code. Could you give me an example for classification to refer?

Hello，it's really confusing that GP performed very very well in modeling this data,but DeepGP didn't get such a good result.I'm looking forward to your reply，thanks！

why deepgp not running?

Basic Tutorial

Would be great to have a basic tutorial outlining the general workflow in using this library together with some comments on what functionality is available to the user.

Example of Deep GP with early stopping

Hi,

I am interested in a GP that only the last layer is a GP and the previous layers are regular CNNs for feature extraction. Also, I am interested in training this network in an end-to-end way with early stopping. Is there an example for this or can you guide me through writing that example?

Kernel dimensionality bug in examples.

PyDeepGP/examples/example_supervised_learning.py

Line 45 in dd4e7b4

kern2 = GPy.kern.RBF(Q,ARD=False) + GPy.kern.Bias(X_tr.shape[1])

Fixing this bug using the correct input dimension number renders DeepGP model almost as good as the previous GP regression:

Running L-BFGS-B (Scipy implementation) Code:
  runtime   i     f              |g|        
    00s16  003   1.931458e+04   4.393298e+06 
    02s17  066   1.454812e+03   1.019922e+03 
    07s25  205   1.422389e+03   3.225139e+02 
    12s28  336   1.414768e+03   2.612186e+01 
    29s27  802   1.411086e+03   3.311174e+00 
Runtime:     29s27
Optimization status: Maximum number of f evaluations reached

Running L-BFGS-B (Scipy implementation) Code:
  runtime   i      f              |g|        
    00s17  0004   1.359542e+03   2.307453e+04 
    03s26  0085   1.318150e+03   3.130295e+02 
    12s51  0302   1.315215e+03   1.869511e+01 
    33s93  0893   1.313376e+03   8.171695e+00 
    54s77  1502   1.312655e+03   2.320556e+00 
Runtime:     54s77
Optimization status: Maximum number of f evaluations reached

Running L-BFGS-B (Scipy implementation) Code:
  runtime   i     f              |g|        
    00s07  004   2.550754e+03   4.948787e+03 
    00s19  011   2.342103e+03   4.584468e+00 
    00s42  029   2.318162e+03   1.822445e-07 
Runtime:     00s42
Optimization status: Converged

Running L-BFGS-B (Scipy implementation) Code:
  runtime   i     f              |g|        
    00s16  008   1.774224e+03   2.085415e+00 
    00s17  009   1.773712e+03   8.342545e-01 
    00s50  029   1.772853e+03   1.115593e-09 
Runtime:     00s50
Optimization status: Converged

# RMSE DGP               : 3.029233376952055
# RMSE DGP (with samples): 3.0334975870275342
# RMSE sparse GP         : 3.033962956251842

Please let me know if I am missing something?

same questions about the gradient

I have some questions about the gradient of the objective function w.r.t. all-layer parameters. Could you provide the gradient formulas for these parameters ?

'ObservedMRDLayer' predict and plot_latent

It seems that predict method is missing from the ObservedMRDLayer object.
When m.predict() is called and the loop reaches the MRD layer it results in

AttributeError: 'ObservedMRDLayer' object has no attribute 'predict'

When trying to plot the latent space, instead, it tries to import a missing module

PyDeepGP/deepgp/layers/mrd.py

Line 168 in 4ec944f

from GPy.plotting.matplot_dep import dim_reduction_plots

😕

Attribute 'layer_lower' of class HiddenLayer used before initialization

Hello,

It seems using more than one hidden layer raises the following error on line 198 of layers.py:
AttributeError: 'HiddenLayer' object has no attribute 'layer_lower'

This is caused by line 315 of layers.py where the property self.Y is called, which requires self.layer_lower to be initialized beforehand. A quick and dirty fix is to add the following code (inspired from the Y property of the class Layer at the beginning of the HiddenLayer constructor:

        if hasattr(layer_lower, 'repeatX') and layer_lower.repeatX:
            Y = layer_lower.X[:,:layer_lower.repeatXsplit]
        else:
            Y = layer_lower.X

Supervised Learning Kernel Bias Questions

Hi Andreas, I have a few questions regarding the kernels and the results used in the tutorials.

a) To begin with when I run the unsupervised oil data experiment I get the following ARD weights which I am assuming relates to the bias kernel bias added.

b) Secondly, in the supervised learning tutorial, you initialise kernel 2 (the top layer kernel) in the following way:

kern2 = GPy.kern.RBF(Q,ARD=False) + GPy.kern.Bias(X_tr.shape[1])

What would be the reason for not setting these two kernels with the same dimension as done in the unsupervised learning case? ie:
kern2 = GPy.kern.RBF(X_tr.shape[1],ARD=False) + GPy.kern.Bias(X_tr.shape[1])

c) Regarding the example given in the Supervised Learning case, can I confirm that this how multi-task supervised learning is implemented in this framework?

d) Finally, regarding section 4.2 (Modeling Human Motion) in your paper Deep Gaussian Processes is there an implementation of that model available, as it would be useful for my current research.

Thanks a lot,
Pavlos

RuntimeError: maximum recursion depth exceeded

When creating a model that includes an ObservedMRDLayer and back_constraint enabled as mlp, it seems to get stuck in a recursive loop

File "build/bdist.linux-x86_64/egg/paramz/core/observable_array.py", line 125, in __ setslice __
File "build/bdist.linux-x86_64/egg/paramz/core/observable_array.py", line 119, in __ setitem __
File "build/bdist.linux-x86_64/egg/paramz/core/observable.py", line 91, in notify_observers
File "build/bdist.linux-x86_64/egg/paramz/core/parameter_core.py", line 510, in _pass_through_notify_observers
File "build/bdist.linux-x86_64/egg/paramz/core/observable.py", line 91, in notify_observers
File "build/bdist.linux-x86_64/egg/paramz/core/parameter_core.py", line 510, in _pass_through_notify_observers
File "build/bdist.linux-x86_64/egg/paramz/core/observable.py", line 91, in notify_observers
File "build/bdist.linux-x86_64/egg/paramz/core/parameter_core.py", line 508, in _parameters_changed_notification
File "build/bdist.linux-x86_64/egg/deepgp/layers/mrd.py", line 193, in parameters_changed
File "build/bdist.linux-x86_64/egg/deepgp/layers/mrd.py", line 143, in _aggregate_qX
File "build/bdist.linux-x86_64/egg/paramz/core/observable_array.py", line 125, in __ setslice __
File "build/bdist.linux-x86_64/egg/paramz/core/observable_array.py", line 118, in __ setitem __
File "build/bdist.linux-x86_64/egg/paramz/param.py", line 158, in __ getitem __
RuntimeError: maximum recursion depth exceeded in __ instancecheck __

Full log here

PredLayer and BinaryPredLayer

I can not find the implementation for BinaryPredLayer and PredLayer.
Could you help me finding these two?

sheffieldml / pydeepgp Goto Github PK

pydeepgp's Introduction

Deep GP

pydeepgp's People

Contributors

Stargazers

Watchers

Forkers

pydeepgp's Issues

Recommend Projects

Recommend Topics

Recommend Org