The Python Implementation of Deep Gaussian Processes
Currently implemented models are
- Deep GPs
- Variational Auto-encoded Deep GPs
Deep Gaussian Processes in Python
License: BSD 3-Clause "New" or "Revised" License
I tried example from:
Reproducible code:
import deepgp
import numpy as np
Y=np.random.randn(100,1).astype('float')
X=np.random.randn(100, 100).astype('float')
m = deepgp.DeepGP([Y.shape[1],5,X.shape[1]],Y,X=X,kernels=[GPy.kern.RBF(5,ARD=True), GPy.kern.RBF(X.shape[1],ARD=True)],
num_inducing=2, back_constraint=False)
The above returns error:
++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++
In [8]: m = deepgp.DeepGP([Y.shape[1],5,X.shape[1]],Y,X=X,kernels=[GPy.kern.RBF(5,ARD=True), GPy.kern.RBF(X.shape[1],ARD=True)], num_inducing=2, back_constraint=False)
Traceback (most recent call last):
File "", line 2, in
num_inducing=2, back_constraint=False)
File "build/bdist.linux-x86_64/egg/paramz/parameterized.py", line 49, in call
self = super(ParametersChangedMeta, self).call(*args, **kw)
File "/home/lemma/PyDeepGP/deepgp/models/model.py", line 93, in init
self.layers.append(ObservedLayer(nDims[0],nDims[1], Y, X=Xs[i], likelihood=likelihood, num_inducing=num_inducing[i], init=inits[i], kernel=kernels[i] if kernels is not None else None, back_constraint=back_constraint, inference_method=inference_method, mpi_comm=mpi_comm, mpi_root=mpi_root, auto_update=auto_update))
File "build/bdist.linux-x86_64/egg/paramz/parameterized.py", line 54, in call
self.initialize_parameter()
File "build/bdist.linux-x86_64/egg/paramz/core/parameter_core.py", line 331, in initialize_parameter
self.trigger_update()
File "build/bdist.linux-x86_64/egg/paramz/core/updateable.py", line 79, in trigger_update
self._trigger_params_changed(trigger_parent)
File "build/bdist.linux-x86_64/egg/paramz/core/parameter_core.py", line 128, in _trigger_params_changed
self.notify_observers(None, None if trigger_parent else -np.inf)
File "build/bdist.linux-x86_64/egg/paramz/core/observable.py", line 91, in notify_observers
[callble(self, which=which) for _, _, callble in self.observers]
File "build/bdist.linux-x86_64/egg/paramz/core/parameter_core.py", line 498, in _parameters_changed_notification
self.parameters_changed()
File "/home/lemma/PyDeepGP/deepgp/layers/layers.py", line 55, in parameters_changed
if self.auto_update: self.update_layer()
File "/home/lemma/PyDeepGP/deepgp/layers/layers.py", line 245, in update_layer
super(Layer,self).update_layer()
File "/home/lemma/PyDeepGP/deepgp/layers/layers.py", line 58, in update_layer
self._inference_vardtc()
File "/home/lemma/PyDeepGP/deepgp/layers/layers.py", line 73, in _inference_vardtc
self.posterior, self._log_marginal_likelihood, self.grad_dict = self.inference_method.inference(self.kern, self.X, self.Z, self.likelihood, self.Y, self.Y_metadata, Kuu_sigma=self.Kuu_sigma)
File "/home/lemma/PyDeepGP/deepgp/inference/vardtc.py", line 111, in inference
Kmm = kern.K(Z).copy()
File "/home/lemma/GPy/GPy/kern/src/kernel_slice_operations.py", line 86, in wrap
ret = f(self, s.X, s.X2, *a, **kw)
File "", line 2, in K
File "build/bdist.linux-x86_64/egg/paramz/caching.py", line 283, in g
return cacher(*args, **kw)
File "build/bdist.linux-x86_64/egg/paramz/caching.py", line 179, in call
new_output = self.operation(*args, **kw)
File "/home/lemma/GPy/GPy/kern/src/stationary.py", line 113, in K
r = self._scaled_dist(X, X2)
File "", line 2, in _scaled_dist
File "build/bdist.linux-x86_64/egg/paramz/caching.py", line 283, in g
return cacher(*args, **kw)
File "build/bdist.linux-x86_64/egg/paramz/caching.py", line 179, in call
new_output = self.operation(*args, **kw)
File "/home/lemma/GPy/GPy/kern/src/stationary.py", line 165, in _scaled_dist
return self._unscaled_dist(X/self.lengthscale, X2)
File "/home/lemma/GPy/GPy/kern/src/stationary.py", line 137, in _unscaled_dist
r2 = -2.*tdot(X) + (Xsq[:,None] + Xsq[None,:])
File "/home/lemma/GPy/GPy/util/linalg.py", line 320, in tdot
return tdot_blas(*args, **kwargs)
File "/home/lemma/GPy/GPy/util/linalg.py", line 316, in tdot_blas
symmetrify(out, upper=True)
File "/home/lemma/GPy/GPy/util/linalg.py", line 362, in symmetrify
_symmetrify_cython(A, upper)
File "/home/lemma/GPy/GPy/util/linalg.py", line 368, in _symmetrify_cython
return linalg_cython.symmetrify(A, upper)
NameError: global name 'linalg_cython' is not defined
Dr.Damianou, for understanding these code more easily, can you show me the primary references which explain the core idea of the package? Only original deep gaussian process paper or include other helpful papers?
Hello,
Firstly i would like to thank you for sharing this implementation online.
I have one question about the definition of the stastistics in vardtc for uncertain inputs.
In the code it's written:
if uncertain_inputs:
psi0 = kern.psi0(Z, X)
psi1 = kern.psi1(Z, X)*beta
psi2 = kern.psi2(Z, X)*beta
Then in the computation of the likelihood:
logL = -(output_dim*(num_data*log_2_pi+logL_R+psi0-np.trace(LmInvPsi2LmInvT))+YRY- bbt)/2.-output_dim*logdet_L/2.
My question is: shouldn't it be psi0= kern.psi0(Z, X)*beta
or in the logL be psi0*beta
?? because in the logL psi0 should be scaled with beta
Will the prediction speed be slower than the neural network after training?
As in title, at this point
PyDeepGP/deepgp/models/model.py
Line 85 in aefff21
is raised
NameError: global name 'ObservedMRDLayer' is not defined
Hello,
Is it possible to build classification models using deepgp ?
Thanks.
Note: The context here is supervised regression
Without having to change it manually in the source code, is the uncertain_input
a feature that can be turned off manually by the user? Or rather is it a feature that's intentionally hidden from the user?
Also, with regards to the inducing variables Z, how would I set it for the first ObservedLayer
and fix it such that the subsequent layers would then follow suit and update itself w.r.t to the first layer?
When I tried to optimize my model ,"overflow encountered in expm1” come out and all predicted output become a constant (like .Sometimes it will disappear when I change the training output with the same input ,and the predicted output won't be constant if there is no such warning.
(PS: As a rookie in machine learning ,it's really confusing that GP performed very very well in modeling this data,but DeepGP didn't get such a good result. I really appreciate your reply.Thank you!)
The demo given in 'example_supervised_learning.py' is that of a single hidden layer. Say my data dimensions for X is (100, 2) and y is (100, 1). If i were to apply 2 layers, what would my Q1, Q2, variables be (Q as in number of latent dimensions) would it all be X.shape[1]? Meaning that the dimensions of the output y of the first layer is X.shape?
Is it possible to add LVMOGP to deep GP (it's already in GPy)?
The reason MRD wouldn't do it for me in the fully independent MRD mode (FI-MRD) is that I need to model unknown conditions (i.e., domain generalization).
Thx
I'm a little confused about the "Number of latent dimensions" set in the kernel function such as example code. Does it mean the output of hidden layer is Q dimension or the number of hidden unit is Q? Or these two are the same ? Thank you very much!
Q = 5
kern1 = GPy.kern.RBF(Q,ARD=True) + GPy.kern.Bias(Q)
kern2 = GPy.kern.RBF(Q,ARD=False) + GPy.kern.Bias(X_tr.shape[1])
num_inducing = 40
back_constraint = False
m = deepgp.DeepGP([Y_tr.shape[1],Q,X_tr.shape[1]],Y_tr, X=X_tr,kernels=[kern1, kern2], num_inducing=num_inducing, back_constraint=back_constraint)
Two question:
how to change input data's dimension to my use case.
output data is a tuple , which element in is predict result.
In the tutorial.ipnb, I try to change the input data dimension X_tr from (100L,55L) to (1L,55L),
and Y_tr follow it change:
m = deepgp.DeepGP([Y_tr.shape[1],Q,X_tr.shape[1]],
Y_tr, X_tr=X_tr,kernels=[kern1, kern2],
num_inducing=1, back_constraint=back_constraint)
To:
m = deepgp.DeepGP([Y_tr.shape[1],Q,X_tr.shape[1]],
Y_tr[0], X_tr=X_tr[0],kernels=[kern1, kern2],
num_inducing=1, back_constraint=back_constraint)
And i get this error [IndexError: tuple index out of range]:
Then i change X_tr from (1L,55L) to (2L,55L), and Y_tr follow it change.
m = deepgp.DeepGP([Y_tr.shape[1],Q,X_tr.shape[1]],
Y_tr[0:2], X_tr=X_tr[0:2],kernels=[kern1, kern2],
num_inducing=1, back_constraint=back_constraint)
I get another error: [LinAlgError: not positive definite, even with jitter.]
It seems X_tr and Y_tr's shape still illegal.
At last i change the input shape to default, and run the predict code.
but in the output data , it seems a tuple, with two array.
it make me confuse, which one is the predict result.
Indeed, i'm using DGP to forecast an time series, but in the tutorial.ipnb i can't get any time series like example. is there any example or material help me to use DGP in this case , and how to tune the its parameter.
THANKS !
Hi,
I am trying to use repeatX feature in the deep gp and getting the following error.
AttributeError: 'NoneType' object has no attribute 'shape'
I have defined my layers and kernels in the following way
layers = [y_train.shape[1],1,1,X_PCA.shape[1]] kernels = [GPy.kern.RBF(1+X_PCA.shape[1]), GPy.kern.RBF(1+X_PCA.shape[1], ARD=False),
GPy.kern.RBF(X_PCA.shape[1], ARD=True)+GPy.kern.Bias(X_PCA.shape[1])]
My understanding is that repeatX feature allows the inclusion of the input or features at chose hidden layers, something similar to resnet neural network. Please correct me if I am wrong in assuming it.
Hey Dr. Damianou,
We tried with only minor modifications to the supervised learning example you posted to get the DeepGP working on a toy example.
import numpy as np
import GPy
from pylab import *
from sys import path
import matplotlib.pyplot as plt
np.random.seed(42)
import deepgp
# Utility to load sample data. It can be installed with pip. Otherwise just load some other data.
import pods
#Data Prep#
# Load some mocap data.
#data = pods.datasets.cmu_mocap_35_walk_jog()
Ntr = 100
Nts = 500
# All data represented in Y_all, which is the angles of the movement of the subject
#Y_all = data['Y']
X_tr = np.random.uniform(0,10, (Ntr,1))
Y_tr = np.sin(X_tr)
X_ts = np.random.uniform(0,10, (Nts,1))
#Model Construction#
# Number of latent dimensions (single hidden layer, since the top layer is observed)
Q = 1
# Define what kernels to use per layer
kern1 = GPy.kern.RBF(Q,ARD=False)
kern2 = GPy.kern.RBF(X_tr.shape[1],ARD=False)
# Number of inducing points to use
num_inducing = 3
# Whether to use back-constraint for variational posterior
back_constraint = False
# Dimensions of the MLP back-constraint if set to true
encoder_dims=[[300],[150]]
m = deepgp.DeepGP([Y_tr.shape[1],Q,X_tr.shape[1]],Y_tr, X_tr=X_tr,kernels=[kern1, kern2], num_inducing=num_inducing, back_constraint=back_constraint)
# Optimization #
# Make sure initial noise variance gives a reasonable signal to noise ratio.
# Fix to that value for a few iterations to avoid early local minima
for i in range(len(m.layers)):
output_var = m.layers[i].Y.var() if i==0 else m.layers[i].Y.mean.var()
m.layers[i].Gaussian_noise.variance = output_var*0.01
m.layers[i].Gaussian_noise.variance.fix()
m.optimize(max_iters=800, messages=True)
# Unfix noise variance now that we have initialized the model
for i in range(len(m.layers)):
m.layers[i].Gaussian_noise.variance.unfix()
m.optimize(max_iters=1500, messages=True)
#Inspection #
# Compare with GP
m_GP = GPy.models.SparseGPRegression(X=X_tr, Y=Y_tr, kernel=GPy.kern.RBF(X_tr.shape[1])+GPy.kern.Bias(X_tr.shape[1]), num_inducing=num_inducing)
m_GP.Gaussian_noise.variance = m_GP.Y.var()*0.01
m_GP.Gaussian_noise.variance.fix()
m_GP.optimize(max_iters=100, messages=True)
m_GP.Gaussian_noise.variance.unfix()
m_GP.optimize(max_iters=400, messages=True)
def rmse(predictions, targets):
return np.sqrt(((predictions.flatten() - targets.flatten()) ** 2).mean())
Y_pred = m.predict(X_ts)[0]
Y_pred_s = m.predict_withSamples(X_ts, nSamples=500)[0]
Y_pred_GP = m_GP.predict(X_ts)[0]
plt.plot(X_tr, Y_tr, 'go')
plt.plot(X_ts, Y_pred, 'ro')
plt.show()
The output is... somewhat nonsensical. Can you please clue us in to what is happening?
(green is training, red is output test data.)
I have run this code for classification task. It seems that the results are not good. There may be something wrong in my classification usage code. Could you give me an example for classification to refer?
Would be great to have a basic tutorial outlining the general workflow in using this library together with some comments on what functionality is available to the user.
Hi,
I am interested in a GP that only the last layer is a GP and the previous layers are regular CNNs for feature extraction. Also, I am interested in training this network in an end-to-end way with early stopping. Is there an example for this or can you guide me through writing that example?
Fixing this bug using the correct input dimension number renders DeepGP model almost as good as the previous GP regression:
Running L-BFGS-B (Scipy implementation) Code:
runtime i f |g|
00s16 003 1.931458e+04 4.393298e+06
02s17 066 1.454812e+03 1.019922e+03
07s25 205 1.422389e+03 3.225139e+02
12s28 336 1.414768e+03 2.612186e+01
29s27 802 1.411086e+03 3.311174e+00
Runtime: 29s27
Optimization status: Maximum number of f evaluations reached
Running L-BFGS-B (Scipy implementation) Code:
runtime i f |g|
00s17 0004 1.359542e+03 2.307453e+04
03s26 0085 1.318150e+03 3.130295e+02
12s51 0302 1.315215e+03 1.869511e+01
33s93 0893 1.313376e+03 8.171695e+00
54s77 1502 1.312655e+03 2.320556e+00
Runtime: 54s77
Optimization status: Maximum number of f evaluations reached
Running L-BFGS-B (Scipy implementation) Code:
runtime i f |g|
00s07 004 2.550754e+03 4.948787e+03
00s19 011 2.342103e+03 4.584468e+00
00s42 029 2.318162e+03 1.822445e-07
Runtime: 00s42
Optimization status: Converged
Running L-BFGS-B (Scipy implementation) Code:
runtime i f |g|
00s16 008 1.774224e+03 2.085415e+00
00s17 009 1.773712e+03 8.342545e-01
00s50 029 1.772853e+03 1.115593e-09
Runtime: 00s50
Optimization status: Converged
# RMSE DGP : 3.029233376952055
# RMSE DGP (with samples): 3.0334975870275342
# RMSE sparse GP : 3.033962956251842
Please let me know if I am missing something?
I have some questions about the gradient of the objective function w.r.t. all-layer parameters. Could you provide the gradient formulas for these parameters ?
predict
method is missing from the ObservedMRDLayer
object.m.predict()
is called and the loop reaches the MRD layer it results inAttributeError: 'ObservedMRDLayer' object has no attribute 'predict'
Line 168 in 4ec944f
😕
Hello,
It seems using more than one hidden layer raises the following error on line 198 of layers.py
:
AttributeError: 'HiddenLayer' object has no attribute 'layer_lower'
This is caused by line 315 of layers.py
where the property self.Y
is called, which requires self.layer_lower
to be initialized beforehand. A quick and dirty fix is to add the following code (inspired from the Y
property of the class Layer
at the beginning of the HiddenLayer
constructor:
if hasattr(layer_lower, 'repeatX') and layer_lower.repeatX:
Y = layer_lower.X[:,:layer_lower.repeatXsplit]
else:
Y = layer_lower.X
Hi Andreas, I have a few questions regarding the kernels and the results used in the tutorials.
a) To begin with when I run the unsupervised oil data experiment I get the following ARD weights which I am assuming relates to the bias kernel bias added.
b) Secondly, in the supervised learning tutorial, you initialise kernel 2 (the top layer kernel) in the following way:
kern2 = GPy.kern.RBF(Q,ARD=False) + GPy.kern.Bias(X_tr.shape[1])
What would be the reason for not setting these two kernels with the same dimension as done in the unsupervised learning case? ie:
kern2 = GPy.kern.RBF(X_tr.shape[1],ARD=False) + GPy.kern.Bias(X_tr.shape[1])
c) Regarding the example given in the Supervised Learning case, can I confirm that this how multi-task supervised learning is implemented in this framework?
d) Finally, regarding section 4.2 (Modeling Human Motion) in your paper Deep Gaussian Processes is there an implementation of that model available, as it would be useful for my current research.
Thanks a lot,
Pavlos
When creating a model that includes an ObservedMRDLayer and back_constraint enabled as mlp, it seems to get stuck in a recursive loop
File "build/bdist.linux-x86_64/egg/paramz/core/observable_array.py", line 125, in __ setslice __
File "build/bdist.linux-x86_64/egg/paramz/core/observable_array.py", line 119, in __ setitem __
File "build/bdist.linux-x86_64/egg/paramz/core/observable.py", line 91, in notify_observers
File "build/bdist.linux-x86_64/egg/paramz/core/parameter_core.py", line 510, in _pass_through_notify_observers
File "build/bdist.linux-x86_64/egg/paramz/core/observable.py", line 91, in notify_observers
File "build/bdist.linux-x86_64/egg/paramz/core/parameter_core.py", line 510, in _pass_through_notify_observers
File "build/bdist.linux-x86_64/egg/paramz/core/observable.py", line 91, in notify_observers
File "build/bdist.linux-x86_64/egg/paramz/core/parameter_core.py", line 508, in _parameters_changed_notification
File "build/bdist.linux-x86_64/egg/deepgp/layers/mrd.py", line 193, in parameters_changed
File "build/bdist.linux-x86_64/egg/deepgp/layers/mrd.py", line 143, in _aggregate_qX
File "build/bdist.linux-x86_64/egg/paramz/core/observable_array.py", line 125, in __ setslice __
File "build/bdist.linux-x86_64/egg/paramz/core/observable_array.py", line 118, in __ setitem __
File "build/bdist.linux-x86_64/egg/paramz/param.py", line 158, in __ getitem __
RuntimeError: maximum recursion depth exceeded in __ instancecheck __
Full log here
I can not find the implementation for BinaryPredLayer and PredLayer.
Could you help me finding these two?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.