juliagaussianprocesses / abstractgps.jl Goto Github PK

View Code? Open in Web Editor NEW

208.0 9.0 20.0 77.59 MB

Abstract types and methods for Gaussian Processes.

Home Page: https://juliagaussianprocesses.github.io/AbstractGPs.jl/dev

License: Other

Julia 100.00%

gaussian-processes julia gps probabilistic-programming

abstractgps.jl's Introduction

AbstractGPs

AbstractGPs.jl is a package that defines a low-level API for working with Gaussian processes (GPs), and basic functionality for working with them in the simplest cases. As such it is aimed more at developers and researchers who are interested in using it as a building block than end-users of GPs. You may want to go through the main API design documentation.

Installation

AbstractGPs is an officially registered Julia package, so the following will install AbstractGPs using the Julia's package manager:

] add AbstractGPs

Example

# Import packages.
using AbstractGPs, Plots

# Generate toy synthetic data.
x = rand(10)
y = sin.(x)

# Define GP prior with Matern-3/2 kernel
f = GP(Matern32Kernel())

# Finite projection of `f` at inputs `x`.
# Added Gaussian noise with variance 0.001.
fx = f(x, 0.001)

# Log marginal probability of `y` under `f` at `x`.
# Quantity typically maximised to train hyperparameters.
logpdf(fx, y)

# Exact posterior given `y`. This is another GP.
p_fx = posterior(fx, y)

# Log marginal posterior predictive probability.
logpdf(p_fx(x), y)

# Plot posterior.
scatter(x, y; label="Data")
plot!(-0.5:0.001:1.5, p_fx; label="Posterior")

Related Julia packages

AbstractGPsMakie.jl - Plotting GPs with Makie.jl.
ApproximateGPs.jl - Approximate inference for GPs, both for sparse approximations and non-Gaussian likelihoods. Built on types which implement this package's APIs.
BayesianLinearRegressors.jl - Accelerated inference in GPs with a linear kernel. Built on types which implement this package's APIs.
GPLikelihoods.jl - Non-Gaussian likelihood functions to use with GPs.
KernelFunctions.jl - Kernel functions for machine learning.
Stheno.jl - Building probabilistic programmes involving GPs. Built on types which implement this package's APIs.
TemporalGPs.jl - Accelerated inference in GPs involving time. Built on types which implement this package's APIs.

Issues/Contributing

If you notice a problem or would like to contribute by adding more kernel functions or features please submit an issue.

abstractgps.jl's People

Contributors

Stargazers

Watchers

abstractgps.jl's Issues

Option to also plot samples

We currently have plot, which shows the marginals, and sampleplot which gives you samples.

It's commonly the case that I want both the marginals and samples when I'm plotting, so I would really like it if I could say

plot(f(x); samples=10)

to get a plot containing both the marginals and 10 samples. By default, samples=0 would do nothing, so this would be non-breaking.

Is this something that others would like to have @devmotion @theogf @st-- , or am I on my own here?

Reducing Allocations

Unless I'm missing something in the documentation it seems that the default implementations (at least for PosteriorGP) are quite allocation heavy, e.g. when computing the mean/covariance of a GP at a single location as well as a vector of locations. It would be quite useful to have non-allocating equivalents, especially when querying the GP posterior at a single point where on the surface no vectors are involved.

Array of Array input

Though array of array input, or any inputs other than pure vector was not documented. I think following kernel functions behavior imply that array of array is the nature way to represent multivariate GP inputs:

kern = Matern32Kernel()
kern([1., 2.], [3., 4.])  # 0.04397209203797647
kern([1. 2.; 3. 4.], [3. 4.; 5. 6.])  # 0.007767733942101923

And array of array have partly worked:

using AbstractGPs

x = reshape([[0.1*i, 0.1*j] for i=1:10, j=1:10], 1, :)[1,:]
kern = SqExponentialKernel()
f = GP(kern)
fx = f(x, 1e-3)
y = zeros(100)
fxy = posterior(fx, y)
fxyx = fxy(x, 1e-3)

rand(fx)  # 100-element Array{Float64,1}

But posterior process fxyx failed:

rand(fxyx)
#=
MethodError: no method matching dim(::Array{Array{Float64,1},1})
Closest candidates are:
  dim(!Matched::ColVecs) at /home/yiyuezhuo/.julia/packages/KernelFunctions/xDtN0/src/utils.jl:45
  dim(!Matched::RowVecs) at /home/yiyuezhuo/.julia/packages/KernelFunctions/xDtN0/src/utils.jl:75
  dim(!Matched::AbstractArray{var"#s16",1} where var"#s16"<:Real) at /home/yiyuezhuo/.julia/packages/KernelFunctions/xDtN0/src/utils.jl:23
  ...

Stacktrace:
 [1] validate_dims(::Array{Array{Float64,1},1}, ::Array{Array{Float64,1},1}) at /home/yiyuezhuo/.julia/packages/KernelFunctions/xDtN0/src/utils.jl:122
 [2] kernelmatrix(::SqExponentialKernel, ::Array{Array{Float64,1},1}, ::Array{Array{Float64,1},1}) at /home/yiyuezhuo/.julia/packages/KernelFunctions/xDtN0/src/matrix/kernelmatrix.jl:92
 [3] cov(::GP{AbstractGPs.ZeroMean{Float64},SqExponentialKernel}, ::Array{Array{Float64,1},1}, ::Array{Array{Float64,1},1}) at /home/yiyuezhuo/.julia/packages/AbstractGPs/9Xym2/src/gp/gp.jl:75
 [4] mean_and_cov(::AbstractGPs.PosteriorGP{GP{AbstractGPs.ZeroMean{Float64},SqExponentialKernel},NamedTuple{(:α, :C, :x, :δ),Tuple{Array{Float64,1},LinearAlgebra.Cholesky{Float64,Array{Float64,2}},Array{Array{Float64,1},1},Array{Float64,1}}}}, ::Array{Array{Float64,1},1}) at /home/yiyuezhuo/.julia/packages/AbstractGPs/9Xym2/src/posterior_gp/posterior_gp.jl:71
 [5] mean_and_cov(::AbstractGPs.FiniteGP{AbstractGPs.PosteriorGP{GP{AbstractGPs.ZeroMean{Float64},SqExponentialKernel},NamedTuple{(:α, :C, :x, :δ),Tuple{Array{Float64,1},LinearAlgebra.Cholesky{Float64,Array{Float64,2}},Array{Array{Float64,1},1},Array{Float64,1}}}},Array{Array{Float64,1},1},LinearAlgebra.Diagonal{Float64,FillArrays.Fill{Float64,1,Tuple{Base.OneTo{Int64}}}}}) at /home/yiyuezhuo/.julia/packages/AbstractGPs/9Xym2/src/abstract_gp/finite_gp.jl:109
 [6] rand(::Random._GLOBAL_RNG, ::AbstractGPs.FiniteGP{AbstractGPs.PosteriorGP{GP{AbstractGPs.ZeroMean{Float64},SqExponentialKernel},NamedTuple{(:α, :C, :x, :δ),Tuple{Array{Float64,1},LinearAlgebra.Cholesky{Float64,Array{Float64,2}},Array{Array{Float64,1},1},Array{Float64,1}}}},Array{Array{Float64,1},1},LinearAlgebra.Diagonal{Float64,FillArrays.Fill{Float64,1,Tuple{Base.OneTo{Int64}}}}}, ::Int64) at /home/yiyuezhuo/.julia/packages/AbstractGPs/9Xym2/src/abstract_gp/finite_gp.jl:184
 [7] rand(::AbstractGPs.FiniteGP{AbstractGPs.PosteriorGP{GP{AbstractGPs.ZeroMean{Float64},SqExponentialKernel},NamedTuple{(:α, :C, :x, :δ),Tuple{Array{Float64,1},LinearAlgebra.Cholesky{Float64,Array{Float64,2}},Array{Array{Float64,1},1},Array{Float64,1}}}},Array{Array{Float64,1},1},LinearAlgebra.Diagonal{Float64,FillArrays.Fill{Float64,1,Tuple{Base.OneTo{Int64}}}}}, ::Int64) at /home/yiyuezhuo/.julia/packages/AbstractGPs/9Xym2/src/abstract_gp/finite_gp.jl:188
 [8] rand(::AbstractGPs.FiniteGP{AbstractGPs.PosteriorGP{GP{AbstractGPs.ZeroMean{Float64},SqExponentialKernel},NamedTuple{(:α, :C, :x, :δ),Tuple{Array{Float64,1},LinearAlgebra.Cholesky{Float64,Array{Float64,2}},Array{Array{Float64,1},1},Array{Float64,1}}}},Array{Array{Float64,1},1},LinearAlgebra.Diagonal{Float64,FillArrays.Fill{Float64,1,Tuple{Base.OneTo{Int64}}}}}) at /home/yiyuezhuo/.julia/packages/AbstractGPs/9Xym2/src/abstract_gp/finite_gp.jl:190
 [9] top-level scope at In[5]:1
 [10] include_string(::Function, ::Module, ::String, ::String) at ./loading.jl:1091
=#

This problem have been fixed in JuliaGaussianProcesses/KernelFunctions.jl#147

But current compat setting lock KernelFunctions to "0.4, 0.5", so the above fix, which is introduced in 0.6, can not be applied. So I think it would be useful to bump KernelFunctions to 0.6 and add some test for multivariate input.

Building a front-end

As discussed elsewhere, there is some consensus that it would be very helpful to provide some kind of user-facing front-end which sits on top of the low-level researcher-focused functionality provided in this packages, and others in JuliaGaussianProcesses.

There are a few options:

Implement a new / repurpose an existing package which provides this functionality, completely independently of the existing infrastructure availlable in Julia.
Implement the API of some existing package, such as MLJ.jl, Models.jl, or ScikitLearn.jl.

My personal feeling is that we should at least try 2 before thinking about 1.

MLJ

MLJ provides extensive documentation on how to add a model. The only interaction that we would have to have with MLJ is to register the package here once -- after that, I think we're free to update it as we see fit.

I made a first pass at this a while ago here, but ran out of steam because I only gave myself a few hours to make it work.

Models.jl

This is an interface that Invenia uses internally quite a bit, but hasn't received a lot of attention more generally. That might change though. The interface is really quite straightforward to implement, so we could straightforwardly implement it in addition to any other API we implement (there's nothing wrong with having a couple of front-ends that do almost entirely the same thing, since presumably different users will prefer different things)

ScikitLearn.jl

My impression is that this is a fore-runner to MLJ, so probably we should prefer MLJ.

Abstract vs concrete?

Hi, I came across the POMDPs.jl Software paper and it reminded me of a point I had thought of before - what should "AbstractGP.jl" be? From the name, I had imagined it as a way of defining the "interface" of what GPs should look like in Julia - similar to what POMDPs.jl does, with concrete implementations in other packages (see https://github.com/JuliaPOMDP/). However, despite the name, there are lots of concrete implementations in here. What's the motivation for having them in here (rather than, for example, providing a "base GP" package implementing the AbstractGP interface)? Splitting it up that way might help encourage both more exploration by the community (trying out new ways of implementing the same interface) as well as increasing cohesion/interoperability by those. Right now, even though it's called "AbstractGP", it doesn't seem clear to someone coming from the outside that they really ought to fit their GP work in this interface. What are your thoughts on this?

Rename `cov_diag` to `var`?

Distributions.jl uses var for the diagonal of the covariance matrix and, e.g., defines a fallback for AbstractMvNormal: https://github.com/JuliaStats/Distributions.jl/blob/35125579a77b938f8d4a8bd2be23cbf1f2ddf225/src/multivariate/mvnormal.jl#L258

So to avoid that users end up with surprisingly inefficient code we could

define Statistics.var(gp::FiniteGP) = cov_diag(gp)
or rename cov_diag to var

What would you prefer?

Arithmetic between GPs

I have two AbstractGPs.FiniteGP instances fit with Optim. I want to know the difference between their predictions. One way to do this is to make predictions from each and take the difference. I think it is also possible to take the difference between the two FiniteGPs to get a new FiniteGP, and use that to make a prediction of the difference at an input point?

If so, does it make sense to add that feature to AbstractGPs.jl ?

Stuff that needs doing

@4aHxKzD was asking about ways to contribute to this package in a different issue, which made me realise that we don't currently have a TODO list for this package, or any kind of list of things that potential contributors could do.

My feeling is that this package is in pretty good shape now, and that most of the interesting work is now going to happen in peripheral packages.

@devmotion @sharanry @theogf what are your thoughts? Is there anything that you think that we're missing from this particular package?

I propose that we brainstorm a bit in this issue, and spin off specific issues for things that we arrive at.

Specialised Plotting Inputs for FiniteGP

Our plotting functionality currently assumes that when plotting a FiniteGP f(x), we wish to visualise it at the inputs x. While this assumption generally holds, there's currently no escape hatch (beyond circumventing the plotting functionality entirely!) if this doesn't hold.

My particular use case is that I have a FiniteGP f(x) whose dimensions I wish to plot on the x-axis at some other set of points z.

I'm not super familiar with plotting recipes, @theogf or @devmotion , do you know how I might we might extend our existing functionality to make this work?

Plotting PosteriorGPs seems to be broken in latest release

The plot at the end of the documentation example here errors in the current release. These docs are from v0.2, but if v0.3 docs are not available, there should be a note added if the functionality is different (unless this is just a bug, in which case it should be fixed). I have confirmed that it works in v0.2.25 and is broken in v0.3.4 but have not check the versions in between.

I have mentioned this to @willtebbutt .

Here is the stacktrace and the call that produces it (context of call can be found in hyperlink above as it is directly copied)

plot!(plt, p_fx, 0:0.001:1; label="Posterior")

Cannot convert AbstractGPs.PosteriorGP{GP{AbstractGPs.ZeroMean{Float64}, Matern52Kernel{Distances.Euclidean}}, NamedTuple{(:α, :C, :x, :δ), Tuple{Vector{Float64}, LinearAlgebra.Cholesky{Float64, Matrix{Float64}}, Vector{Float64}, Vector{Float64}}}} to series data for plotting

Stacktrace:
  [1] error(s::String)
    @ Base ./error.jl:33
  [2] _prepare_series_data(x::AbstractGPs.PosteriorGP{GP{AbstractGPs.ZeroMean{Float64}, Matern52Kernel{Distances.Euclidean}}, NamedTuple{(:α, :C, :x, :δ), Tuple{Vector{Float64}, LinearAlgebra.Cholesky{Float64, Matrix{Float64}}, Vector{Float64}, Vector{Float64}}}})
    @ RecipesPipeline ~/.julia/packages/RecipesPipeline/CirY4/src/series.jl:8
  [3] _series_data_vector(x::AbstractGPs.PosteriorGP{GP{AbstractGPs.ZeroMean{Float64}, Matern52Kernel{Distances.Euclidean}}, NamedTuple{(:α, :C, :x, :δ), Tuple{Vector{Float64}, LinearAlgebra.Cholesky{Float64, Matrix{Float64}}, Vector{Float64}, Vector{Float64}}}}, plotattributes::Dict{Symbol, Any})
    @ RecipesPipeline ~/.julia/packages/RecipesPipeline/CirY4/src/series.jl:27
  [4] macro expansion
    @ ~/.julia/packages/RecipesPipeline/CirY4/src/series.jl:143 [inlined]
  [5] apply_recipe(plotattributes::AbstractDict{Symbol, Any}, #unused#::Type{RecipesPipeline.SliceIt}, x::Any, y::Any, z::Any)
    @ RecipesPipeline ~/.julia/packages/RecipesBase/92zOw/src/RecipesBase.jl:282
  [6] _process_userrecipes!(plt::Any, plotattributes::Any, args::Any)
    @ RecipesPipeline ~/.julia/packages/RecipesPipeline/CirY4/src/user_recipe.jl:36
  [7] recipe_pipeline!(plt::Any, plotattributes::Any, args::Any)
    @ RecipesPipeline ~/.julia/packages/RecipesPipeline/CirY4/src/RecipesPipeline.jl:70
  [8] _plot!(plt::Plots.Plot, plotattributes::Any, args::Any)
    @ Plots ~/.julia/packages/Plots/iYDwd/src/plot.jl:172
  [9] plot!(::Plots.Plot, ::Any, ::Vararg{Any, N} where N; kw::Any)
    @ Plots ~/.julia/packages/Plots/iYDwd/src/plot.jl:162
 [10] top-level scope
    @ In[9]:3
 [11] eval
    @ ./boot.jl:360 [inlined]
 [12] include_string(mapexpr::typeof(REPL.softscope), mod::Module, code::String, filename::String)
    @ Base ./loading.jl:1094

Celerite methods?

Anyone here familiar with the methods used here? It gives a linear-time exact solution for special case of one-dimensional data with covariance given by mixtures of exponentials.

The main implementation is celerite. There a full-Julia version by @ericagol here, and an alternate Julia implementation @dfm wrote to walk me through the basic ideas here.

Just thought I'd mention this, since it seems relevant if AbstractGPs is intended as a best-of-everything GP package (which... is that the case?)

`size` inconsistency

The Soss tests have this model:

k = SqExponentialKernel()
y = randn(3)
X = randn(3, 1)
x = [rand(1) for _ in 1:3]

gp_regression = Soss.@model X begin
    # Priors.
    α ~ LogNormal(0.0, 0.1)
    ρ ~ LogNormal(0.0, 1.0)
    σ² ~ LogNormal(0.0, 1.0)

    # Realized covariance function
    kernel = α * transform(SqExponentialKernel(), 1 / ρ)
    f = GP(kernel)

    # Sampling Distribution.
    y ~ f(X, σ²)
end

running this "by hand" gives

julia> k = SqExponentialKernel()
Squared Exponential Kernel

julia> y = randn(3)
3-element Vector{Float64}:
  0.4104263995264387
  1.4282768487843682
 -0.9190053618980447

julia> X = randn(3, 1)
3×1 Matrix{Float64}:
  0.8403190938843137
  0.8672163043894272
 -0.09545241492760646

julia> x = [rand(1) for _ in 1:3]
3-element Vector{Vector{Float64}}:
 [0.7607104031520515]
 [0.27095691264133337]
 [0.11087012586721845]

julia> α = rand(LogNormal(0.0, 0.1))
0.9566297929437682

julia> ρ = rand(LogNormal(0.0, 1.0))
5.9575826296063665

julia> σ² = rand(LogNormal(0.0, 1.0))
1.7373379697361795

julia> kernel = α * transform(SqExponentialKernel(), 1 / ρ)
Squared Exponential Kernel
        - Scale Transform (s = 0.16785331604642345)
        - σ² = 0.9566297929437682

julia> f = GP(kernel)
GP{ZeroMean{Float64}, ScaledKernel{TransformedKernel{SqExponentialKernel, ScaleTransform{Float64}}, Float64}}(ZeroMean{Float64}(), Squared Exponential Kernel
        - Scale Transform (s = 0.16785331604642345)
        - σ² = 0.9566297929437682)

julia> ydist = f(X, σ²)
FiniteGP{GP{ZeroMean{Float64}, ScaledKernel{TransformedKernel{SqExponentialKernel, ScaleTransform{Float64}}, Float64}}, ColVecs{Float64, Matrix{Float64}, SubArray{Float64, 1, Matrix{Float64}, Tuple{Base.Slice{Base.OneTo{Int64}}, Int64}, true}}, Diagonal{Float64, Fill{Float64, 1, Tuple{Base.OneTo{Int64}}}}}(
f: GP{ZeroMean{Float64}, ScaledKernel{TransformedKernel{SqExponentialKernel, ScaleTransform{Float64}}, Float64}}(ZeroMean{Float64}(), Squared Exponential Kernel
        - Scale Transform (s = 0.16785331604642345)
        - σ² = 0.9566297929437682)
x: SubArray{Float64, 1, Matrix{Float64}, Tuple{Base.Slice{Base.OneTo{Int64}}, Int64}, true}[[0.8403190938843137, 0.8672163043894272, -0.09545241492760646]]
Σy: [1.7373379697361795]
)

I'd expect this to be a distribution over R^3, corresponding to a y for each row of X. But

julia> size(ydist)
(1,)

julia> rand(ydist)
1-element Vector{Float64}:
 -1.6493519278768176

Avoid re-computing distance for stationary models

Hi,

First, thanks for this great package! Some of the features here are really slick.

My questions is whether it is possible to avoid re-computing distance matrices in the following example.

Basically, I'm interested in getting posterior predictives of the function f at new locations xgrid. Say I have posterior samples of the inverse range parameter (rho) of a squared exponential covariance function and I want to predict at new locations using each of those samples of rho. It seems the only option using the semantics below involves re-computing a distance matrix (which in theory needs to be computed only once). Is it possible to avoid re-computing the distance matrix?

using AbstractGPs, KernelFunctions, PyPlot, Distributions
import Random

# Make some data
Random.seed!(4)
num_obs = 10
x = randn(num_obs)
f(x) =  sin(3 * x) * sin(x) * (-1)^(x > 0)
y = f.(x);

# Make a grid of prediction locations.
num_new_locations = 100
xgrid = range(-3, 3, length=num_new_locations)

# Random inverse range parameters.
n_posterior_samples = 50
rhos = rand(Uniform(1, 2), n_posterior_samples)

# Predict at new locations (xgrid).
@time fs = [let
    kernel = transform(SqExponentialKernel(), rho)
    kernel += 1e-8 * WhiteKernel()  # needed to ensure covariance diagonal is positive.
    f = GP(kernel)
   
    # FIXME? This line involve re-computing a distance matrix from (a static) x.
    pfx = posterior(f(x, 1e-8), y)
    
    rand(pfx(xgrid))
end for rho in rhos];

# Plot results.
plt.scatter(x, y, c="black", zorder=3, s=50)
plt.xlabel("x")
plt.ylabel("y")
for f in fs
    plt.plot(xgrid, f, alpha=0.2, c="red")
end

Examples / Docs TODO

@theogf pointed out here that we've got some missing examples, and @devmotion pointed out that some consolidation / extension of our existing examples would be helpful. Moreover, we've known for a while that there's stuff that needs to be better documented (e.g. how to handle multi-dimensional inputs). It makes sense to tackle these in the same issue so that we have a vaguely joined up approach to our examples + docs.

Below is a very incomplete list, please feel free to extend below, and I'll add them to this list (or add them directly yourself if GH allows that... I have no idea whether it does or not).

plot() recipe returns unexpected prediction intervals

# Load dataset.
using HTTP, CSV, DataFrames, Dates
link = "https://bit.ly/3vtTaW5";
r = HTTP.get(link);
df = CSV.read(r.body, DataFrame; missingstring="NA");

using AbstractGPs, Plots
X = 0:1.0:(length(df.date)-1) 
Y = df.value;
XH = 0:1.0:(length(df.date)-1 +5) 

f = GP(Matern32Kernel()) # Define GP prior w Matern32 kernel
fx = f(X, 0.001)         # Finite projection at the inputs `X` 
logpdf(fx, Y)            # -512 # Data's log-likelihood w.r.t prior GP `f`. 
p_fx = posterior(fx, Y)  # Exact posterior given `Y`.
logpdf(p_fx(X), Y)       # 184 # Data's LL w.r.t prior posterior GP `p_fx`.

px = XH[end-4:end]
pm = mean.(marginals(p_fx(XH))[end-4:end])
ps = std.(marginals(p_fx(XH))[end-4:end])

# Plot posterior.
scatter(X, Y; label="Data")
plot!(XH, f(XH, 0.001); label="Prior")
plot!(p_fx(XH), label="Posterior" )
scatter!(px, pm, lab = "oos prediction μ")
scatter!(px, pm - 1.0*ps, lab="oos prediction μ - σ") 
scatter!(px, pm + 1.0*ps, lab="oos prediction μ + σ")

It looks like the plot() recipe returns μ ± 1.0σ instead of (what I'd expect) μ ± 1.96σ.
Is this intended? Can this be customized?

Implementing Advanced Variational GaussianProcess Models - GSoC Project

This is a meta-issue for information/discussion about my GSoC project

The original plan (as in the proposal) is roughly as follows:

An initial implementation of [1] and possibly [2] (will be done in a separate repo)
Design/implement a clean API for the above (+ tests)
Benchmark vs. GPFlow
Documentation

[1] Hensman, James, Nicolo Fusi, and Neil D. Lawrence. "Gaussian processes for big data."
[2] Hensman, James, Alexander Matthews, and Zoubin Ghahramani. "Scalable variational Gaussian process classification." Artificial Intelligence and Statistics. PMLR, 2015.

Adding test for Turing.jl models.

Since one of our objectives is to ensure compatibility with the Turing.jl we can create a small model to ensure constant compatibility with it. Something like the one written in #49

Add posterior consistency test to standard tests

The public API tests don't currently check the correctness of whatever is produced by posterior, we just check that it's some kind of AbstractGP.

We could, however, check that e.g. samples from it made under low observation noise yield samples close to the observations sampled from the etc. Might prevent some bugs 🤷

Status check "Julia 1.3 - macOS-latest - x64 - pull_request" breaks

The macOS-based CI using Julia 1.3 seems to segfault.
I've currently disabled it as suggested by @willtebbutt so we don't hold up PRs over it.

rrule for MeanFunction types

Without rrule AD will not pass for MeanFunction types. #16 resolves #14 for ZeroMean but it should be done for all types.

Logistic for inducing points locations

In the docs, the VI example the inducing points locations are passed through the logistic function. Although I suppose it's for keeping them between 0 and 1, it's not explained anywhere and I don't think it's necessary either?

GP only takes AbstractVector as an argument

Is there any reason why one cannot pass a matrix to a GP object ?

f = GP(SqExponentialKernel())
fx = f(randn(3, 100)) # fails

Updating Dependencies

We're currently going through a ChainRulesCore breaking change. AbstractGPs has to wait for both Turing.jl and Soss.jl to update as well before it's possible for us to update -- they're fantastic packages of course, but they're dependency-heavy so tend to take a little while to update everything.

I definitely want to ensure that we're testing against them, but I wonder whether it would make sense to have (at least) two separate CI runs -- one for the core functionality of AbstractGPs, and another to run Turing and Soss (or possibly separate runners for Turing and Soss?).

This would mean that we'd be able to plough ahead with upgrades for AbstractGPs provided that our "core" tests pass, but we would be kept apprised of whether or not AbstractGPs interacts properly with Soss and Turing.

Thoughts?

JuliaGP landing webpage

As discussed in the JuliaGP meeting, we need a better unification of the whole ecosystem which could happen through a general website

Right now we have https://juliagaussianprocesses.github.io/, which contains only a landing page, links to the different repos, and the team section.

A proposition is to add more content to describe the use of the different repos and to have a "gallery" of fancy "end products" (plots, animations, etc) linking directly to the related tutorial.

Binder setup incorrect

Currently, binder does not recognize that the notebooks use Julia and therefore does not install the correct kernels. It seems Project.toml and Manifest.toml files have to be added in the gh-pages branch (see https://mybinder.readthedocs.io/en/latest/examples/sample_repos.html#julia-binder-demo and fredrikekre/Literate.jl#46 (comment)). If the examples (currently there's only one) require different dependencies, probably it would be helpful to add separate Project.toml + Manifest.toml as suggested in JuliaGaussianProcesses/KernelFunctions.jl#234 (comment).

Approximate Inference

Currently, we have

approx_posterior(approximation, fx, y, u)

As pointed out by @st-- and @rossviljoen in SparseGPs.jl, we should consider reducing this to a 3-arg function in which approximation contains u, since u is really a component of the approximation. This kind of thing would generalise more elegantly to what @rossviljoen is doing in SparseGPs.jl, as it will be natural in that case to put the variational parameters associated with q(u) inside approximation as well. More generally, there are approximate inference algorithms which don't involve pseudo-points, and it would be nice to generalise to them also.

So the new approx_posterior function for the saturated VFE approximation would be something like

approx_posterior(VFE(u), fx, y)

and for the unsaturated would be something like

approx_posterior(VFE(u, qu), fx, y)

(or something a bit like that).

Test code examples in documentation

There are some example code snippets in the docs (particularly docs/src/concrete_features.md) which aren't run as doctests.

As noted here, some of these examples didn't even run, so it's probably worth converting them to doctests of some kind (perhaps hiding the output?).

[Question] is `sampleplot()` doing what it is supposed to?

Is this the intended behavior of

k = SqExponentialKernel()
f = GP(k)
sampleplot(f(rand(10)), 2; markersize=5)

Type stability for GPs

It seems like AbstractGPs is not always type stable in the way we'd expect when calling posterior:

using AbstractGPs

gp = GP(Matern32Kernel())

X1 = ColVecs(rand(1, 10))
Y1 = rand(10) 

gp_post = posterior(gp(X1, 0.01), Y1)

X2 = ColVecs(rand(1, 10))
Y2 = rand(10)

typeof(posterior(gp_post(X2, 0.01), Y2)) == typeof(gp_post) # returns false

Add API for (log) marginal likelihood approximation

AbstractGPs currently defines elbo, but other approximations (e.g. Laplace, EP) also provide approximations to the (log) marginal likelihood that can be used for hyperparameter optimisation. Would be great to have a unified API function for that (for VFE, SVGP, etc it could simply forward to elbo). What do you think? Finding a decent name seems the most complicated aspect of this:

approx_lml (short but a bit cryptic)?
approx_log_marginal_likelihood (explicit but so long)?
approx_log_marglik (the compromise that makes everyone unhappy)?
approx_log_evidence?

Can't use AbstractGPs with Turing

Hi,

Not sure if I'm using AbstractGPs the way it was intended.

I am able to extract the logpdf from AbstractGPs and use it to increment the log target density in Turing. But I am wondering if there is a simpler way to use AbstractGPs with Turing. For example, I feel inclined to do the following:

using Turing, AbstractGPs, KernelFunctions
import Random

# Define a kernel.
sqexpkernel(alpha::Real, rho::Real) = alpha^2 * transform(SqExponentialKernel(), 1/(rho*sqrt(2)))

@model GPRegression(y, X) = begin
    # Priors.
    alpha ~ LogNormal(0.0, 0.1)
    rho ~ LogNormal(0.0, 1.0)
    sigma ~ LogNormal(0.0, 1.0)
    
    # Realized covariance function
    kernel = sqexpkernel(alpha, rho)
    f ~ GP(kernel)
    
    # Sampling Distribution.
    y ~ f(X, sigma)
end;

Random.seed!(0)
y = randn(3)
X = randn(3, 1)
m = GPRegression(y, X)
@time chain = sample(m, NUTS(1000, 0.8), 1000)

But this raises the error:

ArgumentError: Right-hand side of a ~ must be subtype of Distribution or a vector of Distributions.

Any comments/help would be appreciated.

Better error messages for AbstractGP

Rereading #89 it makes me think that the error messages for stuff like mean(f::AbstractGP) are awful (cannot iterate over crazy type, etc).
I would be in favor in returning an error message giving maybe a quick explanation and referring to read the docs!

Implement sequential conditioning

Plotting

Stheno has functionality to make plotting GPs in 1D easy. It would be great if someone could port this over.

@sharanry this might be a good thing to do to make plotting stuff for the readme / examples easier :)

Multi-output GPs

There are a couple of options here.

Multi-output GPs are just GPs

A multioutput GP can just be viewed as a single output GP on an extended input domain. Specifically, suppose we're interested in modelling functions $\mathcal{X} \to \mathbb{R}^N$. You can always rephrase this as a function $\mathcal{X} \times {1, ..., N} \to \mathbb{R}$. Consequently you can always re-write a multioutput GP as a single-output GP on an extended input domain.

Multi-output GPs are there own special thing

The other approach is to introduce a custom object here that really treats a multi-output GP as a separate thing, and introduce new API features / conventions to handle them.

Pros and Cons

There are a couple of advantages to the first approach: namely that there's no separate notion of a multi-output GP floating around the code base, just extra abstractions (probably just an extra AbstractVector type and some specifically "multi-output" kernels). This also means that things like pseudo-point approximations will just work with the multi-output models and be non-mean-field by default.

That being said, the second approach makes the multi-output structure really explicit, and it's often the case that you want to be able to get at this for computational reasons. However, as I mentioned we can just introduce some new abstractions for multi-output GPs if we go with the everything-is-just-a-GP approach, making this look slightly less like an advantage.

Overall I definitely favour the everything-is-a-GP approach because it fits nicely with our current way of doing things, and makes multi-output GPs trivial from an API perspective -- if you're comfortable with things like the ColVecs and RowVecs abstractions, multi-output GPs will be completely natural.

Combining sampleplot and plot

It's great that we have these plotting recipes separated for situations in which you only want one or the other, but quite often you really do want both together. I wonder whether it wouldn't make sense to add a third method that just does both at the same time?

Small error in the ESS example: Incorrect use of train/test data in the last block

This is really just a small error, but it might could actually be confusing to readers.

In the last block of code and plot in the example on Approximate Inference with ESS, there are two mistakes:

both scatterplots are called on x_train and y_train, so that the final plot does not actually show the test data, and
the posterior is actually computed with the whole dataset (x, y), which I assume should be (x_train, y_train).

I will open a very small pull request with the corrections to resolve this issue, so I guess this should be resolved very quickly.

Some usability suggestions

Taken from this issue in AugmentedGaussianProcesses.jl:

There seems to be no simple way of accessing observations in a PosteriorGP except via data.x and data.δ

As there is a standard API for creating a PosteriorGP it would make sense to also provide one for retrieving its contents. How about getX and gety?

rand(gp, x) as an alternative to rand(gp(x)) similar to mean, cov etc.
convert(Normal, gp, x) or convert(MvNormal, gp, X) would be convenient

As FiniteGP isa AbstractMvNormal the last point is probably not relevant.

Move test utilities inside src

I'm currently implementing the AbstractGPs API in Stheno and TemporalGPs. Unfortunately I can't presently use the test suite in this package to see whether I've implemented the interface correctly because the code lives in test. If we move it to its own module in src, I will be able to use it.

Relates to #104 in that it will help formalise the AbstractGPs API nicely.

ZeroMean{Float64} is not callable

So diligently using ColVecs 😄 I faced the following bug when trying to optimize kernel parameters. Here is the MWE

using AbstractGPs, KernelFunctions, Flux
X = rand(2, 10)
y = rand(10)
l = rand(2)
k = transform(SqExponentialKernel(), l)
Flux.gradient(Flux.params(k)) do
    m = GP(k)(KernelFunctions.ColVecs(X), 0.01)
    logpdf(m, y)
end

And get the error LoadError: MethodError: objects of type AbstractGPs.ZeroMean{Float64} are not callable

One additional note : KernelFunctions and ColVecs/RowVecs should be exported.

FiniteGP not exported

Is it expected that FiniteGP is not always exported? It seems like a pretty important type to have outside of the module.

AdvancedHMC example broken

For some time the AdvancedHMC example has been broken, see e.g. https://github.com/JuliaGaussianProcesses/AbstractGPs.jl/runs/1639170780?check_suite_focus=true#step:5:56:

┌ Warning: failed to run `@example` block in src/examples/HMC.md:149-161
│ ```@example HMC
│ samples, stats = sample(
│     hamiltonian,
│     proposal,
│     initial_params,
│     n_samples,
│     adaptor,
│     n_adapts;
│     progress=false
│ )
│ samples_mat = reduce(hcat, samples)';
│ nothing #hide
│ ```
│   c.value = PosDefException: matrix is not positive definite; Cholesky factorization failed.
└ @ Documenter.Expanders ~/.julia/packages/Documenter/6vUwN/src/Expanders.jl:563

Gradients

It's not currently possible to reliably use eg. Zygote to propagate gradient information through this package. This is a consequence of KernelFunctions being somewhat up in the air at the minute. This issue should be addressed once KernelFunctions has figured out what it's doing about gradient info.

Mean function

Hi, I just played around with this.. it's awesome!

Some remarks / bugs:

When calling mean(f) on a GP object, I get an error message in the lines of:

MethodError: no method matching iterate(::AbstractGPs.PosteriorGP{AbstractGPs.GP{AbstractGPs.ZeroMean{Float64},KernelFunctions.Matern32Kernel},NamedTuple{(:α, :C, :x, :δ),Tuple{Array{Float64,1},LinearAlgebra.Cholesky{Float64,Array{Float64,2}},KernelFunctions.ColVecs{Float64,LinearAlgebra.Adjoint{Float64,Array{Float64,2}},SubArray{Float64,1,LinearAlgebra.Adjoint{Float64,Array{Float64,2}},Tuple{Base.Slice{Base.OneTo{Int64}},Int64},false}},Array{Float64,1}}}})
mean(::typeof(identity), ::AbstractGPs.PosteriorGP{AbstractGPs.GP{AbstractGPs.ZeroMean{Float64},KernelFunctions.Matern32Kernel},NamedTuple{(:α, :C, :x, :δ),Tuple{Array{Float64,1},LinearAlgebra.Cholesky{Float64,Array{Float64,2}},KernelFunctions.ColVecs{Float64,LinearAlgebra.Adjoint{Float64,Array{Float64,2}},SubArray{Float64,1,LinearAlgebra.Adjoint{Float64,Array{Float64,2}},Tuple{Base.Slice{Base.OneTo{Int64}},Int64},false}},Array{Float64,1}}}})@Statistics.jl:62
mean(::AbstractGPs.PosteriorGP{AbstractGPs.GP{AbstractGPs.ZeroMean{Float64},KernelFunctions.Matern32Kernel},NamedTuple{(:α, :C, :x, :δ),Tuple{Array{Float64,1},LinearAlgebra.Cholesky{Float64,Array{Float64,2}},KernelFunctions.ColVecs{Float64,LinearAlgebra.Adjoint{Float64,Array{Float64,2}},SubArray{Float64,1,LinearAlgebra.Adjoint{Float64,Array{Float64,2}},Tuple{Base.Slice{Base.OneTo{Int64}},Int64},false}},Array{Float64,1}}}})@Statistics.jl:44
top-level scope@Local: 1[inlined]

When calling mean(f, x) on a GP object for some x, then x has to be a vector. If the covariate space is multidimensional this has to be a vector of vectors. This should be written down somewhere as I didn't know how to use mean(f, x) on a GP on a multidimensional covariate space. Maybe it is, and I didn't see it? Anyways, maybe this function could also support Arrays?

Documentation link for stable is broken

It seems the following link in README is currently broken

https://juliagaussianprocesses.github.io/AbstractGPs.jl/stable

However, the link for the dev version works fine. Any idea why?

MethodError: no method matching bijector(::AbstractGPs.FiniteGP)

Hey all,

first of all, great package! I work a lot with Turing and now try to add GPs in order to estimate latent functions. That said, the simple example below breaks

using AbstractGPs, KernelFunctions
using Turing

X = 1:100
f = GP( Matern32Kernel() )
u = rand( f(X) )
λ = exp.(u)
y = rand.(Poisson.(λ))

@model function latent_gp_model(X, y)
    f  = GP(Matern32Kernel())
    u ~ f(X)
    λ = exp.(u)
    y .~ Poisson.(λ)
end
c = sample( latent_gp_model(X, y), NUTS(), 10)

MethodError: no method matching bijector(::AbstractGPs.FiniteGP)

As a workaround, I can define a struct that inherits from AbstractMvNormal and everything works fine:

struct MvNormalGP{T<:AbstractGPs.FiniteGP} <: AbstractMvNormal
    f::T
end
AbstractGPs.rand(rng::Random.AbstractRNG, f::MvNormalGP) = rand(rng, f.f)
AbstractGPs.logpdf(f::MvNormalGP, y::AbstractVector{<:Real}) = logpdf(f.f, y)

@model function latent_gp_model(X, y)
    f  = GP(Matern32Kernel())
    u ~ MvNormalGP( f(X) )
    λ = exp.(u)
    y .~ Poisson.(λ)
end
c = sample( latent_gp_model(X, y), NUTS(), 10)

Thats not a serious issue but, as a beginner to both Turing and AbstractGPs, it took me a while to figure out the problem. It is also somewhat inconsistent in the case of Stheno: With the recent PR, sampling of sparse latent GPs with Stheno works out of the box but, sampling from a non-sparse GP has the same issue as above since the implementation is based on AbstractGPs.

Do you think, it would be possible to derive GPs from AbstractMvNormal instead of ContinuousMultivariateDistribution (at least in the case of Gaussian likelihoods) ?

LatentGP API: support for input-specific likelihoods

#201 made me realise that we can't currently handle input-specific likelihoods in our LatentGP.

In particular, consider this line:

AbstractGPs.jl/src/latent_gp.jl

Line 29 in 83dc67d

(lgp::LatentGP)(x) = LatentFiniteGP(lgp.f(x, lgp.Σy), lgp.lik)

The likelihood passes straight through to the LatentFiniteGP, meaning that it never gets to know about x.
Were we to insist that the lik field of the LatentGP be possible to evaluate on a vector of inputs, and change the above line to something along the lines of

(lgp::LatentGP)(x) = LatentFiniteGP(lgp.f(x, lgp.Σy), lgp.lik(x))

this would not be a problem.

using AbstractGPs, KernelFunctions
x = rand(10)
y = rand(10)
k = SqExponentialKernel()
m = posterior(GP(k)(x, 0.01), y)
mean_and_cov_diag(m(x))