Comments (14)
There are many other ways to compute prediction intervals, but I think we should limit our focus here on CP. I also don't think I want to change the name of this package. One reason is that it also implements conformal classification and those predictions are set-valued. That being said, PredictionIntervals.jl
sounds like a good name for an umbrella-type package that is designed to generate prediction intervals (different approaches beyond CP) for any supervised model.
from conformalprediction.jl.
Yes, that is correct. Of course if you're seeking guaranteed 90% coverage, so \alpha = 0.1, then you could always use the JK+ with \alpha = 0.05.
However I wouldn't say the interpretation is that the JK-minimax is better. The JK+ at level \alpha often has close to level 1-\alpha coverage in practice, and provably so under stability conditions (as the paper shows). In practice, the JK-minimax is often overly conservative. So practically, I would favor the JK+ in most scenarios.
from conformalprediction.jl.
Sure thing! I'll probably turn back to this later next week - happy to have a chat then. Obviously do also feel free to create a PR in the meantime
from conformalprediction.jl.
Here is some code for the Jackknife+ (and friends) Prediction intervals:
using Distributions, Plots, Random, Tables, GLMNet, MLJ;
n= 1_000; σ=1.0;
X = [ones(n) sin.(1:n) cos.(n:-1:1) exp.(sin.(1:n))]
θ = [9.0; 7.0; 3.0; 21]
Noise = randn(MersenneTwister(49), n) # n rows
cef(x;θ=θ) = x*θ; # Conditional Expectation Function
skedastic(x;σ=σ) = σ; # Skedastic Function. Homoskedastic.
y = cef(X;θ=θ) + skedastic(X;σ=σ) .* Noise
train, calibration, test = partition(eachindex(y[:,1]), 0.4, 0.4)
#train, calibration, test = partition(eachindex(y[:,1]), 0.4, 0.4, shuffle=true, rng=444)
i_new = test[1] # index of new data point.
y_train = y[train]; X_train = X[train,:];
scfit(x,y) = GLMNet.glmnetcv(x, y, nlambda=1_000, alpha = 1.0) # Ridge/Lasso alpha = 0/1
scpr(m,x) = GLMNet.predict(m, x);
α = 0.05 # miscoverage rate α∈[0,1]
# compute in-sample residuals in the training data
m = scfit(X_train, y_train);
ŷ = scpr(m, X);
ŷ_new_is = ŷ[i_new];
res_is = y_train .- ŷ[train] # in-sample residuals.
# compute LOO residuals in the training data
res_LOO, ŷ_new_LOO = [], [];
for tt in train
println(tt)
#
ty = y_train[train .!= tt] # y_train leave out tt
tX = X_train[train .!= tt,:] # X_train leave out tt
#
tm = scfit(tX, ty) # fit on training data minus row==tt
tŷ = scpr(tm, X) # predict on dataset
#
push!(res_LOO, y[tt] - tŷ[tt]) # LOO residual
push!(ŷ_new_LOO, tŷ[i_new]) # LOO prediction
end
res_LOO
ŷ_new_LOO
"Naive PI: use in-sample residuals to estimate oos residuals"
"problem: in-sample residuals usually smaller than oos residuals"
err_Naive = quantile(abs.(res_is), 1 - α)
LB_Naive = ŷ_new_is - err_Naive
UB_Naive = ŷ_new_is + err_Naive
"Jackknife PI: use training LOO residuals to estimate oos residuals"
"problem: sensitive to instability"
err_LOO = quantile(abs.(res_LOO), 1 - α)
LB_J = ŷ_new_is - err_LOO
UB_J = ŷ_new_is + err_LOO
"Jackknife+ PI: use sample quantile of each ŷ_new_LOO[tt] adjusted by its own res_LOO[tt]"
LB_Jp = quantile(ŷ_new_LOO - abs.(res_LOO), α)
UB_Jp = quantile(ŷ_new_LOO + abs.(res_LOO), 1-α)
"Jackknife-minmax PI"
LB_Jmm = minimum(ŷ_new_LOO) - err_LOO
UB_Jmm = maximum(ŷ_new_LOO) + err_LOO
#
#v = ŷ_new_LOO - abs.(res_LOO);
#quantile(v,α) == -quantile(-v,1-α) # SANITY CHECK
LB_Naive, LB_J, LB_Jp, LB_Jmm
UB_Naive, UB_J, UB_Jp, UB_Jmm
#TODO:
"Split Conformal/Full Conformal/K-Fold CV+/K-fold cross-conformal"
@ablaom (I had trouble implementing in MLJ so I used GLMNet.jl)
The idea is pretty simple (as explained in the paper)
Naive prediction intervals (ignores overfitting):
@ryantibs this is prob a dumb question (I prob misunderstood the paper), but suppose I want a prediction interval that contains Y_{n+1}
90% of the time (alpha=.10).
Does Theorem 1 imply the 90% Jackknife+ PI will contain it at least 80%
of the time?
Does Theorem 2 imply the 90% Jackknife-minmax PI will contain it at least 90%
of the time?
Then would that mean Jackknife-minmax PIs are "better" than Jackknife+?
from conformalprediction.jl.
Thanks @ryantibs! I re-read the paper.
If we assume exchangeability, coverage for JK+ is >= 1-2α
If we assume exchangeability & oos-stability, coverage for JK+ is >= 1-α
{+ a term that vanishes as n
gets big}
-
are there analogous theoretical results that bound maximum coverage for these prediction intervals?
I've seen "valid PIs" defined -
IIUC what the paper calls "assumption-free theory" relies on the assumption of exchangeability. If we are predicting time-series data, this can be a very strong assumption...
from conformalprediction.jl.
Nice one @azev77 👍🏽 should be straight-forward to add this here. Happy to do that myself, but perhaps you'd prefer creating a PR so you'll show up as a contributor? (I'd like to turn to #5 first anyway and have some work to do this week on some other packages)
Edit: I've invited you as a collaborator and created a separate branch for this.
from conformalprediction.jl.
As a general comment, I would say that validity just means that the coverage is at least 1-\alpha. This just means you don't undercover. The upper bound is of course nice, and says that you don't overcover by "too much". Traditional conformal methods have this property.
-
We don't have an overcoverage bound for JK+.
-
"Assumption-free" is used to refer to the fact that there are no distributional assumptions on the joint distribution of (X,Y). But yes to your broader point, certainly I agree that i.i.d. (or exchangeable) sampling is key. None of the traditional conformal methods, JK+, or CV+ really apply outside of this setting. For time series you will have to look at extensions or adaptions of these techniques---there are by now several, take a look fro example at our "beyond exchangeability" paper and references therein.
from conformalprediction.jl.
@pat-alt thanks for being so open to collaborating.
I have some other work I gotta finish before I look into PRs.
In the mean time, here is a table I made to try to summarize some of the PI methods in @ryantibs's JK+paper:
(to do: implement heteroskedasticity robust (sorta) PIs in PNAS paper by @kwuthrich etal)
Notation:
from conformalprediction.jl.
I just tried to understand what the package is doing to compute Naive PIs for regression NaiveConformalRegressor
.
I'll point out a small warning regarding terminology.
The JK+ paper (linked above) calls naive
methods where the score is computed using in sample residuals.
-This is Naive bc we expect residuals outside the training data to be bigger...
The method this pkg calls naive
uses oos residuals w/ outcomes in the calibration
data
-probably different papers in the literature have different definitions for naive
...
using MLJ
import Statistics: quantile
EvoTreeRegressor = @load EvoTreeRegressor pkg=EvoTrees
#Make Data
X, y = randn(1_000, 2), randn(1_000)
train, calibration, test = partition(eachindex(y), 0.4, 0.4)
Xcal = X[calibration,:]; ycal = y[calibration];
Xnew = X[test,:]; ynew = y[test];
#Fit training data
model = EvoTreeRegressor()
mach = machine(model, X, y)
fit!(mach, rows=train)
#Get PIs manually w/o a pkg (NaiveConformalRegressor)
## 2: non-conformity scores w/ calibration data ##
ŷcal = MLJ.predict(mach, Xcal) # MLJ.predict(mach, rows=calibration)
scores_cal = @.(abs(ŷcal - ycal))
scores_cal = sort(scores_cal, rev=true) # sorted non-conformity scores
## 3: get PIs ##
α=0.05; # miscoverage α ∈ [0,1]
n = length(scores_cal)
p̂ = ceil(((n+1) * (1-α))) / n
p̂ = clamp(p̂, 0.0, 1.0)
q̂ = quantile(scores_cal, p̂)
ŷnew = MLJ.predict(mach, Xnew) # MLJ.predict(mach, rows=test)
PInew = map(x -> ["lower" => x .- q̂, "upper" => x .+ q̂],eachrow(ŷnew))
some comments (to consider maybe at some point):
- may be useful to allow the user to specify their own
score()
function as input? - may be allow alternative cross-validation techniques (like in my JK+ code above)?
PS: before I risk letting "scope creep" let me get too carried away.
My goals are to implement:
-Jackknife/JK+ methods
-the DCP methods in this PNAS paper
-possibly compare w/ prob prediction methods such as NGBoost.py
-hopefully the pkg will become easy to use so other authors can add their own method...
from conformalprediction.jl.
Thanks @azev77
You're right, that was a misnomer. In fact, what I had referred to as NaiveConformalRegressor
originally, is in fact a simple Inductive Conformal regressor (also referred to as Split Conformal Regressor, because it relies on a separate on splitting the training data). Jackknife and related methods do not use data splitting and therefore fall into the category of what this tutorial as transductive CP.
I've changed the code base (see #10) to incorporate that broad distinction: models of type InductiveConformalModel <: ConformalModel
rely on a separate calibration step involving a calibration set, while models of type TransductiveConformalModel <: ConformalModel
only need to be fitted on training data. I've made that distinction explicit for now.
I've also implemented Jackknife now (see #15). Adding other approaches is very straight-forward from here. Authors/contributors just need to subtype the relevant ConformalModel
(either inductive of transductive) and dispatch the scoring logic. If you want to have a look at the way I've now implemented Jackknife [here], you should see what I have in mind in terms of extensibility.
I'll close this issue and related branch now, but feel free to continue discussion here.
from conformalprediction.jl.
Added multiple other approaches in #18 @azev77:
Regression:
- Inductive
- Naive Transductive
- Jackknife
- Jackknife+
- Jackknife-minmax
- CV+
- CV-minmax
Classification:
- Inductive (LABEL)
- Adaptive Inductive
from conformalprediction.jl.
HI @pat-alt, I was actually hoping to add JK+ etc, I'm actually traveling (in Vienna rn, but wanna look into the code when back...)
from conformalprediction.jl.
Great - will be good to have second pair of eyes glance over it 🙏 safe travels!
from conformalprediction.jl.
I have a weird question about prediction interval terminology.
Are there other approaches to PIs besides conformal & JK+? @ryantibs @pat-alt @ablaom
What about packages such as ngboost.py which return an entire predicted distribution? What if we compute PIs implied by that predicted distribution, what are those PIs called?
I prob should've asked this question sooner, but is there any advantage to naming this pkg PredictionIntervals.jl
& make it flexible enough to incorporate a variety of methods (including methods that haven't been thought of yet...)?
from conformalprediction.jl.
Related Issues (20)
- Conformal Training examples HOT 2
- Support for thresholding predictive distributions as explained in Section 2.4 of the tutorial HOT 2
- Conformal Bayes through 'add-one-in' importance sampling
- .vscode folder HOT 1
- Add Aqua.jl
- Add parallelizer field to all models
- Adaptive Inductive Classification broken? HOT 2
- Move to adjusted quantile HOT 1
- Class-Conditional CP with many classes
- Treat data as artifacts
- JuliaCon pres
- Add format check to CI
- Add support for RAPS
- [Refactor] Separate module for TS
- Revisit sample correction
- Move plot methods to TaijaPlotting.jl
- Add TaijaPlotting to docs env HOT 1
- Add support for 1.6 HOT 1
- readme Quick Tour notebook: "Could not fetch rendered notebook or notebook source." HOT 5
- Conformal Training
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from conformalprediction.jl.