Other methods for PIs about conformalprediction.jl HOT 14 CLOSED

juliatrustworthyai commented on July 20, 2024 1

Other methods for PIs

from conformalprediction.jl.

Comments (14)

pat-alt commented on July 20, 2024 2

There are many other ways to compute prediction intervals, but I think we should limit our focus here on CP. I also don't think I want to change the name of this package. One reason is that it also implements conformal classification and those predictions are set-valued. That being said, PredictionIntervals.jl sounds like a good name for an umbrella-type package that is designed to generate prediction intervals (different approaches beyond CP) for any supervised model.

from conformalprediction.jl.

ryantibs commented on July 20, 2024 1

Yes, that is correct. Of course if you're seeking guaranteed 90% coverage, so \alpha = 0.1, then you could always use the JK+ with \alpha = 0.05.

However I wouldn't say the interpretation is that the JK-minimax is better. The JK+ at level \alpha often has close to level 1-\alpha coverage in practice, and provably so under stability conditions (as the paper shows). In practice, the JK-minimax is often overly conservative. So practically, I would favor the JK+ in most scenarios.

from conformalprediction.jl.

pat-alt commented on July 20, 2024

Sure thing! I'll probably turn back to this later next week - happy to have a chat then. Obviously do also feel free to create a PR in the meantime

from conformalprediction.jl.

azev77 commented on July 20, 2024

Here is some code for the Jackknife+ (and friends) Prediction intervals:

using Distributions, Plots, Random, Tables, GLMNet, MLJ;

n= 1_000; σ=1.0;
X = [ones(n) sin.(1:n) cos.(n:-1:1) exp.(sin.(1:n))]
θ = [9.0; 7.0; 3.0; 21]
Noise = randn(MersenneTwister(49), n) # n rows 
cef(x;θ=θ)  = x*θ;      # Conditional Expectation Function 
skedastic(x;σ=σ) = σ;   # Skedastic Function. Homoskedastic.
y = cef(X;θ=θ) + skedastic(X;σ=σ) .* Noise

train, calibration, test = partition(eachindex(y[:,1]), 0.4, 0.4)
#train, calibration, test = partition(eachindex(y[:,1]), 0.4, 0.4, shuffle=true, rng=444)

i_new = test[1] # index of new data point.
y_train = y[train]; X_train = X[train,:];

scfit(x,y) = GLMNet.glmnetcv(x, y, nlambda=1_000, alpha = 1.0) # Ridge/Lasso alpha = 0/1
scpr(m,x)  = GLMNet.predict(m, x);

α = 0.05 # miscoverage rate α∈[0,1]

# compute in-sample residuals in the training data 
m  = scfit(X_train, y_train);
ŷ = scpr(m, X); 
ŷ_new_is = ŷ[i_new];
res_is = y_train .- ŷ[train] # in-sample residuals.

# compute LOO residuals in the training data 
res_LOO, ŷ_new_LOO = [], [];
for tt in train 
    println(tt)
    #
    ty = y_train[train .!= tt]    # y_train leave out tt
    tX = X_train[train .!= tt,:]  # X_train leave out tt
    #
    tm = scfit(tX, ty)             # fit on training data minus row==tt
    tŷ = scpr(tm, X)               # predict on dataset
    #
    push!(res_LOO, y[tt] - tŷ[tt]) # LOO residual 
    push!(ŷ_new_LOO,    tŷ[i_new]) # LOO prediction 
end 
res_LOO
ŷ_new_LOO



"Naive PI: use in-sample residuals to estimate oos residuals"
"problem: in-sample residuals usually smaller than oos residuals"
err_Naive = quantile(abs.(res_is), 1 - α)
LB_Naive = ŷ_new_is - err_Naive 
UB_Naive = ŷ_new_is + err_Naive

"Jackknife PI: use training LOO residuals to estimate oos residuals"
"problem: sensitive to instability"
err_LOO = quantile(abs.(res_LOO), 1 - α)
LB_J = ŷ_new_is - err_LOO 
UB_J = ŷ_new_is + err_LOO

"Jackknife+ PI: use sample quantile of each ŷ_new_LOO[tt] adjusted by its own res_LOO[tt]"
LB_Jp = quantile(ŷ_new_LOO - abs.(res_LOO),   α)
UB_Jp = quantile(ŷ_new_LOO + abs.(res_LOO), 1-α) 

"Jackknife-minmax PI"
LB_Jmm = minimum(ŷ_new_LOO) - err_LOO
UB_Jmm = maximum(ŷ_new_LOO) + err_LOO
#
#v = ŷ_new_LOO - abs.(res_LOO);
#quantile(v,α) == -quantile(-v,1-α)  # SANITY CHECK 

LB_Naive, LB_J, LB_Jp, LB_Jmm
UB_Naive, UB_J, UB_Jp, UB_Jmm


#TODO:
"Split Conformal/Full Conformal/K-Fold CV+/K-fold cross-conformal"

@ablaom (I had trouble implementing in MLJ so I used GLMNet.jl)
The idea is pretty simple (as explained in the paper)

Naive prediction intervals (ignores overfitting):

Properties:

Notation (sample quantile):

Also:

Some theoretical results:

@ryantibs this is prob a dumb question (I prob misunderstood the paper), but suppose I want a prediction interval that contains Y_{n+1} 90% of the time (alpha=.10).
Does Theorem 1 imply the 90% Jackknife+ PI will contain it at least 80% of the time?
Does Theorem 2 imply the 90% Jackknife-minmax PI will contain it at least 90% of the time?
Then would that mean Jackknife-minmax PIs are "better" than Jackknife+?

from conformalprediction.jl.

azev77 commented on July 20, 2024

Thanks @ryantibs! I re-read the paper.
If we assume exchangeability, coverage for JK+ is >= 1-2α
If we assume exchangeability & oos-stability, coverage for JK+ is >= 1-α {+ a term that vanishes as n gets big}

are there analogous theoretical results that bound maximum coverage for these prediction intervals?
I've seen "valid PIs" defined
IIUC what the paper calls "assumption-free theory" relies on the assumption of exchangeability. If we are predicting time-series data, this can be a very strong assumption...

from conformalprediction.jl.

pat-alt commented on July 20, 2024

Nice one @azev77 👍🏽 should be straight-forward to add this here. Happy to do that myself, but perhaps you'd prefer creating a PR so you'll show up as a contributor? (I'd like to turn to #5 first anyway and have some work to do this week on some other packages)

Edit: I've invited you as a collaborator and created a separate branch for this.

from conformalprediction.jl.

ryantibs commented on July 20, 2024

As a general comment, I would say that validity just means that the coverage is at least 1-\alpha. This just means you don't undercover. The upper bound is of course nice, and says that you don't overcover by "too much". Traditional conformal methods have this property.

We don't have an overcoverage bound for JK+.
"Assumption-free" is used to refer to the fact that there are no distributional assumptions on the joint distribution of (X,Y). But yes to your broader point, certainly I agree that i.i.d. (or exchangeable) sampling is key. None of the traditional conformal methods, JK+, or CV+ really apply outside of this setting. For time series you will have to look at extensions or adaptions of these techniques---there are by now several, take a look fro example at our "beyond exchangeability" paper and references therein.

from conformalprediction.jl.

azev77 commented on July 20, 2024

@pat-alt thanks for being so open to collaborating.
I have some other work I gotta finish before I look into PRs.
In the mean time, here is a table I made to try to summarize some of the PI methods in @ryantibs's JK+paper:
(to do: implement heteroskedasticity robust (sorta) PIs in PNAS paper by @kwuthrich etal)

Notation:

from conformalprediction.jl.

azev77 commented on July 20, 2024

I just tried to understand what the package is doing to compute Naive PIs for regression NaiveConformalRegressor.
I'll point out a small warning regarding terminology.

The JK+ paper (linked above) calls naive methods where the score is computed using in sample residuals.
-This is Naive bc we expect residuals outside the training data to be bigger...
The method this pkg calls naive uses oos residuals w/ outcomes in the calibration data
-probably different papers in the literature have different definitions for naive...

using MLJ 
import Statistics: quantile
EvoTreeRegressor = @load EvoTreeRegressor pkg=EvoTrees

#Make Data 
X, y = randn(1_000, 2), randn(1_000)
train, calibration, test = partition(eachindex(y), 0.4, 0.4)
Xcal = X[calibration,:]; ycal = y[calibration];
Xnew = X[test,:];        ynew = y[test];

#Fit training data 
model = EvoTreeRegressor() 
mach = machine(model, X, y)
fit!(mach, rows=train)

#Get PIs manually w/o a pkg (NaiveConformalRegressor)

## 2: non-conformity scores w/ calibration data ##
ŷcal = MLJ.predict(mach, Xcal) # MLJ.predict(mach, rows=calibration)
scores_cal = @.(abs(ŷcal - ycal))
scores_cal = sort(scores_cal, rev=true) # sorted non-conformity scores

## 3: get PIs ##
α=0.05; # miscoverage α ∈ [0,1]
n = length(scores_cal)
p̂ = ceil(((n+1) * (1-α))) / n
p̂ = clamp(p̂, 0.0, 1.0)
q̂ = quantile(scores_cal, p̂)
ŷnew = MLJ.predict(mach, Xnew) # MLJ.predict(mach, rows=test)
PInew = map(x -> ["lower" => x .- q̂, "upper" => x .+ q̂],eachrow(ŷnew))

some comments (to consider maybe at some point):

may be useful to allow the user to specify their own score() function as input?
may be allow alternative cross-validation techniques (like in my JK+ code above)?

PS: before I risk letting "scope creep" let me get too carried away.
My goals are to implement:
-Jackknife/JK+ methods
-the DCP methods in this PNAS paper
-possibly compare w/ prob prediction methods such as NGBoost.py
-hopefully the pkg will become easy to use so other authors can add their own method...

from conformalprediction.jl.

pat-alt commented on July 20, 2024

Thanks @azev77

You're right, that was a misnomer. In fact, what I had referred to as NaiveConformalRegressor originally, is in fact a simple Inductive Conformal regressor (also referred to as Split Conformal Regressor, because it relies on a separate on splitting the training data). Jackknife and related methods do not use data splitting and therefore fall into the category of what this tutorial as transductive CP.

I've changed the code base (see #10) to incorporate that broad distinction: models of type InductiveConformalModel <: ConformalModel rely on a separate calibration step involving a calibration set, while models of type TransductiveConformalModel <: ConformalModel only need to be fitted on training data. I've made that distinction explicit for now.

I've also implemented Jackknife now (see #15). Adding other approaches is very straight-forward from here. Authors/contributors just need to subtype the relevant ConformalModel (either inductive of transductive) and dispatch the scoring logic. If you want to have a look at the way I've now implemented Jackknife [here], you should see what I have in mind in terms of extensibility.

I'll close this issue and related branch now, but feel free to continue discussion here.

from conformalprediction.jl.

pat-alt commented on July 20, 2024

Added multiple other approaches in #18 @azev77:

Regression:

Inductive
Naive Transductive
Jackknife
Jackknife+
Jackknife-minmax
CV+
CV-minmax

Classification:

Inductive (LABEL)
Adaptive Inductive

from conformalprediction.jl.

azev77 commented on July 20, 2024

HI @pat-alt, I was actually hoping to add JK+ etc, I'm actually traveling (in Vienna rn, but wanna look into the code when back...)

from conformalprediction.jl.

pat-alt commented on July 20, 2024

Great - will be good to have second pair of eyes glance over it 🙏 safe travels!

from conformalprediction.jl.

azev77 commented on July 20, 2024

I have a weird question about prediction interval terminology.

Are there other approaches to PIs besides conformal & JK+? @ryantibs @pat-alt @ablaom

What about packages such as ngboost.py which return an entire predicted distribution? What if we compute PIs implied by that predicted distribution, what are those PIs called?

I prob should've asked this question sooner, but is there any advantage to naming this pkg PredictionIntervals.jl & make it flexible enough to incorporate a variety of methods (including methods that haven't been thought of yet...)?

from conformalprediction.jl.

Other methods for PIs about conformalprediction.jl HOT 14 CLOSED

Comments (14)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent