Add support for ConfTr,about juliatrustworthyai/conformalprediction.jl

Comments (3)

davidstutz commented on July 2, 2024 1

Re:

You can find a reference implementation for Equation (5) here.
Reference implementation for that is here - but this does indeed depend on the scale. That's what the temperature term $T$ is for: $\sigma((E_\theta(x, k) - \tau)/T)$. Also, you can use the log-probabilities $E_\theta(x, k) = \log \pi_{\theta,k}(x)$ which works a bit better in practice.
They can be penalized but this is generally not necessary. Basically, as long as there is one true label for each example, and $\alpha$ is reasonably low, the majority of prediction sets will contain at least the true label (so not be empty). This is mainly a result of the simple conformity score (for other conformal predictors this can be different). Beyond that, you are of course free to penalize that, but I am just saying that it is generally not required to learn good classifiers.
Gradients wrt. to what is the question? Generlly, gradient is not a problem as long as the sorting is fixed. The key is getting gradients through the sorting - this is what the smooth sorter is for.

Hope that helps. If I am slow to respond on here, feel free to send me an email to follow-up - always curious to see what people do with conformal training especially as I had some follow-up ideas but couldn't really pursue them.

from conformalprediction.jl.

pat-alt commented on July 2, 2024

Some questions that have come up so far:

Is the Direc delta really supposed to be an indicator function? Equation (5) on page 5. Maybe I'm just not familiar with this notation.
Doesn't the smooth size loss depend a lot on the scale of the (non-)conformity scores? For $E_{\theta}(x,k)=\pi_{\theta,k}(x)\in[0,1]$, for example, we have that $\sigma(E_{\theta}(x,k) - \tau) \in [0.27,0.73]$. We can use temperature scaling, but can we really speak of 'probabilities' that labels are assigned to the set?
More on smooth size loss: What about empty sets? Shouldn't they be penalised at least as heavily as complete sets?
a. Could just penalise these cases as $K - \kappa$, that is the maximum set size minus the target set size (1).
b. Perhaps even better: penalise $\sum(1-C) - \kappa$, that is the total sum of probabilities that labels are not assigned to $C$.
As for the smooth quantile computation, it seems that Zygote.jl's AD actually let's me compute grads as long as I sort values beforehand (see this answer on SO). Is this suprising?

@davidstutz would much appreciate your thoughts, if you get the chance. This is still early stages here, so there's absolutely no rush. Amazing paper by the way!!

from conformalprediction.jl.

pat-alt commented on July 2, 2024

Wow this was quick, thanks a lot 🙏

That all makes sense. Regarding the quantile computation, thanks for the clarification. For my current use case, I just need to differentiate with respect to a conformal model that has already been calibrated, but I see now why you need information about the sorting itself for training.

Thanks again for being responsive!

from conformalprediction.jl.

Add support for ConfTr about conformalprediction.jl HOT 3 CLOSED

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent