Git Product home page Git Product logo

Comments (3)

davidstutz avatar davidstutz commented on June 20, 2024 1

Re:

  1. You can find a reference implementation for Equation (5) here.
  2. Reference implementation for that is here - but this does indeed depend on the scale. That's what the temperature term $T$ is for: $\sigma((E_\theta(x, k) - \tau)/T)$. Also, you can use the log-probabilities $E_\theta(x, k) = \log \pi_{\theta,k}(x)$ which works a bit better in practice.
  3. They can be penalized but this is generally not necessary. Basically, as long as there is one true label for each example, and $\alpha$ is reasonably low, the majority of prediction sets will contain at least the true label (so not be empty). This is mainly a result of the simple conformity score (for other conformal predictors this can be different). Beyond that, you are of course free to penalize that, but I am just saying that it is generally not required to learn good classifiers.
  4. Gradients wrt. to what is the question? Generlly, gradient is not a problem as long as the sorting is fixed. The key is getting gradients through the sorting - this is what the smooth sorter is for.

Hope that helps. If I am slow to respond on here, feel free to send me an email to follow-up - always curious to see what people do with conformal training especially as I had some follow-up ideas but couldn't really pursue them.

from conformalprediction.jl.

pat-alt avatar pat-alt commented on June 20, 2024

Some questions that have come up so far:

  1. Is the Direc delta really supposed to be an indicator function? Equation (5) on page 5. Maybe I'm just not familiar with this notation.
  2. Doesn't the smooth size loss depend a lot on the scale of the (non-)conformity scores? For $E_{\theta}(x,k)=\pi_{\theta,k}(x)\in[0,1]$, for example, we have that $\sigma(E_{\theta}(x,k) - \tau) \in [0.27,0.73]$. We can use temperature scaling, but can we really speak of 'probabilities' that labels are assigned to the set?
  3. More on smooth size loss: What about empty sets? Shouldn't they be penalised at least as heavily as complete sets?
    a. Could just penalise these cases as $K - \kappa$, that is the maximum set size minus the target set size (1).
    b. Perhaps even better: penalise $\sum(1-C) - \kappa$, that is the total sum of probabilities that labels are not assigned to $C$.
  4. As for the smooth quantile computation, it seems that Zygote.jl's AD actually let's me compute grads as long as I sort values beforehand (see this answer on SO). Is this suprising?

@davidstutz would much appreciate your thoughts, if you get the chance. This is still early stages here, so there's absolutely no rush. Amazing paper by the way!!

from conformalprediction.jl.

pat-alt avatar pat-alt commented on June 20, 2024

Wow this was quick, thanks a lot 🙏

That all makes sense. Regarding the quantile computation, thanks for the clarification. For my current use case, I just need to differentiate with respect to a conformal model that has already been calibrated, but I see now why you need information about the sorting itself for training.

Thanks again for being responsive!

from conformalprediction.jl.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.