Git Product home page Git Product logo

Comments (7)

eamid avatar eamid commented on May 6, 2024 1

The loss you define by removing the last two terms might work. However, it won't be a proper loss function. That is, for a given example, your predicted probabilities are going to be proportional to label probabilities to the power t1. This is called the escort distribution and we discuss this more thoroughly in an another paper (see Section 5): http://proceedings.mlr.press/v89/amid19a/amid19a.pdf

Basically, our motivation using Bregman divergences assures that the loss is proper and not only you predict the true class, but also the correct probabilities in expectation.

from bi-tempered-loss.

eamid avatar eamid commented on May 6, 2024

Yes, this is the same as the KL divergence KL(y, \hat{y}) between y and \hat{y} = softmax(\hat{a}). You will need to plug the softmax term into the definition of the KL divergence. Also log_t = log and exp_t = exp when t = 1.

from bi-tempered-loss.

hihunjin avatar hihunjin commented on May 6, 2024

My sentence above is from KL divergence.
But my question is :
why did you use Bergman divergence?
They are kinda different even though we call Bergman divergence "generalized KL divergence".
image

image

Note that : The two sentences are from your paper, page number 5.

from bi-tempered-loss.

eamid avatar eamid commented on May 6, 2024

Bregman divergence is a broader class of distance measures that includes KL divergence as a special case: https://en.wikipedia.org/wiki/Bregman_divergence

Our "generalized KL divergence" belongs to the class of Bregman divergence and also includes KL divergence (when t=1). The first expression is the definition of our generalized KL. If you set t=1, you recover KL (since log_t = log when t=1). The second expression is KL divergence in terms of labels y and logits (i.e. activations) \hat{a} (plug in \hat{y} = softmax(\hat{a}).

from bi-tempered-loss.

hihunjin avatar hihunjin commented on May 6, 2024

I see. Yes. Thanks for clarification.
Then my questions is :
Why did you NOT use KL divergence with exp_t and log_t?
image
Sorry if this bothers you.

I think you did
define log_t -> derive F_t -> use Bregman divergence for loss.
Why did you not do
define log_t -> use KL divergence for loss?

from bi-tempered-loss.

eamid avatar eamid commented on May 6, 2024

KL divergence for the loss is a special case of the Bi-tempered loss when t1=1. If you also set t2=1, you recover the vanilla softmax function. So basically t1 = t2 = 1 gives the softmax cross entropy loss: softmax_cross_entropy_with_logits

from bi-tempered-loss.

hihunjin avatar hihunjin commented on May 6, 2024

Code shows that

        probabilities = tempered_softmax(activations, t2, num_iters)
        loss_values = tf.multiply(
            labels,
            log_t(labels + 1e-10, t1) -
            log_t(probabilities, t1)) - 1.0 / (2.0 - t1) * (
                tf.pow(labels, 2.0 - t1) - tf.pow(probabilities, 2.0 - t1))

How about the following loss :

# hihunjin1 loss
        probabilities = tempered_softmax(activations, t2, num_iters)
        loss_values = tf.multiply(
            labels,
            log_t(labels + 1e-10, t1) -
            log_t(probabilities, t1))

Will this hihunjin1 loss be diverging? I think the latter one is not Softmax Cross Entropy Loss with logits

# hihunjin2 loss
        probabilities = tempered_softmax(activations, t2, num_iters)
        loss_values = tf.multiply(
            labels,
            log_t(labels + 1e-10, t1) -probabilities
            -compute_normalization(activations, t2, num_iters)

I also want to ask if this hihunjin2 loss is fine

from bi-tempered-loss.

Related Issues (15)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤ī¸ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.