Why did you use Bergman divergence instead of KL divergence? about bi-tempered-loss HOT 7 CLOSED

google commented on May 6, 2024

Why did you use Bergman divergence instead of KL divergence?

from bi-tempered-loss.

Comments (7)

eamid commented on May 6, 2024 1

The loss you define by removing the last two terms might work. However, it won't be a proper loss function. That is, for a given example, your predicted probabilities are going to be proportional to label probabilities to the power t1. This is called the escort distribution and we discuss this more thoroughly in an another paper (see Section 5): http://proceedings.mlr.press/v89/amid19a/amid19a.pdf

Basically, our motivation using Bregman divergences assures that the loss is proper and not only you predict the true class, but also the correct probabilities in expectation.

from bi-tempered-loss.

eamid commented on May 6, 2024

Yes, this is the same as the KL divergence KL(y, \hat{y}) between y and \hat{y} = softmax(\hat{a}). You will need to plug the softmax term into the definition of the KL divergence. Also log_t = log and exp_t = exp when t = 1.

from bi-tempered-loss.

hihunjin commented on May 6, 2024

My sentence above is from KL divergence.
But my question is :
why did you use Bergman divergence?
They are kinda different even though we call Bergman divergence "generalized KL divergence".

Note that : The two sentences are from your paper, page number 5.

from bi-tempered-loss.

eamid commented on May 6, 2024

Bregman divergence is a broader class of distance measures that includes KL divergence as a special case: https://en.wikipedia.org/wiki/Bregman_divergence

Our "generalized KL divergence" belongs to the class of Bregman divergence and also includes KL divergence (when t=1). The first expression is the definition of our generalized KL. If you set t=1, you recover KL (since log_t = log when t=1). The second expression is KL divergence in terms of labels y and logits (i.e. activations) \hat{a} (plug in \hat{y} = softmax(\hat{a}).

from bi-tempered-loss.

hihunjin commented on May 6, 2024

I see. Yes. Thanks for clarification.
Then my questions is :
Why did you NOT use KL divergence with exp_t and log_t?

Sorry if this bothers you.

I think you did
define log_t -> derive F_t -> use Bregman divergence for loss.
Why did you not do
define log_t -> use KL divergence for loss?

from bi-tempered-loss.

eamid commented on May 6, 2024

KL divergence for the loss is a special case of the Bi-tempered loss when t1=1. If you also set t2=1, you recover the vanilla softmax function. So basically t1 = t2 = 1 gives the softmax cross entropy loss: softmax_cross_entropy_with_logits

from bi-tempered-loss.

hihunjin commented on May 6, 2024

Code shows that

        probabilities = tempered_softmax(activations, t2, num_iters)
        loss_values = tf.multiply(
            labels,
            log_t(labels + 1e-10, t1) -
            log_t(probabilities, t1)) - 1.0 / (2.0 - t1) * (
                tf.pow(labels, 2.0 - t1) - tf.pow(probabilities, 2.0 - t1))

How about the following loss :

# hihunjin1 loss
        probabilities = tempered_softmax(activations, t2, num_iters)
        loss_values = tf.multiply(
            labels,
            log_t(labels + 1e-10, t1) -
            log_t(probabilities, t1))

Will this hihunjin1 loss be diverging? I think the latter one is not Softmax Cross Entropy Loss with logits

# hihunjin2 loss
        probabilities = tempered_softmax(activations, t2, num_iters)
        loss_values = tf.multiply(
            labels,
            log_t(labels + 1e-10, t1) -probabilities
            -compute_normalization(activations, t2, num_iters)

I also want to ask if this hihunjin2 loss is fine

from bi-tempered-loss.

Why did you use Bergman divergence instead of KL divergence? about bi-tempered-loss HOT 7 CLOSED

Comments (7)

Related Issues (15)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent