Comments (7)
The loss you define by removing the last two terms might work. However, it won't be a proper loss function. That is, for a given example, your predicted probabilities are going to be proportional to label probabilities to the power t1. This is called the escort distribution and we discuss this more thoroughly in an another paper (see Section 5): http://proceedings.mlr.press/v89/amid19a/amid19a.pdf
Basically, our motivation using Bregman divergences assures that the loss is proper and not only you predict the true class, but also the correct probabilities in expectation.
from bi-tempered-loss.
Yes, this is the same as the KL divergence KL(y, \hat{y}) between y and \hat{y} = softmax(\hat{a}). You will need to plug the softmax term into the definition of the KL divergence. Also log_t = log and exp_t = exp when t = 1.
from bi-tempered-loss.
My sentence above is from KL divergence.
But my question is :
why did you use Bergman divergence?
They are kinda different even though we call Bergman divergence "generalized KL divergence".
Note that : The two sentences are from your paper, page number 5.
from bi-tempered-loss.
Bregman divergence is a broader class of distance measures that includes KL divergence as a special case: https://en.wikipedia.org/wiki/Bregman_divergence
Our "generalized KL divergence" belongs to the class of Bregman divergence and also includes KL divergence (when t=1). The first expression is the definition of our generalized KL. If you set t=1, you recover KL (since log_t = log when t=1). The second expression is KL divergence in terms of labels y and logits (i.e. activations) \hat{a} (plug in \hat{y} = softmax(\hat{a}).
from bi-tempered-loss.
I see. Yes. Thanks for clarification.
Then my questions is :
Why did you NOT use KL divergence with exp_t and log_t?
Sorry if this bothers you.
I think you did
define log_t -> derive F_t -> use Bregman divergence for loss.
Why did you not do
define log_t -> use KL divergence for loss?
from bi-tempered-loss.
KL divergence for the loss is a special case of the Bi-tempered loss when t1=1. If you also set t2=1, you recover the vanilla softmax function. So basically t1 = t2 = 1 gives the softmax cross entropy loss: softmax_cross_entropy_with_logits
from bi-tempered-loss.
Code shows that
probabilities = tempered_softmax(activations, t2, num_iters)
loss_values = tf.multiply(
labels,
log_t(labels + 1e-10, t1) -
log_t(probabilities, t1)) - 1.0 / (2.0 - t1) * (
tf.pow(labels, 2.0 - t1) - tf.pow(probabilities, 2.0 - t1))
How about the following loss :
# hihunjin1 loss
probabilities = tempered_softmax(activations, t2, num_iters)
loss_values = tf.multiply(
labels,
log_t(labels + 1e-10, t1) -
log_t(probabilities, t1))
Will this hihunjin1
loss be diverging? I think the latter one is not Softmax Cross Entropy Loss with logits
# hihunjin2 loss
probabilities = tempered_softmax(activations, t2, num_iters)
loss_values = tf.multiply(
labels,
log_t(labels + 1e-10, t1) -probabilities
-compute_normalization(activations, t2, num_iters)
I also want to ask if this hihunjin2
loss is fine
from bi-tempered-loss.
Related Issues (15)
- trainning is too slow
- How to calculate "simple integration" in Chapter 3 HOT 1
- Use sigmod or tempered_sigmoid for prediction? HOT 4
- Nan loss during training HOT 10
- noisy instances HOT 2
- How do I implement Tempered_softmax in Cīŧ HOT 1
- loss_test.py fails in test_gradient_error HOT 1
- Accuracy results on MNIST HOT 3
- Accuracy results on cifar100 HOT 4
- How are the labels corrupted? HOT 2
- Output activation and bi-tempered loss HOT 1
- TF 2.0 Version HOT 2
- why 5 is the default num_iters? HOT 3
- ValueError: Rank mismatch: Rank of labels (received 2) should equal rank of logits minus 1 (received 2) HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
đ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. đđđ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google â¤ī¸ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from bi-tempered-loss.