Comments (3)
Hi Jerome,
Thanks for noticing this bug. We also have noticed this for a while and have been working on fixing it.
About your first question: we fixed a couple of issues such as the initialization (switching from random.normal to xavier_initializer for the weights) and the learning rates. We were able to improve the results on the noise-free case.
The results reported on CIFAR-100 and ImageNet 2012 use early-stopping. However, the results on MNIST are the final test accuracy after 500 epochs. We have noticed that early stopping helps a lot for the logistic loss. We can confirm your results with 10 or so epochs. However, as you keep training with the logistic loss, the model starts to quickly overfit to the noise in the data and therefore, the test accuracy drops immediately. On the other hand, the bi-tempered loss seems not to be affected much by longer training and successfully avoids the noisy example. We decided to change those to the value of test accuracy based on best accuracy value obtained on a noisy validation set (which corresponds to early stopping). We have created figures that show the overfitting of the logistic loss to the noise during the training and how the bi-tempered loss avoids this. We will update the pre-print soon and let you know about it.
Thank you again for trying our model and noticing the bug. We deeply appreciate your feedback. We also would like to acknowledge you for your feedback in the next version of our paper.
from bi-tempered-loss.
Hey, sorry for the late reply.
You should probably give a bit more details about the validation part and the hyperparameters of the optimizer such as learning rate policy, weight decay, momentum, etc.
However, I am still skeptical about a few things. When I train with 50% of label corruption (both in train and validation), I can still get ~98% accuracy on the test set. You mentioned the overfitting of the model to the train set. If this was true, and you based your model selection on the corrupted validation set, it would (and it indeed does) dramatically increase the validation loss after a few epochs. When trying, I observe that the train accuracy and validation loss increase so the final model is actually one from the first few epochs and it gets 98% on the test set which is more than the 15.82% claimed. And this is without any regularization other than the model dropout (so no weight decay, no momentum, constant learning rate of 0.1).
from bi-tempered-loss.
Jerome,
We have fixed the MNIST experiment in the final version. Thank you again for your comments.
from bi-tempered-loss.
Related Issues (15)
- trainning is too slow
- How to calculate "simple integration" in Chapter 3 HOT 1
- Why did you use Bergman divergence instead of KL divergence? HOT 7
- Use sigmod or tempered_sigmoid for prediction? HOT 4
- Nan loss during training HOT 10
- noisy instances HOT 2
- How do I implement Tempered_softmax in Cīŧ HOT 1
- loss_test.py fails in test_gradient_error HOT 1
- Accuracy results on cifar100 HOT 4
- How are the labels corrupted? HOT 2
- Output activation and bi-tempered loss HOT 1
- TF 2.0 Version HOT 2
- why 5 is the default num_iters? HOT 3
- ValueError: Rank mismatch: Rank of labels (received 2) should equal rank of logits minus 1 (received 2) HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
đ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. đđđ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google â¤ī¸ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from bi-tempered-loss.