Git Product home page Git Product logo

Comments (4)

rohan-anil avatar rohan-anil commented on May 6, 2024

Hi Shuikehuo,

We used Resnet-56 without batch norm from [1] which explains the accuracy difference (and a weaker baseline). And it was trained with SGD optimizer for 50k steps with batch size 128.

The experiment shows the effect of noisy labels on the test accuracy when trained with logistic loss, and with bi-tempered logistic loss. We expect that the results in terms of accuracy delta to remain similar even when trained with the Resnet-50 (with batch norm) model or models of similar capacity. We will make available the code for Resnet-56 model without batch norm from [1] soon to reproduce the results.

Thanks,

[1] Identity Matters in Deep Learning, Moritz Hardt, Tengyu Ma, https://arxiv.org/pdf/1611.04231.pdf

from bi-tempered-loss.

Charles-Xie avatar Charles-Xie commented on May 6, 2024

Hi Shuikehuo,

We used Resnet-56 without batch norm from [1] which explains the accuracy difference (and a weaker baseline). And it was trained with SGD optimizer for 50k steps with batch size 128.

The experiment shows the effect of noisy labels on the test accuracy when trained with logistic loss, and with bi-tempered logistic loss. We expect that the results in terms of accuracy delta to remain similar even when trained with the Resnet-50 (with batch norm) model or models of similar capacity. We will make available the code for Resnet-56 model without batch norm from [1] soon to reproduce the results.

Thanks,

[1] Identity Matters in Deep Learning, Moritz Hardt, Tengyu Ma, https://arxiv.org/pdf/1611.04231.pdf

@rohan-anil Is there any reason to use resnet56 without batch normalization? This network seems not to be used a lot in experiments.

When I use resnet110 with BN (as introduced in ResNet v1 paper), the accuracy delta (improvement) does not seems to be very obvious, for clean or noisy labels.

result

from bi-tempered-loss.

eamid avatar eamid commented on May 6, 2024

Hi Chi,

Thank you for your interest in our method.

We used the Resnet-56 model because we had the baseline easily available (Moritz was at google, and we used his codebase). I noticed that the bi-tempered loss still gives some improvements in your case. You might achieve even more improvement by tuning t1 and t2 (I would suggest trying a larger t2 value).

Ehsan

from bi-tempered-loss.

Charles-Xie avatar Charles-Xie commented on May 6, 2024

@eamid
Thanks a lot!

from bi-tempered-loss.

Related Issues (15)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤ī¸ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.