Git Product home page Git Product logo

Comments (8)

liuzhuang13 avatar liuzhuang13 commented on August 20, 2024 1

Do you apply weight_decay to all layers (input, inside the dense block, in the transitions and the fully connected output) ?

Yes, even including batch_norm layer's scale parameters.

Do you apply bias decay as well ?

Yes, Torch applies weight decay to bias, too. However, note in convolutional layers we didn't use bias, since batch norm will undo what bias does.

Good luck!

from deeplearningimplementations.

tdeboissiere avatar tdeboissiere commented on August 20, 2024

Thanks for pointing that out !
I had trouble reproducing your results with L=40, k = 12 on CIFAR10 (no data augmentation) as my test accuracy saturated around 90 %.

I'm gonna run tests with both to find out the best option. If you find any other mistake please let me know.

Edit:

Changed model and figure to averagepooling

from deeplearningimplementations.

liuzhuang13 avatar liuzhuang13 commented on August 20, 2024

Hi, I saw your description and results in README, and noticed one more different thing... We didn't use any lr decay, it's only divided twice, in other time it's a constant. The 1e-4 decay in our setting is the weight decay.

In our curve(in paper) you can see a big error drop at epoch 150 and 225, but in yours there isn't. That's due to your lr decay probably.

Cheers,

from deeplearningimplementations.

tdeboissiere avatar tdeboissiere commented on August 20, 2024

Thanks!

2 more questions if you can spare the time:

  • Do you apply weight_decay to all layers (input, inside the dense block, in the transitions and the fully connected output) ?
  • Do you apply bias decay as well ?

from deeplearningimplementations.

tdeboissiere avatar tdeboissiere commented on August 20, 2024

All right, so now I have:

  • no bias on Conv2D layers
  • l2 norm regularization on all Conv2D weights, on the gamma and beta parameters of all batch norm layers and on the weights and bias of the Dense Layer.

Sounds good ?

Edit:

I applied the above and reproduced your CIFAR10 (no augmentation) results: sweet !

from deeplearningimplementations.

ruudvlutters avatar ruudvlutters commented on August 20, 2024

Hi,

I'm also trying to replicate the Densenet in Keras, and I found that Keras uses a momentum default of 0.99 in the batchnorm layers. Whereas, torch is using 0.1 (corresponding to 0.9 in Keras, as it uses an opposite definition). Maybe another reason for a difference ?

from deeplearningimplementations.

liuzhuang13 avatar liuzhuang13 commented on August 20, 2024

@ruudvlutters In the Torch code we use a momentum of 0.9 to all weights and biases including batch norm layers. Yes that could be a reason for difference. Thanks for pointing out.

And I just found the initialization is different too, the initialization used in our Torch code (which is copied from fb.resnet.torch) is indeed different from the commonly used "he" or "msra" scheme.

from deeplearningimplementations.

tdeboissiere avatar tdeboissiere commented on August 20, 2024

Given I was able to reproduce your results (see update) without the momentum/initialization stage, it seems that these differences do not matter that much.

from deeplearningimplementations.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.