Hi, thanks for reimplementing the densenets. I took a look at your structure diagram a

Thanks! 2 more questions if you can spare the time: <ul dir="aut

All right, so now I have: no bias on Conv2D layers l2

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Max pooling/Avg pooling about deeplearningimplementations HOT 8 CLOSED

tdeboissiere commented on August 20, 2024

Max pooling/Avg pooling

from deeplearningimplementations.

Comments (8)

liuzhuang13 commented on August 20, 2024 1

Do you apply weight_decay to all layers (input, inside the dense block, in the transitions and the fully connected output) ?

Yes, even including batch_norm layer's scale parameters.

Do you apply bias decay as well ?

Yes, Torch applies weight decay to bias, too. However, note in convolutional layers we didn't use bias, since batch norm will undo what bias does.

Good luck!

from deeplearningimplementations.

tdeboissiere commented on August 20, 2024

Thanks for pointing that out !
I had trouble reproducing your results with L=40, k = 12 on CIFAR10 (no data augmentation) as my test accuracy saturated around 90 %.

I'm gonna run tests with both to find out the best option. If you find any other mistake please let me know.

Edit:

Changed model and figure to averagepooling

from deeplearningimplementations.

liuzhuang13 commented on August 20, 2024

Hi, I saw your description and results in README, and noticed one more different thing... We didn't use any lr decay, it's only divided twice, in other time it's a constant. The 1e-4 decay in our setting is the weight decay.

In our curve(in paper) you can see a big error drop at epoch 150 and 225, but in yours there isn't. That's due to your lr decay probably.

Cheers,

from deeplearningimplementations.

tdeboissiere commented on August 20, 2024

Thanks!

2 more questions if you can spare the time:

Do you apply weight_decay to all layers (input, inside the dense block, in the transitions and the fully connected output) ?
Do you apply bias decay as well ?

from deeplearningimplementations.

tdeboissiere commented on August 20, 2024

All right, so now I have:

no bias on Conv2D layers
l2 norm regularization on all Conv2D weights, on the gamma and beta parameters of all batch norm layers and on the weights and bias of the Dense Layer.

Sounds good ?

Edit:

I applied the above and reproduced your CIFAR10 (no augmentation) results: sweet !

from deeplearningimplementations.

ruudvlutters commented on August 20, 2024

Hi,

I'm also trying to replicate the Densenet in Keras, and I found that Keras uses a momentum default of 0.99 in the batchnorm layers. Whereas, torch is using 0.1 (corresponding to 0.9 in Keras, as it uses an opposite definition). Maybe another reason for a difference ?

from deeplearningimplementations.

liuzhuang13 commented on August 20, 2024

@ruudvlutters In the Torch code we use a momentum of 0.9 to all weights and biases including batch norm layers. Yes that could be a reason for difference. Thanks for pointing out.

And I just found the initialization is different too, the initialization used in our Torch code (which is copied from fb.resnet.torch) is indeed different from the commonly used "he" or "msra" scheme.

from deeplearningimplementations.

tdeboissiere commented on August 20, 2024

Given I was able to reproduce your results (see update) without the momentum/initialization stage, it seems that these differences do not matter that much.

from deeplearningimplementations.

Max pooling/Avg pooling about deeplearningimplementations HOT 8 CLOSED

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent