Git Product home page Git Product logo

Comments (11)

shicai avatar shicai commented on May 25, 2024 1

it's easy to write a train_val.prototxt from deploy.prototxt.
you can do it by yourself.
Here is my solver settings, batch size is 256, it is quite easy too.

base_lr: 0.1
lr_policy: "poly"
power: 1.0
max_iter: 500000
momentum: 0.9
weight_decay: 0.0001

from senet-caffe.

kli-casia avatar kli-casia commented on May 25, 2024

same question

from senet-caffe.

zimenglan-sysu-512 avatar zimenglan-sysu-512 commented on May 25, 2024

hi @shicai,
have you ever tried another policy for learning rate?
thanks

from senet-caffe.

shicai avatar shicai commented on May 25, 2024

i just trained it once, and no other lr policies used.

from senet-caffe.

wlw208dzy avatar wlw208dzy commented on May 25, 2024

Thanks for your wonderful job. I am not sure about the hyper params in your train.proto. The BN layer is as follows:
layer {
name: "conv1/bn"
type: "BatchNorm"
bottom: "conv1"
top: "conv1"
batch_norm_param {
eps: 1e-4
}
}
layer {
name: "conv1/scale"
type: "Scale"
bottom: "conv1"
top: "conv1"
scale_param {
bias_term: true
}
}
Is it right?

from senet-caffe.

shicai avatar shicai commented on May 25, 2024

It is ok for test stage when using pretrained models. but for training, you should add params to control weight decay and learning rate multipliers.

from senet-caffe.

wlw208dzy avatar wlw208dzy commented on May 25, 2024

Thanks for your reply. I wonder if the hyper params of BatchNorm and Scale Layer are default (lr_multi=1.0 and decay_multi=1.0) ? @shicai

from senet-caffe.

shicai avatar shicai commented on May 25, 2024

for batchnorm layers, lr and wd should be set to 0, since you don't need to learn mean/var params.
but for scale layers, lr and wd should be set as conv layers.

from senet-caffe.

wlw208dzy avatar wlw208dzy commented on May 25, 2024

Thanks. I would like to train from scratch on ImageNet dataset. So I think the params in batchnorm layer need to learn (mean/val/factor etc.). Are the lr_multi and decay_multi set as follows?
layer {
name: "conv1/bn"
type: "BatchNorm"
bottom: "conv1"
top: "conv1"
param {
lr_mult: 1.0
decay_mult: 1.0
}
param {
lr_mult: 1.0
decay_mult: 1.0
}
param {
lr_mult: 1.0
decay_mult: 1.0
}
batch_norm_param {
eps: 1e-4
}
}
layer {
name: "conv1/scale"
type: "Scale"
bottom: "conv1"
top: "conv1"
param {
lr_mult: 1.0
decay_mult: 1.0
}
param {
lr_mult: 2.0
decay_mult: 0.0
}
scale_param {
bias_term: true
}
}

from senet-caffe.

shicai avatar shicai commented on May 25, 2024

you should know that params in batchnorm layer don't need to be learned, they are calculated. just calculate the mean/var values, actually they are not params, so please don't set lr or wd for them.

from senet-caffe.

wlw208dzy avatar wlw208dzy commented on May 25, 2024

layer {
name: "conv1/bn"
type: "BatchNorm"
bottom: "conv1"
top: "conv1"
batch_norm_param {
eps: 1e-4
}
}
layer {
name: "conv1/scale"
type: "Scale"
bottom: "conv1"
top: "conv1"
param {
lr_mult: 1.0
decay_mult: 1.0
}
param {
lr_mult: 2.0
decay_mult: 0.0
}
scale_param {
bias_term: true
}
}
Is it right?

from senet-caffe.

Related Issues (14)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.