ducha-aiki / caffenet-benchmark Goto Github PK

Evaluation of the CNN design choices performance on ImageNet-2012.

Jupyter Notebook 100.00%

convolutional-neural-networks convolutional-networks batch-size caffenet lr-policy architecture relu caffe benchmark activations

caffenet-benchmark's People

Contributors

Stargazers

Watchers

Forkers

phecy dereyly janemyleng kli-casia wait1988 fuxicv gouxiayibu tybxiaobao shaoli-huang qingsong99 wangxianliang blgene baiyancheng20 nianfudong weilamchung yxliang tianboguangding gujunli wshenx ibmua chenglongchen laurae2 ashiqrh wanjinchang chagge snazz2001 wang4959520 zhang365947064 aminm2a wendalegood huohuai pddpradip caomw sunxingxingtf scatterbrain333 john-annual yiweichen04 simonzeus hiroshiba coderx7 dafeifei2016 fanshiqing kapilkoundinya rollingstone walkoncross soledad89 samson-wang znat xqpinitial alexander-rakhlin viewsky ilibx inachencyr panxjia trantorrepository zhl10154 githubfragments yuckfu tobechao wujixiu iij0 johnson-yue 5059 xychen9459 tinyloop dcarlyle ffun ll36771 ashoksundaresan leo-zhou suzhenghang yao-ying xiaoxinyi weiliangxiao kmkolasinski mjiansun zhengzhugithub pierrehao imokawa wujiahongpku jiamery vivienfu fulquan chlapec changeerhao unixnme feng257 inkimage ieee820 redsuncmx jiangyy5318 shichaosuper mm2012mm zyysny daijuting qfdong skyjiao zhang405744522 gavince dreadlord1984

caffenet-benchmark's Issues

EltAffine Functionality

Hi, @ducha-aiki

I am trying to figure out BN implementations from the PR you test and there's no bias and shift implemented there.

I also notice that from your experiments, it seems that BN + Affine doesn't improve performance that much from the initial training stages.

And in your Caffe fork, https://github.com/ducha-aiki/caffe, there's another version of bn implementation as Caffe PR 1965, which implements shift and bias.

So may I know why such two operations is dropped in Caffe Upstream you test? Do they even hurt performance? Or what version should I choose to use?

Thanks a lot.

What about group in Caffenet and Alexnet

There is a 'group' param in Caffenet and Alexnet. I've googled a bit. It seams that no direct comparison between one group and two. Is it worth trying? Hope not too stupid.;-)
Thanks.

Pre-trained GoogLeNet-128?

Could you please release the pre-trained GoogLeNet-128 model?
Thanks a lot!

Question about your log loss benchmarks

Just a clarification question.
From ilsvrc2012, there were three data sets for the classification task: training, validation, and testing.

Did you train on only the training data, and are showing validation?
Or did you train on training+validation and are showing results on testing?

Thanks!

What I will test next

Continue random walk on ResNets - to understand how to train them properly. There is definitely somewhere problem I cannot see :(
~~Pooling: AVG-pooling caffenet Generalizing Pooling Functions in Convolutional Neural Networks: Mixed, Gated, and Tree http://arxiv.org/abs/1509.08985 -- all three from paper, thanks authors for code.~~
Regularization: ~~weight decay values , L1\L2 weights decay, dropout rates~~.
~~Freeze conv structure and play with fc6-fc8 classifier. Maxout? More layers? Convolution? Inspired by http://arxiv.org/abs/1504.06066, but in end-to-end style.~~
~~Solvers: default caffenet + ADAM\RMSProp\ ~~Nesterov~~\ ~~"poly" policy~~~~
~~BatchNorm for blocks of layers, not each.~~
-~~For fully convolutional nets, what is better - avg pool on features, then classifier, or other way round</~~

~~SqueezeNet https://github.com/DeepScale/SqueezeNet~~

(very future) how best choices stacks? I.e. BN+20% dropout + best activation + best solver + ...

~~Suggestions and training logs from community are welcomed.~~

LSUV/Ortho with PReLU

I've copied the lsuv.py script into my current caffe installation (not the one in here) and when I try to use it with PReLU units I get errors.

For LSUV:

Loading
conv_1 (128, 1, 5, 5)
conv_1 var =  0.0857792 mean =  0.19733
conv_1 var =  1.16208 mean =  0.189732
conv_1 var =  0.733266 mean =  0.193065
conv_1 var =  1.08708 mean =  0.190994
conv_1 var =  0.938934 mean =  0.191936
conv_1 var =  0.968886 mean =  0.191879
conv_1 var =  0.914341 mean =  0.192624
conv_1 var =  1.04143 mean =  0.191765
conv_1 var =  1.00926 mean =  0.191769
conv_1_rectifier (128,)
Traceback (most recent call last):
  File "/home/sharpy/caffe/tools/extra/lsuv.py", line 73, in <module>
    solver.net.forward(end=k)
  File "/home/sharpy/caffe/tools/extra/../../python/caffe/pycaffe.py", line 124, in _Net_forward
    return {out: self.blobs[out].data for out in outputs}
  File "/home/sharpy/caffe/tools/extra/../../python/caffe/pycaffe.py", line 124, in <dictcomp>
    return {out: self.blobs[out].data for out in outputs}
KeyError: 'conv_1_rectifier'

-- line 73 in my copy is
if 'LSUV' in init_mode:
if var_before_relu_if_inplace:
solver.net.forward(end=k) #this one

For Ortho:


Loading
conv_1 (128, 1, 5, 5)
conv_1_rectifier (128,)
Traceback (most recent call last):
  File "/home/sharpy/caffe/tools/extra/lsuv.py", line 66, in <module>
    weights=svd_orthonormal(v[0].data[:].shape)
  File "/home/sharpy/caffe/tools/extra/lsuv.py", line 17, in svd_orthonormal
    raise RuntimeError("Only shapes of length 2 or more are supported.")
RuntimeError: Only shapes of length 2 or more are supported.

Line 66 is
if 'Orthonormal' in init_mode:
weights=svd_orthonormal(v[0].data[:].shape) #it's this one

Meanwhile, I don't have any errors if I replace PReLU activation with ELU, or ReLU.

ducha-aiki / caffenet-benchmark Goto Github PK

caffenet-benchmark's People

Contributors

Stargazers

Watchers

Forkers

caffenet-benchmark's Issues

EltAffine Functionality

What about group in Caffenet and Alexnet

Pre-trained GoogLeNet-128?

Question about your log loss benchmarks

What I will test next

LSUV/Ortho with PReLU

Prototxt python code

questions about BatchNorm usage

Release pre-trained models?

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent