hujie-frank / senet Goto Github PK

View Code? Open in Web Editor NEW

3.3K 82.0 831.0 1.35 MB

Squeeze-and-Excitation Networks

License: Apache License 2.0

C++ 20.23% Cuda 79.77%

senet caffe gpu

senet's Introduction

Squeeze-and-Excitation Networks _(paper)

By Jie Hu^[1], Li Shen^[2], Gang Sun^[1].

Momenta^[1] and University of Oxford^[2].

Approach

Figure 1: Diagram of a Squeeze-and-Excitation building block.

Figure 2: Schema of SE-Inception and SE-ResNet modules. We set r=16 in all our models.

Implementation

In this repository, Squeeze-and-Excitation Networks are implemented by Caffe.

Augmentation

Method	Settings
Random Mirror	True
Random Crop	8% ~ 100%
Aspect Ratio	3/4 ~ 4/3
Random Rotation	-10° ~ 10°
Pixel Jitter	-20 ~ 20

Note:

To achieve efficient training and testing, we combine the consecutive operations channel-wise scale and element-wise summation into a single layer "Axpy" in the architectures with skip-connections, resulting in a considerable reduction in memory cost and computational burden.
In addition, we found that the implementation for global average pooling on GPU supported by cuDNN and BVLC/caffe is less efficient. In this regard, we re-implement the operation which achieves significant acceleration.

Trained Models

Table 1. Single crop validation error on ImageNet-1k (center 224x224 crop from resized image with shorter side = 256). The SENet-154 is one of our superior models used in ILSVRC 2017 Image Classification Challenge where we won the 1st place (Team name: WMW).

Model	Top-1	Top-5	Size	Caffe Model	Caffe Model
SE-BN-Inception	23.62	7.04	46 M	GoogleDrive	BaiduYun
SE-ResNet-50	22.37	6.36	107 M	GoogleDrive	BaiduYun
SE-ResNet-101	21.75	5.72	189 M	GoogleDrive	BaiduYun
SE-ResNet-152	21.34	5.54	256 M	GoogleDrive	BaiduYun
SE-ResNeXt-50 (32 x 4d)	20.97	5.54	105 M	GoogleDrive	BaiduYun
SE-ResNeXt-101 (32 x 4d)	19.81	4.96	187 M	GoogleDrive	BaiduYun
SENet-154	18.68	4.47	440 M	GoogleDrive	BaiduYun

Here we obtain better performance than those reported in the paper. We re-train the SENets described in the paper on a single GPU server with 8 NVIDIA Titan X cards, using a mini-batch of 256 and a initial learning rate of 0.1 with more epoches. In contrast, the results reported in the paper were obtained by training the networks with a larger batch size (1024) and learning rate (0.6) across 4 servers.

Third-party re-implementations

Caffe. SE-mudolues are integrated with a modificated ResNet-50 using a stride 2 in the 3x3 convolution instead of the first 1x1 convolution which obtains better performance: Repository.
TensorFlow. SE-modules are integrated with a pre-activation ResNet-50 which follows the setup in fb.resnet.torch: Repository.
TensorFlow. Simple Tensorflow implementation of SENets using Cifar10: Repository.
MatConvNet. All the released SENets are imported into MatConvNet: Repository.
MXNet. SE-modules are integrated with the ResNeXt and more architectures are coming soon: Repository.
PyTorch. Implementation of SENets by PyTorch: Repository.
Chainer. Implementation of SENets by Chainer: Repository.

Citation

If you use Squeeze-and-Excitation Networks in your research, please cite the paper:

@inproceedings{hu2018senet,
  title={Squeeze-and-Excitation Networks},
  author={Jie Hu and Li Shen and Gang Sun},
  journal={IEEE Conference on Computer Vision and Pattern Recognition},
  year={2018}
}

senet's People

Contributors

Stargazers

Watchers

Forkers

weitaoatvison liuguoyou luan-g sophiezhou insmod-he zgsxwsdxg chelovekhe congmonkey johndpope andyliu93 starstylesky 3dmm-icme2023 murugeshmarvel dreadlord1984 lihua213 wanjinchang chenbinghui1 phecy yuckfu lyy5 pierrehao xuguozhi marvis wang-mengjiao stonegiggity yiliangnie baiyancheng20 silence4395 linzichuan jackieyung liubinyijia xc35 runauto johnson-yue xshhhm zbxzc35 wangzhe0623 jieli1994 vseledkin codeaudit issac8huxley thuanvh xiongduan santara benjamesbabala 123chengbo wujiahongpku bicelove davidnewgate chunyanlian pzz2011 izhaolei chaoisanai guoxiangqu aliushn fqss0436 huaxinxiao hongzhenwang haofusheng venkai ml-lab autohe dengql collector-m kambarakun fleapo tony32769 strategist922 fatherofham zhouhui1992 liuc425 wangnuowa bauerzhou xujinchang yinggo dimplesl wzhang1 rongchangzhao hualitlc lyk125 shuangseu zhyj3038 lovelyboy1 olivierjb haiyang21 liyong3forever legolas123 walkoncross zdwong longchuan1985 408550969 zhs1 lvchigo liujie3948 armstrongyang dengshuo jade999 qlaboratory sunpeng1996 senlinuc

senet's Issues

where have you use your axpy_layer

Hi, I didn't find where you have put your axpy_layer in prototxt, is it replaced by "Scale" layer？

有没有train_val.prototxt吗？

你好，非常喜欢你的这个创新，能共享一下训练的train_val.prototxt吗？
谢谢！

What is your weight_decay parameter ?^_^

Hi Hujie:

First of all, thanks for your excellent work! I can't find the weight_decay parameter in your paper, would you please tell me? ^_^

Best regards,
hungsing

Preprocessing Details

Hi Jie,

Thanks a lot for sharing the models! Could you also share some details on preprocessing / augmentation so it's easier for others to reproduce the results?

The loss is always 6.9 when I train the default SE-BN-Inception

Hi
There is a strange thing when I train the default SE-BN-Inception,The loss is always 6.9 when I didn't use the pretrained model to train on Imagenet2012(even at iteration 0,the loss is 6.9).And if I use the pretrained model the loss is begin to fall from 9.6.
And I add
weight_filler{
type: "msra"
}
in each convolotion layer(didn't use the pretrained model) ,and the loss can be reduced from 6.9(also 6.9 at iteration 0).
May I ask why this would happen?

Obtaining SENet model

Sir, would you please to upload your SENet model to the Baidu Cloud cause we can not download them from the GoogleDrive. Thank you very much in advance.

Augmentation

Thanks for the wonderful job! @hujie-frank
Could you please share the code that implement the augmentation,such as Aspect Ratio ,Random Rotation, and Pixel Jitter.
Thank you in advance!

the differences with the third-party caffe impl.

You mentioned the conv used are slightly different(3x3 vs. 1x1). However, by comparing your model's prototxt and shicai's, I could not find the difference, could you please point it out more specifically? e.g., the name of the difference layer. Thanks very much!

HOW to set the caffe.proto?

i have add all layers,and make success,but
Check failed: registry.count(type) == 1 (0 vs. 1) Unknown layer type: Axpy (known types: AbsVal, Accuracy, ......WindowData)
*** Check failure stack trace: ***
so HOW to set the caffe.proto?

What is the "Scale" ?

I'm implementing your network in tensorflow, but I do not know exactly what the scale is.

Can you explain ?
thank you

how to add axpy layer to caffe

Additional data augmentation needed for best result?

Hi,

Thanks for the good work and open source code!

In this repo you mention the augmentation methods you use, which includes aspect ratio / rotation / jittering that are not usually used in benchmarking the models (e.g. ResNet / ResNeXt). Did you use these additional augmentation methods to get the reported results in the repo (and the SENet entry in Table 3. of the paper?) I am confused because it seems that you didn't mention them in the implementation section of the paper when you describe the data augmentation methods.

Thanks!

Best,
Hongyi

mean value and use_global_stats

Hi Frank,

Good job! I have two puzzles to check with you. :)

The BatchNorm layer has "use_global_stats: true", which means the prototxt is the deploy instead of the train prototxt? Cuase if my understanding is correct, the use_global_stats is set to be FALSE during the training process of BN.
You comment the mean value in the prototxt. Does it mean that we need to minus the mean by myself cause it is possibly the deploy.prototxt?

Thank you!

Best,
Jordan

Is the architecture for the best performing model SENet* described anywhere?

Is this a new custom architecture you came up with, or is it a variant of ResNext or Inception?

Thanks.

About Pooling_layer.cu

Hi，thanks for your shared code.
i have a question in pooling_layer.cu, a part of code is as followed:

......
......
case PoolingParameter_PoolMethod_AVE:
if (this->layer_param_.pooling_param().global_pooling()) {
// NOLINT_NEXT_LINE(whitespace/operators)
GlobalAvePoolForward<<<bottom[0]->count(0, 2), CAFFE_CUDA_NUM_THREADS>>>(
bottom[0]->count(2), bottom_data, top_data);
} else {
// NOLINT_NEXT_LINE(whitespace/operators)
AvePoolForward<<<CAFFE_GET_BLOCKS(count), CAFFE_CUDA_NUM_THREADS>>>(
count, bottom_data, bottom[0]->num(), channels_,
height_, width_, pooled_height_, pooled_width_, kernel_h_,
kernel_w_, stride_h_, stride_w_, pad_h_, pad_w_, top_data);
}
break;
......
......
Sorry，Is it necessary to choose the void of GlobalAvePoolForward ？
And What is the difference between GlobalAvePoolForward and AvePoolForward for the result?
Thank you！

About Label Smoothing Regularization

I‘m wondering how Label Smoothing Regularization works，could you please release the code about that part？3Q

wrong result in python wrapper?

Hi, I downloaded the code and compiled successfully, but when I test the SENET, the results were wrong.

I use pywrapper and I haven't tried cmd.


import matplotlib; matplotlib.use('agg')
%matplotlib inline
import sys
import os
cafferoot = '/home/liaofangzhou/caffes/caffe'
sys.path.append(os.path.join(cafferoot,'python'))

import caffe
import numpy as np
from matplotlib import pyplot as plt
import pandas

net = caffe.Net('/home/liaofangzhou/adv_bak/liaofz/toolkit/sample_defenses/caffe_senet/model_def/SENet.prototxt', 
                         '/home/liaofangzhou/adv_bak/liaofz/toolkit/sample_defenses/caffe_senet/pretrained_model/SENet.caffemodel',
                          caffe.TEST)


# load the mean ImageNet image (as distributed with Caffe) for subtraction
mu = np.array([104, 117, 123])
print 'mean-subtracted values:', zip('BGR', mu)

# create transformer for the input called 'data'
transformer = caffe.io.Transformer({'data': net.blobs['data'].data.shape})

transformer.set_transpose('data', (2,0,1))  # move image channels to outermost dimension
transformer.set_mean('data', mu)            # subtract the dataset-mean value in each channel
transformer.set_raw_scale('data', 255)      # rescale from [0, 1] to [0, 255]
transformer.set_channel_swap('data', (2,1,0))  # swap channels from RGB to BGR

image = caffe.io.load_image(os.path.join(cafferoot,'examples/images/cat.jpg'))
im_input = transformer.preprocess('data', image)
net.blobs['data'].data[:] = im_input
output = net.forward()
output_prob = output['prob'][0]  # the output probability vector for the first image in the batch
print(np.argmax(output_prob))

and the result turned out to be 7(cock), but the truth should be 281(cat). And the output probability seems to be quite confident. Why is that?

Thank you in advance!

The effect of lr_mult and decay_mult on accuracy

Excuse me,what would be the difference of accuracy if I didn't add param{lr_mult, and decay_mult} when training? What is the default value(lr_mult, and decay_mult)for Caffe?

The detail preprocessing of image classification

CUDNN_STATUS_INTERNAL_ERROR

When I train the SE-ResNeXt-101 (32 x 4d), everything is ok. However, I train the SENet, the logs say that "status == CUDNN_STATUS_SUCCESS (4 vs. 0) CUDNN_STATUS_INTERNAL_ERROR". I disable CUDNN in the Makefile and recompile the caffe ,the error disappears. Without CUDNN, training the SENet is slower. Could you please tell me how to debug the error while using the CUDNN to accelerate the training?

As shown in Figure 8, how to make the distribution of average activations?

Could you share your code that show the distribution of average activations ?

Thanks in advance.

Enough memory for training but not enough memory for testing?

Greeting!
I am training SE-Resnet-50, and I have enough GPU memory for training.
But when I am trying to do testing with the same config, I got out of memory error.
I am confused, do SE need more memory in testing phase than training?

convergence

Thank you for your share!

How do you train your scene model?

I found out that you use pretrained model of scene in https://github.com/lishen-shirley/Places2-CNNs.git.I also use this pre-trained model with the se block. But the accuracy of my model can not exceed that of the origin models without se block. Can you tell me more details about training with places 365 data? Do you add auxiliary loss as mentioned in 《Relay Backpropagation for Effective Learning of Deep Convolutional Neural Networks》？ @hujie-frank

Difference between SE-ResNeXt-101 and SENet

Hi Hujie,

I wonder what is the architectural difference between SE-ResNeXt-101 and SENet, I have OOM issue with SENet and it almost require 7G to initialize the network, but SE-ResNeXt-101 only need <3G.
Why there is hug memory usage difference?

Please help.

Thanks,
Ruxiao

hello,i want ask a question

thank you for realeasing your code.
now,how can I get the SE-ResNet-152.prototxt to the train_SE-ResNet-152.prototxt(train version)?
i'm looking forward your answer,thank you.

Why use 1*1 convolution layer instead of full connect layer in SENet？For the less parameter？

Have you try SE-Densenet?

Thanks for great network. I found that Se module is general and it can apply to any kind of network. You have tried with the state of the art network but i did not find it apply to densenet. Could you try with se-densenet and what is your performance with it?

something wrong with the test of SE-ResNet-50?

Thanks for sharing such excellent work.

But when I test the SE-ResNet-50 caffemodel, I have encountered some problems.

I add the data layer at the bottom and the accuracy layer at the top of the prototxt:
layer {
name: "data"
type: "Data"
top: "data"
top: "label"
include {
phase: TEST
}
transform_param {
mirror: false
crop_size: 224
mean_value: 103.939
mean_value: 116.779
mean_value: 123.68
}
data_param {
source: "examples/imagenet/ilsvrc12_val_lmdb"
batch_size: 10
backend: LMDB
}
}

layer {
name: "accuracy/top1"
type: "Accuracy"
bottom: "classifier"
bottom: "label"
top: "accuracy/top1"
include {
phase: TEST
}
}

layer {
name: "accuracy/top5"
type: "Accuracy"
bottom: "classifier"
bottom: "label"
top: "accuracy/top5"
include {
phase: TEST
}
accuracy_param {
top_k: 5
}
}

and run the test program on ILSVRC12 imagenet dataset:
./build/tools/caffe test --model=models/SE-resnet-50/SE-ResNet-50.prototxt --weights=models/SE-resnet-50/SE-ResNet-50.caffemodel --iterations 5000 -gpu=0
(test batch_size=10)

but i get the following result:
I0830 16:03:59.629042 37849 caffe.cpp:313] Batch 0, accuracy/top1 = 0
I0830 16:03:59.629132 37849 caffe.cpp:313] Batch 0, accuracy/top5 = 0
I0830 16:03:59.629142 37849 caffe.cpp:313] Batch 0, loss = 8.86858
almost all of the accuracy are 0.

Very grateful and hope that getting some advice on this issue from you. Thanks very much.

cannot build caffe with axpy_layer (NVIDIA/caffe)

Hi,

first off all, congratulations to the great result on ImageNet!

I want to try your architecture in my Master Thesis,
where I try to distinguish action forces from regular pedestrians based on their appearance in my own dataset.

Here is what I did:

I added your provided files to my NVIDIA caffe flavour from
https://github.com/NVIDIA/caffe
in
src/caffe/layers
and
include/caffe/layers
respectively.

Then I ran "make clean".
But when I want to build caffe with "make all -j16" I get the following build error?

In file included from src/caffe/layers/axpy_layer.cpp:8:0:
./include/caffe/layers/axpy_layer.hpp:25:36: error: wrong number of template arguments (1, should be 2)

What do I also need to change in order for a successful build?

SENet out of GPU memory when trying to fine-tuning.

Hi,

I modified the deploy prototxt to fine-tune SENet. However, even with a barchsize of 1, i still got out of memory error. Please give some help.
Here are the steps that i used to generate train_val.prototxt:

add ImageData layers: mean_value 104, 117, 123, cropsize 224
add lr_mult and decay_mult for all conv layers and scale layers
remove use_global_stats for all BatchNorm layers
add solver.prototxt

Then i train it with "caffe train --solver=solver.prototxt --gpu=all --weights=SENet.caffemodel". But no matter how i modify the prototxt, i still got the memory error.

Caffe: master branch
CuDNN: v7.0

Training reproduce issue

I tried to reproduce the training from scratch, but the accuracy is 5 point lower on Inception network. Can you share the solver file, e.g. number of iterations, learning rate policy, or any more details of the training?

SE-VGG

hi，how to squeeze and excitation VGG networks？Any suggestions?

About group convolution in SENet

Thanks for sharing your great work, amazing. By the way, do you have some optimization on the group convolution in Caffe? The SENet may suffer memory or speed problems.

MXNet version?

any pretrained models for mxnet?

any pretrained models for mxnet?
i try to convert the caffe model to mxnet model
but there is a new layer,do you have any pretrain model for mxent

Does SE-Net work in other network architectures or in detection tasks?

Does SE-Net work in other network architectures like VGG, Alexnet or some other networks besides GoogLeNet Inception and ResNet?

Besides, does SE module work on detection frameworks or models like SSD, Faster RCNN, etc? I have add the Squeeze and Excitation module in SSD ( aiming to strengthen the 6 features), but it doesn't seem to work.

Looking forward to your suggestions, thanks so much.

train

Solving...
F0616 13:59:44.988385 16022 math_functions.cu:79] Check failed: error == cudaSuccess (74 vs. 0) misaligned address

how to solve the problem?

top-1 accuracy is very low

Hi
I test the SE-BN-Inception and use the pre-trained caffemodel, but the accuracy is 0.00018 on Imagenet2012.
May I ask why this could happen, is it a problem with my dataset?
Thanks.

trian error

Hi,when i run the train the codes,I have put the corresponding include and cpp files in the caffe file, but still encounter the following error and i don't konw how to solver this problem, can you help me,thanks
] Check failed: registry.count(type) == 1 (0 vs. 1) Unknown layer type: Axpy (known types: AbsVal, Accuracy, ArgMax, BNLL, BatchNorm, BatchReindex, Bias, Concat, ContrastiveLoss, Convolution, Data, Deconvolution, Dropout, DummyData, ELU, Eltwise, Embed, EuclideanLoss, Exp, Filter, Flatten, HDF5Data, HDF5Output, HingeLoss, Im2col, ImageData, InfogainLoss, InnerProduct, LRN, Log, MVN, MemoryData, MultinomialLogisticLoss, PReLU, Pooling, Power, Python, ROIPooling, ReLU, Reduction, Reshape, SPP, Scale, Sigmoid, SigmoidCrossEntropyLoss, Silence, Slice, SmoothL1Loss, Softmax, SoftmaxWithLoss, Split, TanH, Threshold, Tile, WindowData)
*** Check failure stack trace: ***

About global average pooling layer

When use SENet to FCN, what does the global average pooling layer behave?

synset words list for your pretrain models

I am trying to perform evaluation of your pretrain models on imagenet. I tested a few images and it seems that you are not following the standard caffe "data/ilsvrc12/synset_words.txt " from "http://dl.caffe.berkeleyvision.org/caffe_ilsvrc12.tar.gz".

Where can i find your synset list?

Some doubts in baselines accuracy

According to your paper: single-crop error rates (%) on the ImageNet validation set .

However, I found accuracy in Resnet that is different from your accuracy.

And I am confused about what's meaning of

original

re-implementation

thanks.

caffe_proto file

can you release the modified caffe.proto?

GTX 1080ti x 1 , memory shortage

Hello. Thank you for sharing fantastic SENet model.

I tried to train with my 1080ti. 11Gb ram.

I succeeded to train SE resnet 101 (train batch 5).

But I failed to train SENet , or SE resnet 152 even though I set the train_batch to 1.

I used bvlc caffe and patched with this repo.

11Gb GPU memory is not enough to train SENet??

Thank you.

Finetuning from ../model/SE-ResNeXt-101.caffemodel

I0929 12:02:35.547410 5970 caffe.cpp:155] Finetuning from ../model/SE-ResNeXt-101.caffemodel
I0929 12:02:35.856974 5970 net.cpp:761] Ignoring source layer label_data_1_split
I0929 12:02:35.857090 5970 net.cpp:761] Ignoring source layer conv2_1_1x1_increase/bn_conv2_1_1x1_increase/bn_0_split
I0929 12:02:35.857216 5970 net.cpp:761] Ignoring source layer conv2_2_1x1_increase/bn_conv2_2_1x1_increase/bn_0_split
I0929 12:02:35.857317 5970 net.cpp:761] Ignoring source layer conv2_3_1x1_increase/bn_conv2_3_1x1_increase/bn_0_split
I0929 12:02:35.857544 5970 net.cpp:761] Ignoring source layer conv3_1_1x1_increase/bn_conv3_1_1x1_increase/bn_0_split
I0929 12:02:35.857978 5970 net.cpp:761] Ignoring source layer conv3_2_1x1_increase/bn_conv3_2_1x1_increase/bn_0_split
I0929 12:02:35.858289 5970 net.cpp:761] Ignoring source layer conv3_3_1x1_increase/bn_conv3_3_1x1_increase/bn_0_split
I0929 12:02:35.858595 5970 net.cpp:761] Ignoring source layer conv3_4_1x1_increase/bn_conv3_4_1x1_increase/bn_0_split

what is the "bn_0_split"?

cannot find arxiv paper from the given link

Can not find fc-layer in SE blocks

Hi Hujie,

First of all thank you open your code,In your introduction,you use two fc-layer in SE block,but in prototxt,I find that you use two conv-layer with 1*1 kernel size. Is this a improvement? If yes ,is this improved performance to compared with the fc-layer.

Please help.

Thanks,
Totoro

About axpy_layer.cu

Hi，thanks for your shared code.

48 for (int i = blockDim.x / 2; i > 0; i >>= 1) {
49 if (tid < i) {
50 buffer[threadIdx.x] += buffer[threadIdx.x + i];
51 }
52 __syncthreads();
53 }

Sorry, can you explain the logic behind lines from 48 to 53 in axpy_layer.cu? In my opinion, these piece of code should be commented out.

About SE-ResNeXt-101 (32 x 4d)

Hi @hujie-frank ,
Have you runned SE-ResNeXt-101 (32 x 4d) on caffe before? No matter how many GPUs I use, the memory problem is serious. SE-ResNeXt-50 is Ok.

hujie-frank / senet Goto Github PK

senet's Introduction

Squeeze-and-Excitation Networks (paper)

Approach

Implementation

Augmentation

Note:

Trained Models

Third-party re-implementations

Citation

senet's People

Contributors

Stargazers

Watchers

Forkers

senet's Issues

Recommend Projects

Recommend Topics

Recommend Org

Squeeze-and-Excitation Networks _(paper)