Git Product home page Git Product logo

senet's Introduction

Squeeze-and-Excitation Networks (paper)

By Jie Hu[1], Li Shen[2], Gang Sun[1].

Momenta[1] and University of Oxford[2].

Approach

Figure 1: Diagram of a Squeeze-and-Excitation building block.

 

Figure 2: Schema of SE-Inception and SE-ResNet modules. We set r=16 in all our models.

Implementation

In this repository, Squeeze-and-Excitation Networks are implemented by Caffe.

Augmentation

Method Settings
Random Mirror True
Random Crop 8% ~ 100%
Aspect Ratio 3/4 ~ 4/3
Random Rotation -10° ~ 10°
Pixel Jitter -20 ~ 20

Note:

  • To achieve efficient training and testing, we combine the consecutive operations channel-wise scale and element-wise summation into a single layer "Axpy" in the architectures with skip-connections, resulting in a considerable reduction in memory cost and computational burden.

  • In addition, we found that the implementation for global average pooling on GPU supported by cuDNN and BVLC/caffe is less efficient. In this regard, we re-implement the operation which achieves significant acceleration.

Trained Models

Table 1. Single crop validation error on ImageNet-1k (center 224x224 crop from resized image with shorter side = 256). The SENet-154 is one of our superior models used in ILSVRC 2017 Image Classification Challenge where we won the 1st place (Team name: WMW).

Model Top-1 Top-5 Size Caffe Model Caffe Model
SE-BN-Inception 23.62 7.04 46 M GoogleDrive BaiduYun
SE-ResNet-50 22.37 6.36 107 M GoogleDrive BaiduYun
SE-ResNet-101 21.75 5.72 189 M GoogleDrive BaiduYun
SE-ResNet-152 21.34 5.54 256 M GoogleDrive BaiduYun
SE-ResNeXt-50 (32 x 4d) 20.97 5.54 105 M GoogleDrive BaiduYun
SE-ResNeXt-101 (32 x 4d) 19.81 4.96 187 M GoogleDrive BaiduYun
SENet-154 18.68 4.47 440 M GoogleDrive BaiduYun

Here we obtain better performance than those reported in the paper. We re-train the SENets described in the paper on a single GPU server with 8 NVIDIA Titan X cards, using a mini-batch of 256 and a initial learning rate of 0.1 with more epoches. In contrast, the results reported in the paper were obtained by training the networks with a larger batch size (1024) and learning rate (0.6) across 4 servers.

Third-party re-implementations

  1. Caffe. SE-mudolues are integrated with a modificated ResNet-50 using a stride 2 in the 3x3 convolution instead of the first 1x1 convolution which obtains better performance: Repository.
  2. TensorFlow. SE-modules are integrated with a pre-activation ResNet-50 which follows the setup in fb.resnet.torch: Repository.
  3. TensorFlow. Simple Tensorflow implementation of SENets using Cifar10: Repository.
  4. MatConvNet. All the released SENets are imported into MatConvNet: Repository.
  5. MXNet. SE-modules are integrated with the ResNeXt and more architectures are coming soon: Repository.
  6. PyTorch. Implementation of SENets by PyTorch: Repository.
  7. Chainer. Implementation of SENets by Chainer: Repository.

Citation

If you use Squeeze-and-Excitation Networks in your research, please cite the paper:

@inproceedings{hu2018senet,
  title={Squeeze-and-Excitation Networks},
  author={Jie Hu and Li Shen and Gang Sun},
  journal={IEEE Conference on Computer Vision and Pattern Recognition},
  year={2018}
}

senet's People

Contributors

gangsunlion avatar hujie-frank avatar kambarakun avatar lishen-shirley avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

senet's Issues

What is your weight_decay parameter ?^_^

Hi Hujie:

First of all, thanks for your excellent work! I can't find the weight_decay parameter in your paper, would you please tell me? ^_^

Best regards,
hungsing

Preprocessing Details

Hi Jie,

Thanks a lot for sharing the models! Could you also share some details on preprocessing / augmentation so it's easier for others to reproduce the results?

The loss is always 6.9 when I train the default SE-BN-Inception

Hi
There is a strange thing when I train the default SE-BN-Inception,The loss is always 6.9 when I didn't use the pretrained model to train on Imagenet2012(even at iteration 0,the loss is 6.9).And if I use the pretrained model the loss is begin to fall from 9.6.
And I add
weight_filler{
type: "msra"
}
in each convolotion layer(didn't use the pretrained model) ,and the loss can be reduced from 6.9(also 6.9 at iteration 0).
May I ask why this would happen?

Obtaining SENet model

Sir, would you please to upload your SENet model to the Baidu Cloud cause we can not download them from the GoogleDrive. Thank you very much in advance.

Augmentation

Thanks for the wonderful job! @hujie-frank
Could you please share the code that implement the augmentation,such as Aspect Ratio ,Random Rotation, and Pixel Jitter.
Thank you in advance!

the differences with the third-party caffe impl.

You mentioned the conv used are slightly different(3x3 vs. 1x1). However, by comparing your model's prototxt and shicai's, I could not find the difference, could you please point it out more specifically? e.g., the name of the difference layer. Thanks very much!

HOW to set the caffe.proto?

i have add all layers,and make success,but
Check failed: registry.count(type) == 1 (0 vs. 1) Unknown layer type: Axpy (known types: AbsVal, Accuracy, ......WindowData)
*** Check failure stack trace: ***
so HOW to set the caffe.proto?

What is the "Scale" ?

I'm implementing your network in tensorflow, but I do not know exactly what the scale is.

Can you explain ?
thank you

Additional data augmentation needed for best result?

Hi,

Thanks for the good work and open source code!

In this repo you mention the augmentation methods you use, which includes aspect ratio / rotation / jittering that are not usually used in benchmarking the models (e.g. ResNet / ResNeXt). Did you use these additional augmentation methods to get the reported results in the repo (and the SENet entry in Table 3. of the paper?) I am confused because it seems that you didn't mention them in the implementation section of the paper when you describe the data augmentation methods.

Thanks!

Best,
Hongyi

mean value and use_global_stats

Hi Frank,

Good job! I have two puzzles to check with you. :)

  1. The BatchNorm layer has "use_global_stats: true", which means the prototxt is the deploy instead of the train prototxt? Cuase if my understanding is correct, the use_global_stats is set to be FALSE during the training process of BN.

  2. You comment the mean value in the prototxt. Does it mean that we need to minus the mean by myself cause it is possibly the deploy.prototxt?

Thank you!

Best,
Jordan

About Pooling_layer.cu

Hi,thanks for your shared code.
i have a question in pooling_layer.cu, a part of code is as followed:

......
......
case PoolingParameter_PoolMethod_AVE:
if (this->layer_param_.pooling_param().global_pooling()) {
// NOLINT_NEXT_LINE(whitespace/operators)
GlobalAvePoolForward<<<bottom[0]->count(0, 2), CAFFE_CUDA_NUM_THREADS>>>(
bottom[0]->count(2), bottom_data, top_data);
} else {
// NOLINT_NEXT_LINE(whitespace/operators)
AvePoolForward<<<CAFFE_GET_BLOCKS(count), CAFFE_CUDA_NUM_THREADS>>>(
count, bottom_data, bottom[0]->num(), channels_,
height_, width_, pooled_height_, pooled_width_, kernel_h_,
kernel_w_, stride_h_, stride_w_, pad_h_, pad_w_, top_data);
}
break;
......
......
Sorry,Is it necessary to choose the void of GlobalAvePoolForward ?
And What is the difference between GlobalAvePoolForward and AvePoolForward for the result?
Thank you!

wrong result in python wrapper?

Hi, I downloaded the code and compiled successfully, but when I test the SENET, the results were wrong.

I use pywrapper and I haven't tried cmd.


import matplotlib; matplotlib.use('agg')
%matplotlib inline
import sys
import os
cafferoot = '/home/liaofangzhou/caffes/caffe'
sys.path.append(os.path.join(cafferoot,'python'))

import caffe
import numpy as np
from matplotlib import pyplot as plt
import pandas

net = caffe.Net('/home/liaofangzhou/adv_bak/liaofz/toolkit/sample_defenses/caffe_senet/model_def/SENet.prototxt', 
                         '/home/liaofangzhou/adv_bak/liaofz/toolkit/sample_defenses/caffe_senet/pretrained_model/SENet.caffemodel',
                          caffe.TEST)


# load the mean ImageNet image (as distributed with Caffe) for subtraction
mu = np.array([104, 117, 123])
print 'mean-subtracted values:', zip('BGR', mu)

# create transformer for the input called 'data'
transformer = caffe.io.Transformer({'data': net.blobs['data'].data.shape})

transformer.set_transpose('data', (2,0,1))  # move image channels to outermost dimension
transformer.set_mean('data', mu)            # subtract the dataset-mean value in each channel
transformer.set_raw_scale('data', 255)      # rescale from [0, 1] to [0, 255]
transformer.set_channel_swap('data', (2,1,0))  # swap channels from RGB to BGR

image = caffe.io.load_image(os.path.join(cafferoot,'examples/images/cat.jpg'))
im_input = transformer.preprocess('data', image)
net.blobs['data'].data[:] = im_input
output = net.forward()
output_prob = output['prob'][0]  # the output probability vector for the first image in the batch
print(np.argmax(output_prob))

and the result turned out to be 7(cock), but the truth should be 281(cat). And the output probability seems to be quite confident. Why is that?

Thank you in advance!

CUDNN_STATUS_INTERNAL_ERROR

When I train the SE-ResNeXt-101 (32 x 4d), everything is ok. However, I train the SENet, the logs say that "status == CUDNN_STATUS_SUCCESS (4 vs. 0) CUDNN_STATUS_INTERNAL_ERROR". I disable CUDNN in the Makefile and recompile the caffe ,the error disappears. Without CUDNN, training the SENet is slower. Could you please tell me how to debug the error while using the CUDNN to accelerate the training?

How do you train your scene model?

I found out that you use pretrained model of scene in https://github.com/lishen-shirley/Places2-CNNs.git.I also use this pre-trained model with the se block. But the accuracy of my model can not exceed that of the origin models without se block. Can you tell me more details about training with places 365 data? Do you add auxiliary loss as mentioned in 《Relay Backpropagation for Effective Learning of Deep Convolutional Neural Networks》? @hujie-frank

Difference between SE-ResNeXt-101 and SENet

Hi Hujie,

I wonder what is the architectural difference between SE-ResNeXt-101 and SENet, I have OOM issue with SENet and it almost require 7G to initialize the network, but SE-ResNeXt-101 only need <3G.
Why there is hug memory usage difference?

Please help.

Thanks,
Ruxiao

hello,i want ask a question

thank you for realeasing your code.
now,how can I get the SE-ResNet-152.prototxt to the train_SE-ResNet-152.prototxt(train version)?
i'm looking forward your answer,thank you.

Have you try SE-Densenet?

Thanks for great network. I found that Se module is general and it can apply to any kind of network. You have tried with the state of the art network but i did not find it apply to densenet. Could you try with se-densenet and what is your performance with it?

something wrong with the test of SE-ResNet-50?

Thanks for sharing such excellent work.

But when I test the SE-ResNet-50 caffemodel, I have encountered some problems.

I add the data layer at the bottom and the accuracy layer at the top of the prototxt:
layer {
name: "data"
type: "Data"
top: "data"
top: "label"
include {
phase: TEST
}
transform_param {
mirror: false
crop_size: 224
mean_value: 103.939
mean_value: 116.779
mean_value: 123.68
}
data_param {
source: "examples/imagenet/ilsvrc12_val_lmdb"
batch_size: 10
backend: LMDB
}
}

layer {
name: "accuracy/top1"
type: "Accuracy"
bottom: "classifier"
bottom: "label"
top: "accuracy/top1"
include {
phase: TEST
}
}

layer {
name: "accuracy/top5"
type: "Accuracy"
bottom: "classifier"
bottom: "label"
top: "accuracy/top5"
include {
phase: TEST
}
accuracy_param {
top_k: 5
}
}

and run the test program on ILSVRC12 imagenet dataset:
./build/tools/caffe test --model=models/SE-resnet-50/SE-ResNet-50.prototxt --weights=models/SE-resnet-50/SE-ResNet-50.caffemodel --iterations 5000 -gpu=0
(test batch_size=10)

but i get the following result:
I0830 16:03:59.629042 37849 caffe.cpp:313] Batch 0, accuracy/top1 = 0
I0830 16:03:59.629132 37849 caffe.cpp:313] Batch 0, accuracy/top5 = 0
I0830 16:03:59.629142 37849 caffe.cpp:313] Batch 0, loss = 8.86858
almost all of the accuracy are 0.

Very grateful and hope that getting some advice on this issue from you. Thanks very much.

cannot build caffe with axpy_layer (NVIDIA/caffe)

Hi,

first off all, congratulations to the great result on ImageNet!

I want to try your architecture in my Master Thesis,
where I try to distinguish action forces from regular pedestrians based on their appearance in my own dataset.

Here is what I did:

I added your provided files to my NVIDIA caffe flavour from
https://github.com/NVIDIA/caffe
in
src/caffe/layers
and
include/caffe/layers
respectively.

Then I ran "make clean".
But when I want to build caffe with "make all -j16" I get the following build error?

In file included from src/caffe/layers/axpy_layer.cpp:8:0:
./include/caffe/layers/axpy_layer.hpp:25:36: error: wrong number of template arguments (1, should be 2)

What do I also need to change in order for a successful build?

SENet out of GPU memory when trying to fine-tuning.

Hi,

I modified the deploy prototxt to fine-tune SENet. However, even with a barchsize of 1, i still got out of memory error. Please give some help.
Here are the steps that i used to generate train_val.prototxt:

  1. add ImageData layers: mean_value 104, 117, 123, cropsize 224
  2. add lr_mult and decay_mult for all conv layers and scale layers
  3. remove use_global_stats for all BatchNorm layers
  4. add solver.prototxt

Then i train it with "caffe train --solver=solver.prototxt --gpu=all --weights=SENet.caffemodel". But no matter how i modify the prototxt, i still got the memory error.

Caffe: master branch
CuDNN: v7.0

Training reproduce issue

I tried to reproduce the training from scratch, but the accuracy is 5 point lower on Inception network. Can you share the solver file, e.g. number of iterations, learning rate policy, or any more details of the training?

SE-VGG

hi,how to squeeze and excitation VGG networks?Any suggestions?

About group convolution in SENet

Thanks for sharing your great work, amazing. By the way, do you have some optimization on the group convolution in Caffe? The SENet may suffer memory or speed problems.

any pretrained models for mxnet?

any pretrained models for mxnet?
i try to convert the caffe model to mxnet model
but there is a new layer,do you have any pretrain model for mxent

Does SE-Net work in other network architectures or in detection tasks?

Does SE-Net work in other network architectures like VGG, Alexnet or some other networks besides GoogLeNet Inception and ResNet?

Besides, does SE module work on detection frameworks or models like SSD, Faster RCNN, etc? I have add the Squeeze and Excitation module in SSD ( aiming to strengthen the 6 features), but it doesn't seem to work.

Looking forward to your suggestions, thanks so much.

train

Solving...
F0616 13:59:44.988385 16022 math_functions.cu:79] Check failed: error == cudaSuccess (74 vs. 0) misaligned address

how to solve the problem?

top-1 accuracy is very low

Hi
I test the SE-BN-Inception and use the pre-trained caffemodel, but the accuracy is 0.00018 on Imagenet2012.
May I ask why this could happen, is it a problem with my dataset?
Thanks.

trian error

Hi,when i run the train the codes,I have put the corresponding include and cpp files in the caffe file, but still encounter the following error and i don't konw how to solver this problem, can you help me,thanks
] Check failed: registry.count(type) == 1 (0 vs. 1) Unknown layer type: Axpy (known types: AbsVal, Accuracy, ArgMax, BNLL, BatchNorm, BatchReindex, Bias, Concat, ContrastiveLoss, Convolution, Data, Deconvolution, Dropout, DummyData, ELU, Eltwise, Embed, EuclideanLoss, Exp, Filter, Flatten, HDF5Data, HDF5Output, HingeLoss, Im2col, ImageData, InfogainLoss, InnerProduct, LRN, Log, MVN, MemoryData, MultinomialLogisticLoss, PReLU, Pooling, Power, Python, ROIPooling, ReLU, Reduction, Reshape, SPP, Scale, Sigmoid, SigmoidCrossEntropyLoss, Silence, Slice, SmoothL1Loss, Softmax, SoftmaxWithLoss, Split, TanH, Threshold, Tile, WindowData)
*** Check failure stack trace: ***

Some doubts in baselines accuracy

According to your paper: single-crop error rates (%) on the ImageNet validation set .

default

However, I found accuracy in Resnet that is different from your accuracy.
default

And I am confused about what's meaning of

original

re-implementation

thanks.

GTX 1080ti x 1 , memory shortage

Hello. Thank you for sharing fantastic SENet model.

I tried to train with my 1080ti. 11Gb ram.

I succeeded to train SE resnet 101 (train batch 5).

But I failed to train SENet , or SE resnet 152 even though I set the train_batch to 1.

I used bvlc caffe and patched with this repo.

11Gb GPU memory is not enough to train SENet??

Thank you.

Finetuning from ../model/SE-ResNeXt-101.caffemodel

I0929 12:02:35.547410 5970 caffe.cpp:155] Finetuning from ../model/SE-ResNeXt-101.caffemodel
I0929 12:02:35.856974 5970 net.cpp:761] Ignoring source layer label_data_1_split
I0929 12:02:35.857090 5970 net.cpp:761] Ignoring source layer conv2_1_1x1_increase/bn_conv2_1_1x1_increase/bn_0_split
I0929 12:02:35.857216 5970 net.cpp:761] Ignoring source layer conv2_2_1x1_increase/bn_conv2_2_1x1_increase/bn_0_split
I0929 12:02:35.857317 5970 net.cpp:761] Ignoring source layer conv2_3_1x1_increase/bn_conv2_3_1x1_increase/bn_0_split
I0929 12:02:35.857544 5970 net.cpp:761] Ignoring source layer conv3_1_1x1_increase/bn_conv3_1_1x1_increase/bn_0_split
I0929 12:02:35.857978 5970 net.cpp:761] Ignoring source layer conv3_2_1x1_increase/bn_conv3_2_1x1_increase/bn_0_split
I0929 12:02:35.858289 5970 net.cpp:761] Ignoring source layer conv3_3_1x1_increase/bn_conv3_3_1x1_increase/bn_0_split
I0929 12:02:35.858595 5970 net.cpp:761] Ignoring source layer conv3_4_1x1_increase/bn_conv3_4_1x1_increase/bn_0_split

what is the "bn_0_split"?

Can not find fc-layer in SE blocks

Hi Hujie,

First of all thank you open your code,In your introduction,you use two fc-layer in SE block,but in prototxt,I find that you use two conv-layer with 1*1 kernel size. Is this a improvement? If yes ,is this improved performance to compared with the fc-layer.

Please help.

Thanks,
Totoro

About axpy_layer.cu

Hi,thanks for your shared code.

48 for (int i = blockDim.x / 2; i > 0; i >>= 1) {
49 if (tid < i) {
50 buffer[threadIdx.x] += buffer[threadIdx.x + i];
51 }
52 __syncthreads();
53 }

Sorry, can you explain the logic behind lines from 48 to 53 in axpy_layer.cu? In my opinion, these piece of code should be commented out.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.