megvii-model / shufflenet-series Goto Github PK

View Code? Open in Web Editor NEW

1.5K 1.5K 273.0 810 KB

License: MIT License

Python 100.00%

shufflenet-series's People

Contributors

Stargazers

Watchers

Forkers

zyiyy v-qjqs yanhuiguo vangohao wh-forker liwangcs xiaoketongxue aliushn chentyjpm chenzuge1 qaz734913414 wenqingchu nihui flygyyy wynmew chaozhong2010 koala-good hdony t-mac-curry tzhang2014 zhangxujinsh rkshuai wpf535236337 starstylesky zccfighting kevinchen1223 piseyyou dlreseach tqdavid shengzhang90 tanning2315 orashi forrest-ht peterzhousz dypromise mathpopo howard201314 peterzs chaoso templeblock wangxianliang cscn89 li--paul tangshao0804 raymon-tian lincaiming chaos1992 mahlermozart amseej lyk125 elepherai wangdeyu wzhang1 collector-m tchigher xiaowenhe hajungong007 allanwlz maohlzj zhangxuecheng keymanchen1215 sprinterzzj rtums taxuezcy kech96 crazysnowboy bruceyang2012 batermj laoyingbu wxy0218 jlhou flamato santhu45482 srikanthvadapalli lzb863 chaoyueziji madhavadama chandansinha lj0620 fendaq shaobo-xu zhenli888888 zilipeng seizemx ashsur yuexinpu lvchengfei886 thomascx ami1023 ylh071032 michael-wzhu xsd1221 zhjpqq niexiaokun giorking roycezjq yjingyu dingdingcai zzdxfei zhaoliangjin

shufflenet-series's Issues

One shot accuracy

oneshot.txt
The error rate of my training is 26.3, can't reach 25.1.

Can you release your log files for training?

To reduplicate the training details, I' d like to request the log file for your training.

Best

在116-119行
‘’’
if not self.has_proj:
x = self.relu(x_proj + x)
x = torch.cat((proj, x), dim=1)
‘’’
如果使用 self.has_proj==True，此时stride=2,所以就不使用residual形式，直接concat proj和seblock的输出x,相反则使用residual形式，我这边在自己的数据集使用（数据集无问题，其他网络收敛正常）这种方法后一直未收敛；但当在stride=2时，改为不经过se_block直接 concat proj和x则能正常收敛（即stride=2时不使用SE_Block也不使用reisual形式，stride=1不变（使用SENet_Block和residual形式））。
但在你们的shufflenetv2+中使用SE_Block后则一直保持非residual形式，我这边直接初始化训练也正常；确认了下官方SENet代码中当strde=2或者前后channel不相等时，会使用相应的卷积进行proj使特征图大小和channe保持一致后在形成residual形式，使其整个网络在使用SE_Block后都保持residual形式；
所以我直接用你们的公开的代码初始化训练一直不收敛，是不是因为SE_Block前后使用形式不一致（resiual和非residual形式混合使用）导致网络不稳定，从而影响收敛性。不知你们在训练该网络时有没有遇到这种情况，希望能获得意见。

pretrained model fatal error

Hi, I downloaded the pretrained models provided in Onedrive. However when extracting the tar files, I received following error:
tar: This does not look like a tar archive

Would you be more specific about how to use those pretrained models?

Thank you!

Some pre-trained models may be corrupted

Hi, I downloaded the ShuffleNetV2.Large pre-trained model (snetv2_residual_se.pkl), but got the following message when loading the model in PyTorch:

RuntimeError: unexpected EOF, expected 1750141 more bytes. The file might be corrupted.

I have tried three times downloading the model, all get the same issue. So I think the problem may caused by the original source. The md5sum of the model which I download is:
be052a3ef97de2c64bc4236380972d38

Can you provide the md5sum of the original correct model? Many thanks.

ShuffleNetV2+ cannot be convergent when setting shuffle=False of train_loader

Paramaters:
model-size=Large, auto_continue=False, batch-size=128, num_workers=8，and other default params.
Evironment:
Ubuntu 16.04, PyTorch1.2, Single RTX 2080ti GPU

When I train the ShuffleNetV2+ on ImageNet-1K dataset, it cannot be convergent when setting shuffle=False of train_loader. Can you give some kind advice?

[30 02:15:34] TRAIN Iter 20: lr = 0.499978,	loss = 2.621804,	Top-1 err = 0.162891,	Top-5 err = 0.145703,	data_time = 0.006651,	train_time = 2.031535

[30 02:16:09] TRAIN Iter 40: lr = 0.499956,	loss = 4.062627,	Top-1 err = 0.154688,	Top-5 err = 0.096875,	data_time = 0.006521,	train_time = 1.751516

[30 02:16:41] TRAIN Iter 60: lr = 0.499933,	loss = 48.428082,	Top-1 err = 0.465625,	Top-5 err = 0.323047,	data_time = 0.006557,	train_time = 1.589813

[30 02:16:47] TRAIN Iter 80: lr = 0.499911,	loss = nan,	Top-1 err = 0.628516,	Top-5 err = 0.564063,	data_time = 0.006574,	train_time = 0.323313

[30 02:16:52] TRAIN Iter 100: lr = 0.499889,	loss = nan,	Top-1 err = 1.000000,	Top-5 err = 1.000000,	data_time = 0.006572,	train_time = 0.255579

About BN operations recalculation

Hello, first and foremost, thank you for sharing your paper works.

You said in your paper of Single Path One-Shot that:

Before the inference of an architecture, the statistics of all the BN operations are recalculated on a ramdom subset of training data.

I have some questions on the recalculation:

How much does this recalculation affects the inference and evolutionary search?
How do you achieve this? Is it something like firstly turning off all the parameters' training switch , then turning those BNs' on, finally do training (BP) on the model using the subset of data?
Is subset of training data re-sampled everytime a new architecture is given? or the contents of the subset are sampled only once and do not change during evolutionary search?

Thank you so much if you can give me some idea.

Pretrained model

您好 Trained Models里面的ShuffleNetV2 2.0x可能放错了精度不太对能麻烦您再提供下这个的Pretrained model吗谢谢

Low accuracy on multi-clsss problem

When I used shufflenet to train models on my own dataset, I found shufflenet has high accuracy on two-class classification probelm while low accuracy on multi-class problem.
I used finetuning based on officially released pretrained model. Is it unproper to use pretrained model when training multi-class dataset?

The link to the pretrain models is broken

The link to the pretrain models is broken，can you fix it? Thanks.

Will you release model-searching codes in future?

Thanks for your amazing work! I was wondering if you will release the one shot searching code in future.

Training speed will be slow down after some iters

Thanks for your code. When I train Shufflenetv2 1.0 , The training speed will be slow after some iters. As you can see in the picture, at the beginning 1000 iters, I need 4s to train 20 iters, but after this time, I need about 16s to train 20iters. I use 3 titan xp to train and use NVIDIA-DALI to accelerate dataloader. I set batch size as 512. Has anyone missed the same situation as me?

DetNAS (COCO-FPN) release

I'm integrating DetNAS into mmdetection library and eagerly waiting for your detector optimized models, especially DetNAS (COCO-FPN) ones.
The provided ClsNASNet_medium already shows improvement over ResNet50 as a backbone in a Cascade R-CNN detector.

When do you plan to release the detector optimized models?
Thank you!

Compare the speed of mobilenetv3-large, shufflenetv2 1.5x, shufflenetv2 + medium?

Compare the speed of mobilenetv3-large, shufflenetv2 1.5x, shufflenetv2 + medium based on ncnn on Android CPU. Shufflenetv2 + (154ms) = mobilenetv3-large (157ms)> shufflenetv2 1.5x (166ms), why is v2 + faster than v2 1.5x? Is this reasonable？thank you!

Compare time cost between mobilenetV1 and shufflenetV1、V2

think you for the nice jobs of ShuffleNet series.
i test time cost while finetuning mobilenet and shufflenet in my task in the below condition:
One GPU V100
Batchsize=8
Input size of W,H=64,128
Two labels classification

My time cost code is

    start = time.time()
    output = model(img)
    torch.cuda.synchronize()
    end = time.time()

Here is the time cost:
Mobilenetv1_0.25: ~0.0035s
MobilenetV1_0.5: ~0.0036s
Shufflenetv1_0.5(g=3): ~0.0095s
Shufflenetv2_0.5: ~0.009s

According to the paper of Shufflenetv2，their GPU speed should be : Mobilenetv1_0.25 > Shufflenetv2_0.5 > MobilenetV1_0.5 > Shufflenetv1_0.5 , so what is the problem?

the ShuffleNet structure is not consistent with the structure listed in the paper

In the original paper, the first pointwise group conv is followed by channel shuffle operarion. But in the ShufflenetV1block, it is followed by a DWConv.

How can I use CPU run the saved model

Hi there, I am using your ShuffleNetV1 code, which is very useful. Thanks a lot! But when I want to use CPU to run the saved model, it has some errors, so do you have some advices? Thank you again!

channel shuffle in ShuffleNetV2

trained models on BaiduYun cannot be extracted correctly

I downloaded ShuffleNetV2.0.5x.pth.tar on BaiduYun, and get error while extract

and I also tried oneDrive , the page will jump to oneDrive's error page.

No image normalization needed?

Hi! Congratulations on the awesome work here!

I was trying to use the ShuffleNetV2 on my own dataset and I didn't find any normalization transform on the train.py file. Isn't there any?

Best regards,
Paulo

What algorithms did author use to search the architecture of ShuffleNetV2+?

The architecture of ShuffleNetV2+ is quiet different from the ShuffleNetV2! There are 4 type of blocks inside ShuffleNetV2+ and the topology of the network is irregular. I guess some NAS algorithms were adopted to search the architecture. I am curious about the searching algorithms and I hope to get more details about them! Thanks a lot!

about trained model file

why the trained model file can't be decompressed?

shufflenetv1的一个问题

不知道是不是我理解的不对，这里有一点不太懂：
`branch_main_1 = [
# pw
nn.Conv2d(inp, mid_channels, 1, 1, 0, groups=1 if first_group else group, bias=False),
nn.BatchNorm2d(mid_channels),
nn.ReLU(inplace=True),
# dw
nn.Conv2d(mid_channels, mid_channels, ksize, stride, pad, groups=mid_channels, bias=False),
nn.BatchNorm2d(mid_channels),
]
branch_main_2 = [
# pw-linear
nn.Conv2d(mid_channels, outputs, 1, 1, 0, groups=group, bias=False),
nn.BatchNorm2d(outputs),
]
self.branch_main_1 = nn.Sequential(*branch_main_1)
self.branch_main_2 = nn.Sequential(*branch_main_2)

    if stride == 2:
        self.branch_proj = nn.AvgPool2d(kernel_size=3, stride=2, padding=1)

def forward(self, old_x):
    x = old_x
    x_proj = old_x
    _**x = self.branch_main_1(x)
    if self.group > 1:
        x = self.channel_shuffle(x)
    x = self.branch_main_2(x)**_
    if self.stride == 1:
        return F.relu(x + x_proj)
    elif self.stride == 2:
        return torch.cat((self.branch_proj(x_proj), F.relu(x)), 1)`

这里的branch_main_1应该相当于进行了11的GP卷积和33的DW卷积，
这个forward中，先是branch_main_1，在进行 channel shuffle ，这里就和文中不一样了吧，文章中是进行完1*1的，就直接channel shuffle。
期待您的回答，谢谢！

All training log files

Hi, Thanks for your excellent work! Could you please release the log files for shufflenetv2_1.0x and shufflenetv2_0.5x? Thank you so much!

Traceback and AssertionError in train

Traceback (most recent call last):
File "train.py", line 286, in
main()
File "train.py", line 107, in main
assert os.path.exists(args.train_dir)
AssertionError

$ python3 train.py --model-size 1.5x --train-dir=pessoa/train/.jpg --val-dir=pessoa/test/.jpg --save=./models/

ShuffleNetV1 architecture differ from paper. Some pretrained weights are corrupted

According to the paper output channels for group=8 are [-1, 24, 384, 768, 1536], but you are using [-1, 24, 240, 480, 960] for group=3 and group=8 here .

Some state dicts are corrupted (containing mainly -inf and nan values) :

group3 0.5x
group8 {0.5x, 1.0x, 1.5x, 2.0x}

通道设计不是2^n的原因

1，shuffleNet v2的通道数设计为什么不是像resnet, mobilenet等backbone网络按照2^n方那样设计，这个通道数是nas搜索出来的吗，我看论文没有提到通道数设计的来源。
2，如果是目标检测网络的backbone换成了shuffleNet v2，但是通道数设计为32，64，128这种2^n方类型，精度是不是会下降。

channel shuffle in ShuffleNetV2

在shufflenet中，我画了一个特征图的流程图，发现了一个奇怪的地方，向您请教一下。

如图所示：
在channel shuffle之前，我们得到一个2x4xhxw的特征图，并编上序号。
经过channel shuffle之后，特征图分为两个，分别是两个分支的输入。
经过两个分支的block并concate之后如图第三行所示，每个通道的序号仍然是按照在channel shuffle之前的排序得到的。
此时，特征图会进入下一个带channel shuffle的block，首先进行channel shuffle，得到最后一行的特征图，此时我们会发现，原始的每个bach中的特征图并不是按照类似于第二步中的顺序排列的，而是间隔一个通道排列的。每个batch内的特征图没有相邻，请教一下，是我哪里画错了吗？

Shuffle_Xception stride

if the stride is 2 1 1 respectively when downsampling, rather than 2 2 2?
see line 73, 80 and 87 in blocks.py :

训练时，数据tensor没有做 0~1的归一化么？

你好！首先感谢你们的工作！
我看了你们的训练代码，发现imagefolder 导入图像数据后没找到做 0~1的归一化，这样对于训练效果有影响不？谢谢

Will you release Pretrained Model based on RGB and ToTensor() Inputs ?

Great works! The pretrained model you release is based on bgr input (0-255), will you release the rgb input (transformed by pytorch) ?

question on why 4 fragments in parallel runs slower than 4 fragments in series

Dear @nmaac ,

May I ask a question w.r.t ShuffleNet v2?
Why 4 fragments in parallel run slower than 4 fragments in series?
The 4 fragments in parallel does not compute in parallel fashion? Is that due to implementation issue? Does that mean inception module actually does now increase speed but decrease it?

Thank you!

Hyperparameters for Training ShuffleNetV2+ Medium

I have run the code of ShuffleNetV2+ Medium with default hyperparameters, and obtained the result: Top-1 err = 0.248780, Top-5 err = 0.077260.

The top-1 err 24.9 is a little bit different from the reported 24.3. Could you provide the Hyperparameters for Training ShuffleNetV2+ Medium?

Why is ShuffleNetV1 different from description in paper ?

Difference

Your code in ShuffleNetV1/blocks.py

x = self.branch_main_1(x)
if self.group > 1:
    x = self.channel_shuffle(x)
x = self.branch_main_2(x)

The channel_shuffle operation is in the next of conv3x3 operation. But in the paper, the channel_shuffle operation occurs before the con3x3 operation.

Questions of Tab.2 in ShuffleNetV2 paper

Hi, thanks for your interesting work. I'm curious about the details of Tab.2 in ShuffleNetV2 paper.

What are x1, x2, x4 mean? Are they denote the scale of the channel?
I try to reproduce the model of g=8 and c=180, while raise ValueError('in_channels must be divisible by groups'). Is it a correct setting?

论文引用

您好，我想在shufflenetv2plus上面改进，加head用在检测算法上面，请问如何引用您提出的该模型论文呢？
希望能得到您的回复，非常感谢！

ShuffleNetV2+，batchsize=1,无法运行

File "F:\PycharmProjects\ShuffleNet-Series\ShuffleNetV2+\blocks.py", line 29, in forward
atten = self.SE_opr(x)
File "J:\Anaconda3\envs\torch01\lib\site-packages\torch\nn\modules\module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "J:\Anaconda3\envs\torch01\lib\site-packages\torch\nn\modules\container.py", line 92, in forward
input = module(input)
File "J:\Anaconda3\envs\torch01\lib\site-packages\torch\nn\modules\module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "J:\Anaconda3\envs\torch01\lib\site-packages\torch\nn\modules\batchnorm.py", line 76, in forward
exponential_average_factor, self.eps)
File "J:\Anaconda3\envs\torch01\lib\site-packages\torch\nn\functional.py", line 1619, in batch_norm
raise ValueError('Expected more than 1 value per channel when training, got input size {}'.format(size))
ValueError: Expected more than 1 value per channel when training, got input size torch.Size([1, 42, 1, 1])

原则2中为什么 MAC 与组数 g 成正比的问题

原则2中的公式(2)给定输入特征图尺寸（shape）和计算代价 B，则 MAC 与组数 g 成正比，这里我有些不明白，B 本身就与 g 有关系，而且对公式进行换算后，不是MAC 与组数 g 成反比吗？

channel shuffle layer's location

In this paper ShuffleNet: An Extremely Convolution Neural Network for Mobile Device ,
residual branch in shufflenet unit is seems like pw->channel shuffle->dw->pw. But in this repo, residual branch is pw->dw->channel shuffle->pw.
Why the order of those layers are different?

Training accuracy could not reach the accuracy of the paper, the difference of 7 percentage points

Thank you so much for such an outstanding job!But my ShuffleNet V1's accuracy on the ImageNet data set is seven percentage points off your results.The parameters I used are basically the same as yours!Only batchsize is 384, step is 7.8*e5, which is due to the limitation of my equipment.How can I improve the accuracy of the network?

One short NAS Supernet training strategy

Hi, I've implemented an MXNet version, with both fixed structure model and block & channel selection supernet model, based on this official release.

But others and myself all have had a hard time to train the supernet with both block and channel selection from scratch. Currently, my work-around is to only train with the block selection at the first 60 epochs and the train with both block and channel selection for the rest 60.

Could you please let me know what's your training strategy for the supernet? Did you train block and channel selection from scratch?

Thanks

One Shot NAS training speed

Thanks for sharing your code. I am wandering the training speed of One Shot NAS on your side. I am using two Tesla P100. 20 iteration takes about 80 seconds, which means the 300,000 iteration will take about two weeks. Can you advice me your training speed for both searching and final training?

network

Do you think the experiment is reasonable if only the SE layer similar to Mobilenetv3 is added into the Shufflenetv2 during the experiment? I tried to test it on some of the data sets and found an improvement, but not a significant one. Looking forward to your reply and comments. Thank you

Pre-training model import

address = "E:/ShuffleNetV2.1.0x.pth.tar"
pretrained_state_dict = torch.load(address)
self.load_state_dict(pretrained_state_dict, strict=False)

When I imported in this way, I found that the loss of model training changed slowly, almost as much as when the pre-training model was not loaded
Do you know the reason? Thank you very much

inference issuse

When inference using ShuffleNetV2,the class score is low and the class label is wrong.I use the following code:
transform = tansforms.Compose([
OpencvResize(256),
tansforms.CenterCrop(224),
ToBGRTensor(),
])

model = ShuffleNetV2(model_size='2.0x')
model.load_state_dict(remove_prefix(torch.load("ShuffleNetV2.2.0x.pth.tar")["state_dict"],"module."))
model.eval()
image = Image.open("../Image/cat.jpg")
img = transform(image).unsqueeze(0)
output = model(img)
output=torch.softmax(output,dim=1)
vals,idxs=torch.max(output, dim=1)
print(vals[0].item(),idxs[0].item())

Output:
0.31983616948127747 287
287 means 'lynx, catamount'

Do I preprocess the input image incorecctly?

About release the counter tools for flops

I have tried some tools pytorch-OpCounter for calculate the flops and params of the released ShuffleNetV2, but it gave different numbers from your paper.
model calculated reported in paper
shufflenetv2_0.5: 45.6M 41M
shufflenetv2_1.0: 154.65 M 146M
shufflenetv2_1.5: 311.07M 299M
I wonder whether you will release the tools to get the reported flops?

channel shuffle method is not same as papers in ShuffleNetV1

In the paper, the method of channel shuffle is that we first reshape the output channel dimension into (g, n), transposing and then flattening it back as the input of next layer. But in the code reimplementation, it is reshaped into (n, g), then transposed and flattened. There is some difference with original paper.

comparision between shufflenet+ and oneshot

Hello,

Thanks for sharing the great project.
I am trying to design an efficient network.

As in the title, between shufflenet+ and oneshot network, which one is the most efficient?

Thanks,

What is the training setting for reproducing ShuffleNetV2+？

I am curious about the training setting for reproducing ShuffleNetV2+：

Iteration numbers？According to the train.py code, total iteration is 450000.
Data Augmentation？Did you use less aggressive scale data augmentation？According to the paper of ShuffleNetV1, authors mentioned - we use slightly less aggressive scale augmentation for data preprocessing. Similar modifications are also referenced in MobileNet because such small networks usually suffer from underfitting rather than overfitting. If you use the less scale data augmentation, then what is the scale range (default = (0.08, 1.00))?

img = cv2.resize(img, target_size, interpolation=cv2.INTER_LINEAR)

File "train.py", line 30, in call
img = cv2.resize(img, target_size, interpolation=cv2.INTER_LINEAR)
error: OpenCV(4.0.1) /io/opencv/modules/imgproc/src/resize.cpp:3787: error: (-215:Assertion failed) inv_scale_x > 0 in function 'resize'