Git Product home page Git Product logo

shufflenet-series's People

Contributors

nmaac avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

shufflenet-series's Issues

ShuffleNet_ExLarge中SE_block问题

在116-119行
‘’’
if not self.has_proj:
x = self.relu(x_proj + x)
x = torch.cat((proj, x), dim=1)
‘’’
如果使用 self.has_proj==True,此时stride=2,所以就不使用residual形式,直接concat proj和seblock的输出x,相反则使用residual形式,我这边在自己的数据集使用(数据集无问题,其他网络收敛正常)这种方法后一直未收敛;但当在stride=2时,改为不经过se_block直接 concat proj和x则能正常收敛(即stride=2时不使用SE_Block也不使用reisual形式,stride=1不变(使用SENet_Block和residual形式))。
但在你们的shufflenetv2+中使用SE_Block后则一直保持非residual形式,我这边直接初始化训练也正常;确认了下官方SENet代码中当strde=2或者前后channel不相等时,会使用相应的卷积进行proj使特征图大小和channe保持一致后在形成residual形式,使其整个网络在使用SE_Block后都保持residual形式;
所以我直接用你们的公开的代码初始化训练一直不收敛,是不是因为SE_Block前后使用形式不一致(resiual和非residual形式混合使用)导致网络不稳定,从而影响收敛性。不知你们在训练该网络时有没有遇到这种情况,希望能获得意见。

pretrained model fatal error

Hi, I downloaded the pretrained models provided in Onedrive. However when extracting the tar files, I received following error:
tar: This does not look like a tar archive

Would you be more specific about how to use those pretrained models?

Thank you!

Some pre-trained models may be corrupted

Hi, I downloaded the ShuffleNetV2.Large pre-trained model (snetv2_residual_se.pkl), but got the following message when loading the model in PyTorch:

RuntimeError: unexpected EOF, expected 1750141 more bytes. The file might be corrupted.

I have tried three times downloading the model, all get the same issue. So I think the problem may caused by the original source. The md5sum of the model which I download is:
be052a3ef97de2c64bc4236380972d38

Can you provide the md5sum of the original correct model? Many thanks.

ShuffleNetV2+ cannot be convergent when setting shuffle=False of train_loader

Paramaters:
model-size=Large, auto_continue=False, batch-size=128, num_workers=8,and other default params.
Evironment:
Ubuntu 16.04, PyTorch1.2, Single RTX 2080ti GPU

When I train the ShuffleNetV2+ on ImageNet-1K dataset, it cannot be convergent when setting shuffle=False of train_loader. Can you give some kind advice?

[30 02:15:34] TRAIN Iter 20: lr = 0.499978,	loss = 2.621804,	Top-1 err = 0.162891,	Top-5 err = 0.145703,	data_time = 0.006651,	train_time = 2.031535

[30 02:16:09] TRAIN Iter 40: lr = 0.499956,	loss = 4.062627,	Top-1 err = 0.154688,	Top-5 err = 0.096875,	data_time = 0.006521,	train_time = 1.751516

[30 02:16:41] TRAIN Iter 60: lr = 0.499933,	loss = 48.428082,	Top-1 err = 0.465625,	Top-5 err = 0.323047,	data_time = 0.006557,	train_time = 1.589813

[30 02:16:47] TRAIN Iter 80: lr = 0.499911,	loss = nan,	Top-1 err = 0.628516,	Top-5 err = 0.564063,	data_time = 0.006574,	train_time = 0.323313

[30 02:16:52] TRAIN Iter 100: lr = 0.499889,	loss = nan,	Top-1 err = 1.000000,	Top-5 err = 1.000000,	data_time = 0.006572,	train_time = 0.255579

About BN operations recalculation

Hello, first and foremost, thank you for sharing your paper works.

You said in your paper of Single Path One-Shot that:

Before the inference of an architecture, the statistics of all the BN operations are recalculated on a ramdom subset of training data.

I have some questions on the recalculation:

  1. How much does this recalculation affects the inference and evolutionary search?
  2. How do you achieve this? Is it something like firstly turning off all the parameters' training switch , then turning those BNs' on, finally do training (BP) on the model using the subset of data?
  3. Is subset of training data re-sampled everytime a new architecture is given? or the contents of the subset are sampled only once and do not change during evolutionary search?

Thank you so much if you can give me some idea.

Pretrained model

您好 Trained Models里面的ShuffleNetV2 2.0x可能放错了 精度不太对 能麻烦您再提供下这个的Pretrained model吗 谢谢

Low accuracy on multi-clsss problem

When I used shufflenet to train models on my own dataset, I found shufflenet has high accuracy on two-class classification probelm while low accuracy on multi-class problem.
I used finetuning based on officially released pretrained model. Is it unproper to use pretrained model when training multi-class dataset?

Training speed will be slow down after some iters

Thanks for your code. When I train Shufflenetv2 1.0 , The training speed will be slow after some iters. As you can see in the picture, at the beginning 1000 iters, I need 4s to train 20 iters, but after this time, I need about 16s to train 20iters. I use 3 titan xp to train and use NVIDIA-DALI to accelerate dataloader. I set batch size as 512. Has anyone missed the same situation as me?
111
222

DetNAS (COCO-FPN) release

I'm integrating DetNAS into mmdetection library and eagerly waiting for your detector optimized models, especially DetNAS (COCO-FPN) ones.
The provided ClsNASNet_medium already shows improvement over ResNet50 as a backbone in a Cascade R-CNN detector.

When do you plan to release the detector optimized models?
Thank you!

Compare time cost between mobilenetV1 and shufflenetV1、V2

think you for the nice jobs of ShuffleNet series.
i test time cost while finetuning mobilenet and shufflenet in my task in the below condition:
One GPU V100
Batchsize=8
Input size of W,H=64,128
Two labels classification

My time cost code is

    start = time.time()
    output = model(img)
    torch.cuda.synchronize()
    end = time.time()

Here is the time cost:
Mobilenetv1_0.25: ~0.0035s
MobilenetV1_0.5: ~0.0036s
Shufflenetv1_0.5(g=3): ~0.0095s
Shufflenetv2_0.5: ~0.009s

According to the paper of Shufflenetv2,their GPU speed should be : Mobilenetv1_0.25 > Shufflenetv2_0.5 > MobilenetV1_0.5 > Shufflenetv1_0.5 , so what is the problem?

How can I use CPU run the saved model

Hi there, I am using your ShuffleNetV1 code, which is very useful. Thanks a lot! But when I want to use CPU to run the saved model, it has some errors, so do you have some advices? Thank you again!

No image normalization needed?

Hi! Congratulations on the awesome work here!

I was trying to use the ShuffleNetV2 on my own dataset and I didn't find any normalization transform on the train.py file. Isn't there any?

Best regards,
Paulo

What algorithms did author use to search the architecture of ShuffleNetV2+?

The architecture of ShuffleNetV2+ is quiet different from the ShuffleNetV2! There are 4 type of blocks inside ShuffleNetV2+ and the topology of the network is irregular. I guess some NAS algorithms were adopted to search the architecture. I am curious about the searching algorithms and I hope to get more details about them! Thanks a lot!

shufflenetv1的一个问题

1
不知道是不是我理解的不对,这里有一点不太懂:
`branch_main_1 = [
# pw
nn.Conv2d(inp, mid_channels, 1, 1, 0, groups=1 if first_group else group, bias=False),
nn.BatchNorm2d(mid_channels),
nn.ReLU(inplace=True),
# dw
nn.Conv2d(mid_channels, mid_channels, ksize, stride, pad, groups=mid_channels, bias=False),
nn.BatchNorm2d(mid_channels),
]
branch_main_2 = [
# pw-linear
nn.Conv2d(mid_channels, outputs, 1, 1, 0, groups=group, bias=False),
nn.BatchNorm2d(outputs),
]
self.branch_main_1 = nn.Sequential(*branch_main_1)
self.branch_main_2 = nn.Sequential(*branch_main_2)

    if stride == 2:
        self.branch_proj = nn.AvgPool2d(kernel_size=3, stride=2, padding=1)

def forward(self, old_x):
    x = old_x
    x_proj = old_x
    _**x = self.branch_main_1(x)
    if self.group > 1:
        x = self.channel_shuffle(x)
    x = self.branch_main_2(x)**_
    if self.stride == 1:
        return F.relu(x + x_proj)
    elif self.stride == 2:
        return torch.cat((self.branch_proj(x_proj), F.relu(x)), 1)`

这里的branch_main_1应该相当于进行了11的GP卷积和33的DW卷积,
这个forward中,先是branch_main_1,在进行 channel shuffle ,这里就和文中不一样了吧,文章中是进行完1*1的,就直接channel shuffle。
期待您的回答,谢谢!

All training log files

Hi, Thanks for your excellent work! Could you please release the log files for shufflenetv2_1.0x and shufflenetv2_0.5x? Thank you so much!

Traceback and AssertionError in train

Traceback (most recent call last):
File "train.py", line 286, in
main()
File "train.py", line 107, in main
assert os.path.exists(args.train_dir)
AssertionError

$ python3 train.py --model-size 1.5x --train-dir=pessoa/train/.jpg --val-dir=pessoa/test/.jpg --save=./models/

通道设计不是2^n的原因

1,shuffleNet v2的通道数设计为什么不是像resnet, mobilenet等backbone网络按照2^n方那样设计,这个通道数是nas搜索出来的吗,我看论文没有提到通道数设计的来源。
2,如果是目标检测网络的backbone换成了shuffleNet v2,但是通道数设计为32,64,128这种2^n方类型,精度是不是会下降。

channel shuffle in ShuffleNetV2

在shufflenet中,我画了一个特征图的流程图,发现了一个奇怪的地方,向您请教一下。
微信图片_20191106170906
如图所示:
在channel shuffle之前,我们得到一个2x4xhxw的特征图,并编上序号。
经过channel shuffle之后,特征图分为两个,分别是两个分支的输入。
经过两个分支的block并concate之后如图第三行所示,每个通道的序号仍然是按照在channel shuffle之前的排序得到的。
此时,特征图会进入下一个带channel shuffle的block,首先进行channel shuffle,得到最后一行的特征图,此时我们会发现,原始的每个bach中的特征图并不是按照类似于第二步中的顺序排列的,而是间隔一个通道排列的。每个batch内的特征图没有相邻,请教一下,是我哪里画错了吗?

Shuffle_Xception stride

if the stride is 2 1 1 respectively when downsampling, rather than 2 2 2?
see line 73, 80 and 87 in blocks.py :

Hyperparameters for Training ShuffleNetV2+ Medium

I have run the code of ShuffleNetV2+ Medium with default hyperparameters, and obtained the result: Top-1 err = 0.248780, Top-5 err = 0.077260.

The top-1 err 24.9 is a little bit different from the reported 24.3. Could you provide the Hyperparameters for Training ShuffleNetV2+ Medium?

Why is ShuffleNetV1 different from description in paper ?

Difference

Your code in ShuffleNetV1/blocks.py

x = self.branch_main_1(x)
if self.group > 1:
    x = self.channel_shuffle(x)
x = self.branch_main_2(x)

The channel_shuffle operation is in the next of conv3x3 operation. But in the paper, the channel_shuffle operation occurs before the con3x3 operation.
image

Questions of Tab.2 in ShuffleNetV2 paper

Hi, thanks for your interesting work. I'm curious about the details of Tab.2 in ShuffleNetV2 paper.

  1. What are x1, x2, x4 mean? Are they denote the scale of the channel?
  2. I try to reproduce the model of g=8 and c=180, while raise ValueError('in_channels must be divisible by groups'). Is it a correct setting?

论文引用

您好,我想在shufflenetv2plus上面改进,加head用在检测算法上面,请问如何引用您提出的该模型论文呢?
希望能得到您的回复,非常感谢!

ShuffleNetV2+,batchsize=1,无法运行

File "F:\PycharmProjects\ShuffleNet-Series\ShuffleNetV2+\blocks.py", line 29, in forward
atten = self.SE_opr(x)
File "J:\Anaconda3\envs\torch01\lib\site-packages\torch\nn\modules\module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "J:\Anaconda3\envs\torch01\lib\site-packages\torch\nn\modules\container.py", line 92, in forward
input = module(input)
File "J:\Anaconda3\envs\torch01\lib\site-packages\torch\nn\modules\module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "J:\Anaconda3\envs\torch01\lib\site-packages\torch\nn\modules\batchnorm.py", line 76, in forward
exponential_average_factor, self.eps)
File "J:\Anaconda3\envs\torch01\lib\site-packages\torch\nn\functional.py", line 1619, in batch_norm
raise ValueError('Expected more than 1 value per channel when training, got input size {}'.format(size))
ValueError: Expected more than 1 value per channel when training, got input size torch.Size([1, 42, 1, 1])

原则2中为什么 MAC 与组数 g 成正比的问题

image
原则2中的公式(2)给定输入特征图尺寸(shape) 和计算代价 B,则 MAC 与组数 g 成正比,这里我有些不明白,B 本身就与 g 有关系,而且对公式进行换算后,不是MAC 与组数 g 成反比吗?

One short NAS Supernet training strategy

Hi, I've implemented an MXNet version, with both fixed structure model and block & channel selection supernet model, based on this official release.

But others and myself all have had a hard time to train the supernet with both block and channel selection from scratch. Currently, my work-around is to only train with the block selection at the first 60 epochs and the train with both block and channel selection for the rest 60.

Could you please let me know what's your training strategy for the supernet? Did you train block and channel selection from scratch?

Thanks

One Shot NAS training speed

Thanks for sharing your code. I am wandering the training speed of One Shot NAS on your side. I am using two Tesla P100. 20 iteration takes about 80 seconds, which means the 300,000 iteration will take about two weeks. Can you advice me your training speed for both searching and final training?

network

Do you think the experiment is reasonable if only the SE layer similar to Mobilenetv3 is added into the Shufflenetv2 during the experiment? I tried to test it on some of the data sets and found an improvement, but not a significant one. Looking forward to your reply and comments. Thank you

Pre-training model import

address = "E:/ShuffleNetV2.1.0x.pth.tar"
pretrained_state_dict = torch.load(address)
self.load_state_dict(pretrained_state_dict, strict=False)

When I imported in this way, I found that the loss of model training changed slowly, almost as much as when the pre-training model was not loaded
Do you know the reason? Thank you very much

inference issuse

When inference using ShuffleNetV2,the class score is low and the class label is wrong.I use the following code:
transform = tansforms.Compose([
OpencvResize(256),
tansforms.CenterCrop(224),
ToBGRTensor(),
])

model = ShuffleNetV2(model_size='2.0x')
model.load_state_dict(remove_prefix(torch.load("ShuffleNetV2.2.0x.pth.tar")["state_dict"],"module."))
model.eval()
image = Image.open("../Image/cat.jpg")
img = transform(image).unsqueeze(0)
output = model(img)
output=torch.softmax(output,dim=1)
vals,idxs=torch.max(output, dim=1)
print(vals[0].item(),idxs[0].item())

Output:
0.31983616948127747 287
287 means 'lynx, catamount'

Do I preprocess the input image incorecctly?

About release the counter tools for flops

I have tried some tools pytorch-OpCounter for calculate the flops and params of the released ShuffleNetV2, but it gave different numbers from your paper.
model calculated reported in paper
shufflenetv2_0.5: 45.6M 41M
shufflenetv2_1.0: 154.65 M 146M
shufflenetv2_1.5: 311.07M 299M
I wonder whether you will release the tools to get the reported flops?

channel shuffle method is not same as papers in ShuffleNetV1

In the paper, the method of channel shuffle is that we first reshape the output channel dimension into (g, n), transposing and then flattening it back as the input of next layer. But in the code reimplementation, it is reshaped into (n, g), then transposed and flattened. There is some difference with original paper.

comparision between shufflenet+ and oneshot

Hello,

Thanks for sharing the great project.
I am trying to design an efficient network.

As in the title, between shufflenet+ and oneshot network, which one is the most efficient?

Thanks,

What is the training setting for reproducing ShuffleNetV2+?

I am curious about the training setting for reproducing ShuffleNetV2+:

  1. Iteration numbers?According to the train.py code, total iteration is 450000.
  2. Data Augmentation?Did you use less aggressive scale data augmentation?According to the paper of ShuffleNetV1, authors mentioned - we use slightly less aggressive scale augmentation for data preprocessing. Similar modifications are also referenced in MobileNet because such small networks usually suffer from underfitting rather than overfitting. If you use the less scale data augmentation, then what is the scale range (default = (0.08, 1.00))?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.