megvii-model / shufflenet-series Goto Github PK
View Code? Open in Web Editor NEWLicense: MIT License
License: MIT License
oneshot.txt
The error rate of my training is 26.3, can't reach 25.1.
To reduplicate the training details, I' d like to request the log file for your training.
Best
在116-119行
‘’’
if not self.has_proj:
x = self.relu(x_proj + x)
x = torch.cat((proj, x), dim=1)
‘’’
如果使用 self.has_proj==True,此时stride=2,所以就不使用residual形式,直接concat proj和seblock的输出x,相反则使用residual形式,我这边在自己的数据集使用(数据集无问题,其他网络收敛正常)这种方法后一直未收敛;但当在stride=2时,改为不经过se_block直接 concat proj和x则能正常收敛(即stride=2时不使用SE_Block也不使用reisual形式,stride=1不变(使用SENet_Block和residual形式))。
但在你们的shufflenetv2+中使用SE_Block后则一直保持非residual形式,我这边直接初始化训练也正常;确认了下官方SENet代码中当strde=2或者前后channel不相等时,会使用相应的卷积进行proj使特征图大小和channe保持一致后在形成residual形式,使其整个网络在使用SE_Block后都保持residual形式;
所以我直接用你们的公开的代码初始化训练一直不收敛,是不是因为SE_Block前后使用形式不一致(resiual和非residual形式混合使用)导致网络不稳定,从而影响收敛性。不知你们在训练该网络时有没有遇到这种情况,希望能获得意见。
Hi, I downloaded the pretrained models provided in Onedrive. However when extracting the tar files, I received following error:
tar: This does not look like a tar archive
Would you be more specific about how to use those pretrained models?
Thank you!
Hi, I downloaded the ShuffleNetV2.Large pre-trained model (snetv2_residual_se.pkl), but got the following message when loading the model in PyTorch:
RuntimeError: unexpected EOF, expected 1750141 more bytes. The file might be corrupted.
I have tried three times downloading the model, all get the same issue. So I think the problem may caused by the original source. The md5sum of the model which I download is:
be052a3ef97de2c64bc4236380972d38
Can you provide the md5sum of the original correct model? Many thanks.
Paramaters:
model-size=Large, auto_continue=False, batch-size=128, num_workers=8,and other default params.
Evironment:
Ubuntu 16.04, PyTorch1.2, Single RTX 2080ti GPU
When I train the ShuffleNetV2+ on ImageNet-1K dataset, it cannot be convergent when setting shuffle=False
of train_loader
. Can you give some kind advice?
[30 02:15:34] TRAIN Iter 20: lr = 0.499978, loss = 2.621804, Top-1 err = 0.162891, Top-5 err = 0.145703, data_time = 0.006651, train_time = 2.031535
[30 02:16:09] TRAIN Iter 40: lr = 0.499956, loss = 4.062627, Top-1 err = 0.154688, Top-5 err = 0.096875, data_time = 0.006521, train_time = 1.751516
[30 02:16:41] TRAIN Iter 60: lr = 0.499933, loss = 48.428082, Top-1 err = 0.465625, Top-5 err = 0.323047, data_time = 0.006557, train_time = 1.589813
[30 02:16:47] TRAIN Iter 80: lr = 0.499911, loss = nan, Top-1 err = 0.628516, Top-5 err = 0.564063, data_time = 0.006574, train_time = 0.323313
[30 02:16:52] TRAIN Iter 100: lr = 0.499889, loss = nan, Top-1 err = 1.000000, Top-5 err = 1.000000, data_time = 0.006572, train_time = 0.255579
Hello, first and foremost, thank you for sharing your paper works.
You said in your paper of Single Path One-Shot that:
Before the inference of an architecture, the statistics of all the BN operations are recalculated on a ramdom subset of training data.
I have some questions on the recalculation:
Thank you so much if you can give me some idea.
您好 Trained Models里面的ShuffleNetV2 2.0x可能放错了 精度不太对 能麻烦您再提供下这个的Pretrained model吗 谢谢
When I used shufflenet to train models on my own dataset, I found shufflenet has high accuracy on two-class classification probelm while low accuracy on multi-class problem.
I used finetuning based on officially released pretrained model. Is it unproper to use pretrained model when training multi-class dataset?
The link to the pretrain models is broken,can you fix it? Thanks.
Thanks for your amazing work! I was wondering if you will release the one shot searching code in future.
Thanks for your code. When I train Shufflenetv2 1.0 , The training speed will be slow after some iters. As you can see in the picture, at the beginning 1000 iters, I need 4s to train 20 iters, but after this time, I need about 16s to train 20iters. I use 3 titan xp to train and use NVIDIA-DALI to accelerate dataloader. I set batch size as 512. Has anyone missed the same situation as me?
I'm integrating DetNAS into mmdetection library and eagerly waiting for your detector optimized models, especially DetNAS (COCO-FPN) ones.
The provided ClsNASNet_medium already shows improvement over ResNet50 as a backbone in a Cascade R-CNN detector.
When do you plan to release the detector optimized models?
Thank you!
Compare the speed of mobilenetv3-large, shufflenetv2 1.5x, shufflenetv2 + medium based on ncnn on Android CPU. Shufflenetv2 + (154ms) = mobilenetv3-large (157ms)> shufflenetv2 1.5x (166ms), why is v2 + faster than v2 1.5x? Is this reasonable?thank you!
think you for the nice jobs of ShuffleNet series.
i test time cost while finetuning mobilenet and shufflenet in my task in the below condition:
One GPU V100
Batchsize=8
Input size of W,H=64,128
Two labels classification
My time cost code is
start = time.time()
output = model(img)
torch.cuda.synchronize()
end = time.time()
Here is the time cost:
Mobilenetv1_0.25: ~0.0035s
MobilenetV1_0.5: ~0.0036s
Shufflenetv1_0.5(g=3): ~0.0095s
Shufflenetv2_0.5: ~0.009s
According to the paper of Shufflenetv2,their GPU speed should be : Mobilenetv1_0.25 > Shufflenetv2_0.5 > MobilenetV1_0.5 > Shufflenetv1_0.5 , so what is the problem?
In the original paper, the first pointwise group conv is followed by channel shuffle operarion. But in the ShufflenetV1block, it is followed by a DWConv.
Hi there, I am using your ShuffleNetV1 code, which is very useful. Thanks a lot! But when I want to use CPU to run the saved model, it has some errors, so do you have some advices? Thank you again!
Hi! Congratulations on the awesome work here!
I was trying to use the ShuffleNetV2 on my own dataset and I didn't find any normalization transform on the train.py
file. Isn't there any?
Best regards,
Paulo
The architecture of ShuffleNetV2+ is quiet different from the ShuffleNetV2! There are 4 type of blocks inside ShuffleNetV2+ and the topology of the network is irregular. I guess some NAS algorithms were adopted to search the architecture. I am curious about the searching algorithms and I hope to get more details about them! Thanks a lot!
why the trained model file can't be decompressed?
不知道是不是我理解的不对,这里有一点不太懂:
`branch_main_1 = [
# pw
nn.Conv2d(inp, mid_channels, 1, 1, 0, groups=1 if first_group else group, bias=False),
nn.BatchNorm2d(mid_channels),
nn.ReLU(inplace=True),
# dw
nn.Conv2d(mid_channels, mid_channels, ksize, stride, pad, groups=mid_channels, bias=False),
nn.BatchNorm2d(mid_channels),
]
branch_main_2 = [
# pw-linear
nn.Conv2d(mid_channels, outputs, 1, 1, 0, groups=group, bias=False),
nn.BatchNorm2d(outputs),
]
self.branch_main_1 = nn.Sequential(*branch_main_1)
self.branch_main_2 = nn.Sequential(*branch_main_2)
if stride == 2:
self.branch_proj = nn.AvgPool2d(kernel_size=3, stride=2, padding=1)
def forward(self, old_x):
x = old_x
x_proj = old_x
_**x = self.branch_main_1(x)
if self.group > 1:
x = self.channel_shuffle(x)
x = self.branch_main_2(x)**_
if self.stride == 1:
return F.relu(x + x_proj)
elif self.stride == 2:
return torch.cat((self.branch_proj(x_proj), F.relu(x)), 1)`
这里的branch_main_1应该相当于进行了11的GP卷积和33的DW卷积,
这个forward中,先是branch_main_1,在进行 channel shuffle ,这里就和文中不一样了吧,文章中是进行完1*1的,就直接channel shuffle。
期待您的回答,谢谢!
Hi, Thanks for your excellent work! Could you please release the log files for shufflenetv2_1.0x and shufflenetv2_0.5x? Thank you so much!
Traceback (most recent call last):
File "train.py", line 286, in
main()
File "train.py", line 107, in main
assert os.path.exists(args.train_dir)
AssertionError
$ python3 train.py --model-size 1.5x --train-dir=pessoa/train/.jpg --val-dir=pessoa/test/.jpg --save=./models/
1,shuffleNet v2的通道数设计为什么不是像resnet, mobilenet等backbone网络按照2^n方那样设计,这个通道数是nas搜索出来的吗,我看论文没有提到通道数设计的来源。
2,如果是目标检测网络的backbone换成了shuffleNet v2,但是通道数设计为32,64,128这种2^n方类型,精度是不是会下降。
在shufflenet中,我画了一个特征图的流程图,发现了一个奇怪的地方,向您请教一下。
如图所示:
在channel shuffle之前,我们得到一个2x4xhxw的特征图,并编上序号。
经过channel shuffle之后,特征图分为两个,分别是两个分支的输入。
经过两个分支的block并concate之后如图第三行所示,每个通道的序号仍然是按照在channel shuffle之前的排序得到的。
此时,特征图会进入下一个带channel shuffle的block,首先进行channel shuffle,得到最后一行的特征图,此时我们会发现,原始的每个bach中的特征图并不是按照类似于第二步中的顺序排列的,而是间隔一个通道排列的。每个batch内的特征图没有相邻,请教一下,是我哪里画错了吗?
if the stride is 2 1 1 respectively when downsampling, rather than 2 2 2?
see line 73, 80 and 87 in blocks.py :
你好!首先感谢你们的工作!
我看了你们的训练代码,发现imagefolder 导入图像数据后 没找到做 0~1的归一化,这样对于训练效果有影响不?谢谢
Great works! The pretrained model you release is based on bgr input (0-255), will you release the rgb input (transformed by pytorch) ?
Dear @nmaac ,
May I ask a question w.r.t ShuffleNet v2?
Why 4 fragments in parallel run slower than 4 fragments in series?
The 4 fragments in parallel does not compute in parallel fashion? Is that due to implementation issue? Does that mean inception module actually does now increase speed but decrease it?
Thank you!
I have run the code of ShuffleNetV2+ Medium with default hyperparameters, and obtained the result: Top-1 err = 0.248780, Top-5 err = 0.077260
.
The top-1 err 24.9 is a little bit different from the reported 24.3. Could you provide the Hyperparameters for Training ShuffleNetV2+ Medium?
Hi, thanks for your interesting work. I'm curious about the details of Tab.2 in ShuffleNetV2 paper.
raise ValueError('in_channels must be divisible by groups')
. Is it a correct setting?您好,我想在shufflenetv2plus上面改进,加head用在检测算法上面,请问如何引用您提出的该模型论文呢?
希望能得到您的回复,非常感谢!
File "F:\PycharmProjects\ShuffleNet-Series\ShuffleNetV2+\blocks.py", line 29, in forward
atten = self.SE_opr(x)
File "J:\Anaconda3\envs\torch01\lib\site-packages\torch\nn\modules\module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "J:\Anaconda3\envs\torch01\lib\site-packages\torch\nn\modules\container.py", line 92, in forward
input = module(input)
File "J:\Anaconda3\envs\torch01\lib\site-packages\torch\nn\modules\module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "J:\Anaconda3\envs\torch01\lib\site-packages\torch\nn\modules\batchnorm.py", line 76, in forward
exponential_average_factor, self.eps)
File "J:\Anaconda3\envs\torch01\lib\site-packages\torch\nn\functional.py", line 1619, in batch_norm
raise ValueError('Expected more than 1 value per channel when training, got input size {}'.format(size))
ValueError: Expected more than 1 value per channel when training, got input size torch.Size([1, 42, 1, 1])
In this paper ShuffleNet: An Extremely Convolution Neural Network for Mobile Device ,
residual branch in shufflenet unit is seems like pw->channel shuffle->dw->pw. But in this repo, residual branch is pw->dw->channel shuffle->pw.
Why the order of those layers are different?
Thank you so much for such an outstanding job!But my ShuffleNet V1's accuracy on the ImageNet data set is seven percentage points off your results.The parameters I used are basically the same as yours!Only batchsize is 384, step is 7.8*e5, which is due to the limitation of my equipment.How can I improve the accuracy of the network?
Hi, I've implemented an MXNet version, with both fixed structure model and block & channel selection supernet model, based on this official release.
But others and myself all have had a hard time to train the supernet with both block and channel selection from scratch. Currently, my work-around is to only train with the block selection at the first 60 epochs and the train with both block and channel selection for the rest 60.
Could you please let me know what's your training strategy for the supernet? Did you train block and channel selection from scratch?
Thanks
Thanks for sharing your code. I am wandering the training speed of One Shot NAS on your side. I am using two Tesla P100. 20 iteration takes about 80 seconds, which means the 300,000 iteration will take about two weeks. Can you advice me your training speed for both searching and final training?
Do you think the experiment is reasonable if only the SE layer similar to Mobilenetv3 is added into the Shufflenetv2 during the experiment? I tried to test it on some of the data sets and found an improvement, but not a significant one. Looking forward to your reply and comments. Thank you
address = "E:/ShuffleNetV2.1.0x.pth.tar"
pretrained_state_dict = torch.load(address)
self.load_state_dict(pretrained_state_dict, strict=False)
When I imported in this way, I found that the loss of model training changed slowly, almost as much as when the pre-training model was not loaded
Do you know the reason? Thank you very much
When inference using ShuffleNetV2,the class score is low and the class label is wrong.I use the following code:
transform = tansforms.Compose([
OpencvResize(256),
tansforms.CenterCrop(224),
ToBGRTensor(),
])
model = ShuffleNetV2(model_size='2.0x')
model.load_state_dict(remove_prefix(torch.load("ShuffleNetV2.2.0x.pth.tar")["state_dict"],"module."))
model.eval()
image = Image.open("../Image/cat.jpg")
img = transform(image).unsqueeze(0)
output = model(img)
output=torch.softmax(output,dim=1)
vals,idxs=torch.max(output, dim=1)
print(vals[0].item(),idxs[0].item())
Output:
0.31983616948127747 287
287 means 'lynx, catamount'
Do I preprocess the input image incorecctly?
I have tried some tools pytorch-OpCounter for calculate the flops and params of the released ShuffleNetV2, but it gave different numbers from your paper.
model calculated reported in paper
shufflenetv2_0.5: 45.6M 41M
shufflenetv2_1.0: 154.65 M 146M
shufflenetv2_1.5: 311.07M 299M
I wonder whether you will release the tools to get the reported flops?
In the paper, the method of channel shuffle is that we first reshape the output channel dimension into (g, n), transposing and then flattening it back as the input of next layer. But in the code reimplementation, it is reshaped into (n, g), then transposed and flattened. There is some difference with original paper.
Hello,
Thanks for sharing the great project.
I am trying to design an efficient network.
As in the title, between shufflenet+ and oneshot network, which one is the most efficient?
Thanks,
I am curious about the training setting for reproducing ShuffleNetV2+:
File "train.py", line 30, in call
img = cv2.resize(img, target_size, interpolation=cv2.INTER_LINEAR)
error: OpenCV(4.0.1) /io/opencv/modules/imgproc/src/resize.cpp:3787: error: (-215:Assertion failed) inv_scale_x > 0 in function 'resize'
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.