Git Product home page Git Product logo

weiaicunzai / pytorch-cifar100 Goto Github PK

View Code? Open in Web Editor NEW
4.0K 34.0 1.1K 543 KB

Practice on cifar100(ResNet, DenseNet, VGG, GoogleNet, InceptionV3, InceptionV4, Inception-ResNetv2, Xception, Resnet In Resnet, ResNext,ShuffleNet, ShuffleNetv2, MobileNet, MobileNetv2, SqueezeNet, NasNet, Residual Attention Network, SENet, WideResNet)

Python 100.00%
pytorch image-classification deep-learning cifar100 resnet googlenet inceptionv4 xception resnext inceptionv3

pytorch-cifar100's Introduction

Pytorch-cifar100

practice on cifar100 using pytorch

Requirements

This is my experiment eviroument

  • python3.6
  • pytorch1.6.0+cu101
  • tensorboard 2.2.2(optional)

Usage

1. enter directory

$ cd pytorch-cifar100

2. dataset

I will use cifar100 dataset from torchvision since it's more convenient, but I also kept the sample code for writing your own dataset module in dataset folder, as an example for people don't know how to write it.

3. run tensorbard(optional)

Install tensorboard

$ pip install tensorboard
$ mkdir runs
Run tensorboard
$ tensorboard --logdir='runs' --port=6006 --host='localhost'

4. train the model

You need to specify the net you want to train using arg -net

# use gpu to train vgg16
$ python train.py -net vgg16 -gpu

sometimes, you might want to use warmup training by set -warm to 1 or 2, to prevent network diverge during early training phase.

The supported net args are:

squeezenet
mobilenet
mobilenetv2
shufflenet
shufflenetv2
vgg11
vgg13
vgg16
vgg19
densenet121
densenet161
densenet201
googlenet
inceptionv3
inceptionv4
inceptionresnetv2
xception
resnet18
resnet34
resnet50
resnet101
resnet152
preactresnet18
preactresnet34
preactresnet50
preactresnet101
preactresnet152
resnext50
resnext101
resnext152
attention56
attention92
seresnet18
seresnet34
seresnet50
seresnet101
seresnet152
nasnet
wideresnet
stochasticdepth18
stochasticdepth34
stochasticdepth50
stochasticdepth101

Normally, the weights file with the best accuracy would be written to the disk with name suffix 'best'(default in checkpoint folder).

5. test the model

Test the model using test.py

$ python test.py -net vgg16 -weights path_to_vgg16_weights_file

Implementated NetWork

Training Details

I didn't use any training tricks to improve accuray, if you want to learn more about training tricks, please refer to my another repo, contains various common training tricks and their pytorch implementations.

I follow the hyperparameter settings in paper Improved Regularization of Convolutional Neural Networks with Cutout, which is init lr = 0.1 divide by 5 at 60th, 120th, 160th epochs, train for 200 epochs with batchsize 128 and weight decay 5e-4, Nesterov momentum of 0.9. You could also use the hyperparameters from paper Regularizing Neural Networks by Penalizing Confident Output Distributions and Random Erasing Data Augmentation, which is initial lr = 0.1, lr divied by 10 at 150th and 225th epochs, and training for 300 epochs with batchsize 128, this is more commonly used. You could decrese the batchsize to 64 or whatever suits you, if you dont have enough gpu memory.

You can choose whether to use TensorBoard to visualize your training procedure

Results

The result I can get from a certain model, since I use the same hyperparameters to train all the networks, some networks might not get the best result from these hyperparameters, you could try yourself by finetuning the hyperparameters to get better result.

dataset network params top1 err top5 err epoch(lr = 0.1) epoch(lr = 0.02) epoch(lr = 0.004) epoch(lr = 0.0008) total epoch
cifar100 mobilenet 3.3M 34.02 10.56 60 60 40 40 200
cifar100 mobilenetv2 2.36M 31.92 09.02 60 60 40 40 200
cifar100 squeezenet 0.78M 30.59 8.36 60 60 40 40 200
cifar100 shufflenet 1.0M 29.94 8.35 60 60 40 40 200
cifar100 shufflenetv2 1.3M 30.49 8.49 60 60 40 40 200
cifar100 vgg11_bn 28.5M 31.36 11.85 60 60 40 40 200
cifar100 vgg13_bn 28.7M 28.00 9.71 60 60 40 40 200
cifar100 vgg16_bn 34.0M 27.07 8.84 60 60 40 40 200
cifar100 vgg19_bn 39.0M 27.77 8.84 60 60 40 40 200
cifar100 resnet18 11.2M 24.39 6.95 60 60 40 40 200
cifar100 resnet34 21.3M 23.24 6.63 60 60 40 40 200
cifar100 resnet50 23.7M 22.61 6.04 60 60 40 40 200
cifar100 resnet101 42.7M 22.22 5.61 60 60 40 40 200
cifar100 resnet152 58.3M 22.31 5.81 60 60 40 40 200
cifar100 preactresnet18 11.3M 27.08 8.53 60 60 40 40 200
cifar100 preactresnet34 21.5M 24.79 7.68 60 60 40 40 200
cifar100 preactresnet50 23.9M 25.73 8.15 60 60 40 40 200
cifar100 preactresnet101 42.9M 24.84 7.83 60 60 40 40 200
cifar100 preactresnet152 58.6M 22.71 6.62 60 60 40 40 200
cifar100 resnext50 14.8M 22.23 6.00 60 60 40 40 200
cifar100 resnext101 25.3M 22.22 5.99 60 60 40 40 200
cifar100 resnext152 33.3M 22.40 5.58 60 60 40 40 200
cifar100 attention59 55.7M 33.75 12.90 60 60 40 40 200
cifar100 attention92 102.5M 36.52 11.47 60 60 40 40 200
cifar100 densenet121 7.0M 22.99 6.45 60 60 40 40 200
cifar100 densenet161 26M 21.56 6.04 60 60 60 40 200
cifar100 densenet201 18M 21.46 5.9 60 60 40 40 200
cifar100 googlenet 6.2M 21.97 5.94 60 60 40 40 200
cifar100 inceptionv3 22.3M 22.81 6.39 60 60 40 40 200
cifar100 inceptionv4 41.3M 24.14 6.90 60 60 40 40 200
cifar100 inceptionresnetv2 65.4M 27.51 9.11 60 60 40 40 200
cifar100 xception 21.0M 25.07 7.32 60 60 40 40 200
cifar100 seresnet18 11.4M 23.56 6.68 60 60 40 40 200
cifar100 seresnet34 21.6M 22.07 6.12 60 60 40 40 200
cifar100 seresnet50 26.5M 21.42 5.58 60 60 40 40 200
cifar100 seresnet101 47.7M 20.98 5.41 60 60 40 40 200
cifar100 seresnet152 66.2M 20.66 5.19 60 60 40 40 200
cifar100 nasnet 5.2M 22.71 5.91 60 60 40 40 200
cifar100 wideresnet-40-10 55.9M 21.25 5.77 60 60 40 40 200
cifar100 stochasticdepth18 11.22M 31.40 8.84 60 60 40 40 200
cifar100 stochasticdepth34 21.36M 27.72 7.32 60 60 40 40 200
cifar100 stochasticdepth50 23.71M 23.35 5.76 60 60 40 40 200
cifar100 stochasticdepth101 42.69M 21.28 5.39 60 60 40 40 200

pytorch-cifar100's People

Contributors

developer0hye avatar weiaicunzai avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pytorch-cifar100's Issues

resnet50 accuracy is a little bit worse than resnet18

With three runs of resnet18, I got an average acc around 76.1. However, on resnet50, I got 76.0.
Does anyone have the same problem? By the way, resnet101 work fine with acc 78.1. I think the resnet50's acc is supposed to be around 77.

Is the data enhanced once or every epoch?

Is the data enhanced once or every epoch? I want to know if the data augmentation function "transform_train" executed once or every epoch. If the data augmentation only executed once, that make little difference.

print(torch.cuda.memory_summary(), end='')

AttributeError: module torch.cuda has no attribute memory_summary
image

What the meaning of "torch.cuda.memory_summary()" ??
What does the author want to express in this sentence “print(torch.cuda.memory_summary(), end='')”
train.py line 100

Is this a mistake in senet for fair comparison?

I notice that the stage4 of other instance of ResNet is 512 while the SENet is 516,
Is this a bug?
SENet:

self.stage4 = self._make_stage(block, block_num[3], 516, 2)

ResNet:
self.conv5_x = self._make_layer(block, 512, num_block[3], 2)

ReXsNet:
self.conv5 = self._make_layer(block, num_blocks[3], 512, 2)

boolean type can be accepted rightly as an argument in train.py

In the train.py, the argument gpu and s will be accepted as boolean. However, in normal ways, if you input the command like the following:
python train.py -gpu False -s False

you will find the arg.gpu and arg.s are True. It means that the arguments gpu and s do not accept the input rightly.

Pretrained Models

Are your pre-trained weights available to download for any of your experimental runs?

请问是不是models.resnet这些网络的实现有些问题?

我的pytorch==1.2, torchvision=0.4.0
我试着用torchvision.models中实现的ResNet去运行train.py训练,结果在测试集上的ACC只有 60% 左右。我用作者你实现的ResNet训练的话,ACC大概符合预期。
二者区别还有一个,Batch_size=128时,用torchvision.models中的网络训练,GPU_Memory大概只用了1.4G,而用作者你的实现,GPU_Memory大概要占用5G左右,感觉像是给Batch-Size中每个图片都分配了一个model。
请问这个实现网络的时候有区别,还是咱们的pytorch版本不同导致的问题?
谢谢。

Multiple GPUs

Great thanks for sharing the codes!

One question, does the training support multiple GPUs? Kindly correct if I am mistaken, but as far as I can tell, the codes only support single GPU now?

Error: for batch_index, (images, labels) in enumerate(cifar100_training_loader)

您好,我在运行您的程序时在train.py中train函数、第33行 for batch_index, (images, labels) in enumerate(cifar100_training_loader)报错如下:
发生异常: TypeError
Traceback (most recent call last):
File "/home/zbs/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 57, in _worker_loop
samples = collate_fn([dataset[i] for i in batch_indices])
File "/home/zbs/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 138, in default_collate
return [default_collate(samples) for samples in transposed]
File "/home/zbs/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 138, in
return [default_collate(samples) for samples in transposed]
File "/home/zbs/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 140, in default_collate
raise TypeError((error_msg.format(type(batch[0]))))
TypeError: batch must contain tensors, numbers, dicts or lists; found <class 'torchvision.transforms.Compose'>
File "/home/zbs/Desktop/pytorch-cifar100/train.py", line 33, in train
for batch_index, (images, labels) in enumerate(cifar100_training_loader):
File "/home/zbs/Desktop/pytorch-cifar100/train.py", line 160, in
train(epoch)
请问您能帮我解决吗?非常感谢,Nice work!

WinError123

[WinError 123] 文件名、目录名或卷标语法不正确。: 'runs\vgg16\2019-03-16T19:09:02.288156'

vgg.py

nn.Linear(512, 4096)
nn.Linear(512 * 7 * 7, 4096)

No softmax layer

Is there no softmax layer for all models? Will this problem influence the test acc?

model better than torchvision model on cifar100

Hi,

I thank you for your repo.
I tried to compare the accuracy of your resnet18 with the torchvision one, I do not understand why without pretrained, your model gives 70+ accuracy but the torchvision only 55. Have you implemented something specific for the cifar dataset ? (cifar100)

accuracy of shufflenet v2

I tried to train shufflenet v2 with your scipt with default hyper parameters you set.
It should yield 69.51 % of best test accuracy but i got best test accuracy 61.68%. So what hyper parameters did you use??

#用自己的训练集

文件形式是什么样的?一致报错Permission denied: 'F:\rock_image\shierlei_kc\train'

lower accuracy

Thanks for your great work! I use mobilenet to train the model, warm is set to 2 or 1, but the top1 acc is only 0.5747, which is about 9% difference from yours. Do you know what is wrong? My pytorch version is 1.2, will this affect? Looking for your reply!

Tensor sizes mismatches

Hi,

I am implementing ResNet18 models using the CIFAR-100 dataset.

I checked all the dimensions but got this error:

RuntimeError: The size of tensor a (100) must match the size of tensor b (32) at non-singleton dimension 3

Can you please tell me how to fix it?

Thanks

CUDA out of memory problem

It seems some of the nets define in models has some hidden bug.
For example, I use senet and will get CUDA out of memory error, but my batch_size is only 64, my GPU memory is 11G。

But when I use the model file here
https://github.com/moskomule/senet.pytorch/tree/master/senet
that only occupy 7G memory when batch_size=90.

I find senet.py resnext.py inceptionv4.py both has similar problem,may be more models.

tensorboardX path

Traceback (most recent call last):
File "D:\Anaconda\envs\PyTorch\lib\site-packages\tensorboardX\record_writer.py", line 47, in directory_check
factory = REGISTERED_FACTORIES[prefix]
KeyError: 'runs\vgg16\2020-03-08T16'

Links at README.md

Thank you for the code.
Please note that all three links under the paragraph "Training Details" refer to the same paper.

Densenet: wrong structure of transition layer

According to original densenet implementation, the transition layer should be BN-ReLU-Conv-Pool, but the code in this repository is BN-Conv-Pool. BN-ReLU is missing, which may hurt the accuracy of the model.

the densenet (from paper author):
https://github.com/liuzhuang13/DenseNet/blob/cf511e4add35a7d7a921901101ce7fa8f704aee2/models/densenet.lua#L37-L52

this repo:

class Transition(nn.Module):
def __init__(self, in_channels, out_channels):
super().__init__()
#"""The transition layers used in our experiments
#consist of a batch normalization layer and an 1×1
#convolutional layer followed by a 2×2 average pooling
#layer""".
self.down_sample = nn.Sequential(
nn.BatchNorm2d(in_channels),
nn.Conv2d(in_channels, out_channels, 1, bias=False),
nn.AvgPool2d(2, stride=2)
)

by the way, maybe the description in the paper is misleading:

The transitionlayers used in our experiments consist of a batch normal-ization layer and an 1×1 convolutional layer followed by a2×2 average pooling layer.

学习率是增加的?

raining Epoch: 1 [3968/8000] Loss: 1.3380 LR: 0.004960
Training Epoch: 1 [3976/8000] Loss: 0.3088 LR: 0.004970
Training Epoch: 1 [3984/8000] Loss: 0.6474 LR: 0.004980
Training Epoch: 1 [3992/8000] Loss: 0.4500 LR: 0.004990
Training Epoch: 1 [4000/8000] Loss: 0.6452 LR: 0.005000
Training Epoch: 1 [4008/8000] Loss: 0.9984 LR: 0.005010
Training Epoch: 1 [4016/8000] Loss: 0.7139 LR: 0.005020
Training Epoch: 1 [4024/8000] Loss: 0.6220 LR: 0.005030
Training Epoch: 1 [4032/8000] Loss: 0.4329 LR: 0.005040
Training Epoch: 1 [4040/8000] Loss: 0.4127 LR: 0.005050
Training Epoch: 1 [4048/8000] Loss: 0.4696 LR: 0.005060
Training Epoch: 1 [4056/8000] Loss: 0.5181 LR: 0.005070
Training Epoch: 1 [4064/8000] Loss: 0.4105 LR: 0.005080
Training Epoch: 1 [4072/8000] Loss: 0.7041 LR: 0.005090
Training Epoch: 1 [4080/8000] Loss: 0.3864 LR: 0.005100
Training Epoch: 1 [4088/8000] Loss: 0.6991 LR: 0.005110
Training Epoch: 1 [4096/8000] Loss: 0.3007 LR: 0.005120
Training Epoch: 1 [4104/8000] Loss: 0.3111 LR: 0.005130
Training Epoch: 1 [4112/8000] Loss: 0.3763 LR: 0.005140
Training Epoch: 1 [4120/8000] Loss: 0.5825 LR: 0.005150
Training Epoch: 1 [4128/8000] Loss: 0.5528 LR: 0.005160
Training Epoch: 1 [4136/8000] Loss: 0.3553 LR: 0.005170
Training Epoch: 1 [4144/8000] Loss: 0.2654 LR: 0.005180
Training Epoch: 1 [4152/8000] Loss: 0.3935 LR: 0.005190
Training Epoch: 1 [4160/8000] Loss: 0.2935 LR: 0.005200
Training Epoch: 1 [4168/8000] Loss: 0.2382 LR: 0.005210
Training Epoch: 1 [4176/8000] Loss: 0.2893 LR: 0.005220

Maybe some wrong in Xception

Hi,thanks for your nice job!

when i study the xception,i reading the paper and compare your implementation.
in your model/xception/SeperableConv2d, i get some confusion about:

in the paper,the author tell us that xception first through the 1x1 conv, but in your imple, you using depthwise first which is not 1x1 conv right?
image

    def forward(self, x):
        x = self.depthwise(x)
        x = self.pointwise(x)

        return x

Training stage

Do you train the model from scratch? or based on pre-trained weights on ImageNet

Mistakes of MobileNetV2

Dear sir,I found a mistake when I use the MobileNetV2 model. You can print the architecture and check it. The fisrt layer has a output of [32,226] , which expects [32,112] . I'd like to fix this error .

Model estimates criterion

I have noticed that you use the full training set to train the model and select the best model directly on the test set? This practice will not underestimate the test error ? As I often split a validation set and choose the model based on the performance on the validation set. Thanks in advance

Unable to execute code on test set

Hi,

I am trying to test the model on Google Colab using the command you have put up in your README.md file ( !python test.py -net vgg16 -weights ./checkpoint/vgg16/Tuesday_04_August_2020_15h_05m_17s/vgg16-171-best.pth). However, I seem to be running into this issue

Screenshot 2020-08-04 at 18 07 46

To my understanding, this issue only comes up when the model is on the GPU but data is on the CPU. I checked the code to see what the issue could be but found that not only is the model being loaded to the Colab GPU, but also the labels and images. I'm not sure if this problem arises due to it being executed on Google Colab (I do not have access to a GPU locally). Would appreciate your help regarding this.

resnet50 结果

B`FE{7{WSIV WBMAJ@D0ARV
为什么结果比表格好很多 没有修改 ,resnet50 183-best-pth
是我测试错了吗

GPU利用率

博主您好,请问我在训练时GPU利用率只能维持在10%左右,对应的batch_size是64,num_workers是8。请问有没有什么办法提高利用率呢?

Model Accuracy

i use your model "seresnet18" to train cifar-100 datasets, and i try your train Details, "200epoch , init lr = 0.1 divide by 5 at 60th, 120th, 160th epochs, train for 200 epochs with batchsize 128 and weight decay 5e-4, " but i cant get the 23.56 accuracty rate , i only get 33.01 accuracy rate!
can you help me ?

The histogram is empty

I am doing a classification task where I have changed the CIFAR dataset to my custom dataset. When I started the first epoch, the loss value is too big and after some iteration, the loss becomes "nan". After completing 1 epoch, the program is crashed.

Traceback (most recent call last):
  File "/media/khawar/HDD_Khawar1/CVPR/pytorch-cifar100/train.py", line 213, in <module>
    train(epoch)
  File "/media/khawar/HDD_Khawar1/CVPR/pytorch-cifar100/train.py", line 71, in train
    writer.add_histogram("{}/{}".format(layer, attr), param, epoch)
  File "/home/khawar/.local/lib/python3.6/site-packages/torch/utils/tensorboard/writer.py", line 425, in add_histogram
    histogram(tag, values, bins, max_bins=max_bins), global_step, walltime)
  File "/home/khawar/.local/lib/python3.6/site-packages/torch/utils/tensorboard/summary.py", line 226, in histogram
    hist = make_histogram(values.astype(float), bins, max_bins)
  File "/home/khawar/.local/lib/python3.6/site-packages/torch/utils/tensorboard/summary.py", line 264, in make_histogram
    raise ValueError('The histogram is empty, please file a bug report.')
ValueError: The histogram is empty, please file a bug report.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.