meliketoy / wide-resnet.pytorch Goto Github PK

View Code? Open in Web Editor NEW

451.0 14.0 129.0 674 KB

Best CIFAR-10, CIFAR-100 results with wide-residual networks using PyTorch

License: MIT License

Lua 1.63% Python 94.89% Shell 3.48%

wide-resnet.pytorch's Introduction

Best CIFAR-10, CIFAR-100 results with wide-residual networks using PyTorch

Pytorch Implementation of Sergey Zagoruyko's Wide Residual Networks

For Torch implementations, see here.

Requirements

See the installation instruction for a step-by-step installation guide. See the server instruction for server settup.

Install cuda-8.0
Install cudnn v5.1
Download Pytorch 2.7 and clone the repository.

pip install http://download.pytorch.org/whl/cu80/torch-0.1.12.post2-cp27-none-linux_x86_64.whl
pip install torchvision
git clone https://github.com/meliketoy/wide-resnet.pytorch

How to run

After you have cloned the repository, you can train each dataset of either cifar10, cifar100 by running the script below.

python main --lr 0.1 resume false --net_type [lenet/vggnet/resnet/wide-resnet] --depth 28 --widen_factor 10 --dropout_rate 0.3 --dataset [cifar10/cifar100]

Implementation Details

epoch	learning rate	weight decay	Optimizer	Momentum	Nesterov
0 ~ 60	0.1	0.0005	Momentum	0.9	true
61 ~ 120	0.02	0.0005	Momentum	0.9	true
121 ~ 160	0.004	0.0005	Momentum	0.9	true
161 ~ 200	0.0008	0.0005	Momentum	0.9	true

CIFAR-10 Results

Below is the result of the test set accuracy for CIFAR-10 dataset training.

Accuracy is the average of 5 runs

network	dropout	preprocess	GPU:0	GPU:1	per epoch	accuracy(%)
wide-resnet 28x10	0	ZCA	5.90G	-	2 min 03 sec	95.83
wide-resnet 28x10	0	meanstd	5.90G	-	2 min 03 sec	96.21
wide-resnet 28x10	0.3	meanstd	5.90G	-	2 min 03 sec	96.27
wide-resnet 28x20	0.3	meanstd	8.13G	6.93G	4 min 10 sec	96.55
wide-resnet 40x10	0.3	meanstd	8.08G	-	3 min 13 sec	96.31
wide-resnet 40x14	0.3	meanstd	7.37G	6.46G	3 min 23 sec	96.34

CIFAR-100 Results

Below is the result of the test set accuracy for CIFAR-100 dataset training.

Accuracy is the average of 5 runs

network	dropout	preprocess	GPU:0	GPU:1	per epoch	Top1 acc(%)	Top5 acc(%)
wide-resnet 28x10	0	ZCA	5.90G	-	2 min 03 sec	80.07	95.02
wide-resnet 28x10	0	meanstd	5.90G	-	2 min 03 sec	81.02	95.41
wide-resnet 28x10	0.3	meanstd	5.90G	-	2 min 03 sec	81.49	95.62
wide-resnet 28x20	0.3	meanstd	8.13G	6.93G	4 min 05 sec	82.45	96.11
wide-resnet 40x10	0.3	meanstd	8.93G	-	3 min 06 sec	81.42	95.63
wide-resnet 40x14	0.3	meanstd	7.39G	6.46G	3 min 23 sec	81.87	95.51

wide-resnet.pytorch's People

Contributors

Stargazers

Watchers

Forkers

runngezhang windweller benjamesbabala stevenlol amoliu nutszebra isallam papercoming derekgrant francisyizhang slimwangyue jdaaph helq2612 cornprincess uranusx86 karandwivedi42 b2220333 autuanliu joshualin24 giladcohen haiminzhang miny0401 nikoimagry sehocho afcarl hal2001 fbcotter codingwolfman daemon csyhhu shubhampachori12110095 chaoshen0 jihaonew jl2922 pursueorigin davidjanz zqsiat randophilus synicix indussky8 mariogeiger wangyuustc andrewssobral anishshah cindyguyuehu123 smspillaz monaen sigmaquan grshprajapat lhj815 ahtwq rickardsjogren dc0107 shengzhang90 mymuli lukemshannonhill dawkinszheng2 khanrc msunming prasanna1991 leeyegy queenie88 pramuperera robot-ai-machinelearning peternara deeplearningmachine an-ju happydetective yaoyao-liu 2019125078 laisiewcheng raghavamodhugu minygd zhujq32 damehou choasma peace-hilite mahyarnajibi wangping521 goodpupil laiyurui codingidea rophen2333 lijunyi95 madonokouki rainwangphy bemoregt masora1030 alambert238 pengfight marcozullich shagunsodhani miziha-zp tarokiritani xxuffei cv-scb0 miladkhademinori adam-dziedzic lxl213 thirteendian

wide-resnet.pytorch's Issues

failed to reproduce the accuracy of WRN-28-10 on Cifar-10

unable to reproduce the acc on cifar-10 use wrn-28-10. The gap between the reported acc and th actual acc is huge while all hyper parameters are exactly the same.

Activation functions in WRN?

Hello I miss somehow your activation function: https://github.com/meliketoy/wide-resnet.pytorch/blob/master/networks/wide_resnet.py
when I load your model and print. There aren't any. See: https://gist.github.com/jS5t3r/797bc39c9706a687eb05b027ae711d4c

I think that, it ought to look like this: https://gist.github.com/jS5t3r/fde796a3154c39f961f0d3686b88b722

Why arent there any activation functions? Is that on purpose?

PS.: This is the original repository: https://github.com/szagoruyko/wide-residual-networks/blob/master/pytorch/resnet.py
I havnt tried it out.

Following from the model graph for wideresnet50 with depth 28 and widen_factor = 10, layer2.0.conv2 and layer3.0.conv2 have stride - stride=(2, 2). It should be layer2.0.conv1 and layer3.0.conv1 that have stride=(2, 2), while layer2.0.conv2 and layer3.0.conv2 should have stride stride=(1, 1).

Here is the model graph:

Resnet(
  (model): Wide_ResNet(
    (conv1): Conv2d(3, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (layer1): Sequential(
      (0): wide_basic(
        (bn1): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu1): ReLU()
        (conv1): Conv2d(16, 160, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (dropout): Dropout(p=0, inplace=False)
        (bn2): BatchNorm2d(160, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu2): ReLU()
        (conv2): Conv2d(160, 160, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (shortcut): Sequential(
          (0): Conv2d(16, 160, kernel_size=(1, 1), stride=(1, 1))
        )
      )
      (1): wide_basic(
        (bn1): BatchNorm2d(160, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu1): ReLU()
        (conv1): Conv2d(160, 160, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (dropout): Dropout(p=0, inplace=False)
        (bn2): BatchNorm2d(160, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu2): ReLU()
        (conv2): Conv2d(160, 160, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (shortcut): Sequential()
      )
      (2): wide_basic(
        (bn1): BatchNorm2d(160, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu1): ReLU()
        (conv1): Conv2d(160, 160, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (dropout): Dropout(p=0, inplace=False)
        (bn2): BatchNorm2d(160, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu2): ReLU()
        (conv2): Conv2d(160, 160, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (shortcut): Sequential()
      )
      (3): wide_basic(
        (bn1): BatchNorm2d(160, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu1): ReLU()
        (conv1): Conv2d(160, 160, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (dropout): Dropout(p=0, inplace=False)
        (bn2): BatchNorm2d(160, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu2): ReLU()
        (conv2): Conv2d(160, 160, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (shortcut): Sequential()
      )
    )
    (layer2): Sequential(
      (0): wide_basic(
        (bn1): BatchNorm2d(160, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu1): ReLU()
        (conv1): Conv2d(160, 320, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (dropout): Dropout(p=0, inplace=False)
        (bn2): BatchNorm2d(320, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu2): ReLU()
        (conv2): Conv2d(320, 320, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
        (shortcut): Sequential(
          (0): Conv2d(160, 320, kernel_size=(1, 1), stride=(2, 2))
        )
      )
      (1): wide_basic(
        (bn1): BatchNorm2d(320, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu1): ReLU()
        (conv1): Conv2d(320, 320, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (dropout): Dropout(p=0, inplace=False)
        (bn2): BatchNorm2d(320, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu2): ReLU()
        (conv2): Conv2d(320, 320, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (shortcut): Sequential()
      )
      (2): wide_basic(
        (bn1): BatchNorm2d(320, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu1): ReLU()
        (conv1): Conv2d(320, 320, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (dropout): Dropout(p=0, inplace=False)
        (bn2): BatchNorm2d(320, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu2): ReLU()
        (conv2): Conv2d(320, 320, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (shortcut): Sequential()
      )
      (3): wide_basic(
        (bn1): BatchNorm2d(320, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu1): ReLU()
        (conv1): Conv2d(320, 320, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (dropout): Dropout(p=0, inplace=False)
        (bn2): BatchNorm2d(320, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu2): ReLU()
        (conv2): Conv2d(320, 320, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (shortcut): Sequential()
      )
    )
    (layer3): Sequential(
      (0): wide_basic(
        (bn1): BatchNorm2d(320, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu1): ReLU()
        (conv1): Conv2d(320, 640, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (dropout): Dropout(p=0, inplace=False)
        (bn2): BatchNorm2d(640, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu2): ReLU()
        (conv2): Conv2d(640, 640, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
        (shortcut): Sequential(
          (0): Conv2d(320, 640, kernel_size=(1, 1), stride=(2, 2))
        )
      )
      (1): wide_basic(
        (bn1): BatchNorm2d(640, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu1): ReLU()
        (conv1): Conv2d(640, 640, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (dropout): Dropout(p=0, inplace=False)
        (bn2): BatchNorm2d(640, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu2): ReLU()
        (conv2): Conv2d(640, 640, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (shortcut): Sequential()
      )
      (2): wide_basic(
        (bn1): BatchNorm2d(640, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu1): ReLU()
        (conv1): Conv2d(640, 640, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (dropout): Dropout(p=0, inplace=False)
        (bn2): BatchNorm2d(640, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu2): ReLU()
        (conv2): Conv2d(640, 640, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (shortcut): Sequential()
      )
      (3): wide_basic(
        (bn1): BatchNorm2d(640, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu1): ReLU()
        (conv1): Conv2d(640, 640, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (dropout): Dropout(p=0, inplace=False)
        (bn2): BatchNorm2d(640, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu2): ReLU()
        (conv2): Conv2d(640, 640, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (shortcut): Sequential()
      )
    )
    (bn1): BatchNorm2d(640, eps=1e-05, momentum=0.9, affine=True, track_running_stats=True)
    (relu1): ReLU()
    (linear): Linear(in_features=640, out_features=100, bias=True)
  )
)

Wrong Implementation

@meliketoy

BatchNorm2d

self.bn1 = nn.BatchNorm2d(nStages[3], momentum=0.9)

In the official Pytorch code, they use the default value of momentum, momentum=0.1.
https://github.com/szagoruyko/wide-residual-networks/blob/ae6d0d0561484172790c7a63c8ce6ade5a5a2914/pytorch/utils.py#L55
I think you've confused with the momentum used in Tensorflow which has different meanings.
https://stackoverflow.com/questions/48345857/batchnorm-momentum-convention-pytorch

AvgPool

out = F.avg_pool2d(out, 8)

In the official Pytorch code, they use F.avg_pool2d(out, 8, 1, 0), but you've used the default stride which is same as the kernel size = 8.
https://github.com/szagoruyko/wide-residual-networks/blob/master/pytorch/resnet.py#L56

How to train WRN 34?

Do I need to set the widen factor to 1? or 0?

```python main --lr 0.1 resume false --net_type [lenet/vggnet/resnet/wide-resnet] --depth 34 --widen_factor 1 --dropout_rate 0.3 --dataset cifar10``

IndexError

Hi,

After following the install instructions, when I try and run main.py on CIFAR10, on the first epoch I get the following error:

Traceback (most recent call last): File "../main.py", line 220, in <module> train(epoch) File "../main.py", line 163, in train train_loss += loss.data[0] IndexError: invalid index of a 0-dim tensor. Use tensor.item() to convert a 0-dim tensor to a Python number

Any ideas?

Definition of Optimizer in train function

Hi,
First of all, Thank you so much for sharing the code.

Maybe because of my lack of knowledge, I wonder that is there any difference between declaring an optimizer every epoch and declaring once at the start of training?
In code, the declaration of optimization is in train function!

Thank you!

TypeError: can't multiply sequence by non-int of type 'float' in _wide_layer

I get this issue with using the _wide_layer

     30     def make_layer(self, block, out_channels, n_blocks, dropout_rate, stride):
---> 31         strides = [stride] + [1]*(n_blocks-1)
     32         layers = []
     33 

TypeError: can't multiply sequence by non-int of type 'float'

Unable to reproduce the accuracy of WRN-28-10 on Cifar-100

I git cloned the code and ran it with the command suggested by readme. However, the Top1 acc stopped at 76% after 160 epochs. I've seen the learning curve in the paper, and found that my model failed to reach 65% acc before 60 epochs. Instead, it just got around 6% lower. Could you please give some suggestion on debugging?

Wide-ResNet 28-10-0.3 for CIAFR10 produced a low test accuracy

dear Author,

I downloaded your code and reproduced your experiment on cifar-10 according to your settings, but got 95.24 % test accuracy with cross entropy loss. I am not sure where I have omissions, look forward to your reply！

experimetn settings:
parser.add_argument('--lr', default=0.1, type=float, help='learning_rate')
parser.add_argument('--net_type', default='wide-resnet', type=str, help='model')
parser.add_argument('--depth', default=28, type=int, help='depth of model')
parser.add_argument('--widen_factor', default=10, type=int, help='width of model')
parser.add_argument('--dropout', default=0.3, type=float, help='dropout_rate')

start_epoch = 1
num_epochs = 200
batch_size = 128
optim_type = 'SGD'

mean = {
'cifar10': (0.4914, 0.4822, 0.4465),
'cifar100': (0.5071, 0.4867, 0.4408),
}

std = {
'cifar10': (0.2023, 0.1994, 0.2010),
'cifar100': (0.2675, 0.2565, 0.2761),
}

optimizer = optim.SGD(net.parameters(), lr=cf.learning_rate(args.lr, epoch), momentum=0.9, weight_decay=5e-4)

the result:

Checkpoints

Could you please provide the checkpoints for the trained models? Thanks!

Var of CIFAR10 wrong

Mine:

dataset = datasets.CIFAR10(root='./data', download=True, train=True, transform=transforms.ToTensor())
mean = dataset.data.astype(float).mean(axis=(0,1,2)) / 255
std = dataset.data.astype(float).std(axis=(0,1,2)) / 255
print("cifar10", np.round(mean, 4), np.round(std, 4))

--> cifar10 [0.4914 0.4822 0.4465] [0.247  0.2435 0.2616]

Yours:

mean = { 'cifar10': (0.4914, 0.4822, 0.4465) }
std = { 'cifar10': (0.2023, 0.1994, 0.2010) }

These ResNets are not for CIFAR10

The ResNets from the authors on the original paper change the sizes when applied to CIFAR10. It just don't match the dimensions you specified here:

ResNets for CIFAR10 are 2+6n layers leading to architectures:
ResNet20, ResNet32, ResNet44, ResNet56, ResNet110, ResNet1202.
This is mixed up with ResNets for ImageNet datasets but the models should not be the same.

Furthermore, in the paper they don't use neither BasicBlock or BottleneckBlock. They just pad the input volume to match the dimensions before summation every 2n layers, when a downsampling is performed by a stride of 2

About dropout and shortcut in residual block

For dropout: I believe the dropout layer should come after the ReLU layer rather than Conv2d
For shortcut: It seems a Conv2d layer is added when the stride is not 1, but why?

Support for imagenet?

Hi,
nice repo!
I noticed the code does not work on imagenet. It is a input dimensionality problem. It would be very nice if you could fix it!