Git Product home page Git Product logo

wide-resnet.pytorch's Introduction

Best CIFAR-10, CIFAR-100 results with wide-residual networks using PyTorch

Pytorch Implementation of Sergey Zagoruyko's Wide Residual Networks

For Torch implementations, see here.

Requirements

See the installation instruction for a step-by-step installation guide. See the server instruction for server settup.

pip install http://download.pytorch.org/whl/cu80/torch-0.1.12.post2-cp27-none-linux_x86_64.whl
pip install torchvision
git clone https://github.com/meliketoy/wide-resnet.pytorch

How to run

After you have cloned the repository, you can train each dataset of either cifar10, cifar100 by running the script below.

python main --lr 0.1 resume false --net_type [lenet/vggnet/resnet/wide-resnet] --depth 28 --widen_factor 10 --dropout_rate 0.3 --dataset [cifar10/cifar100] 

Implementation Details

epoch learning rate weight decay Optimizer Momentum Nesterov
0 ~ 60 0.1 0.0005 Momentum 0.9 true
61 ~ 120 0.02 0.0005 Momentum 0.9 true
121 ~ 160 0.004 0.0005 Momentum 0.9 true
161 ~ 200 0.0008 0.0005 Momentum 0.9 true

CIFAR-10 Results

alt tag

Below is the result of the test set accuracy for CIFAR-10 dataset training.

Accuracy is the average of 5 runs

network dropout preprocess GPU:0 GPU:1 per epoch accuracy(%)
wide-resnet 28x10 0 ZCA 5.90G - 2 min 03 sec 95.83
wide-resnet 28x10 0 meanstd 5.90G - 2 min 03 sec 96.21
wide-resnet 28x10 0.3 meanstd 5.90G - 2 min 03 sec 96.27
wide-resnet 28x20 0.3 meanstd 8.13G 6.93G 4 min 10 sec 96.55
wide-resnet 40x10 0.3 meanstd 8.08G - 3 min 13 sec 96.31
wide-resnet 40x14 0.3 meanstd 7.37G 6.46G 3 min 23 sec 96.34

CIFAR-100 Results

alt tag

Below is the result of the test set accuracy for CIFAR-100 dataset training.

Accuracy is the average of 5 runs

network dropout preprocess GPU:0 GPU:1 per epoch Top1 acc(%) Top5 acc(%)
wide-resnet 28x10 0 ZCA 5.90G - 2 min 03 sec 80.07 95.02
wide-resnet 28x10 0 meanstd 5.90G - 2 min 03 sec 81.02 95.41
wide-resnet 28x10 0.3 meanstd 5.90G - 2 min 03 sec 81.49 95.62
wide-resnet 28x20 0.3 meanstd 8.13G 6.93G 4 min 05 sec 82.45 96.11
wide-resnet 40x10 0.3 meanstd 8.93G - 3 min 06 sec 81.42 95.63
wide-resnet 40x14 0.3 meanstd 7.39G 6.46G 3 min 23 sec 81.87 95.51

wide-resnet.pytorch's People

Contributors

meliketoy avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

wide-resnet.pytorch's Issues

Activation functions in WRN?

Hello I miss somehow your activation function: https://github.com/meliketoy/wide-resnet.pytorch/blob/master/networks/wide_resnet.py
when I load your model and print. There aren't any. See: https://gist.github.com/jS5t3r/797bc39c9706a687eb05b027ae711d4c

I think that, it ought to look like this: https://gist.github.com/jS5t3r/fde796a3154c39f961f0d3686b88b722

Why arent there any activation functions? Is that on purpose?

PS.: This is the original repository: https://github.com/szagoruyko/wide-residual-networks/blob/master/pytorch/resnet.py
I havnt tried it out.

Stride is Wrong

Following from the model graph for wideresnet50 with depth 28 and widen_factor = 10, layer2.0.conv2 and layer3.0.conv2 have stride - stride=(2, 2). It should be layer2.0.conv1 and layer3.0.conv1 that have stride=(2, 2), while layer2.0.conv2 and layer3.0.conv2 should have stride stride=(1, 1).

Here is the model graph:

Resnet(
  (model): Wide_ResNet(
    (conv1): Conv2d(3, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (layer1): Sequential(
      (0): wide_basic(
        (bn1): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu1): ReLU()
        (conv1): Conv2d(16, 160, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (dropout): Dropout(p=0, inplace=False)
        (bn2): BatchNorm2d(160, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu2): ReLU()
        (conv2): Conv2d(160, 160, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (shortcut): Sequential(
          (0): Conv2d(16, 160, kernel_size=(1, 1), stride=(1, 1))
        )
      )
      (1): wide_basic(
        (bn1): BatchNorm2d(160, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu1): ReLU()
        (conv1): Conv2d(160, 160, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (dropout): Dropout(p=0, inplace=False)
        (bn2): BatchNorm2d(160, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu2): ReLU()
        (conv2): Conv2d(160, 160, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (shortcut): Sequential()
      )
      (2): wide_basic(
        (bn1): BatchNorm2d(160, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu1): ReLU()
        (conv1): Conv2d(160, 160, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (dropout): Dropout(p=0, inplace=False)
        (bn2): BatchNorm2d(160, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu2): ReLU()
        (conv2): Conv2d(160, 160, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (shortcut): Sequential()
      )
      (3): wide_basic(
        (bn1): BatchNorm2d(160, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu1): ReLU()
        (conv1): Conv2d(160, 160, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (dropout): Dropout(p=0, inplace=False)
        (bn2): BatchNorm2d(160, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu2): ReLU()
        (conv2): Conv2d(160, 160, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (shortcut): Sequential()
      )
    )
    (layer2): Sequential(
      (0): wide_basic(
        (bn1): BatchNorm2d(160, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu1): ReLU()
        (conv1): Conv2d(160, 320, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (dropout): Dropout(p=0, inplace=False)
        (bn2): BatchNorm2d(320, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu2): ReLU()
        (conv2): Conv2d(320, 320, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
        (shortcut): Sequential(
          (0): Conv2d(160, 320, kernel_size=(1, 1), stride=(2, 2))
        )
      )
      (1): wide_basic(
        (bn1): BatchNorm2d(320, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu1): ReLU()
        (conv1): Conv2d(320, 320, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (dropout): Dropout(p=0, inplace=False)
        (bn2): BatchNorm2d(320, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu2): ReLU()
        (conv2): Conv2d(320, 320, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (shortcut): Sequential()
      )
      (2): wide_basic(
        (bn1): BatchNorm2d(320, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu1): ReLU()
        (conv1): Conv2d(320, 320, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (dropout): Dropout(p=0, inplace=False)
        (bn2): BatchNorm2d(320, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu2): ReLU()
        (conv2): Conv2d(320, 320, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (shortcut): Sequential()
      )
      (3): wide_basic(
        (bn1): BatchNorm2d(320, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu1): ReLU()
        (conv1): Conv2d(320, 320, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (dropout): Dropout(p=0, inplace=False)
        (bn2): BatchNorm2d(320, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu2): ReLU()
        (conv2): Conv2d(320, 320, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (shortcut): Sequential()
      )
    )
    (layer3): Sequential(
      (0): wide_basic(
        (bn1): BatchNorm2d(320, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu1): ReLU()
        (conv1): Conv2d(320, 640, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (dropout): Dropout(p=0, inplace=False)
        (bn2): BatchNorm2d(640, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu2): ReLU()
        (conv2): Conv2d(640, 640, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
        (shortcut): Sequential(
          (0): Conv2d(320, 640, kernel_size=(1, 1), stride=(2, 2))
        )
      )
      (1): wide_basic(
        (bn1): BatchNorm2d(640, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu1): ReLU()
        (conv1): Conv2d(640, 640, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (dropout): Dropout(p=0, inplace=False)
        (bn2): BatchNorm2d(640, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu2): ReLU()
        (conv2): Conv2d(640, 640, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (shortcut): Sequential()
      )
      (2): wide_basic(
        (bn1): BatchNorm2d(640, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu1): ReLU()
        (conv1): Conv2d(640, 640, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (dropout): Dropout(p=0, inplace=False)
        (bn2): BatchNorm2d(640, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu2): ReLU()
        (conv2): Conv2d(640, 640, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (shortcut): Sequential()
      )
      (3): wide_basic(
        (bn1): BatchNorm2d(640, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu1): ReLU()
        (conv1): Conv2d(640, 640, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (dropout): Dropout(p=0, inplace=False)
        (bn2): BatchNorm2d(640, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu2): ReLU()
        (conv2): Conv2d(640, 640, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (shortcut): Sequential()
      )
    )
    (bn1): BatchNorm2d(640, eps=1e-05, momentum=0.9, affine=True, track_running_stats=True)
    (relu1): ReLU()
    (linear): Linear(in_features=640, out_features=100, bias=True)
  )
)

Wrong Implementation

@meliketoy

  1. BatchNorm2d
self.bn1 = nn.BatchNorm2d(nStages[3], momentum=0.9)

In the official Pytorch code, they use the default value of momentum, momentum=0.1.
https://github.com/szagoruyko/wide-residual-networks/blob/ae6d0d0561484172790c7a63c8ce6ade5a5a2914/pytorch/utils.py#L55
I think you've confused with the momentum used in Tensorflow which has different meanings.
https://stackoverflow.com/questions/48345857/batchnorm-momentum-convention-pytorch

  1. AvgPool
out = F.avg_pool2d(out, 8)

In the official Pytorch code, they use F.avg_pool2d(out, 8, 1, 0), but you've used the default stride which is same as the kernel size = 8.
https://github.com/szagoruyko/wide-residual-networks/blob/master/pytorch/resnet.py#L56

How to train WRN 34?

Do I need to set the widen factor to 1? or 0?

```python main --lr 0.1 resume false --net_type [lenet/vggnet/resnet/wide-resnet] --depth 34 --widen_factor 1 --dropout_rate 0.3 --dataset cifar10``

IndexError

Hi,

After following the install instructions, when I try and run main.py on CIFAR10, on the first epoch I get the following error:

Traceback (most recent call last): File "../main.py", line 220, in <module> train(epoch) File "../main.py", line 163, in train train_loss += loss.data[0] IndexError: invalid index of a 0-dim tensor. Use tensor.item() to convert a 0-dim tensor to a Python number

Any ideas?

Definition of Optimizer in train function

Hi,
First of all, Thank you so much for sharing the code.

Maybe because of my lack of knowledge, I wonder that is there any difference between declaring an optimizer every epoch and declaring once at the start of training?
In code, the declaration of optimization is in train function!

Thank you!

Unable to reproduce the accuracy of WRN-28-10 on Cifar-100

I git cloned the code and ran it with the command suggested by readme. However, the Top1 acc stopped at 76% after 160 epochs. I've seen the learning curve in the paper, and found that my model failed to reach 65% acc before 60 epochs. Instead, it just got around 6% lower. Could you please give some suggestion on debugging?

Wide-ResNet 28-10-0.3 for CIAFR10 produced a low test accuracy

dear Author,

I downloaded your code and reproduced your experiment on cifar-10 according to your settings, but got 95.24 % test accuracy with cross entropy loss. I am not sure where I have omissions, look forward to your reply!

experimetn settings:
parser.add_argument('--lr', default=0.1, type=float, help='learning_rate')
parser.add_argument('--net_type', default='wide-resnet', type=str, help='model')
parser.add_argument('--depth', default=28, type=int, help='depth of model')
parser.add_argument('--widen_factor', default=10, type=int, help='width of model')
parser.add_argument('--dropout', default=0.3, type=float, help='dropout_rate')

start_epoch = 1
num_epochs = 200
batch_size = 128
optim_type = 'SGD'

mean = {
'cifar10': (0.4914, 0.4822, 0.4465),
'cifar100': (0.5071, 0.4867, 0.4408),
}

std = {
'cifar10': (0.2023, 0.1994, 0.2010),
'cifar100': (0.2675, 0.2565, 0.2761),
}

optimizer = optim.SGD(net.parameters(), lr=cf.learning_rate(args.lr, epoch), momentum=0.9, weight_decay=5e-4)

the result:
image

Checkpoints

Could you please provide the checkpoints for the trained models? Thanks!

Var of CIFAR10 wrong

Mine:

dataset = datasets.CIFAR10(root='./data', download=True, train=True, transform=transforms.ToTensor())
mean = dataset.data.astype(float).mean(axis=(0,1,2)) / 255
std = dataset.data.astype(float).std(axis=(0,1,2)) / 255
print("cifar10", np.round(mean, 4), np.round(std, 4))

--> cifar10 [0.4914 0.4822 0.4465] [0.247  0.2435 0.2616]

Yours:

mean = { 'cifar10': (0.4914, 0.4822, 0.4465) }
std = { 'cifar10': (0.2023, 0.1994, 0.2010) }

These ResNets are not for CIFAR10

The ResNets from the authors on the original paper change the sizes when applied to CIFAR10. It just don't match the dimensions you specified here:

ResNets for CIFAR10 are 2+6n layers leading to architectures:
ResNet20, ResNet32, ResNet44, ResNet56, ResNet110, ResNet1202.
This is mixed up with ResNets for ImageNet datasets but the models should not be the same.

Furthermore, in the paper they don't use neither BasicBlock or BottleneckBlock. They just pad the input volume to match the dimensions before summation every 2n layers, when a downsampling is performed by a stride of 2

About dropout and shortcut in residual block

For dropout: I believe the dropout layer should come after the ReLU layer rather than Conv2d
For shortcut: It seems a Conv2d layer is added when the stride is not 1, but why?

Support for imagenet?

Hi,
nice repo!
I noticed the code does not work on imagenet. It is a input dimensionality problem. It would be very nice if you could fix it!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.