titu1994 / wide-residual-networks Goto Github PK

View Code? Open in Web Editor NEW

112.0 4.0 44.0 205.86 MB

Wide Residual Networks in Keras

Python 100.00%

residual-networks wide deep-learning keras

wide-residual-networks's Introduction

Wide Residual Networks in Keras

Implementation of Wide Residual Networks from the paper Wide Residual Networks in Keras.

Usage

It can be used by importing the wide_residial_network script and using the create_wide_residual_network() method. There are several parameters which can be changed to increase the depth or width of the network.

Note that the number of layers can be calculated by the formula : nb_layers = 4 + 6 * N
Therefore N can be computed as : N = (nb_layers - 4) / 6

import wide_residial_network as wrn
ip = Input(shape=(3, 32, 32)) # For CIFAR 10

wrn_28_10 = wrn.create_wide_residual_network(ip, nb_classes=10, N=4, k=10, dropout=0.0, verbose=1)

model = Model(ip, wrn_28_10)

Testing

WRN-16-8

The WRN-16-8 model has been tested on the CIFAR 10 dataset. It achieves a score of 93.68% after 100 epochs. It is not as high as the accuracy posted in the paper (95.19%), however the score may improve with further training.

Training was done by using the Adam optimizer instead of SGD+Momentum for faster convergence. The history of training/validation accuracy and loss is not available for the first 30 epochs due to an overwriting of the files. However the history of the last 70 epochs has been shown in the figure below. The script and weights for this model are also provided.

WRN-28-8

The WRN-28-10 model could not be used due to GPU memory constraints, hence WRN-28-8 model was used instead with a batch size of 64. Each epoch requires roughly 886 seconds, and therefore this was only run for 100 epochs. It achieves a score of 95.08 %, less than the best score of 95.83 % obtained by the WRN-28-10 network.

The Adadelta optimizer was used instead of SGD+Momentum for faster convergence. The history of training/validation accuracy and loss is shown as below. The script and weights for this model are also provided.

Models

The below model is the WRN-28-8 model.

wide-residual-networks's People

Contributors

Stargazers

Watchers

wide-residual-networks's Issues

wrn_28_8_tf_kernels_tf_dim_ordering_no_top.h5 requires object "flatten_2" when used without including top

When I try to initialize a network using these cifar10 weights [wrn_28_8_tf_kernels_tf_dim_ordering_no_top.h5] on a model not including top, the load_weights method fails out with the error:

File "h5py/h5o.pyx", line 190, in h5py.h5o.open (/home/ilan/minonda/conda-bld/h5py_1490028461908/work/h5py/h5o.c:3740) KeyError: "Unable to open object (Object 'flatten_2' doesn't exist)"

As the only flatten layer in the model is when you include top (i.e. including the following two layers) this seems like an error in the .h5 file.

x = Flatten()(x) x = Dense(nb_classes, activation='softmax')(x)

Missing weight decay in model

The model is missing the weight decay of 0.0005 described in the paper. This could be the cause of lower model accuracy than expected

Incorrect order: conv-BN-ReLU instead of BN-ReLU-conv

Described in the original paper and as can be seen in the original implementation, the authors suggested that for blocks, the internal structure should be: BatchNormalization-ReLU-Convolution instead of Convolution-BatchNormalization-ReLU.

Any particular reason, why the order was changed? If not, I would propose that we change it to represent the same architecture as proposed by the original authors.

"Install" Models and Transfer Learning

Cannot install models since no installer. Cannot properly import models since they are not a class.
How then does example say import wide_residual_net as wrn and ...wrn.create_wide_residual_network...? This always produces an (expected) error even after I save the file to working directory. This goes for DenseNet and the other models as well. I really want to use these models simply by using the import statements but its not possible due to my misunderstanding.
Is there a way once the model is created to remove the classification layers so that it will make transfer learning easier? Or to be more general get intermediate output from hidden layers in general with the model.predict command (or different method)?

Where the maxpooling come from?

BTW, we should use preact (resnet -V2 style), rather than post-act(V1 style).

How to achieve the desired verification accuracy？

I was using the default settings (with Tensorflow backend) but my test accuracies are worse:

For WRN-16-8 put N = 2, k = 8
@ Epoch 82/100: - 184s - loss: 0.0711 - acc: 0.9768 - val_loss: 0.4480 - val_acc: 0.9187
For WRN-28-10 put N = 4, k = 10
@ Epoch119/200 - 378s - loss: 0.0260 - acc: 0.9915 - val_loss: 0.4396 - val_acc: 0.9213

There seems to be a severe overfitting (each model contains more than 10 million trainable parameters).
I was wondering what can I do to achieve the desired accuracy?
Do I need to change the preprocessing settings? or adjust the learning rate? or reduce filters for each layer? ...
Please~

issue with loading weights for WRN-28-8

This is a very useful repository - thank you for contributing.

I am trying to run the code for WRN-28-8 on Python 3 using Keras 2.1.1 and TensorFlow backend in a jupyter notebook.

When I get to loading weights:

model.load_weights("weights/WRN-28-8 Weights.h5")
print("Model loaded.")

It throws an error: (full error message below my name)

ValueError: Layer #0 (named "conv2d_1" in the current model) was found to correspond to layer convolution2d_1 in the save file. However the new layer conv2d_1 expects 1 weights, but the saved weights have 2 elements.

Do you know what's going on here? To me this looks like a version issue, but your weights seem to be meant for Keras 2. What about the backend and python version?

Thanks,
Astha

Finished compiling
Allocating GPU memory
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-13-1a29b71a7c3c> in <module>()
      3 print("Allocating GPU memory")
      4 
----> 5 model.load_weights("weights/WRN-28-8 Weights.h5")
      6 print("Model loaded.")

~\Anaconda2\envs\tensorflow\lib\site-packages\keras\engine\topology.py in load_weights(self, filepath, by_name)
   2620             load_weights_from_hdf5_group_by_name(f, self.layers)
   2621         else:
-> 2622             load_weights_from_hdf5_group(f, self.layers)
   2623 
   2624         if hasattr(f, 'close'):

~\Anaconda2\envs\tensorflow\lib\site-packages\keras\engine\topology.py in load_weights_from_hdf5_group(f, layers)
   3138                              ' weights, but the saved weights have ' +
   3139                              str(len(weight_values)) +
-> 3140                              ' elements.')
   3141         weight_value_tuples += zip(symbolic_weights, weight_values)
   3142     K.batch_set_value(weight_value_tuples)

ValueError: Layer #0 (named "conv2d_1" in the current model) was found to correspond to layer convolution2d_1 in the save file. However the new layer conv2d_1 expects 1 weights, but the saved weights have 2 elements.

Pretrained model with Imagenet dataset

It would be great if the pretrained model with imagenet dataset ca be provided. Wide residual network can be quite good at feature extracting. The pretrained model weight will help with transfer learning. :)

Exception while running the WRN_28_8 pretrained model.

Hi @titu1994
I got the following excpetion while trying to run the cifar10_wrn_28_8.py file

Traceback (most recent call last):

  File "<ipython-input-4-a18c4043abdc>", line 1, in <module>
    runfile('C:/Users/engadmin/.spyder-py3/test_WRN28.py', wdir='C:/Users/engadmin/.spyder-py3')

  File "c:\users\engadmin\anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 710, in runfile
    execfile(filename, namespace)

  File "c:\users\engadmin\anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 101, in execfile
    exec(compile(f.read(), filename, 'exec'), namespace)

  File "C:/Users/engadmin/.spyder-py3/test_WRN28.py", line 49, in <module>
    model.load_weights("WRN-28-8 Weights.h5")

  File "c:\users\engadmin\anaconda3\lib\site-packages\keras\engine\topology.py", line 2619, in load_weights
    load_weights_from_hdf5_group(f, self.layers)

  File "c:\users\engadmin\anaconda3\lib\site-packages\keras\engine\topology.py", line 3093, in load_weights_from_hdf5_group
    ' elements.')

ValueError: Layer #0 (named "conv2d_1" in the current model) was found to correspond to layer convolution2d_1 in the save file. However the new layer conv2d_1 expects 1 weights, but the saved weights have 2 elements.

both new and old weights don't load

Using keras 2.2.0. I tried both the new and old weights in the main 28-8 script, and neither load correctly.

ValueError: Layer #0 (named "conv2d_1" in the current model) was found to correspond to layer convolution2d_29 in the save file. However the new layer conv2d_1 expects 1 weights, but the saved weights have 2 elements.

The keras contrib version of this code does load the new weights, but the accuracy on the test set is around 30%. Am I missing something?

Maxpooling instead of conv(stride=2) in the WRN paper?

First of all, thanks a lot for implementing the Wide-ResNet in Keras!
However, I've a question regarding the Max Pooling layer that is used to reduce the dimensionality between the ResNet blocks.

Is this really used in the WRN paper?
Unfortunately I couldn't find any specific information in the paper, but according to their Torch implementation they use a stride of 2 at the beginning of each ResNet block type to reduce the dimensionality and not a Max Pooling layer.
(Torch: https://github.com/szagoruyko/wide-residual-networks/blob/master/models/wide-resnet.lua).
Such that it's similar to the original ResNet paper.

Or am I missing something (maybe it's also due to my poor Torch knowledge while reading their code)?

Batchnorm momentum

According to the definition of momentum in keras doc, the momentum of BN layers should be 0.9, which is not the default momentum value of BN layer of pytorch (0.1).

About the nunber of the convs

In the function create_wide_residual_network,

    x = initial_conv(ip)
    nb_conv = 4

After the initial_conv, why the nb_conv equals 4 rather than 1?

Why subtract by 4 from the number of layers

Wide-Residual-Networks/wide_residual_network.py

Line 118 in f719257

Example : For a depth of 16, n = 16, N = (16 - 4) / 6 = 2

I had another concern. In the original paper they mention that n is the number of convolutional layer, but I couldn't understand why did we subtract by 4 at the first hand. Are we treating n as the total number of layers including convolutional layers, i.e., are we counting batch norm and relu layer?

Thank you!!

Implementation WRN-50-2

Have you tried implementing WRN-50-2 which is Wider Version of ResNet-50.
It is mention in the latest version of this paper:
https://arxiv.org/pdf/1605.07146v4.pdf
This version is work with ImageNet Data, Can you give me some suggestion about how to do it in Keras?

N value not working properly

Hey, I've just found that if I set N to 2, the printed message said WRN 10-4, which is N=1. I looked at the code and I think there's an issue in lines 123, 132, 141 in this for i in range(N-1): thing.

Learning rate schedule

Have you tried applying time schedule to improve the performance as in the article?

titu1994 / wide-residual-networks Goto Github PK

wide-residual-networks's Introduction

Wide Residual Networks in Keras

Usage

Testing

WRN-16-8

WRN-28-8

Models

wide-residual-networks's People

Contributors

Stargazers

Watchers

Forkers

wide-residual-networks's Issues

Recommend Projects

Recommend Topics

Recommend Org