Git Product home page Git Product logo

keras-fractalnet's Introduction

FractalNet implementation in Keras

Information

I built this network as stated in the paper but fractals are done iterative instead of functional to avoid the extra complexity when merging the fractals.

The Join layers are built with a shared indicator sampled from a binomial distribution to indicate if global or local drop-path must be used.

When local drop-path is used, each Join layer samples it's own paths. But when global drop-path is used, all the join layers share the same tensor randomly sampled so one of the columns is globally selected.

Notes

In the paper, they state that the last Join layer of each block is switched with the MaxPooling layer because of convenience. I don't do it and finish each block with a Join->MaxPooling but it should not affect the model.

Also it's not clear how and where the Dropout should be used. I found an implementation of the network here by Larsson (one of the paper authors) and he adds it in each convolutional block (Convolution->Dropout->BatchNorm->ReLU). I implemented it the same way.

For testing the deepest column, the network is built with all the columns but the indicator for global drop-path is always set and the tensor with the paths is set to a constant array indicating which column is enabled.

Model

Model graph image of FractalNet(c=3, b=5) generated by Keras: link

Experiments

This results are from the experiments with the code published here. The authors of the paper have not yet released a complete implementation of the network as of the publishing of this so I can't say what's different from theirs code. Also there is no kind of standardization, scaling or normalization across the dataset in these raw tests (which they may have used).

So far the results are promising when compared against Residual Networks. But I couldn't reproduce their deepest-column experiment.

The code here might have bugs too, if you find anything write me or submit a PR and I will rerun the tests.

Test error (%)

Method C10 C100
ResNet (reported by [1]) 13.63 44.76
ResNet Stochastic Depth (reported by [1]) 11.66 37.80
FractalNet (paper w/SGD) 10.18 35.34
FractalNet+dropout/drop-path (paper w/SGD) 7.33 28.20
FractalNet+dropout/drop-path (this w/SGD) 8.76 31.10
FractalNet+dropout/drop-path (this w/Adam) 8.33 31.30
FractalNet+dropout/drop-path/deepest-column (paper w/SGD) 7.27 29.05
FractalNet+dropout/drop-path/deepest-column (this w/SGD) 12.53 43.07
FractalNet+dropout/drop-path/deepest-column (this w/Adam) 12.28 41.32

[1] G. Huang, Y. Sun, Z. Liu, D. Sedra, and K. Weinberger. Deep networks with stochastic depth. arXiv preprint arXiv:1603.09382, 2016.

CIFAR-10

Training as reported by the paper with SGD for 400 epochs starting with 0.02 learning rate and reducing it by 10x each time it reaches half of the remaining epochs (200, 300, 350, 375). Training with Adam is with default parameters.

CIFAR-100

Trained with SGD (as with CIFAR-10) and Adam with default parameters:

Paper

arXiv: FractalNet: Ultra-Deep Neural Networks without Residuals

@article{larsson2016fractalnet,
  title={FractalNet: Ultra-Deep Neural Networks without Residuals},
  author={Larsson, Gustav and Maire, Michael and Shakhnarovich, Gregory},
  journal={arXiv preprint arXiv:1605.07648},
  year={2016}
}

keras-fractalnet's People

Contributors

aicentral avatar snf avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

keras-fractalnet's Issues

Loss is consistenly NaN

I am trying to run the CIFAR-10 example using the default setup/values with the only exception that I use verbose=1 when fitting. The printed loss is always NaN and never seems to budge. There is minor fluctuation in the accuracy but no improvement. I have tried this with Adam and SGD optimizers with same results.

I'm able to reproduce this on GPU and CPU. I'm using bleeding-edge Theano and Keras' current (as of writing).

Help Understanding Code and Appropriate Dependancies

Hello,

How many times is the fractalnet built in this code? From the paper, it seems that it should be built numerous times, but is that how this code is working?

Also, what exactly are all of the dependencies for this code as I'm having issues getting it to run with the current versions of keras and tesnorflow I have installed. Also, I've been trying to use Python3 but I'm not sure if it is supposed to be Python3.

advice for solve the problem #3

I run the code with the lastest tensorflow and python3, only modifying a few lines. However, I also enconunter the same problem as mentioned in #3 , I tried solving this problemm following the direction described in #2 , no effect. Afterward, I carefully check all the code and found that when you add the JoinLayer, the parameter 'global_switch' and 'global_path' are set to fixed values, which means that although this model owns five JoinLayers, all these JoinLayers have different fixed global_switch and global_path respectively. And after these JoinLayers joined to tf Graph, they can only use the fixed values, not flexible values. So the global droppath will take no effect, so will the K,swtich in the function _drop_path(self, inputs). I suggest that the parameter 'global_switch' and 'global_path' can be initialized in the JoinLayer so that they can take effect. @snf

Drop-path implementation

I have taken a deeper look into your FractalNet implementation, but there is one thing I don't understand. Your global_path_arr is initialized with a (seeded) numpy random array. This means that during training it will be generated once and then not changed.

The fact that you seed the np.random at the beginning means that even if the random function was called over and over it would always produce the same results.

I tried to implement FractalNet on Cifar from scratch using your implementation as reference and I have run into the same problem, how to generate one path configuration for drop-path per batch.

Could you clarify if this is a bug or if I missed something entirely?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.