Git Product home page Git Product logo

batchrenormalization's Introduction

Batch Renormalization

Batch Renormalization algorithm implementation in Keras 2.0+. Original paper by Sergey Ioffe, Batch Renormalization: Towards Reducing Minibatch Dependence in Batch-Normalized Models.

Usage

Add the batch_renorm.py script into your repository, and import the BatchRenormalization layer.

Eg. You can replace Keras BatchNormalization layers with BatchRenormalization layers.

from batch_renorm import BatchRenormalization

Performance

Using BatchRenormalization layers requires slightly more time than the simpler BatchNormalization layer.

Observed speed differences in WRN-16-4 with respect to BatchNormalization on a 980M GPU:

  1. Batch Normalization : 137 seconds per epoch.

  2. Batch Renormalization (Mode 0) : 152 seconds per epoch.

  3. Batch Renormalization (Mode 2) : 142 seconds per epoch.

Results

The following graph is from training a Wide Residual Network (WRN-16-4) on the CIFAR 10 dataset, with no data augmentation and no dropout. Therefore all models clearly overfit.

However, the graphs compare WRN-16-4 model with Keras BatchNormalization (mode 0) with BatchRenormalization (mode 0 and mode 2). All other parameters are kept constant.

Training curve

Parameters

There are several parameters that are present in addition to the parameters in BatchNormalization layers.

r_max_value: The clipped maximum value that the internal parameter 'r' can take. The value of r will be clipped in the range
             (1 / r_max_value, r_max_value) after a sufficient number of iterations. 
             The paper suggests a default value of 3.
             
d_max_value: The clipped maximum value that the internal parameter 'd' can take. The value of d will be clipped in the range
             (-d_max_value, d_max_value) after a sufficient number of iterations. 
             The paper suggests a default value of 5.
             
t_delta:     This parameter determines in how many iterations the internal r_max and d_max values will become equal to 
             r_max_value and d_max_value. 
             
             Default setting is 1, which means that in 5 iterations the internal parameters 
             will become their maximum value.
             
             Values larger than 1 can cause gradient explosion, and prevent learning of anything useful.
             
             Using very small values will lead to slower learning, but eventually will lead to the same result as using 
             t_delta = 1. 
             
             Sugggested t_delta values = 1 to 1e-3.

Requirements

Keras 1.2.1 (will be updated when Keras 2 launches)

Theano / Tensorflow

h5py

seaborn (optional, for plotting training graph)

batchrenormalization's People

Contributors

brucedai003 avatar titu1994 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.