Git Product home page Git Product logo

snns's Introduction

Self-Normalizing Networks

Tutorials and implementations for "Self-normalizing networks"(SNNs) as suggested by Klambauer et al. (arXiv pre-print).

Versions

  • see environment file for full list of prerequisites. Tutorial implementations use Tensorflow > 2.0 (Keras) or Pytorch, but versions for Tensorflow 1.x users based on the deprecated tf.contrib module (with separate environment file) are also available.

Note for Tensorflow >= 1.4 users

Tensorflow >= 1.4 already has the function tf.nn.selu and tf.contrib.nn.alpha_dropout that implement the SELU activation function and the suggested dropout version.

Note for Tensorflow >= 2.0 users

Tensorflow 2.3 already has selu activation function when using high level framework keras, tf.keras.activations.selu. Must be combined with tf.keras.initializers.LecunNormal, corresponding dropout version is tf.keras.layers.AlphaDropout.

Note for Pytorch users

Pytorch versions >= 0.2 feature torch.nn.SELU and torch.nn.AlphaDropout, they must be combined with the correct initializer, namely torch.nn.init.kaiming_normal_ (parameter, mode='fan_in', nonlinearity='linear') as this is identical to lecun initialisation (mode='fan_in') with a gain of 1 (nonlinearity='linear').

Tutorials

Tensorflow 1.x

  • Multilayer Perceptron on MNIST (notebook)
  • Convolutional Neural Network on MNIST (notebook)
  • Convolutional Neural Network on CIFAR10 (notebook)

Tensorflow 2.x (Keras)

Pytorch

  • Multilayer Perceptron on MNIST (notebook)
  • Convolutional Neural Network on MNIST (notebook)
  • Convolutional Neural Network on CIFAR10 (notebook)

Further material

Design novel SELU functions (Tensorflow 1.x)

  • How to obtain the SELU parameters alpha and lambda for arbitrary fixed points (notebook)

Basic python functions to implement SNNs (Tensorflow 1.x)

are provided as code chunks here: selu.py

Notebooks and code to produce Figure 1 (Tensorflow 1.x)

are provided here: Figure1, builds on top of the biutils package.

Calculations and numeric checks of the theorems (Mathematica)

are provided as mathematica notebooks here:

UCI, Tox21 and HTRU2 data sets

snns's People

Contributors

avinashsai avatar gklambauer avatar kschweig avatar markhofm avatar untom avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

snns's Issues

cnn_graph

Thank you for sharing the code. I have successfully applied them (SELU and alpha_dropout) building a pure fully connected network (7 layers) for a regression problem (the R2 between the predicted and the observed variable is greater than 0.99!).

Right now I'm trying to replace the RELU in cnn_graph with SELU. Unlike the standard cnn, cnn_graph performs the convolution on the graph Fourier transformed inputs (a recursive process involves multiple matrix multiplications between the layer-specific graph Laplacian and the layer inputs). The original normalized input is shifted to some unknown distribution by the graph Fourier transform. Therefore, I don't know how to apply the SELU even on the first cnn_graph layer. Could you give me some suggestions on this?

Besides, can I use some normalization on the output of cnn_graph before feeding them into fully connected network using SELU as activation function?

Thanks!

Can someone help me with creating the csv from sdf with exactly the same number of features?

So I used skchem pipeline to feature extract meaningful features from the sdf train file mention in the official Tox21 challenge website. But I am not able to get the 801 features that are there in the zip csv file . Can someone help me with that py code. My aim is to experiment with the architucture with SeLU and technique and not get into domain specific feature extraction details.

information about step (1) in selu.py

Hi,
thank you for the paper and the code.
Could you please tell us how you scale the inputs to zero mean and unit variance
in step (1) in selu.py

Thank you

batch normalization

@bioinf-jku,thank you for your nice work!
I am a new to deep learning,and I have some simply questions,since the net in your test code is not very deep,It makes no big different to add batch normalization layers after each convolution layers,but if the net is very deep,is it necessary to add batch normalization layer after each convolution layers?Or there is no need to do so since the activation function selu has the ability to batch normalize the input layer?
thank you in advance!

Categorical and continuous variables preprocessing

With the UCI data, how did you preprocess the categorical and continuous variables?

Did you enforce a min/max or did you just standardize the continuous variables? And for the categorical, did you use one-hot/dummy coding or standardized them?

Edit: Also, what batch size did you use? Did it depend on the sample size?

Thanks!

UCI datasets benchmark code

Hi,
I couldn't find any code related to the UCI 121 tasks benchmark in this repo.
Could you please provide that code as well?

Thanks

small typo

I'm guessing you didn't mean to duplicate this assertion

keep_prob.get_shape().assert_is_compatible_with(tensor_shape.scalar())

in the dropout_selu_impl function

Is SeLU alone having positive impact on accuracy?

Hi,

In MNIST, Cifar-10 tutorials there is Selu as well as alpha dropout used and result after experiment is that SNN outperforms ReLU, ELU based models. Mnist based models (Lenet) can work without dropout, batchNorm with quite good accuracy, so my question is if Selu alone (no dropout and Batch Norm) is according to your observations, increasing accuracy? What I mean is that I have model that is working on MNIST and is a basic CNN eg. convolutions, ReLU, Fully Connected and softmax, and assuming that initialization of weights and normalization of input was done correctly can I expect increased accuracy?

Effect of bias in linear layers

I've been experimenting with SELUs, and found they provide an improvement in terms of computation time during training with respect to batchnorm, thank you for your work.

I just have a question regarding the effect of bias in linear layers. As I understand it, every neuron should have mean zero in order to stay in the self regularizing zone, but bias precisely shifts that mean. In my experiments however I didn't see much of an effect either removing or adding biases. I see that in the tutorial notebook bias is used, and I wonder wether you've considered the issue.

Questions on the self-normalizing property

I think the proposed SELU is an powerful non-linearity for MLPs. The self-normalizing property comes from the derivation of the forward propagation. This property could be confirmed by the following codes.

import torch
f = torch.nn.functional.selu
x = torch.randn(1024, 1024) * 456 + 123
lin = torch.nn.Linear(1024, 1024, bias=False)
_ = torch.nn.init.kaiming_normal_(lin.weight, nonlinearity="linear")
with torch.no_grad():
    for i in range(100): x = f(lin(x))
print(f"mean = {x.mean()}") # 0.00253
print(f"var = {(x ** 2).mean()}") # 1.05135

However, the self-normalizing property only holds for the forward pass, it does not hold for the backward pass (is it right?). Noisy gradients will definitely be harmful for the learning processes. The proposed SELU is based on ELU, which is based on a selective preference. I wonder if there could exists a more general non-linearity that has self-normalizing properties for both the forward and backward propagations. If it is possible, how could we find it?

SELU values for a truncated normal distribution

SNNs/selu.py

Line 31 in f992b22

initializer = layers.variance_scaling_initializer(factor=1.0, mode='FAN_IN')
and many other examples (e.g. Keras) do an additional trick where samples are resampled if they're not within two standard deviations of the mean. I'm curious how much of an effect this truncation has on the fix points derivation? Are they analytically identical for a normal distribution and a truncated normal distribution?

I read in the paper that "Uniform and truncated Gaussian distributions with these moments led to
networks with similar behavior." but this feels unsatisfactory to me. Maybe a small discrepancy becomes really problematic for deeper networks? This aligns with my experience that it's still beneficial to have batchnorm/layernorm with SELU.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.