Git Product home page Git Product logo

keras-sru's Introduction

Keras Simple Recurrent Unit (SRU)

Implementation of Simple Recurrent Unit in Keras. Paper - Training RNNs as Fast as CNNs

This is a naive implementation with some speed gains over the generic LSTM cells, however its speed is not yet 10x that of cuDNN LSTMs

Issues

  • Fix the need to unroll the SRU to get it to work correctly

  • -Input dim must exactly match the number of LSTM cells for now. Still working out how to overcome this problem.-

No longer a problem to have different input dimension than output dimension.

  • Performance of a single SRU layer is slightly lower (about 0.5% on average over 5 runs) compared to 1 layer LSTM (at least on IMDB, with batch size of 32). Haven't tried staking them yet, but this may improve performance.

Performance degrades substantially with larger batch sizes (about 6-7% on average over 5 runs) compared to 1 layer LSTM with batch size of 128. However, a multi layer SRU (I've tried with 3 layers), while a bit slower than a 1 layer LSTM, gets around the same score on batch size of 32 or 128.

Seems the solution to this is to stack several SRUs together. The authors recommend stacks of 4 SRU layers.

  • Speed gains aren't that impressive at small batch size. At batch size of 32, SRU takes around 32-34 seconds. LSTM takes around 60-70 seconds. Thats just 50% reduction in speed, not the 5-10x that was discussed in the paper.

However, once batch size is increased to 128, SRU takes just 7 seconds per epoch compared to LSTM 22 seconds. For comparison, CNNs take 3-4 seconds per epoch.

keras-sru's People

Contributors

titu1994 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

keras-sru's Issues

Broken with keras 2.0.9

Looks like from keras.layers.recurrent import _time_distributed_dense no longer works with the big RNN refactor in 2.0.9.

TypeError: ('Keyword argument not understood:', 'returns_sequnces')

File "C:\Users\user\Anaconda3\lib\site-packages\keras\layers\recurrent.py", line 409, in init
super(RNN, self).init(**kwargs)

File "C:\Users\user\Anaconda3\lib\site-packages\keras\engine\base_layer.py", line 128, in init
raise TypeError('Keyword argument not understood:', kwarg)

TypeError: ('Keyword argument not understood:', 'returns_sequnces')

Bottleneck of SRU

I think the bottleneck of SRU may be not using CUDA level optimization, here is the official implementation of SRU:
SRU
But i don't know how to implement the CUDA operation in tensorflow or keras, do you know how to implement it? I want to use the SRU in my tensorflow program....
Thanks!

SRU is faster,but lower performance in accuracy score

I tested SRU comparing with GRU and LSTM in imdb dataset. SRU was the fastest one ,but SRU got the lowest score in accuray. My log is here:

SRU:
10s - loss: 0.6226 - acc: 0.6436 - val_loss: 0.5841 - val_acc: 0.6807
Epoch 2/5
6s - loss: 0.4984 - acc: 0.7571 - val_loss: 0.5790 - val_acc: 0.7018
Epoch 3/5
6s - loss: 0.3955 - acc: 0.8204 - val_loss: 0.6177 - val_acc: 0.7202
Epoch 4/5
6s - loss: 0.3052 - acc: 0.8668 - val_loss: 0.6947 - val_acc: 0.7243
Epoch 5/5
6s - loss: 0.2293 - acc: 0.9030 - val_loss: 0.8090 - val_acc: 0.7266
Test score: 0.809049695206
Test accuracy: 0.726640000038

GRU:
20s - loss: 0.4735 - acc: 0.7584 - val_loss: 0.3708 - val_acc: 0.8388
Epoch 2/5
12s - loss: 0.2609 - acc: 0.8932 - val_loss: 0.3774 - val_acc: 0.8362
Epoch 3/5
12s - loss: 0.1740 - acc: 0.9346 - val_loss: 0.4637 - val_acc: 0.8232
Epoch 4/5
12s - loss: 0.1132 - acc: 0.9593 - val_loss: 0.5032 - val_acc: 0.8160
Epoch 5/5
12s - loss: 0.0691 - acc: 0.9765 - val_loss: 0.7080 - val_acc: 0.8158
Test score: 0.708041801739
Test accuracy: 0.815840000038

LSTM:
26s - loss: 0.4353 - acc: 0.7924 - val_loss: 0.4062 - val_acc: 0.8214
Epoch 2/5
16s - loss: 0.2580 - acc: 0.8982 - val_loss: 0.3686 - val_acc: 0.8398
Epoch 3/5
16s - loss: 0.1756 - acc: 0.9352 - val_loss: 0.4138 - val_acc: 0.8276
Epoch 4/5
16s - loss: 0.1143 - acc: 0.9592 - val_loss: 0.5257 - val_acc: 0.8198
Epoch 5/5
16s - loss: 0.0783 - acc: 0.9717 - val_loss: 0.6960 - val_acc: 0.8167
Test score: 0.696038662281
Test accuracy: 0.816680000038
'''

Because I tested SRU in pytorch, SRU is not only faster than GRU ,but also get a better accuracy score than GRU. Hence, can you tell me how I can get a better accuray score using SRU than GRU?

overfiting too fast

I run the imdasru.py . But I found it was too easy to be overfitting. My log is here:

7s - loss: 0.6368 - acc: 0.6280 - val_loss: 0.5955 - val_acc: 0.6673
Epoch 2/100
5s - loss: 0.5224 - acc: 0.7412 - val_loss: 0.6085 - val_acc: 0.6791
Epoch 3/100
5s - loss: 0.4561 - acc: 0.7827 - val_loss: 0.6453 - val_acc: 0.6871
Epoch 4/100
5s - loss: 0.3931 - acc: 0.8183 - val_loss: 0.6873 - val_acc: 0.7012
Epoch 5/100
5s - loss: 0.3277 - acc: 0.8527 - val_loss: 0.7497 - val_acc: 0.7072
Epoch 6/100
5s - loss: 0.2661 - acc: 0.8853 - val_loss: 0.8440 - val_acc: 0.7120
Epoch 7/100
5s - loss: 0.2133 - acc: 0.9116 - val_loss: 0.9658 - val_acc: 0.7123
Epoch 8/100
5s - loss: 0.1696 - acc: 0.9330 - val_loss: 1.1144 - val_acc: 0.7143
Epoch 9/100
5s - loss: 0.1312 - acc: 0.9496 - val_loss: 1.3357 - val_acc: 0.7074
Epoch 10/100
5s - loss: 0.1020 - acc: 0.9623 - val_loss: 1.5486 - val_acc: 0.7066

As you can see ,val_loss is increasing. The model is overfitting.

To avoid the overfitting, I did something like these:
outputs = SRU(batch_size, dropout=0.2, recurrent_dropout=0.2)(prev_input)

opt=Adam(lr=0.001,clipnorm=0.03)
model.compile(loss='binary_crossentropy',
optimizer=opt,
metrics=['accuracy'])

As you can see, dropout and clipnorm are all failure to stop overfitting. Why ? Pls help me.

TypeError: ('Keyword argument not understood:', 'return_state')

There is an error occured when run imdb_sru.py:

Traceback (most recent call last):
  File "imdb_sru.py", line 47, in <module>
    return_state=True)(prev_input)
  File "/usr/local/lib/python2.7/dist-packages/keras/legacy/interfaces.py", line 87, in wrapper
    return func(*args, **kwargs)
  File "/home/vladik/ml/keras-SRU/sru.py", line 93, in __init__
    super(SRU, self).__init__(**kwargs)
  File "/usr/local/lib/python2.7/dist-packages/keras/layers/recurrent.py", line 181, in __init__
    super(Recurrent, self).__init__(**kwargs)
  File "/usr/local/lib/python2.7/dist-packages/keras/engine/topology.py", line 277, in __init__
    raise TypeError('Keyword argument not understood:', kwarg)
TypeError: ('Keyword argument not understood:', 'return_state')

It seems that you forget to add param return_state to constructor of SRU

TypeError: RNN.__init__() missing 1 required positional argument: 'cell'

I am running the example program imbd_sru.py.
When running to the
h, h_final, c_final = SRU(units=128, dropout=0.0, recurrent_dropout=0.0,
return_sequences=True, return_state=True,
unroll=True)(prev_input)
Traceback (most recent call last):
File "d:\張\TCGA-HNSC\test.py", line 60, in
SRU_ROC(X_train, X_test, y_train, y_test, learning_rate, depth, tree_num, X_colums)
File "d:\張\TCGA-HNSC\test.py", line 36, in SRU_ROC
h, h_final, c_final = SRU(units=128,activation='ReLU', dropout=0.0, recurrent_dropout=0.0,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "d:\張\TCGA-HNSC\SRU_master\sru.py", line 152, in init
super(SRU, self).init(**kwargs)
TypeError: RNN.init() missing 1 required positional argument: 'cell'

Please help me to solve it

BatchNorm

Hi @titu1994, do you think your SRU could improve further with the use of BatchNorm as in Cooijmans et al. (2017), which has been implemented by @jihunchoi here for an LSTM?
Thank you.

How to use stack SRU ?

I tried to use SRU like these:
SRU(batch_size, dropout=0., recurrent_dropout=0., unroll=True,implementation=1,recurrent_activation='elu')

It worked and got some improvement. But still can not get the same score as GRU. So I want to try use stack SRU. Can you give me some suggestions?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.