titu1994 / keras-sru Goto Github PK

Implementation of Simple Recurrent Unit in Keras

Python 100.00%

keras-sru's Introduction

Keras Simple Recurrent Unit (SRU)

Implementation of Simple Recurrent Unit in Keras. Paper - Training RNNs as Fast as CNNs

This is a naive implementation with some speed gains over the generic LSTM cells, however its speed is not yet 10x that of cuDNN LSTMs

Issues

Fix the need to unroll the SRU to get it to work correctly
-Input dim must exactly match the number of LSTM cells for now. Still working out how to overcome this problem.-

No longer a problem to have different input dimension than output dimension.

Performance of a single SRU layer is slightly lower (about 0.5% on average over 5 runs) compared to 1 layer LSTM (at least on IMDB, with batch size of 32). Haven't tried staking them yet, but this may improve performance.

Performance degrades substantially with larger batch sizes (about 6-7% on average over 5 runs) compared to 1 layer LSTM with batch size of 128. However, a multi layer SRU (I've tried with 3 layers), while a bit slower than a 1 layer LSTM, gets around the same score on batch size of 32 or 128.

Seems the solution to this is to stack several SRUs together. The authors recommend stacks of 4 SRU layers.

Speed gains aren't that impressive at small batch size. At batch size of 32, SRU takes around 32-34 seconds. LSTM takes around 60-70 seconds. Thats just 50% reduction in speed, not the 5-10x that was discussed in the paper.

However, once batch size is increased to 128, SRU takes just 7 seconds per epoch compared to LSTM 22 seconds. For comparison, CNNs take 3-4 seconds per epoch.

keras-sru's People

Contributors

Stargazers

Watchers

keras-sru's Issues

Broken with keras 2.0.9

Looks like from keras.layers.recurrent import _time_distributed_dense no longer works with the big RNN refactor in 2.0.9.

TypeError: ('Keyword argument not understood:', 'returns_sequnces')

File "C:\Users\user\Anaconda3\lib\site-packages\keras\layers\recurrent.py", line 409, in init
super(RNN, self).init(**kwargs)

File "C:\Users\user\Anaconda3\lib\site-packages\keras\engine\base_layer.py", line 128, in init
raise TypeError('Keyword argument not understood:', kwarg)

TypeError: ('Keyword argument not understood:', 'returns_sequnces')

Bottleneck of SRU

I think the bottleneck of SRU may be not using CUDA level optimization, here is the official implementation of SRU:
SRU
But i don't know how to implement the CUDA operation in tensorflow or keras, do you know how to implement it? I want to use the SRU in my tensorflow program....
Thanks!

SRU is faster,but lower performance in accuracy score

I tested SRU comparing with GRU and LSTM in imdb dataset. SRU was the fastest one ,but SRU got the lowest score in accuray. My log is here:

SRU:
10s - loss: 0.6226 - acc: 0.6436 - val_loss: 0.5841 - val_acc: 0.6807
Epoch 2/5
6s - loss: 0.4984 - acc: 0.7571 - val_loss: 0.5790 - val_acc: 0.7018
Epoch 3/5
6s - loss: 0.3955 - acc: 0.8204 - val_loss: 0.6177 - val_acc: 0.7202
Epoch 4/5
6s - loss: 0.3052 - acc: 0.8668 - val_loss: 0.6947 - val_acc: 0.7243
Epoch 5/5
6s - loss: 0.2293 - acc: 0.9030 - val_loss: 0.8090 - val_acc: 0.7266
Test score: 0.809049695206
Test accuracy: 0.726640000038

GRU:
20s - loss: 0.4735 - acc: 0.7584 - val_loss: 0.3708 - val_acc: 0.8388
Epoch 2/5
12s - loss: 0.2609 - acc: 0.8932 - val_loss: 0.3774 - val_acc: 0.8362
Epoch 3/5
12s - loss: 0.1740 - acc: 0.9346 - val_loss: 0.4637 - val_acc: 0.8232
Epoch 4/5
12s - loss: 0.1132 - acc: 0.9593 - val_loss: 0.5032 - val_acc: 0.8160
Epoch 5/5
12s - loss: 0.0691 - acc: 0.9765 - val_loss: 0.7080 - val_acc: 0.8158
Test score: 0.708041801739
Test accuracy: 0.815840000038

LSTM:
26s - loss: 0.4353 - acc: 0.7924 - val_loss: 0.4062 - val_acc: 0.8214
Epoch 2/5
16s - loss: 0.2580 - acc: 0.8982 - val_loss: 0.3686 - val_acc: 0.8398
Epoch 3/5
16s - loss: 0.1756 - acc: 0.9352 - val_loss: 0.4138 - val_acc: 0.8276
Epoch 4/5
16s - loss: 0.1143 - acc: 0.9592 - val_loss: 0.5257 - val_acc: 0.8198
Epoch 5/5
16s - loss: 0.0783 - acc: 0.9717 - val_loss: 0.6960 - val_acc: 0.8167
Test score: 0.696038662281
Test accuracy: 0.816680000038
'''

Because I tested SRU in pytorch, SRU is not only faster than GRU ,but also get a better accuracy score than GRU. Hence, can you tell me how I can get a better accuray score using SRU than GRU?

overfiting too fast

I run the imdasru.py . But I found it was too easy to be overfitting. My log is here:

7s - loss: 0.6368 - acc: 0.6280 - val_loss: 0.5955 - val_acc: 0.6673
Epoch 2/100
5s - loss: 0.5224 - acc: 0.7412 - val_loss: 0.6085 - val_acc: 0.6791
Epoch 3/100
5s - loss: 0.4561 - acc: 0.7827 - val_loss: 0.6453 - val_acc: 0.6871
Epoch 4/100
5s - loss: 0.3931 - acc: 0.8183 - val_loss: 0.6873 - val_acc: 0.7012
Epoch 5/100
5s - loss: 0.3277 - acc: 0.8527 - val_loss: 0.7497 - val_acc: 0.7072
Epoch 6/100
5s - loss: 0.2661 - acc: 0.8853 - val_loss: 0.8440 - val_acc: 0.7120
Epoch 7/100
5s - loss: 0.2133 - acc: 0.9116 - val_loss: 0.9658 - val_acc: 0.7123
Epoch 8/100
5s - loss: 0.1696 - acc: 0.9330 - val_loss: 1.1144 - val_acc: 0.7143
Epoch 9/100
5s - loss: 0.1312 - acc: 0.9496 - val_loss: 1.3357 - val_acc: 0.7074
Epoch 10/100
5s - loss: 0.1020 - acc: 0.9623 - val_loss: 1.5486 - val_acc: 0.7066

As you can see ,val_loss is increasing. The model is overfitting.

To avoid the overfitting, I did something like these:
outputs = SRU(batch_size, dropout=0.2, recurrent_dropout=0.2)(prev_input)

opt=Adam(lr=0.001,clipnorm=0.03)
model.compile(loss='binary_crossentropy',
optimizer=opt,
metrics=['accuracy'])

As you can see, dropout and clipnorm are all failure to stop overfitting. Why ? Pls help me.

TypeError: ('Keyword argument not understood:', 'return_state')

There is an error occured when run imdb_sru.py:

Traceback (most recent call last):
  File "imdb_sru.py", line 47, in <module>
    return_state=True)(prev_input)
  File "/usr/local/lib/python2.7/dist-packages/keras/legacy/interfaces.py", line 87, in wrapper
    return func(*args, **kwargs)
  File "/home/vladik/ml/keras-SRU/sru.py", line 93, in __init__
    super(SRU, self).__init__(**kwargs)
  File "/usr/local/lib/python2.7/dist-packages/keras/layers/recurrent.py", line 181, in __init__
    super(Recurrent, self).__init__(**kwargs)
  File "/usr/local/lib/python2.7/dist-packages/keras/engine/topology.py", line 277, in __init__
    raise TypeError('Keyword argument not understood:', kwarg)
TypeError: ('Keyword argument not understood:', 'return_state')

It seems that you forget to add param return_state to constructor of SRU

TypeError: RNN.init() missing 1 required positional argument: 'cell'

I am running the example program imbd_sru.py.
When running to the
h, h_final, c_final = SRU(units=128, dropout=0.0, recurrent_dropout=0.0,
return_sequences=True, return_state=True,
unroll=True)(prev_input)
Traceback (most recent call last):
File "d:\張\TCGA-HNSC\test.py", line 60, in
SRU_ROC(X_train, X_test, y_train, y_test, learning_rate, depth, tree_num, X_colums)
File "d:\張\TCGA-HNSC\test.py", line 36, in SRU_ROC
h, h_final, c_final = SRU(units=128,activation='ReLU', dropout=0.0, recurrent_dropout=0.0,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "d:\張\TCGA-HNSC\SRU_master\sru.py", line 152, in init
super(SRU, self).init(**kwargs)
TypeError: RNN.init() missing 1 required positional argument: 'cell'

Please help me to solve it