sunnydreamrain / indrnn_pytorch Goto Github PK

Independently Recurrent Neural Networks (IndRNN) implemented in pytorch.

Python 95.77% Shell 4.23%

indrnn rnn action skeleton language-modeling

indrnn_pytorch's Introduction

Independently Recurrent Neural Networks

This code is to implement the IndRNN and the Deep IndRNN. It is based on Pytorch.

cuda_IndRNN_onlyrecurrent is the CUDA version. It is much faster than the simple pytorch implementation. For the sequential MNIST example (length 784), it runs over 31 times faster.

Please cite the following paper if you find it useful.
Shuai Li, Wanqing Li, Chris Cook, Ce Zhu, and Yanbo Gao. "Independently Recurrent Neural Network (IndRNN): Building A Longer and Deeper RNN." CVPR 2018.
Shuai Li, Wanqing Li, Chris Cook, and Yanbo Gao. "Deep Independently Recurrent Neural Network (IndRNN)." arXiv preprint arXiv:1910.06251, 2019.
@inproceedings{li2018independently, title={Independently recurrent neural network (indrnn): Building a longer and deeper rnn}, author={Li, Shuai and Li, Wanqing and Cook, Chris and Zhu, Ce and Gao, Yanbo}, booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition}, pages={5457--5466}, year={2018} }
@article{li2019deep, title={Deep Independently Recurrent Neural Network (IndRNN)}, author={Li, Shuai and Li, Wanqing and Cook, Chris and Gao, Yanbo and Zhu, Ce}, journal={arXiv preprint arXiv:1910.06251}, year={2019} }

Summary of advantages

Able to process longer sequences (over 5000 steps): gradient vanishing and exploding problem is solved.
Able to construct deeper networks (over 20layer, much deeper if GPU memory supports)
Able to be robustly trained with ReLU
Able to interpret the behaviour of IndRNN neurons independently without the effect from the others
Reduced complexity (over 10x faster than cuDNN LSTM when the sequence is long)

Usage

IndRNN_onlyrecurrent.py provides only the recurrent+activation of the IndRNN function. Therefore, processing of the input with dense connection or convolution operation is needed. This is usedful for adding batch normalization (BN) between the processing of input and activation function. Just consider it as an Relu function with recurrent connections. I believe this is more flexible since you can add all different processings to the inputs.
cuda_IndRNN_onlyrecurrent is the CUDA version. It is much faster than the simple pytorch implementation. For the sequential MNIST example (length 784), it runs over 31 times faster.

Requirements

Pytorch

For the CUDA version

CuPy
pynvrtc

Running

Please refer to the tasks.

Considerations in implementation

1, Initialization of the recurrent weights

For relu, Uniform(0,1) is used to make different neurons keep different kinds of memory. But for problems that only use the output of the last time step such as the adding problem, MNIST classification problem, and action recognition problem, the recurrent weights for the last IndRNN layer (caution: only the last one not all) can be initialized to be all 1 or a proper range (1-epsilon, 1+epsilon) where epsilon is a small number, since only long-term memory is needed for the output of this layer. Examples are shown in Indrnn_action_network.py.

2, Constraint of the recurrent weights

For relu, generally it can be set to [-U_bound, U_bound] where U_bound=pow(args.MAG, 1.0 / seq_len) and MAG can be 2 or 10 or others. If the sequence is very long, it can be [-1, 1] since it is very close to 1 and the precision of GPU is limited. If the sequence is short such as 20, no constraint is needed. Example of the constraint is shown at Indrnn_action_train.py. By the way, this constraint can also be implemented as a weight decay of ||max(0,|U|-U_bound)||.
For simplicity, the constraint can always set to [-1, 1] as it can keep long-term memory already and the difference in performance is small.

3, Usage of batch normalization (BN)

Generally, over 3 layers, BN can help accelerate the training. BN can be used before the activation function or after it. In our experiments, we find it converges faster by putting BN after the activation function. However, for tasks such as PTB_c where the output of one batch is further used as the initialization of the next batch, it is better to put BN before activation as mentioned at the above example.

4, Learning rate

In our experiments, ADAM with a learning rate of 2e-4 works well.

5, Weight decay

If weight decay is used, no need to add decay on the recurrent weights.

6, Usage of dropout

Dropout (if used) is applied with the same mask over time.

Note

The above considerations are just suggestions. I did not explore lots of training techniques such as training methods, initialization techniques. So better results may be achieved with other options.

Other implementations

Theano and Lasagne:
https://github.com/Sunnydreamrain/IndRNN_Theano_Lasagne
Tensorflow:
https://github.com/batzner/indrnn
Keras:
https://github.com/titu1994/Keras-IndRNN
Pytorch:
https://github.com/StefOe/indrnn-pytorch
https://github.com/theSage21/IndRNN
https://github.com/zhangxu0307/Ind-RNN
Chainer:
https://github.com/0shimax/chainer-IndRNN

indrnn_pytorch's People

Contributors

Stargazers

Watchers

indrnn_pytorch's Issues

Not able to prepare the dataset shape correctly.

Hi ... Thanks alot for this fabulous work. I'm still trying to understand the code and and trying to run this in my machine.
The main issue I'm having is, I'm not able to prepare the dataset as I required.

I'm trying to understand the following code

datasets=train_datasets
dataname=datasets+'.npy'
labelname=datasets+'_label.npy'
lenname=datasets+'_len.npy'
data_handle=np.load(dataname)
label_handle=np.load(labelname)
len_handle=np.load(lenname)

What is this train_datasets should be? What's it shape need to be? Can you elaborate more on this. Because I need to arrange it to feed to the network

Question about action recognition experiment on this pytorch implementation!!!!

hello, thanks for your excellent work. I notice that you implementation on action recognition is different from the paper formulated. I want to know why you have those changes in your codebase?
ps: your code is return F.relu(input + hx * self.weight_hh.unsqueeze(0).expand(hx.size(0), len(self.weight_hh))) can you explain this code ? hope for your reply

issue on grad

Hi, thanks for your great work! But I have encountered something wrong while reproducing the code.
During "train", the grad of the most layers is "None". Except layers "classify_weight"、"classify_bias"、"RNN5_weight"、"RNN5_bias" have not-None grad, others have grad which are None. As a result, error happens when running to "grad_climp", as showed in the following figure.
Maybe something goes wrong with layer "RNN5_weight_hh" during loss.backward() I think.

I wonder how to address this problem. Looking forward to the reply, thank you!

Do not work on pytorch 1.9

RuntimeError: Legacy autograd function with non-static forward method is deprecated. Please use new-style autograd function with static forward method. (Example: https://pytorch.org/docs/stable/autograd.html#torch.autograd.Function)

Not abel to use this on packed sequence , AttributeError: 'PackedSequence' object has no attribute 'size'

Input sequence padding

Hello,
Thank you verymuch for sharing your work.
I have a question about input shape
[48000, 300,50,3] input shape --> As per I understood ,300 represents sequence length of one .skeleton file . When we have .skeletion files such as 154 ,155 sequence lengths ,did you fill the rest of the length with zero padding to make it as 300 sequence length ?
Will it affects the accuracy of action recognition?

Thank you and waiting for your reply.

cuda version of IndRNNCell?

Hi, is there any chance to implement a cuda version of IndRNNCell? the purpose is to speed up processing variable length sequences.

Settings to reproduce resIndRNN results on PTB

First of all, thanks a lot for this great work!

I am trying to reproduce results with resIndRNN on word level PTB data, however following the recommended settings in the paper I was only able to get around 60 perplexity (in the paper it is around 59). Would it be possible for you to share also the configuration similar to the denseIndRNN case? Thanks a lot in advance!

By the way, to have deterministic behavior, I would also add the following three lines in train_language.py, see PyTorch note on reproducibility.

np.random.seed(seed)
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False

Also, please correct me if I am wrong, but while reading your Theano implementation, it seems that the resIndRNN implemented there uses the original resnet configuration (dense then activation layer) while this version uses the new pre-activation configuration (activation then dense layer). Would this be the reason for the different results?

Word level PTB repo?

I noticed that there are word-level PTB results in the paper but I only find character level in the repo. Is there any folder for the word level PTB?

LSTM instead of RNN

Hi,

Could u please explain how to replace RNN with LSTM?

Thank u

Not able to reproduce the Results

Hi,

Thanks for the wonderful work. I tried to reproduce the results for Action Recognition task on NTU RGB+D dataset on the subject split, by running the provided command.

python -u Indrnn_action_train.py --dropout 0.25 --use_weightdecay_nohiddenW
The maximum Accuracy I am able to reach is 72.48%.
Is there something needed to be changed to get the desired results?