farizrahman4u / seq2seq Goto Github PK

Sequence to Sequence Learning with Keras

License: GNU General Public License v2.0

Python 100.00%

seq2seq's Introduction

Seq2seq

Sequence to Sequence Learning with Keras

Hi! You have just found Seq2Seq. Seq2Seq is a sequence to sequence learning add-on for the python deep learning library Keras. Using Seq2Seq, you can build and train sequence-to-sequence neural network models in Keras. Such models are useful for machine translation, chatbots (see [4]), parsers, or whatever that comes to your mind.

Getting started

Seq2Seq contains modular and reusable layers that you can use to build your own seq2seq models as well as built-in models that work out of the box. Seq2Seq models can be compiled as they are or added as layers to a bigger model. Every Seq2Seq model has 2 primary layers : the encoder and the decoder. Generally, the encoder encodes the input sequence to an internal representation called 'context vector' which is used by the decoder to generate the output sequence. The lengths of input and output sequences can be different, as there is no explicit one on one relation between the input and output sequences. In addition to the encoder and decoder layers, a Seq2Seq model may also contain layers such as the left-stack (Stacked LSTMs on the encoder side), the right-stack (Stacked LSTMs on the decoder side), resizers (for shape compatibility between the encoder and the decoder) and dropout layers to avoid overfitting. The source code is heavily documented, so lets go straight to the examples:

A simple Seq2Seq model:

import seq2seq
from seq2seq.models import SimpleSeq2Seq

model = SimpleSeq2Seq(input_dim=5, hidden_dim=10, output_length=8, output_dim=8)
model.compile(loss='mse', optimizer='rmsprop')

That's it! You have successfully compiled a minimal Seq2Seq model! Next, let's build a 6 layer deep Seq2Seq model (3 layers for encoding, 3 layers for decoding).

Deep Seq2Seq models:

import seq2seq
from seq2seq.models import SimpleSeq2Seq

model = SimpleSeq2Seq(input_dim=5, hidden_dim=10, output_length=8, output_dim=8, depth=3)
model.compile(loss='mse', optimizer='rmsprop')

Notice that we have specified the depth for both encoder and decoder as 3, and your model has a total depth of 3 + 3 = 6. You can also specify different depths for the encoder and the decoder. Example:

import seq2seq
from seq2seq.models import SimpleSeq2Seq

model = SimpleSeq2Seq(input_dim=5, hidden_dim=10, output_length=8, output_dim=20, depth=(4, 5))
model.compile(loss='mse', optimizer='rmsprop')

Notice that the depth is specified as tuple, (4, 5). Which means your encoder will be 4 layers deep whereas your decoder will be 5 layers deep. And your model will have a total depth of 4 + 5 = 9.

Advanced Seq2Seq models:

Until now, you have been using the SimpleSeq2Seq model, which is a very minimalistic model. In the actual Seq2Seq implementation described in [1], the hidden state of the encoder is transferred to decoder. Also, the output of decoder at each timestep becomes the input to the decoder at the next time step. To make things more complicated, the hidden state is propogated throughout the LSTM stack. But you have no reason to worry, as we have a built-in model that does all that out of the box. Example:

import seq2seq
from seq2seq.models import Seq2Seq

model = Seq2Seq(batch_input_shape=(16, 7, 5), hidden_dim=10, output_length=8, output_dim=20, depth=4)
model.compile(loss='mse', optimizer='rmsprop')

Note that we had to specify the complete input shape, including the samples dimensions. This is because we need a static hidden state(similar to a stateful RNN) for transferring it across layers. (Update : Full input shape is not required in the latest version, since we switched to Recurrent Shop backend). By the way, Seq2Seq models also support the stateful argument, in case you need it.

You can also experiment with the hidden state propogation turned off. Simply set the arguments broadcast_state and inner_broadcast_state to False.

Peeky Seq2seq model:

Let's not stop there. Let's build a model similar to cho et al 2014, where the decoder gets a 'peek' at the context vector at every timestep.

To achieve this, simply add the argument peek=True:

import seq2seq
from seq2seq.models import Seq2Seq

model = Seq2Seq(batch_input_shape=(16, 7, 5), hidden_dim=10, output_length=8, output_dim=20, depth=4, peek=True)
model.compile(loss='mse', optimizer='rmsprop')

Seq2seq model with attention:

Let's not stop there either. In all the models described above, there is no allignment between the input sequence elements and the output sequence elements. But for machine translation, learning a soft allignment between the input and output sequences imporves performance.[3]. The Seq2seq framework includes a ready made attention model which does the same. Note that in the attention model, there is no hidden state propogation, and a bidirectional LSTM encoder is used by default. Example:

import seq2seq
from seq2seq.models import AttentionSeq2Seq

model = AttentionSeq2Seq(input_dim=5, input_length=7, hidden_dim=10, output_length=8, output_dim=20, depth=4)
model.compile(loss='mse', optimizer='rmsprop')

As you can see, in the attention model you need not specify the samples dimension as there are no static hidden states involved(But you have to if you are building a stateful Seq2seq model). Note: You can set the argument bidirectional=False if you wish not to use a bidirectional encoder.

Final Words

That's all for now. Hope you love this library. For any questions you might have, create an issue and I will get in touch. You can also contribute to this project by reporting bugs, adding new examples, datasets or models.

Installation:

sudo pip install git+https://github.com/farizrahman4u/seq2seq.git

Requirements:

Working Example:

Training Seq2seq with movie subtitles - Thanks to Nicolas Ivanov

Papers:

seq2seq's People

Contributors

Stargazers

Watchers

Forkers

decisions tttwwy sowmya91 nicolas-ivanov libraua amacbee lukovkin hitwsl nimishzynga wuhaiyangit benjamesbabala hitluobin zuiwufenghua binbinbian wangxiong2015 arcodergh wangejie miznokruge gcj eriche2016 hydercps wavelets ml-lab mlzxy fandywang walkers-mv mphielipp amoliu nextdawn smartinsightsfromdata janggwan qfzhu jstarc mbollmann wgapl mathewlee11 hainm fdoperezi passstory phecy lengkujiaai t13m ye-lun avijeet1132 cojito ml-ai-nlp-ir manzilzaheer snormore wangxggc liangpj olehf jolinxql udibr rwclarity ricky1203 tammyyang vsooda einsnull happywwy dapeng2018 davegreenwood jhnlp eshijia devroy73 asmith26 joelthchao matthiasplappert liormagen lngvietthang codekansas buptpriswang beld syedaffanhamdani flowgrad dboyliao stifflerhe rishy giahy2507 sudy brunoalano chagge lw251 hctsai lixiangnlp davidfsemedo poyuwu lvapeab capers dodocho stevenlol rikima barneyeldinosaurio le02146 libcorner vunb ghs2015 alanguo001 appliedml kaeflint kentchun33333

seq2seq's Issues

pip install

Fariz, please make the correponding changes to your repository, so that it's possible to install it locally via pip install. Currently, after doing

sudo pip install git+ssh://[email protected]:farizrahman4u/seq2seq.git

I'm getting the following message:

ERROR: Repository not found.
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.
  Complete output from command git clone -q ssh://[email protected]:farizrahman4u/seq2seq.git /tmp/pip-uQpnEc-build:

Partial batches

Following up on a comment in issue #4:
#4 (comment)

Would it be possible to support partial batches for the "advanced" Seq2seq? That is, have data not rounded to the nearest batch size?

Error while using Seq2seq model

If I try to execute the example given in introduction

import seq2seq
from seq2seq.models import Seq2seq

model = Seq2seq(batch_input_shape=(16, 7, 5), hidden_dim=10, output_length=8, output_dim=20, depth=4)
model.compile(loss='mse', optimizer='rmsprop')

then I get the following error:

Traceback (most recent call last):
  File "/export/home/manzil/PycharmProjects/autoencoder/main.py", line 8, in <module>
    model2.compile(loss='mse', optimizer='rmsprop')
  File "/export/home/manzil/.local/lib/python2.7/site-packages/Keras-0.3.2-py2.7.egg/keras/models.py", line 507, in compile
    self.y_train = self.get_output(train=True)
  File "/export/home/manzil/.local/lib/python2.7/site-packages/Keras-0.3.2-py2.7.egg/keras/layers/containers.py", line 130, in get_output
    return self.layers[-1].get_output(train)
  File "/export/home/manzil/PycharmProjects/autoencoder/seq2seq/layers/decoders.py", line 151, in get_output
    x_t = self.get_input(train)
  File "/export/home/manzil/.local/lib/python2.7/site-packages/Keras-0.3.2-py2.7.egg/keras/layers/core.py", line 241, in get_input
    previous_output = self.previous.get_output(train=train)
  File "/export/home/manzil/PycharmProjects/autoencoder/seq2seq/layers/state_transfer_lstm.py", line 36, in get_output
    assert K.ndim(X) == 3 #, 'Input should be at least 3D.'
AssertionError

I tried with latest version of Keras as well as the version 0.3.2 in PiPy.

Any pointers would be very helpful.

Can seq2seq be extended to handle RNN with overshooting?

I'm trying to make a RNN with overshooting. I was wondering if it's possible to extend your code to be able to represent a RNN with overshooting? Thanks!

Can the library support scheduled sampling?

hi, when I train a sequence 2 sequence model, i want to using a sampling mechanism proposed in this: http://arxiv.org/abs/1506.03099 . Does the library support it?

Example

Thanks for your code. Could you please update the code with real datasets?

any plan for attention seq2seq?

Thank you for great codes.

Do you have any plans to add 'attention' mechanism to your code as TensorFlow did.

Check https://github.com/tensorflow/tensorflow/blob/master/tensorflow/models/rnn/seq2seq.py#L351

Using LSTMEncoder raise ‘DisconnectedInputError(message)’

Hi！
When i using a Graph container, and add node LSTMEncoder, the code is:
layer = LSTMEncoder(input_dim=input_dim, output_dim=hidden_dim, state_input=False, \
return_sequences=True)
model.add_node(layer, name='end_encoder_layer', input='input1')
model.add_node(Dropout(dropout), name='end_encoder_drop', input='end_encoder_layer')

then, the DisconnectedInputError show up, the log like this:

Sequence-to-Sequence Viterbi/Beam Search

Currently the model always takes greedy decision for each step in the sequence path. During prediction, this is usually not optimal as the best path may not be the greedy path. Beam search is often used to improve the accuracy of sequence generating recurrent neural networks. See http://papers.nips.cc/paper/5346-sequence-to-sequence-learning-with-neural-networks.pdf for reference.

Any plans to implement this search? Or any good direction if I wanted to do this myself?

Attention model and masking

I've noticed that the attention-based model doesn't seem to be working with input masking:

model = Sequential()                                                                                                  
model.add(Embedding(10, 4, input_length=12, mask_zero=mask_zero))                                                         
s2s = AttentionSeq2seq(input_dim=4, input_length=12, hidden_dim=16, output_length=13, output_dim=11, depth=2)         
model.add(s2s)                                                                                                        
model.compile(loss='mse', optimizer='rmsprop')

This simplified model compiles fine with mask_zero=False, but gives a cryptic Theano error with mask_zero=True:

[...]
  File "/lib/python3.4/site-packages/theano/tensor/elemwise.py", line 146, in __init__
    (i, j, len(input_broadcastable)))
ValueError: new_order[2] is 2, but the input only has 2 axes.

Is this a bug, or is masking simply not possible with the attention-based model? I admit I didn't dive deeply into the theory of that model yet, but if masking is generally not possible with it, may I suggest adding a note to the README for other unwary experimenters like me? :)

'Seq2seq' object has no attribute 'layers'

Launching the vanila code from the documentation example leads to the following error:

/usr/bin/python2.7 /home/nicolas/Code/seq2seq/try.py
Traceback (most recent call last):
  File "/home/nicolas/Code/seq2seq/try.py", line 13, in <module>
    output_dim=embedding_dim, output_length=maxlen, batch_size=10, depth=4)
  File "/home/nicolas/Code/seq2seq/seq2seq.py", line 59, in __init__
    self.add(l)
  File "/usr/local/lib/python2.7/dist-packages/keras/layers/containers.py", line 35, in add
    self.layers.append(layer)
AttributeError: 'Seq2seq' object has no attribute 'layers'

What's wrong?

For the reference the code is

import keras
from keras.models import Sequential
from keras.layers.embeddings import Embedding
from seq2seq import Seq2seq
from keras.preprocessing import sequence

vocab_size = 20000 #number of words
maxlen = 100 #length of input sequence and output sequence
embedding_dim = 200 #word embedding size
hidden_dim = 500 #memory size of seq2seq

embedding = Embedding(vocab_size, embedding_dim, input_length=maxlen)
seq2seq = Seq2seq(input_length=maxlen, input_dim=embedding_dim,hidden_dim=hidden_dim,
                  output_dim=embedding_dim, output_length=maxlen, batch_size=10, depth=4)

model = Sequential()
model.add(embedding)
model.add(seq2seq)

The program line mentions the pre-output to next-input

Hi! I have seen the pic in Readme, but I am not sure which part in lstm_decoder tells the process transferring the pre-output to next-input.

encoder and decoder issue for keras

Hi, could you tell me how do you implement equation : h_t = f(h_{t-1}, y_{t-1}, c) in keras. Thanks!

'Advanced' example (Seq2seq) not running

The README example for the advanced seq2seq model (below) is not running. It returns a TypeError: rnn() got an unexpected keyword argument 'mask'. The SimpleSeq2seq examples work fine for me.

import seq2seq
from seq2seq.models import Seq2seq
model = Seq2seq(batch_input_shape=(16, 7, 5), hidden_dim=10, output_length=8, output_dim=20, depth=4)
model.compile(loss='mse', optimizer='rmsprop')

AssertionError on compile

Following the fix for issue #20, the following example now works (using the latest Github versions of Seq2seq, Keras, and Theano):

from seq2seq.models import Seq2seq
model = Seq2seq(batch_input_shape=(16, 7, 5), hidden_dim=10, output_length=8, output_dim=20, depth=4)

But when one subsequently does

model.compile(loss='mse', optimizer='rmsprop')

the compilation fails with an AssertionError.

Here is the full traceback:

File "<ipython-input-2-30afb3e2b95c>", line 1, in <module>
    model.compile(loss='mse', optimizer='rmsprop')

File "C:\Anaconda3\lib\site-packages\keras\models.py", line 433, in compile
    self.y_train = self.get_output(train=True)

File "C:\Anaconda3\lib\site-packages\keras\layers\containers.py", line 128, in get_output
    return self.layers[-1].get_output(train)

File "C:\Anaconda3\lib\site-packages\keras\layers\core.py", line 1116, in get_output
    X = self.get_input(train)

File "C:\Anaconda3\lib\site-packages\keras\layers\core.py", line 173, in get_input
    previous_output = self.previous.get_output(train=train)

File "C:\Anaconda3\lib\site-packages\keras\layers\core.py", line 639, in get_output
    X = self.get_input(train)

File "C:\Anaconda3\lib\site-packages\keras\layers\core.py", line 173, in get_input
    previous_output = self.previous.get_output(train=train)

File "C:\Anaconda3\lib\site-packages\seq2seq\layers\state_transfer_lstm.py", line 28, in get_output
    X = self.get_input(train)

File "C:\Anaconda3\lib\site-packages\keras\layers\core.py", line 173, in get_input
    previous_output = self.previous.get_output(train=train)

File "C:\Anaconda3\lib\site-packages\keras\layers\core.py", line 639, in get_output
    X = self.get_input(train)

File "C:\Anaconda3\lib\site-packages\keras\layers\core.py", line 173, in get_input
    previous_output = self.previous.get_output(train=train)

File "C:\Anaconda3\lib\site-packages\seq2seq\layers\state_transfer_lstm.py", line 28, in get_output
    X = self.get_input(train)

File "C:\Anaconda3\lib\site-packages\keras\layers\core.py", line 173, in get_input
    previous_output = self.previous.get_output(train=train)

File "C:\Anaconda3\lib\site-packages\keras\layers\core.py", line 639, in get_output
    X = self.get_input(train)

File "C:\Anaconda3\lib\site-packages\keras\layers\core.py", line 173, in get_input
    previous_output = self.previous.get_output(train=train)

File "C:\Anaconda3\lib\site-packages\seq2seq\layers\state_transfer_lstm.py", line 28, in get_output
    X = self.get_input(train)

File "C:\Anaconda3\lib\site-packages\keras\layers\core.py", line 173, in get_input
    previous_output = self.previous.get_output(train=train)

File "C:\Anaconda3\lib\site-packages\seq2seq\layers\decoders.py", line 194, in get_output
    x_t = self.get_input(train)

File "C:\Anaconda3\lib\site-packages\keras\layers\core.py", line 173, in get_input
    previous_output = self.previous.get_output(train=train)

File "C:\Anaconda3\lib\site-packages\keras\layers\core.py", line 964, in get_output
    X = self.get_input(train)

File "C:\Anaconda3\lib\site-packages\keras\layers\core.py", line 173, in get_input
    previous_output = self.previous.get_output(train=train)

File "C:\Anaconda3\lib\site-packages\keras\layers\core.py", line 639, in get_output
    X = self.get_input(train)

File "C:\Anaconda3\lib\site-packages\keras\layers\core.py", line 173, in get_input
    previous_output = self.previous.get_output(train=train)

File "C:\Anaconda3\lib\site-packages\seq2seq\layers\state_transfer_lstm.py", line 28, in get_output
    X = self.get_input(train)

File "C:\Anaconda3\lib\site-packages\keras\layers\core.py", line 173, in get_input
    previous_output = self.previous.get_output(train=train)

File "C:\Anaconda3\lib\site-packages\keras\layers\core.py", line 639, in get_output
    X = self.get_input(train)

File "C:\Anaconda3\lib\site-packages\keras\layers\core.py", line 173, in get_input
    previous_output = self.previous.get_output(train=train)

File "C:\Anaconda3\lib\site-packages\seq2seq\layers\state_transfer_lstm.py", line 28, in get_output
    X = self.get_input(train)

File "C:\Anaconda3\lib\site-packages\keras\layers\core.py", line 173, in get_input
    previous_output = self.previous.get_output(train=train)

File "C:\Anaconda3\lib\site-packages\keras\layers\core.py", line 639, in get_output
    X = self.get_input(train)

File "C:\Anaconda3\lib\site-packages\keras\layers\core.py", line 173, in get_input
    previous_output = self.previous.get_output(train=train)

File "C:\Anaconda3\lib\site-packages\seq2seq\layers\state_transfer_lstm.py", line 28, in get_output
    X = self.get_input(train)

File "C:\Anaconda3\lib\site-packages\keras\layers\core.py", line 173, in get_input
    previous_output = self.previous.get_output(train=train)

File "C:\Anaconda3\lib\site-packages\keras\layers\core.py", line 639, in get_output
    X = self.get_input(train)

File "C:\Anaconda3\lib\site-packages\keras\layers\core.py", line 173, in get_input
    previous_output = self.previous.get_output(train=train)

File "C:\Anaconda3\lib\site-packages\seq2seq\layers\state_transfer_lstm.py", line 52, in get_output
    masking=masking)

File "C:\Anaconda3\lib\site-packages\keras\backend\theano_backend.py", line 461, in rnn
    go_backwards=go_backwards)

File "C:\Anaconda3\lib\site-packages\theano\scan_module\scan.py", line 745, in scan
    condition, outputs, updates = scan_utils.get_updates_and_outputs(fn(*args))

File "C:\Anaconda3\lib\site-packages\keras\backend\theano_backend.py", line 444, in _step
    output, new_states = step_function(input, states)

File "C:\Anaconda3\lib\site-packages\keras\layers\recurrent.py", line 434, in step
    assert len(states) == 2

AssertionError

pip install through ssh not working

The installation instructions in the Readme (install with pip) did not work for me using ssh.

sudo pip install git+ssh://github.com/farizrahman4u/seq2seq.git

I don't know what problem pip had with ssh, but I just replaced ssh with https and it all went smoothly. This was on a Ubuntu 15.10 machine, but verified with 14.04 on a different machine. Maybe there should be a pointer to this in the Readme.

workable example needed

Dear developers, thank you for your work!
It would be of great help to use your seq2seq implementation, however after a considerable amount of time and efforts I still can't do it due to the lack of documentation and examples. Keras docs don't tell much about seq2seq mapping either. So please add a simple workable code example that would demostrate the usage of your library: how to prepare the input data and how to implement the training and predicting procedures. Would be greatly appriciated.

Bug in SoftShuffle layer

model :

model = SoftShuffle(input_dim = 1, input_length = 10)
model.compile(loss='mse', optimizer='rmsprop')

Traceback (most recent call last):
File "s2.py", line 13, in
model.compile(loss='mse', optimizer='rmsprop')
File "build/bdist.macosx-10.6-intel/egg/keras/models.py", line 418, in compile
File "build/bdist.macosx-10.6-intel/egg/keras/backend/theano_backend.py", line 361, in function
File "build/bdist.macosx-10.6-intel/egg/keras/backend/theano_backend.py", line 354, in init
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/theano/compile/function.py", line 266, in function
profile=profile)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/theano/compile/pfunc.py", line 511, in pfunc
on_unused_input=on_unused_input)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/theano/compile/function_module.py", line 1466, in orig_function
defaults)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/theano/compile/function_module.py", line 1324, in create
input_storage=input_storage_lists)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/theano/gof/link.py", line 519, in make_thunk
output_storage=output_storage)[:3]
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/theano/gof/vm.py", line 897, in make_all
no_recycling))
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/theano/scan_module/scan_op.py", line 637, in make_thunk
import scan_perform_ext
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/theano/scan_module/scan_perform_ext.py", line 133, in
from scan_perform.scan_perform import *
File "numpy.pxd", line 155, in init theano.scan_module.scan_perform (/Users/abinsimon/.theano/compiledir_Darwin-14.3.0-x86_64-i386-64bit-i386-2.7.10-64/scan_perform/mod.cpp:7672)
ValueError: ('The following error happened while compiling the node', forall_inplace,cpu,scan_fn&scan_fn}(Shape_i{1}.0, Subtensor{int64:int64:int8}.0, IncSubtensor{InplaceSet;:int64:}.0, DeepCopyOp.0, IncSubtensor{InplaceSet;:int64:}.0, IncSubtensor{InplaceSet;:int64:}.0, Shape_i{1}.0, Shape_i{1}.0, <TensorType(float32, matrix)>, <TensorType(float32, vector)>, <TensorType(float32, matrix)>, <TensorType(float32, matrix)>, <TensorType(float32, vector)>, <TensorType(float32, matrix)>, <TensorType(float32, matrix)>, <TensorType(float32, vector)>, <TensorType(float32, matrix)>, <TensorType(float32, matrix)>, <TensorType(float32, vector)>, <TensorType(float32, matrix)>, InplaceDimShuffle{x,0}.0, InplaceDimShuffle{x,0}.0, InplaceDimShuffle{x,0}.0, InplaceDimShuffle{x,0}.0), '\n', 'numpy.dtype has the wrong size, try recompiling')

model.predict() for arbitrary number of sequences

After a successful model training goes prediction. Interestingly, current seq2seq implemenetation only lets you predict the same number of sequences as in your training batch... And in most cases I need a prediction only for one sequence at a time. Fix?

Code:

import numpy as np
from keras.models import Sequential
from seq2seq.seq2seq import Seq2seq


vocab_size = 10 #number of words
seq_maxlen = 3 #length of input sequence and output sequence
embedding_dim = 5 #word embedding size
hidden_dim = 50 #memory size of seq2seq
batch_size = 7

seq2seq = Seq2seq(input_length=seq_maxlen,
                  input_dim=embedding_dim,
                  hidden_dim=hidden_dim,
                  output_dim=vocab_size,
                  output_length=seq_maxlen,
                  batch_size=batch_size,
                  depth=1)

print 'Build model ...'
model = Sequential()
model.add(seq2seq)
model.compile(loss='mse', optimizer='adam')

print 'Generate dummy train data ...'
train_examples_num = batch_size
X = np.zeros((train_examples_num, seq_maxlen, embedding_dim))
Y = np.zeros((train_examples_num, seq_maxlen, vocab_size))

for train_example_idx in xrange(train_examples_num):
    for word_idx in xrange(seq_maxlen):
        w2v_vector = np.random.rand(1, embedding_dim)[0]
        X[train_example_idx][word_idx] = w2v_vector

        bool_vector = np.zeros(vocab_size)
        bool_vector[np.random.choice(vocab_size)] = 1
        Y[train_example_idx][word_idx] = bool_vector

print X.shape, X
print Y.shape, Y

print 'Fit data ...'
model.fit(X, Y, batch_size=batch_size)

print 'Generate dummy predict data ...'
# predict_data = np.zeros((batch_size, seq_maxlen, embedding_dim))    # works this way
predict_data = np.zeros((1, seq_maxlen, embedding_dim))           # doesn't :(
for word_idx in xrange(seq_maxlen):
    w2v_vector = np.random.rand(1, embedding_dim)[0]
    predict_data[0][word_idx] = w2v_vector

print predict_data.shape, predict_data

model.predict(predict_data)

Log:

...
Generate dummy predict data ...
(1, 3, 5) [[[ 0.94333402  0.77712176  0.28421576  0.5250781   0.71062316]
  [ 0.40348298  0.18514321  0.42724825  0.30273698  0.70328719]
  [ 0.50133646  0.25419223  0.21284037  0.22480233  0.4796277 ]]]
Predict vectors ...
Traceback (most recent call last):
  File "/home/nicolas/Code/seq2seq/bin/try.py", line 53, in <module>
    print model.predict(predict_data)
  File "build/bdist.linux-x86_64/egg/keras/models.py", line 499, in predict
  File "build/bdist.linux-x86_64/egg/keras/models.py", line 257, in _predict_loop
  File "/usr/local/lib/python2.7/dist-packages/theano/compile/function_module.py", line 606, in __call__
    storage_map=self.fn.storage_map)
  File "/usr/local/lib/python2.7/dist-packages/theano/compile/function_module.py", line 595, in __call__
    outputs = self.fn()
  File "/usr/local/lib/python2.7/dist-packages/theano/scan_module/scan_op.py", line 672, in rval
    r = p(n, [x[0] for x in i], o)
  File "/usr/local/lib/python2.7/dist-packages/theano/scan_module/scan_op.py", line 661, in <lambda>
    self, node)
  File "scan_perform.pyx", line 356, in theano.scan_module.scan_perform.perform (/home/nicolas/.theano/compiledir_Linux-3.19--generic-x86_64-with-Ubuntu-15.04-vivid-x86_64-2.7.9-64/scan_perform/mod.cpp:3605)
  File "scan_perform.pyx", line 350, in theano.scan_module.scan_perform.perform (/home/nicolas/.theano/compiledir_Linux-3.19--generic-x86_64-with-Ubuntu-15.04-vivid-x86_64-2.7.9-64/scan_perform/mod.cpp:3537)
ValueError: Input dimension mis-match. (input[0].shape[0] = 1, input[1].shape[0] = 7)
Apply node that caused the error: Elemwise{mul,no_inplace}(<TensorType(int8, col)>, <TensorType(float64, matrix)>)
Inputs types: [TensorType(int8, col), TensorType(float64, matrix)]

Backtrace when the node is created:
  File "build/bdist.linux-x86_64/egg/seq2seq/lstm_encoder.py", line 94, in _step
    h_mask_tm1 = mask_tm1 * h_tm1

HINT: Use another linker then the c linker to have the inputs shapes and strides printed.
HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node.
Apply node that caused the error: forall_inplace,cpu,scan_fn}(Elemwise{Composite{minimum(((i0 + i1) - i1), i2)}}.0, Subtensor{int64:int64:int8}.0, Subtensor{int64:int64:int8}.0, Subtensor{int64:int64:int8}.0, Subtensor{int64:int64:int8}.0, Elemwise{Cast{int8}}.0, IncSubtensor{InplaceSet;:int64:}.0, IncSubtensor{InplaceSet;:int64:}.0, <TensorType(float64, matrix)>, <TensorType(float64, matrix)>, <TensorType(float64, matrix)>, <TensorType(float64, matrix)>)
Inputs types: [TensorType(int64, scalar), TensorType(float64, 3D), TensorType(float64, 3D), TensorType(float64, 3D), TensorType(float64, 3D), TensorType(int8, (False, False, True)), TensorType(float64, 3D), TensorType(float64, 3D), TensorType(float64, matrix), TensorType(float64, matrix), TensorType(float64, matrix), TensorType(float64, matrix)]
Inputs shapes: [(), (3, 1, 50), (3, 1, 50), (3, 1, 50), (3, 1, 50), (3, 1, 1), (1, 7, 50), (1, 7, 50), (50, 50), (50, 50), (50, 50), (50, 50)]
Inputs strides: [(), (400, 400, 8), (400, 400, 8), (400, 400, 8), (400, 400, 8), (1, 1, 1), (2800, 400, 8), (2800, 400, 8), (400, 8), (400, 8), (400, 8), (400, 8)]
Inputs values: [array(3), 'not shown', 'not shown', 'not shown', 'not shown', array([[[0]],

       [[1]],

       [[1]]], dtype=int8), 'not shown', 'not shown', 'not shown', 'not shown', 'not shown', 'not shown']

Question: Understanding AttentionDecoder Implementation.

I am trying to understand the code, but it's hard for me. This is my first project using keras.
To my understanding, the _step function can be divided in two parts:
Part1.- Compute the weighted observacion (v)

        s_tm1 = K.repeat(c_tm1, self.input_length)
        e = H + s_tm1

Is this a concatenation or just a regular sum? Now "e" is in the form of (nb_samples, nb_input_timesteps, hidden_dim+c_tm1_dim)?

       def a(x, states):
            output = K.dot(x, w_a) + b_a
            return output, []
        _, energy, _ = K.rnn(a, e, [], mask=None)
        energy = activations.get('linear')(energy) 
        energy = K.permute_dimensions(energy, (2, 0, 1))
        energy = energy[0]

Is the linear activation line really needed? It seems to return the same input. I don't understand why you permute dimmensions and then take the first element.

alpha = K.softmax(energy)

Can all this code from the K.rnn(a) till the K.softmax be expressed as a TimeDistributedDense(activation='softmax') over the sequence "e"?

        alpha = K.repeat(alpha, self.hidden_dim)
        alpha = K.permute_dimensions(alpha, (0, 2 , 1))
        weighted_H = H * alpha
        v = K.sum(weighted_H, axis=1)

Part 2.- Feed as that V as an input it to a standard LSTM (The code seems like the standard implementation of the LSTM, am I right ?)

Wrong imports

Please look at lstm_decoder.py, line 151:
base_config = super(FeedbackLSTM, self).get_config() (FeedbackLSTM never defined)

and lstm_encoder, line 143:
base_config = super(LSTM, self).get_config() (LSTM never defined)

change keras required version to 0.3.2

hi,

for some reason when I try to install it via pip it downgrade keras to 0.3.1 and it does not seem to work well with it, because keras.backend.rnn has argument "masking" in pip's keras 0.3.1 and you are using "maks" as in new 0.3.2

Ben

lstms list in AttentionSeq2seq

Could you explain the point of using the lstms list in the constructor of AttentionSeq2seq?

Some question regarding seq2seq library

Hi, thanks for sharing the wonderful library which can greatly help me implement a seq2seq model. However, while using the library, I found some problems.

1.The examples in seq2seq package uses mse loss function while in my experiments, I used categorical_crossentropy loss because the output layer is a softmax over vocabulary. To my surprise, the model using categorical_crossentropy performs poorly and the loss is very large. But if I change loss function to mse, as the example codes do, the model performs well. I'm curious why mse loss function performs better than categorical_crossentropy.

2.I found that a 1-layer(depth) seq2seq runs more slowly than a 3-layer(depth) seq2seq model, which is very strange. The time for training a batch in a 1-layer seq2seq is about 2000 seconds while training one batch in a 3-layer seq2seq only needs about 950 seconds. The model is as follows:

print('Build model...')
model = Sequential()
model.add(Embedding(max_features, 100, input_length=maxlen, mask_zero=True,batch_input_shape=(batch_size,maxlen)))
seq2seq = Seq2seq(batch_input_shape=(batch_size,maxlen,100),
hidden_dim=50,
output_dim=nb_classes,
output_length=maxlen,
depth=(1,1)) #depth=(3,3)
model.add(seq2seq)

Could you help me?

Thanks,
Tao

AssertionError: Keyword argument not understood: batch_size

Hello,I have met some error "AssertionError: Keyword argument not understood: batch_size",
Is it not right to pass the argument batch_size to Seq2seq?
code is
`
import keras
from keras.models import Sequential
from keras.layers.embeddings import Embedding
from seq2seq.models import Seq2seq
from keras.preprocessing import sequence

vocab_size = 20000 #number of words
maxlen = 100 #length of input sequence and output sequence
embedding_dim = 200 #word embedding size
hidden_dim = 500 #memory size of seq2seq

embedding = Embedding(vocab_size, embedding_dim, input_length=maxlen)
seq2seq = Seq2seq(input_length=maxlen, input_dim=embedding_dim,hidden_dim=hidden_dim,
output_dim=embedding_dim, output_length=maxlen, batch_size=10, depth=4)

model = Sequential()
model.add(embedding)
model.add(seq2seq)`

Model Taking an exceedingly long time to compile/run

Hey, I'm trying to run your model on tensors of the shape
X=(16041, 100, 2) and Y=(16041, 50, 2)
The model is taking upwards of hours to compile (I have yet to get it to successfully compile) on FAST_RUN

on FAST_COMPILE the model takes 3 million seconds + for one epoch. Is this normal behavior? Do you have an example of the code working on a real dataset in a timely fashion? Thanks!

AttentionSeq2seq Shape Mismatches

I have x_train.shape = (12097, 5, 1) and y_train.shape=(12097, 10, 1).

I initialize a model:
model = AttentionSeq2seq(input_dim=1, input_length=x_train.shape[1], hidden_dim=64, output_length=y_train.shape[1], output_dim=1)
model.compile(loss='mse', optimizer='rmsprop')

I try to fit: model.fit(x_train, y_train,nb_epoch=1,batch_size=16,verbose=1)

And immediately I get the error:
Shape mismatch: x has 64 cols (and 16 rows) but y has 1 rows (and 64 cols)

For some reason the batch size isn't being applied to y samples.

Similar thing happens with Seq2seq model, where I specify input_batch_size = (16,x_train.shape[1],1), except the error comes at the end of the epoch, again having to do with the dimensionality of y.

Are my data or model initialization wrong? Worked fine with SimpleSeq2seq

'non_trainable_weights' and most recent Keras update

With the most recent updates to Keras and seq2seq, I am getting the following error when loading weights:

In [19]: model.load_weights('model_WSpark_weights.h5')
--------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-19-ad9285e31961> in <module>()
----> 1 model.load_weights('model_WSpark_weights.h5')

/home/ubuntu/miniconda3/lib/python3.5/site-packages/Keras-0.3.1-py3.5.egg/keras/models.py in load_weights(self, filepath)
    851             g = f['layer_{}'.format(k)]
    852             weights = [g['param_{}'.format(p)] for p in range(g.attrs['nb_params'])]
--> 853             self.layers[k].set_weights(weights)
    854         f.close()
    855 

/home/ubuntu/miniconda3/lib/python3.5/site-packages/Keras-0.3.1-py3.5.egg/keras/layers/containers.py in set_weights(self, weights)
    156     def set_weights(self, weights):
    157         for i in range(len(self.layers)):
--> 158             nb_param = len(self.layers[i].trainable_weights) + len(self.layers[i].non_trainable_weights)
    159             self.layers[i].set_weights(weights[:nb_param])
    160             weights = weights[nb_param:]

AttributeError: 'Bidirectional' object has no attribute 'non_trainable_weights'

Use AttentionDecoder as a vanilla AttentionLSTM

Is it possible to use the class AttentionDecoder (because there is no AttentionEncoder afaik) as a standard LSTM equipped with Attention ?
Assuming the (h_i) play the role of input observations ?

Would it work ?
Thank you !

Question: Using AttentionDecoder without an encoder

Hello,

Reading the paper "Neural Machine Translation by Jointly learning to transcribe and align" I thought that the only input required by the Decoder was a sequence of annotations.
Am I supposed to be able to use your AttentionDecoder without an encoder? I mean, I have a set of annotations comming from a CNN similar to the paper "Show, Attend and Tell: Neural Image Caption
Generation with Visual Attention". There is no "hidden state" from the encoder to initialize de decoder. Also I had to force the input_shape parameter (even if its not the first layer in the model).

I have a layer whose output_shape is (n_timesteps, feature_dimmension) and I would like to add an
AttentionDecoder layer (output_length=16).
Before my final layer that is a TimeDistributedDense(nb_classes,activation='softmax')

Looking at your attention model example I see you create the attentiondecoder layer like this:
AttentionDecoder(hidden_dim=hidden_dim, output_length=output_length, state_input=False, **kwargs)

In my code I tried to use:
model.add(AttentionDecoder(hidden_dim=128, output_length=16, state_input=False))
that fails with an error:
Traceback (most recent call last):
File "", line 1, in
File "./seq2seq/seq2seq/layers/decoders.py", line 44, in init
super(LSTMDecoder, self).init(**kwargs)
File "./seq2seq/seq2seq/layers/state_transfer_lstm.py", line 13, in init
super(StateTransferLSTM, self).init(**kwargs)
TypeError: init() takes at least 2 arguments (1 given)

Then I tried to specify the input_shape
model.add(AttentionDecoder(hidden_dim=128, output_length=16, state_input=False,input_shape=(1,model.layers[-1].output_shape[-1])))

This allows me to compile the model, but then it fails on training due to a shape mismatch on matrices. It seems related to the difference between input shape and hidden_dim.
What would be the right way to insert an attentiondecoder into this kind of setup?

Invalid layer: SimpleSeq2seq

when I try to use model_from_json(open(NN_MODEL_PATH1).read() to read the model from the model text. i have this trouble . thanks for your time.

Implementation question

in your Seq2Seq, to layers you append LSTMEncoders which return sequences, then the final LSTMEncoder, then a Dense layer. After that you add an LSTMDecoder just to then add more layers of LSTMEncoders. Why are you appending the encoders after the decoder?
in the LSTMDecoder class you implement LSTM but you add an additional transformation x_t = K.dot(h_t, w_x) + b_x which is like adding a second layer, why not use the hidden output (h_t) directly?
if you stacked layers of these decoders you would not achieve to feed the output of the highest decoder layer back to the lowest, what would happen is that in each layer the decoder output is fed back to its own input. Is this a feature or a bug?

dimension mismatch error with attention model

Hi, I got an dimension mismatch error with attention model.

my model is:
model.add(embeddings.Embedding(input_dim=len(chars)+1, output_dim=EMB_DIM, input_length=MAXLEN)) seq2seq = AttentionSeq2seq(input_dim=EMB_DIM, input_length=MAXLEN, hidden_dim=HIDDEN_SIZE, output_length=MAXLEN, output_dim=len(chars), depth=1) model.add(seq2seq)
in the model:
len(chars) = 1408, HIDDEN_SIZE=1000

and got the error:
ValueError: dimension mismatch in args to gemm (100,1000)x(1408,1000)->(100,1000)

then I try to make the HIDDEN_SIZE equal to len(chars), then model can run. is this a parameter error in attention model?

AttributeError: 'list' object has no attribute 'shape'

model.save_weights() doesn't work.

Code:

import numpy as np
from keras.models import Sequential
from seq2seq.seq2seq import Seq2seq


vocab_size = 10 #number of words
maxlen = 3 #length of input sequence and output sequence
embedding_dim = 5 #word embedding size
hidden_dim = 50 #memory size of seq2seq
batch_size = 7

seq2seq = Seq2seq(input_length=maxlen,
                  input_dim=embedding_dim,
                  hidden_dim=hidden_dim,
                  output_dim=vocab_size,
                  output_length=maxlen,
                  batch_size=batch_size,
                  depth=1)

print 'Build model ...'
model = Sequential()
model.add(seq2seq)
model.compile(loss='mse', optimizer='adam')

model_full_path = "./saved_model.pkl"
model.save_weights(model_full_path, overwrite=True)

Log:

Traceback (most recent call last):
  File "/home/nicolas/Code/seq2seq/bin/try.py", line 26, in <module>
    model.save_weights(model_full_path, overwrite=True)
  File "build/bdist.linux-x86_64/egg/keras/models.py", line 555, in save_weights
AttributeError: 'list' object has no attribute 'shape'

RuntimeError: maximum recursion depth exceeded while calling a Python object

When I run Seq2seq model with attention，this error appears.

Layer Attribute Error when building seq2seq.py script

Hey Fariz,

I'm having a basic error that I think should be easy to resolve. When I use seq2seq.py, I keep getting the following error: Note that this occurs when I define the seq2seq model:

seq2seq = Seq2seq(
    input_length=x_maxlen*x_sent_len, 
    input_dim=word2vec_dimension,
    hidden_dim=hidden_variables_encoding,
    output_dim=y_matrix_axis, 
    output_length=y_sent_len*2,
    batch_size = batch_size/2, 
    depth=4)

AttributeError: 'Seq2seq' object has no attribute 'layers'

This occurs on line 61, when you do self.add(l).

Before line 61, if I write in:

self = Sequential()

It resolves the error, but then I can't add any more layers after the seq2seq model on my main script. Again I'm faced with the same error:

AttributeError: 'Seq2seq' object has no attribute 'layers'

Thanks alot! I feel that I should be able to figure this out, but I"m just not getting it.

Seq2seq batch input shape problem

With the latest Github versions of Keras and Seq2seq, the following example

from seq2seq.models import Seq2seq
model = Seq2seq(batch_input_shape=(16, 7, 5), hidden_dim=10, output_length=8, output_dim=20, depth=4)

returns an error message:

Exception: Invalid input shape - Layer expects input ndim=2, was provided with input shape (16, 7, 5)

If input_dim or input_shape are specified in addition to batch_input_shape (or instead of it), it returns

Exception: If a RNN is stateful, a complete input_shape must be provided (including batch size).

does the sequence y support embeeding and mask？

Seq2seq is broken: StateTransferLSTM.get_output needs to be updated to changes in LSTM

missing these lines:
https://github.com/fchollet/keras/blob/master/keras/layers/recurrent.py#L187-L188

Issue with SoftShuffle

Could not compile the model

    model = SoftShuffle(input_dim = 1, input_length = 10)
    model.compile(loss='mse', optimizer='rmsprop')

IndexError: index 3 is out of bounds for axis 0 with size 3

Dear humans,
either I'm missimg something or there is a bug in Fariz's seq2seq implementation. What's your bet?

Code:

import numpy as np
from keras.models import Sequential
from seq2seq.seq2seq import Seq2seq


vocab_size = 10 #number of words
maxlen = 3 #length of input sequence and output sequence
embedding_dim = 5 #word embedding size
hidden_dim = 50 #memory size of seq2seq
batch_size = 7

seq2seq = Seq2seq(input_length=maxlen,
                  input_dim=embedding_dim,
                  hidden_dim=hidden_dim,
                  output_dim=vocab_size,
                  output_length=maxlen,
                  batch_size=batch_size,
                  depth=1)

print 'Build model ...'
model = Sequential()
model.add(seq2seq)
model.compile(loss='mse', optimizer='adam')

print 'Generate dummy data ...'
train_examples_num = batch_size
X = np.zeros((train_examples_num, maxlen, embedding_dim))
Y = np.zeros((train_examples_num, maxlen, vocab_size))

for train_example_idx in xrange(train_examples_num):
    for word_idx in xrange(maxlen):
        w2v_vector = np.random.rand(1, embedding_dim)[0]
        X[train_example_idx][word_idx] = w2v_vector

        bool_vector = np.zeros(vocab_size)
        bool_vector[np.random.choice(vocab_size)] = 1
        Y[train_example_idx][word_idx] = bool_vector

print X.shape, X
print Y.shape, Y

print 'Fit data ...'
model.fit(X, Y)

Log:

Build model ...
/usr/local/lib/python2.7/dist-packages/theano/scan_module/scan_perform_ext.py:133: RuntimeWarning: numpy.ndarray size changed, may indicate binary incompatibility
  from scan_perform.scan_perform import *
Generate dummy data ...
(7, 3, 5) [[[ 0.88256245  0.77128004  0.11195964  0.0918089   0.57508599]
  [ 0.23917097  0.69324662  0.5007371   0.08220433  0.25601655]
  [ 0.31997691  0.96215576  0.37516188  0.4564258   0.44645146]]

 [[ 0.1164474   0.68527686  0.48801347  0.06237132  0.64461641]
  [ 0.21418609  0.56414103  0.69280567  0.09577648  0.46501309]
  [ 0.59522824  0.82593701  0.8952664   0.61032139  0.60784708]]

 [[ 0.50277342  0.18204284  0.6920746   0.23992536  0.5031889 ]
  [ 0.24719549  0.39098328  0.84927183  0.93091596  0.93981078]
  [ 0.76817661  0.68241358  0.97509582  0.78777374  0.41076285]]

 [[ 0.83762506  0.76151013  0.06292322  0.71097064  0.77048028]
  [ 0.78948919  0.77401108  0.39082489  0.66905667  0.54795132]
  [ 0.74940861  0.26011439  0.23257989  0.87033028  0.88954607]]

 [[ 0.98032484  0.29076576  0.76085615  0.53828208  0.92028479]
  [ 0.81111357  0.52959467  0.41101679  0.39434533  0.47918241]
  [ 0.18741232  0.68735943  0.27534715  0.18796185  0.89010293]]

 [[ 0.00484476  0.38136868  0.55200039  0.36352682  0.65304447]
  [ 0.19502873  0.86442676  0.82170956  0.90937185  0.93152998]
  [ 0.71814645  0.47181875  0.99475651  0.24588243  0.13357496]]

 [[ 0.92716079  0.83195725  0.50047687  0.86742848  0.27778597]
  [ 0.90902709  0.60421839  0.17206286  0.53972434  0.9863197 ]
  [ 0.63227496  0.14045515  0.88635036  0.72415621  0.88298206]]]
(7, 3, 10) [[[ 1.  0.  0.  0.  0.  0.  0.  0.  0.  0.]
  [ 0.  0.  0.  0.  1.  0.  0.  0.  0.  0.]
  [ 0.  0.  0.  0.  0.  0.  1.  0.  0.  0.]]

 [[ 0.  0.  0.  0.  0.  0.  0.  1.  0.  0.]
  [ 1.  0.  0.  0.  0.  0.  0.  0.  0.  0.]
  [ 0.  1.  0.  0.  0.  0.  0.  0.  0.  0.]]

 [[ 0.  0.  0.  1.  0.  0.  0.  0.  0.  0.]
  [ 0.  0.  0.  1.  0.  0.  0.  0.  0.  0.]
  [ 1.  0.  0.  0.  0.  0.  0.  0.  0.  0.]]

 [[ 0.  0.  1.  0.  0.  0.  0.  0.  0.  0.]
  [ 0.  0.  0.  0.  1.  0.  0.  0.  0.  0.]
  [ 0.  0.  0.  0.  0.  0.  0.  0.  1.  0.]]

 [[ 0.  0.  0.  0.  0.  0.  1.  0.  0.  0.]
  [ 0.  0.  0.  0.  1.  0.  0.  0.  0.  0.]
  [ 0.  0.  0.  0.  1.  0.  0.  0.  0.  0.]]

 [[ 0.  0.  0.  0.  0.  0.  1.  0.  0.  0.]
  [ 0.  1.  0.  0.  0.  0.  0.  0.  0.  0.]
  [ 0.  0.  1.  0.  0.  0.  0.  0.  0.  0.]]

 [[ 1.  0.  0.  0.  0.  0.  0.  0.  0.  0.]
  [ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
  [ 0.  0.  0.  0.  0.  0.  0.  1.  0.  0.]]]
Fit data ...
Epoch 1/100
Traceback (most recent call last):
  File "/home/nicolas/Code/seq2seq/bin/try.py", line 43, in <module>
    model.fit(X, Y)
  File "build/bdist.linux-x86_64/egg/keras/models.py", line 495, in fit
  File "build/bdist.linux-x86_64/egg/keras/models.py", line 216, in _fit
  File "/usr/local/lib/python2.7/dist-packages/theano/compile/function_module.py", line 606, in __call__
    storage_map=self.fn.storage_map)
  File "/usr/local/lib/python2.7/dist-packages/theano/compile/function_module.py", line 595, in __call__
    outputs = self.fn()
  File "/usr/local/lib/python2.7/dist-packages/theano/gof/op.py", line 768, in rval
    r = p(n, [x[0] for x in i], o)
  File "/usr/local/lib/python2.7/dist-packages/theano/tensor/subtensor.py", line 2088, in perform
    out[0] = inputs[0].__getitem__(inputs[1:])
IndexError: index 3 is out of bounds for axis 0 with size 3
Apply node that caused the error: AdvancedSubtensor(Subtensor{int64::}.0, Subtensor{int64}.0, Subtensor{int64}.0)
Inputs types: [TensorType(float64, 3D), TensorType(int64, vector), TensorType(int64, vector)]
Inputs shapes: [(3, 7, 10), (21,), (21,)]
Inputs strides: [(560, 80, 8), (8,), (8,)]
Inputs values: ['not shown', 'not shown', 'not shown']

Backtrace when the node is created:
  File "build/bdist.linux-x86_64/egg/keras/models.py", line 75, in weighted
    filtered_y_pred = y_pred[weights.nonzero()[:-1]]

Repo with the source for your convenience: https://github.com/nicolas-ivanov/seq2seq

Decoding custom vector

I'm trying to get an output of a decoder for a specific input vector.

Layers are:

[<seq2seq.layers.encoders.LSTMEncoder at 0x7fb57c8c1810>,
 <keras.layers.core.Dropout at 0x7fb57c8c1990>,
 <seq2seq.layers.encoders.LSTMEncoder at 0x7fb57b29dc90>,
 <keras.layers.core.Dropout at 0x7fb57c7e1a50>,
 <keras.layers.core.Dense at 0x7fb57a3b5cd0>,
 <seq2seq.layers.decoders.LSTMDecoder at 0x7fb57b29dd50>,
 <seq2seq.layers.encoders.LSTMEncoder at 0x7fb57c80a790>,
 <keras.layers.core.Dropout at 0x7fb57c80a610>,
 <keras.layers.core.TimeDistributedDense at 0x7fb57c80a4d0>]

and the code is:

import seq2seq

model = Seq2seq(batch_input_shape=(16, 10, 5), hidden_dim=10, output_length=10, output_dim=5, depth=2)
model.compile(loss='categorical_crossentropy', optimizer='rmsprop')

# .. some data processing
model.fit(hot_input_tensor, hot_input_tensor, nb_epoch=100, verbose=0) # it fits just fine

encoding = model.layers[4].get_output(train=False).eval({model.layers[0].input: to_encode}) # works fine
to_decode = encoding

print model.layers[-1].get_output(train=False).eval({model.layers[5].get_input(): to_decode}) # fails

decoder = theano.function([model.layers[5].input], model.get_output(train=False), on_unused_input='warn') # fails too
print decoder([to_decode])

with error theano.function was asked to create a function computing outputs given certain inputs, but the provided input variable at index 0 is not part of the computational graph needed to compute the outputs: Elemwise{add,no_inplace}.0. and if I set higher level of verbosity it gives MissingInputError: ("An input of the graph, used to compute DimShuffle{1,0,2}(<TensorType(float32, 3D)>), was not provided and not given a value.Use the Theano flag exception_verbosity='high',for more information on this error.", <TensorType(float32, 3D)>)

but the same thing works just fine for ordinary keras models:

model = Sequential()
model.add(Dense(10, input_dim=2, init='uniform'))
model.add(Activation('tanh'))
model.add(Dropout(0.5))
model.add(Dense(10, init='uniform'))
model.add(Activation('tanh'))
model.add(Dropout(0.5))
model.add(Dense(3, init='uniform'))
model.add(Activation('softmax'))

sgd = SGD(lr=0.1, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='categorical_crossentropy',
              optimizer=sgd)

model.fit(X_train, y_train,
          nb_epoch=20,
          batch_size=16,
          show_accuracy=True,
          verbose=0)

print model.evaluate(X_test, y_test, batch_size=16)

import numpy as np
slice_id = 3
reprenestation = model.layers[slice_id].get_output(train=False) \
                                       .eval({model.layers[0].input: X_train.astype(np.float32)})

print reprenestation.shape
model.layers[-1].get_output(train=False).eval({model.layers[slice_id+1].get_input(): reprenestation})

full code here: https://gist.github.com/MInner/a5c88e6f31ab6fc0ee00

Python 3 support

There's some small issues in bidirectional.py (nothing too bad, python 3's pickle library is pickle instead of cPickle and the division in some of the indexing doesn't work because it needs to be integers) that make it so the code doesn't run in python 3. I have this fixed on a local copy (should work for both 2 and 3), do you accept PRs?

RuntimeError: maximum recursion depth exceeded while calling a Python object

Hi, thanks for sharing the great code. I met an error when using the AttentionSeq2seq model.The error information:

Traceback (most recent call last):
File "train.py", line 182, in
model.compile(loss='mse', optimizer='adam')
File "build/bdist.linux-x86_64/egg/keras/models.py", line 467, in compile
File "build/bdist.linux-x86_64/egg/keras/optimizers.py", line 250, in get_updates
File "build/bdist.linux-x86_64/egg/keras/optimizers.py", line 47, in get_gradients
File "build/bdist.linux-x86_64/egg/keras/backend/theano_backend.py", line 402, in gradients
File "/users2/qfzhu/anaconda/lib/python2.7/site-packages/theano/gradient.py", line 545, in grad
grad_dict, wrt, cost_name)
File "/users2/qfzhu/anaconda/lib/python2.7/site-packages/theano/gradient.py", line 1283, in _populate_grad_dict
rval = [access_grad_cache(elem) for elem in wrt]
File "/users2/qfzhu/anaconda/lib/python2.7/site-packages/theano/gradient.py", line 1241, in access_grad_cache
term = access_term_cache(node)[idx]
File "/users2/qfzhu/anaconda/lib/python2.7/site-packages/theano/gradient.py", line 951, in access_term_cache
output_grads = [access_grad_cache(var) for var in node.outputs]
File "/users2/qfzhu/anaconda/lib/python2.7/site-packages/theano/gradient.py", line 1241, in access_grad_cache
term = access_term_cache(node)[idx]
File "/users2/qfzhu/anaconda/lib/python2.7/site-packages/theano/gradient.py", line 951, in access_term_cache
output_grads = [access_grad_cache(var) for var in node.outputs]
...
...

File "/users2/qfzhu/anaconda/lib/python2.7/site-packages/theano/scan_module/scan_utils.py", line 1030, in local_traverse
rval += local_traverse(inp, x)
File "/users2/qfzhu/anaconda/lib/python2.7/site-packages/theano/scan_module/scan_utils.py", line 1030, in local_traverse
rval += local_traverse(inp, x)
File "/users2/qfzhu/anaconda/lib/python2.7/site-packages/theano/scan_module/scan_utils.py", line 1030, in local_traverse
rval += local_traverse(inp, x)
File "/users2/qfzhu/anaconda/lib/python2.7/site-packages/theano/scan_module/scan_utils.py", line 1030, in local_traverse
rval += local_traverse(inp, x)
File "/users2/qfzhu/anaconda/lib/python2.7/site-packages/theano/scan_module/scan_utils.py", line 1023, in local_traverse
if equal_computations([graph], [x]):
File "/users2/qfzhu/anaconda/lib/python2.7/site-packages/theano/scan_module/scan_utils.py", line 410, in equal_computations
if x not in in_xs and x.type != y.type:
File "/users2/qfzhu/anaconda/lib/python2.7/site-packages/theano/gof/utils.py", line 127, in ne
return not self == other
File "/users2/qfzhu/anaconda/lib/python2.7/site-packages/theano/tensor/type.py", line 260, in eq
return type(self) == type(other) and other.dtype == self.dtype
RuntimeError: maximum recursion depth exceeded while calling a Python object

When I use the Seq2seq layer, the model works well. However when i changes to the AttentionSeq2seq layer, i meet this error.Here is my model with AttentionSeq2seq:

model = Sequential()
model.add(embeddings.Embedding(input_dim=len(chars)+1, output_dim=EMB_DIM, input_length=MAXLEN))
seq2seq = AttentionSeq2seq(input_dim=EMB_DIM, input_length=MAXLEN, hidden_dim=HIDDEN_SIZE, output_length=MAXLEN, output_dim=len(chars), depth=1)
model.add(seq2seq)
model.compile(loss='mse', optimizer='adam')

And I am using the keras=0.3.1, numpy=1.10.4, seq2seq=0.0.2

Besides, I changed the parameter "masking=False" to "mask=None" in the decoders.py, due to an unexpected param error, I don't know if it is the reason of the error mentioned before.

Thanks a lot

Handling of depth=1 in SimpleSeq2seq is inconsistent with output_dim

In all other cases (depth>1 or Seq2Seq) the usage of output_dim is to define an output generated from the LSTM hidden size using Dropout and TimeDistributedDense

However in the case of SimpleSeq2seq when depth happens to be 1 the output_dim is used to replace the hidden size on the ouput RNN.

I tripped over this because I feed a large number to ouput_dim (vocabulary size) and that required huge matrix to handle an RNN which such a huge hidden size which cause the code to hang (when using theano+gpu)
In any case this was not what I expected...

add tensorflow support

tensorflow is not supported, but keras supports tensorflow

Batch Size in Seq2Seq Model Causing Mismatch Dimension Error

Hey Fariz, I've been really struggling with this error, and frankly, I'm not sure how to address it.

You give a batch_size argument for your seq2seq model, and I'm wondering how this batch_size should relate to the batch_size parameter of model.fit

When I make batch_size of model.fit and seq2seq equivalent: I get the following error:

IndexError: index 4 is out of bounds for axis 1 with size 4

I have tried making the batch_size of seq2seq one less and one more than the model.fit batch_size and I'm met with this error:

Input dimension mis-match. Input 1 (indices start at 0) has shape[0] == 3, but the output's size on that axis is 4.

What exactly the difference between the batch_size of seq2seq and batch_size of model.fit()? Any help is greatly appreciated!

Model compile Error with the latest keras

I use the latest keras, but there is an error when compiling the model.
`File "C:\Users\Hisense\Desktop\Testseq2seq\lib\nn_model\model.py", line 33, in get_nn_model
model.compile(loss='mse', optimizer='rmsprop')

File "D:\Anaconda2\lib\site-packages\keras\models.py", line 408, in compile
self.y_train = self.get_output(train=True)

File "D:\Anaconda2\lib\site-packages\keras\layers\containers.py", line 128, in get_output
return self.layers[-1].get_output(train)

File "D:\Anaconda2\lib\site-packages\keras\layers\core.py", line 1101, in get_output
X = self.get_input(train)