keras-team / keras Goto Github PK

View Code? Open in Web Editor NEW

61.5K 61.5K 19.4K 41.99 MB

Deep Learning for humans

Home Page: http://keras.io/

License: Apache License 2.0

Python 99.95% Shell 0.05%

data-science deep-learning jax machine-learning neural-networks python pytorch tensorflow

keras's People

Contributors

Stargazers

Watchers

Forkers

vasili-zolotov cosmith khalednasr adelq wgmueller1 nagadomi thebennos npow domenicosolazzo kedarbellare chrish42 sarvex bussiere ml-ai-nlp-ir jack-qiu cfandy zach14c zclfly snazz2001 jmrinaldi untom alouisos aleju vickkyy simudream charlesollion aptr322 wavelets ayoubkw capybaralet shyamalschandra dengcy028 hashsolo jvarley fduwjj dnuffer jfsantos song-tu 3dconv chagge nkhuyu logpie fdoperezi felsen chenglongchen mindis josephwinston shuimu mizdler flyingdisc xiaozhouwang kaynewest gregbowyer adi12 farukgencel haychris amos-zq voidexception daishichao beronx86 luis-wang zhoujialinmumu qiuyuew zhangyuancv lizhen-dlut lizhangzhan provemyself yliuhb xuzhuoran0106 amoliu vincent-poon trialanderror3481 xsongx twistedmove kaishengyao sungsingsong tigerneil sleepingkoala yangspeaking maoyuzhao davidwang8088 gunugantiaditya369 alexsisu txd866 ririw weilinear nshrk hkyang hanjun-dai bendalexis rtvt123 davideberti brakaus1 magiczhao nimmen defaultrobot gchrupala hengqujushi alihalabyah gaoyuankidult

keras's Issues

Does this support validation set while training to see generalization bound of the model?

Extract weight matrix

Is it possible to extract the weight matrix from a network?

Not sure of this is how it's to work or not (seems to be a Theano issue) but Embedding layers do not work in batch mode. They work fine as first layers in recurrent net builds or for batch size =1 in pure feedforward ones but setting the batch size to 16 with an embedding layer included yields the following error:

ValueError: Input dimension mis-match. (input[0].shape[1] = 1, input[2].shape[1] = 16)
Apply node that caused the error: Elemwise{Composite{((i0 + i1) - i2)}}[(0, 0)](Reshape{3}.0, InplaceDimShuffle{x,x,0}.0, InplaceDimShuffle{x,0,1}.0)
Inputs types: [TensorType(float64, 3D), TensorType(float64, (True, True, False)), TensorType(float64, (True, False, False))]
Inputs shapes: [(16, 1, 1), (1, 1, 1), (1, 16, 1)]
Inputs strides: [(8, 8, 8), (8, 8, 8), (128, 8, 8)]
Inputs values: ['not shown', array([[[ 0.]]]), 'not shown']

Accessing internal states

Hey guys, cool project. The theano interface itself was really horrific and off-putting.

Maybe I'm doing it wrong but is there any way to access the activations of different layers? Similar to predict but only computed half way. Would be really useful for analysis and the likes.

Connecting one layer with two other layers

Hi,

Can one create a layer with this library that is connected to two other layers and not only to one?

For example - one can apply a conv and then max pooling on an image and call this layer 1 and then apply only a conv on the original image and call this layer 2. Now we can create a fully connected layer that will be connected to layer1 and layer2. Therefore the network is not linear and can be any kind of a directed acyclic graph.

Thank you!

Add interrupt handlers

Since many experiments can take a while and will be running on an environment where the user does not have a lot of control (e.g., a shared cluster), it would be interesting to have interrupt handlers to do something in case the operating system sends a signal to kill the process during the execution of the fit method. Blocks does this by using the signal module (which is part of the standard library). That way, you can save the current model state (using pickle, for example) before letting the OS kill your process. Would you be interested in adding something similar to the fit method in model.py?

Activation penalties

I am just starting to explore kera and if I understand the layout, it seems like penalty/constraints are not really abstracted to the extent that other concepts are. Is there some obvious reason this would not work or be dangerous?

For example, I could imagine applying generic penalties to either weights or the activations. Like a sparsity inducing KL penalty that I typically want to apply to activations. If it was fully abstracted, I could try to apply it to the weights of some layer. This would be strange but it seems like it would maximize modularity and separation of concepts.

It seems like PR77 #77 is moving toward a kind of specialized penalty and there is already an L2/L1 penalty in the optimizers.

any plan to add the support of maxout

It seems that maxout works well for many cases.
And you can find the definition here: http://techtalks.tv/talks/maxout-networks/58135/

Autoencoder Architechture

Given the discussion of weight initializations, any opinions on how an autoencoder architecture should be added?

-Stopping conditions for each in a pre-created set of layers [vs.] manual saving -> layer addition -> compile -> start next level of training?
-Noise addition (perhaps as a layer) over a distribution?
-Used only as a pretraining device [vs.] allow backproagation to create and encoder + decoder?

How can I compute a meaningful clip_norm threshold for my particular Network?

Model training diverges after some level ?

Here my training output with Sequential model. As you can see model diverges after epoch 10. Any ideas about the reason?

Epoch 0
61878/61878 [==============================] - 5s - loss: 1.1788
Epoch 1
61878/61878 [==============================] - 5s - loss: 1.0403
Epoch 2
61878/61878 [==============================] - 5s - loss: 0.9919
Epoch 3
61878/61878 [==============================] - 5s - loss: 0.9397
Epoch 4
61878/61878 [==============================] - 5s - loss: 0.8915
Epoch 5
61878/61878 [==============================] - 4s - loss: 0.8484
Epoch 6
61878/61878 [==============================] - 5s - loss: 0.8145
Epoch 7
61878/61878 [==============================] - 5s - loss: 0.7909
Epoch 8
61878/61878 [==============================] - 4s - loss: 0.7627
Epoch 9
61878/61878 [==============================] - 5s - loss: 0.7407
Epoch 10
61878/61878 [==============================] - 6s - loss: 13.3614
Epoch 11
61878/61878 [==============================] - 5s - loss: 26.6396
Epoch 12
61878/61878 [==============================] - 5s - loss: 26.6453
Epoch 13
61878/61878 [==============================] - 5s - loss: 26.6462
Epoch 14
61878/61878 [==============================] - 5s - loss: 26.6461
Epoch 15
61878/61878 [==============================] - 5s - loss: 26.6470
Epoch 16
61878/61878 [==============================] - 5s - loss: 26.6468
Epoch 17
61878/61878 [==============================] - 5s - loss: 26.6465
Epoch 18
61878/61878 [==============================] - 3s - loss: 26.6468
Epoch 19
61878/61878 [==============================] - 3s - loss: 26.6469

Rename Time Distributed dense/softmax

I'm thinking that "Temporal" is a better prefix that "Time Distributed". It fits most papers better, and means essentially the same thing.

Create a setup.py

Awesome library. I've been looking for something with an LSTM that's this simple for some time. I can only seem to run the scripts from within the keras folder. I added the location for the keras directory, downloaded from git, to my sys.path and I can't import the keras module.

cifar10.py - imports cPickle error

Line 6:

 import six.moves.cPickle

The following code changes fixed the issue for me:
Code change at Line 6:

  from six.moves import cPickle

and at Line 20:

  d = cPickle.load(f)

How to save and load model?

Hello,
Thank you for this module, it looks awesome.
I am using the cifar10_cnn example. Is there any efficient way to save and load the trained network?

I tried to use cPickle on model but I hit the "Maximum recursion depth" error...

preprocessing utils would greatly benefit from sklearn

These preprocessing utils would greatly benefit from a fast Cython rewrite.

Preprocessing utils would greatly benefit from sklearn.feature_extraction.text, no?
Or do you want to keep dependencies low and have more fine-grained vectorization?

Adding Batch Size as explicit parameter for Batch Normalization layer

I think it would be much more clear and easy to have a "batch size" parameter separately for Batch Normalization layer. We can just directly pass the outputs of our convolution or pooling layers to it. The layers as a whole will be more coherent.
(As an aside, did anyone have any luck with batch normalization? I tried many times, but actually got worse results most of the time.)

no pip yet?

tried to install keras with pip install keras, but what I got looks like this:

Collecting keras
  Could not find a version that satisfies the requirement keras (from versions: )
  No matching distribution found for keras`

pooling size > stride

Hey,

Last I checked, theano did not support max-pooling op with size > stride. For example:

MaxPooling2D(poolsize=(3, 3), poolstride=(2,2))

Does keras support it using the cudnn backend?

btw, great work guys!

early stopping

Does Keras support early stopping right now? I have tried to implement this feature by myself but hope to know if the library supports this functionality from underlay?

Can i add a dropout after the input layer?

Problem with return_sequences=True

Hey guys so I'm trying to feed a bunch of 2d (batches,seq_len) indices of text sequences into the model in an attempt to try to predict the next word. This leads to a 3d output of (batches,seq_len,vocab_size) time distributed softmax.

model = Sequential()
model.add(Embedding(vocabsize,128)) 
model.add(GRU(128, 128, return_sequences=True))
model.add(Reshape(seqlen,128))
model.add(Dense(128,vocabsize))
model.add(Reshape(seqlen,vocabsize))
model.add(Activation('time_distributed_softmax'))

So far so good. Running the model.predict_probas(sequences) yields a (batches,seq_len,vocab_size) output matrix. Problem is that doing the model.fit(sequences, bin_sequences, batch_size= 4) gives me the theano error of:

('Bad input argument to theano function with name "build/bdist.linux-x86_64/egg/keras/models.py:66" at index 1(0-based)', 'Wrong number of dimensions: expected 2, got 3 with shape (4, 16, 15423).')

Even though the model does output a 3d array, theano still expects a 2d array. Or am I overlooking something?
Is there any way to deal with multiple multi-dimensional time sequences then?

Reconfiguring a model after training

Certain layers, such as Dropout and BatchNormalization, are supposed to be used only during training. It would be nice if we could disable them during testing or for using a model in production. This could be done by changing the connections between layers or replacing a layer by an Identity layer. Any ideas?

SimpleRNN Error

The SimpleRNN layer seem to require a tensor3 input of shape (batch_size, time_steps, features). However, Keras as a whole seems to expect only matrix inputs/outputs.

model = Sequential()
#model.add(Dense(20, 5, init='uniform', activation='tanh'))
model.add(SimpleRNN(5, 20, activation='sigmoid'))

sgd = SGD(lr=0.1, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='mse', optimizer='sgd')

/usr/local/lib/python2.7/dist-packages/keras/layers/recurrent.pyc in output(self, train)
     47         X = self.get_input(train) # shape: (nb_samples, time (padded with zeros at the end), input_dim)
     48         # new shape: (time, nb_samples, input_dim) -> because theano.scan iterates over main dimension
---> 49         X = X.dimshuffle((1,0,2))
     50 
     51         x = T.dot(X, self.W) + self.b

/usr/local/lib/python2.7/dist-packages/theano/tensor/var.pyc in dimshuffle(self, *pattern)
    332             pattern = pattern[0]
    333         op = theano.tensor.basic.DimShuffle(list(self.type.broadcastable),
--> 334                                             pattern)
    335         return op(self)
    336 

/usr/local/lib/python2.7/dist-packages/theano/tensor/elemwise.pyc in __init__(self, input_broadcastable, new_order, inplace)
    139                     raise ValueError(("new_order[%d] is %d, but the input "
    140                         "only has %d axes.") %
--> 141                         (i, j, len(input_broadcastable)))
    142                 if j in new_order[(i + 1):]:
    143                     raise ValueError((

ValueError: new_order[2] is 2, but the input only has 2 axes.```

standardize_y does not support using alternative classes as datasets

I implemented a class to be able to use slices of an HDF5 dataset as data matrices/vectors in keras. Even though the class emulates the ndarray API (at least for len and __getitem__ stuff), you can't call np.asarray on it. Since I am not sure on how this could be fixed, I preferred to post this as an issue to ask for advice. Maybe we could change it such that it only calls np.asarray if it makes sense (i.e., its input is a list, list of tuples, tuple, tuple of tuples, tuple of lists or ndarray?

The class in question is in the following gist: https://gist.github.com/jfsantos/14ae9631716a2aa328c4.

Which version of Python is used?

It seems to be Python2 (I found print without (…)), but I couldn't find any evidence confirming it nor in README.md nor in setup.py for instance.

Fix batch normalization during test time

This discussion was done in #79 but since that issue is closed I figured it would make sense to open another issue. We need to fix the batch normalization layer such that:

It can measure the mean and variance of the batch activations of each batch it sees and store it, and
Use that information instead of the mean and variance of the current batch during testing.

Regarding 1, I think maybe it's not good to measure the activation statistics during training because they will be changing over time. Maybe a safer way is to wait until training is over, and then measure these over a single epoch, with all network parameters static.

Move regularizers to layer definitions?

Hello,
Great job with keras! I wanted to see what you thought about this before I began hacking on it since it would involve some breaking changes.
It seems to me that the regularizers, i.e. maxnorm, L1 and L2, would be more flexible if they were incorporated into the layer definitions, so that different regularization and/or constraints could be applied at each layer if desired. The reason I bring this up is that I wanted to add a non-negativity constraint at a particular layer but there didn't seem to be a straight-forward way to do so.
Let me know any thoughts.
Best,
Mike

Is there an example to show feature extraction?

Hi, fchollet,

After training the model using CNN, how to extract feature by the pre-trained model?

How can I get hidden layer representation of the given data?

After training I want to extract the hidden layer representation of the given data instead of the final probabilities. How can I do it with Keras?

Issues loading sub-modules

First of, great work, I've been looking for something like this for a while now, powerful yet simple to use.

I am trying to import the keras from outside the module. I added the parent folder to my PYTHONPATH variable, but when I run the scripts, i'm getting errors loading the modules below the root:

e.g.
/Users/simon.hughes/GitHub/keras/activations.py in ()
24 return x
25
---> 26 from utils.generic_utils import get_from_module
27 def get(identifier):
28 return get_from_module(identifier, globals(), 'activation function')

ImportError: No module named generic_utils

I've tried adding some of the subfolders to the python path:

sys.path.insert(0, "/Users/simon.hughes/GitHub")
sys.path.insert(0, "/Users/simon.hughes/GitHub/keras")
sys.path.insert(0, "/Users/simon.hughes/GitHub/keras/utils")

But that prevents loading modules like utils.generic_utils. It sees generic_utils as a module, but not utils.generic_utils.

Would it be possible to create a setup.py script to install an egg file? Or is there something simple I can do to make this work? The only way i can run code successfully is from the examples folder.

Initiate a ToDo List

Hey, I'm very interested in contributing to the project. Can you share a list of things to be done, probably a roadmap?

init methods in layers/embeddings.py make reference to argument that is not in the argument list

The init methods for both Embedding and WordContextProduct set self.normalize = normalize, but normalize is not an argument of the __init__ method. This causes the IMDB (LSTM) example not to work.

Working with large datasets like Imagenet

Hi Guys,

First and foremost, I think Keras is quite amazing !!

So far, I see that the largest dataset has about 50000 images. I was wondering if it is possible to work on Imagenet scale datasets (around 1,000,000 images, which are too big to fit in memory), by pre-processing the data (i.e., splitting it into say : 1000 containers of 1000 images each), and feeding one container at a time to the model.fit() function. Or, do I have to save_weights() and load_weights() after each container ?

Thanks for reading.

Setting up tests

One of our goals for the v1 release is to have full unit test coverage. Let's discuss tests!

We want tests to be:

modular (for maintainability); essentially each module should have an independent test file, with independent test functions for each feature of the module.
fast. It should take a few seconds to test the entirety of the library. Otherwise tests would probably not be run often enough, or would result in a significant waste of time, which is very contrary to the Keras philosophy.

What are some best practices that you know of for unit-testing a ML library? I am not a big fan of the way tests are handled in Torch7 (one large file concatenating all test functions).

Fix in cifar example

Hi,

I had an error on the cifar10 example.
Traceback (most recent call last):
File "cifar10_cnn.py", line 49, in
model.add(Flatten(64_8_8))
TypeError: init() takes exactly 1 argument (2 given)

You should remove the argument for the flatten layer, and it works !

Is it possible to merge two different input layers into one?

For example,
How to concatenate image embedding and word embeddings together as one sequence which is then fed to LSTM just like what is done in Google's image caption generation paper http://arxiv.org/pdf/1411.4555v2.pdf ?

l1, l2 regularization

I tried to use l1 and l2 regularization in the optimizer (Adam), but the optimization seems to be the same as without using regularization.

Multiple sequences

Hey guys,

Is this thing actually supported?:
" Eats inputs with shape:
(nb_samples, max_sample_length (samples shorter than this are padded with zeros at the end), input_dim)
"
The recurrent models only take (input_length, input_dim) sized inputs. Perhaps change the comments to remove this part of the description.

To deal with multiple sequences currently implies merging them into one big sequence and padding them to alignment and I see no other way around it.

Model serialization

This discussion started in #51, but as things can get complicated I decided to start another issue.

It seems to be a good idea to store weights for a model separately in an HDF5 file (or even a Numpy npy file, but HDF5 would be more portable). I wanted to compare how large is a serialized model with and without the weights, so I did the following test:

    model = Sequential()
    model.add(Dense(n_input, 2048, init='uniform', activation='relu'))
    model.add(Dropout(0.5))
    model.add(Dense(2048, 2048, init='uniform', activation='relu'))
    model.add(Dropout(0.5))
    model.add(Dense(2048, 2048, init='uniform', activation='relu'))
    model.add(Dropout(0.5))
    model.add(Dense(2048, 2048, init='uniform', activation='relu'))
    model.add(Dropout(0.5))
    model.add(Dense(2048, n_output, init='uniform', activation='linear'))

(the model is intentionally large!)

I then compiled the model, serialized it after compilation, and removed the weights and post-compilation Theano objects/functions as follows:

    for l in model.layers:
        weights.append(l.get_weights())
        l.params = []
        try:
            l.W = None
            l.b = None
        except ValueError:
            pass
    model.X = None
    model.y = None
    model.y_test = None
    model.y_train = None
    model._train  = None
    model._train_with_acc  = None
    model._test = None
    model._test_with_acc = None
    model._predict = None

The full compiled model ends up with 243 MB, and the cleaned-up model with 120 MB (which is exacly the same we would get from pickling the non-compiled models with the weight matrices deleted). Is there anything else we could remove to make the serialized model smaller?

glorot_normal init should be glorot_uniform?

I'm assuming this is meant to implement the novel initialization proposed in this paper: http://jmlr.org/proceedings/papers/v9/glorot10a/glorot10a.pdf

at the bottom of page 253, but that is a uniform initialization, and the numerator is 6, not 2.

Requirements for 1Dconvolution

What exactly is required for providing the 1Dconvolution and 1Dpooling layers? Aren't both special cases of 2Dconvolution and 2Dpooling?

LSTM - Sequences with different num of time steps

Hi,

Could you explain how this library is handling sequences with different number of time steps? Specifically - can we have sequences with different number of time steps and if so where one can supply the length of the sequence?

Thank you!

How to get the output of Conv layer and FC layer?

Hi, fchollet

I just spend several hours reading the documentation, looking through the example cifar10_cnn.py, and I find it really easy to use keras. But as shown in the title, I am confused by these two question:

how to get the output of the convolution layer? I want to visualize the feature map after each convolution layer. Although it is not so important, I need this when writing paper. Any other methods of this framework?
I want to use CNN as feature extractor, so the output of the fully connected layer should be saved. It seems that keras do not support?

Thanks!

General questions

Not an Issue per se but it is good to see a Theano-based deep learning library as Theano can be pretty difficult to understand when all is needed is plug-and-play functionality. Are there plans to support word2vec and sentence2vec? Anything else planned?

💗 💓 💕 💖

One dose of pure 💗 💓 💕 💖 for the name
(and the code)

How to use the pretrained model such as imagenet-vgg-f?

Hi, fchollet

I'm wondering there is possibility to use the pretrained model such as imagenet-vgg-f with keras?

New datasets and application examples

We're very interested in adding new datasets and new example scripts.

If you've used Keras to do something neat with open data, we would love to check it out, and possibly include your script or/and add support for the dataset.

Recurrent Models with sequences of mixed length

The training process for LSTM only supports tensor3. If the sequences are of different length, then X must be a list, however models.py:90 does not support lists as input. I think a quick fix would be to cast X_batch to tensor3 if batch_size=1, and also fix y_batch accordingly.