dnouri / nolearn Goto Github PK

Combines the ease of use of scikit-learn with the power of Theano/Lasagne

Home Page: http://danielnouri.org/notes/2014/12/17/using-convolutional-neural-nets-to-detect-facial-keypoints-tutorial/

License: MIT License

Python 100.00%

scikit-learn lasagne deep-learning machine-learning

nolearn's Introduction

nolearn contains a number of wrappers and abstractions around existing neural network libraries, most notably Lasagne, along with a few machine learning utility modules. All code is written to be compatible with scikit-learn.

Note

nolearn is currently unmaintained. However, if you follow the installation instructions, you should still be able to get it to work (namely with library versions that are outdated at this point).

If you're looking for an alternative to nolearn.lasagne, a library that integrates neural networks with scikit-learn, then take a look at skorch, which wraps PyTorch for scikit-learn.

https://travis-ci.org/dnouri/nolearn.svg?branch=master

Installation

We recommend using venv (when using Python 3) or virtualenv (Python 2) to install nolearn.

nolearn comes with a list of known good versions of dependencies that we test with in requirements.txt. To install the latest version of nolearn from Git along with these known good dependencies, run these two commands:

pip install -r https://raw.githubusercontent.com/dnouri/nolearn/master/requirements.txt
pip install git+https://github.com/dnouri/nolearn.git

Documentation

If you're looking for how to use nolearn.lasagne, then there's two introductory tutorials that you can choose from:

For specifics around classes and functions out of the lasagne package, such as layers, updates, and nonlinearities, you'll want to look at the Lasagne project's documentation.

nolearn.lasagne comes with a number of tests that demonstrate some of the more advanced features, such as networks with merge layers, and networks with multiple inputs.

nolearn's own documentation is somewhat out of date at this point. But there's more resources online.

Finally, there's a few presentations and examples from around the web. Note that some of these might need a specific version of nolearn and Lasange to run:

Oliver Dürr's Convolutional Neural Nets II Hands On with code
Roelof Pieters' presentation Python for Image Understanding comes with nolearn.lasagne code examples
Benjamin Bossan's Otto Group Product Classification Challenge using nolearn/lasagne
Kaggle's instructions on how to set up an AWS GPU instance to run nolearn.lasagne and the facial keypoint detection tutorial
An example convolutional autoencoder
Winners of the saliency prediction task in the 2015 LSUN Challenge have published their lasagne/nolearn-based code.
The winners of the 2nd place in the Kaggle Diabetic Retinopathy Detection challenge have published their lasagne/nolearn-based code.
The winner of the 2nd place in the Kaggle Right Whale Recognition challenge has published his lasagne/nolearn-based code.

Support

If you're seeing a bug with nolearn, please submit a bug report to the nolearn issue tracker. Make sure to include information such as:

how to reproduce the error: show us how to trigger the bug using a minimal example
what versions you are using: include the Git revision and/or version of nolearn (and possibly Lasagne) that you're using

Please also make sure to search the issue tracker to see if your issue has been encountered before or fixed.

If you believe that you're seeing an issue with Lasagne, which is a different software project, please use the Lasagne issue tracker instead.

There's currently no user mailing list for nolearn. However, if you have a question related to Lasagne, you might want to try the Lasagne users list, or use Stack Overflow. Please refrain from contacting the authors for non-commercial support requests directly; public forums are the right place for these.

Citation

Citations are welcome:

Daniel Nouri. 2014. nolearn: scikit-learn compatible neural network library https://github.com/dnouri/nolearn

License

See the LICENSE.txt file for license rights and limitations (MIT).

nolearn's People

Contributors

Stargazers

Watchers

Forkers

blazej-wieliczko indirakumar cometyang wavelets viveksck waytai alessandroferrari daviddumenil ivanajw aaronjoel ajjcoppola udibr c3h3 ddofer pjankiewicz algofantasy cancan101 nkhuyu ljdawn vovoma sdutheone chagge abhishekkrthakur msegala trunghlt reply2vikas hjweide makistsantekidis yanweifu manu-dikzit zhxfl urwithajit9 weizier biddyweb kazeriahm bmcfee jllanfranchi ledmaster smalliao bamine benjaminbossan sveitser bmilde aaxwaz alex-medvedev-msc leiyu2 fch808 52nlp milesqli cbonnett yliuhb raphaelleis daviddjchen yanshuaicao oiclid asantinc arunsingh zhigexie2015 imclab saltydizz aberb jyt109 ledbetdr olivierml carlosf dengcy028 echohenry2006 danielmckeown sigal-shaked small-yellow-duck rtvt123 dsriv alito scottlittle felixlaumon ml-ai-nlp-ir vipuldivyanshu92 rubensmachado boshra williford zhouyunan fdoperezi lixiangnlp lc82111 commonlibs josephmisiti craftsliu mehdidc dwiel churchill0221 ikrishneel ralphgragutaya nkthiebaut cvml pspiegelhalter sundeepteki mcolic parthasen jigyasu10 suryanarayadev

nolearn's Issues

Impossible to use a gpu on Amazon EC2

I wanted to use the g2.2xlarge plan on amazon EC2 for fitting a DBN, but It doesn't detect the GPU. Cuda and Cudamat are installed, and the sample code from NVidia deviceQuery correctly detect the GPU. My code work well with the CPU alone.

net = DBN(
    [x.shape[1], nhidden, 2],
    scales=[numpy.sqrt(6/x_train.shape[1]/nhidden), numpy.sqrt(6/nhidden)],
    epochs=50,
    verbose=1,
    dropouts=0.5,
    minibatch_size = 64,
    minibatches_per_epoch = 4500 # So we have 288k items 
    )
net.fit(x, y)

$ export GNUMPY_USE_GPU=yes; python DBN\ with\ Nolearn.py 
((304007, 683), (304007, 1)) ((5000, 683), (5000, 1)) ((5000, 683), (5000, 1)) (96136, 683)
[DBN] fitting X.shape=(304007, 683)
[DBN] layers [683, 30, 2]
Traceback (most recent call last):
  File "DBN with Nolearn.py", line 112, in <module>
    net.fit(x, y)
  File "/usr/local/lib/python2.7/dist-packages/nolearn/dbn.py", line 340, in fit
    self.net_ = self._build_net(X, y)
  File "/usr/local/lib/python2.7/dist-packages/nolearn/dbn.py", line 246, in _build_net
    v(self.uniforms),
  File "/usr/local/lib/python2.7/dist-packages/gdbn/dbn.py", line 84, in buildDBN
    initialBiases = [gnp.garray(0*num.random.rand(1, layerSizes[i])) for i in range(1, len(layerSizes))]
  File "/usr/local/lib/python2.7/dist-packages/gnumpy.py", line 724, in __init__
    cm = _new_cm(npa.size)
  File "/usr/local/lib/python2.7/dist-packages/gnumpy.py", line 215, in _new_cm
    _init_gpu()
  File "/usr/local/lib/python2.7/dist-packages/gnumpy.py", line 80, in _init_gpu
    if _boardId==-1: raise Exception('No gpu board is available. gnumpy will not function. Consider telling it to run on the CPU by setting environment variable GNUMPY_USE_GPU to "no".')
Exception: No gpu board is available. gnumpy will not function. Consider telling it to run on the CPU by setting environment variable GNUMPY_USE_GPU to "no".

$ ./NVIDIA_CUDA-5.5_Samples/bin/x86_64/linux/release/deviceQuery
Set compute mode to DEFAULT for GPU 0000:00:03.0.
All done.
./NVIDIA_CUDA-5.5_Samples/bin/x86_64/linux/release/deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "GRID K520"
  CUDA Driver Version / Runtime Version          5.5 / 5.5
  CUDA Capability Major/Minor version number:    3.0
  Total amount of global memory:                 4096 MBytes (4294770688 bytes)
  ( 8) Multiprocessors, (192) CUDA Cores/MP:     1536 CUDA Cores
  GPU Clock rate:                                797 MHz (0.80 GHz)
  Memory Clock rate:                             2500 Mhz
  Memory Bus Width:                              256-bit
  L2 Cache Size:                                 524288 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
  Maximum Layered 1D Texture Size, (num) layers  1D=(16384), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(16384, 16384), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 2 copy engine(s)
  Run time limit on kernels:                     No
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device PCI Bus ID / PCI location ID:           0 / 3
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 5.5, CUDA Runtime Version = 5.5, NumDevs = 1, Device0 = GRID K520
Result = PASS

ValueError: 'total size of new array must be unchanged'

Am I doing something wrong here:

net1 = NeuralNet(
    layers=[  # three layers: one hidden layer
        ('input', layers.InputLayer),
        ('conv1', layers.Conv2DLayer),
        ('pool1', layers.MaxPool2DLayer),
        ('dropout1', layers.DropoutLayer),
        ('hidden', layers.DenseLayer),
        ('output', layers.DenseLayer),
        ],
    # layer parameters:
    input_shape=(32, 1, 300, 400),  # 32 images per batch times
    hidden_num_units=100,  # number of units in hidden layer
    output_nonlinearity=None,  # output layer uses identity function
    output_num_units=len(classes), 

    # optimization method:
    upate=nesterov_momentum,
    update_learning_rate=0.01,
    update_momentum=0.9,

    regression=False,  # flag to indicate we're not dealing with regression problem
    use_label_encoder=True,
    max_epochs=400,  # we want to train this many epochs
    verbose=1,
    batch_iterator=LoadBatchIterator(batch_size=32),

    conv1_num_filters=4, conv1_filter_size=(3, 3), pool1_ds=(2, 2),
    dropout1_p=0.1,
    )

leads to:

/home/ubuntu/git/nolearn/nolearn/lasagne.pyc in fit(self, X, y)
    155 
    156         try:
--> 157             self.train_loop(X, y)
    158         except KeyboardInterrupt:
    159             pdb.set_trace()

/home/ubuntu/git/nolearn/nolearn/lasagne.pyc in train_loop(self, X, y)
    193 
    194             for Xb, yb in self.batch_iterator(X_train, y_train):
--> 195                 batch_train_loss = self.train_iter_(Xb, yb)
    196                 train_losses.append(batch_train_loss)
    197 

/home/ubuntu/git/Theano/theano/compile/function_module.pyc in __call__(self, *args, **kwargs)
    603                     gof.link.raise_with_op(
    604                         self.fn.nodes[self.fn.position_of_error],
--> 605                         self.fn.thunks[self.fn.position_of_error])
    606                 else:
    607                     # For the c linker We don't have access from

/home/ubuntu/git/Theano/theano/compile/function_module.pyc in __call__(self, *args, **kwargs)
    593         t0_fn = time.time()
    594         try:
--> 595             outputs = self.fn()
    596         except Exception:
    597             if hasattr(self.fn, 'position_of_error'):

/home/ubuntu/git/Theano/theano/gof/op.pyc in rval(p, i, o, n)
    751 
    752         def rval(p=p, i=node_input_storage, o=node_output_storage, n=node):
--> 753             r = p(n, [x[0] for x in i], o)
    754             for o in node.outputs:
    755                 compute_map[o][0] = True

/home/ubuntu/git/Theano/theano/sandbox/cuda/basic_ops.pyc in perform(self, node, inp, out_)
   2349             else:
   2350                 raise ValueError("total size of new array must be unchanged",
-> 2351                                  x.shape, shp)
   2352 
   2353         out[0] = x.reshape(tuple(shp))

ValueError: ('total size of new array must be unchanged', (31, 4, 298, 398), array([128,   1, 298, 398]))
Apply node that caused the error: GpuReshape{4}(GpuElemwise{Composite{[mul(i0, add(i1, Abs(i1)))]},no_inplace}.0, TensorConstant{[128   1 298 398]})
Inputs types: [CudaNdarrayType(float32, 4D), TensorType(int64, vector)]
Inputs shapes: [(31, 4, 298, 398), (4,)]
Inputs strides: [(474416, 118604, 398, 1), (8,)]
Inputs values: ['not shown', array([128,   1, 298, 398])]

HINT: Re-running with most Theano optimization disabled could give you a back-trace of when this node was created. This can be done with by setting the Theano flag 'optimizer=fast_compile'. If that does not work, Theano optimizations can be disabled with 'optimizer=None'.
HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint of this apply node.

No self.best_weights in the function train_loop() ?

It seems that the train_loop() function inside the NeuralNetwork does not provide a self.best_weights which save the ConvNet parameters for the highest validation accuracy along with the epoch iterations.
Or do I miss something? Hope someone could help. Thank you.

What if the training dataset is too large to fit into memory?

@dnouri
@cancan101
Hey, guys. Thanks for sharing your ideas and bring endless contribution to nolearn community. I have a simple question when I am using nolearn.lasagne.NeuralNet() to do kaggle competition. After we construct a neural network architecture by calling NeuralNet() function, we then train the network by function network.fit(X_train, y_train). For small datasets such as MNIST, this is O.K. However, when the data size is too large so that it can not be loaded into memory one time, what should we do? Thank you for your tips.

Request for Python 3 support

Would be nice. Right now I'm hacking together a solution with use_2to3 and fixing things as they come up for me. Nothing systematic.

See also my comment here Lasagne/Lasagne#23

Zero divison error

when running:

# import the necessary packages
from sklearn.cross_validation import train_test_split
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
from sklearn import datasets
from nolearn.dbn import DBN

# scale the data to the range [0, 1] and then construct the training
# and testing splits
(trainX, testX, trainY, testY) = train_test_split( features , targets , test_size = 0.33)

print trainX.shape
print trainY.shape

dbn = DBN(
[trainX.shape[1], 80, 80, trainY.shape[1]],
learn_rates = 0.3,
learn_rate_decays = 0.9,
epochs = 10,
verbose = 1)
dbn.fit(trainX, trainY)

# compute the predictions for the test data and show a classification
# report
preds = dbn.predict(testX)```

It fails for some reason I cannot find:

100%

ZeroDivisionError Traceback (most recent call last)
in ()
19 epochs = 10,
20 verbose = 1)
---> 21 dbn.fit(trainX, trainY)
22
23 # compute the predictions for the test data and show a classification

/usr/local/lib/python2.7/dist-packages/nolearn/dbn.pyc in fit(self, X, y)
388 loss_funct,
389 self.verbose,
--> 390 self.use_dropout,
391 )):
392 losses_fine_tune.append(loss)

/usr/local/lib/python2.7/dist-packages/gdbn/dbn.pyc in fineTune(self, minibatchStream, epochs, mbPerEpoch, loss, progressBar, useDropout)
207 prog.tick()
208 prog.done()
--> 209 yield sumErr/float(totalCases), sumLoss/float(totalCases)
210
211 def totalLoss(self, minibatchStream, lossFuncts):

ZeroDivisionError: float division by zero

gnumpy: failed to import cudamat. Using npmat instead. No GPU will be used.
(1, 200)
(1, 125)
[DBN] fitting X.shape=(1, 200)
[DBN] layers [200, 80, 80, 125]
[DBN] Fine-tune...

Consider moving EarlyStopping into nolearn

Consider moving EarlyStopping to nolearn.

Train Val split and KFold

Hi Daniel

If I understand correctly, the train validation split is done once for the whole training. Only using indices from the first fold. Once the indices are stored, for each epoch, all the batches within the train set is used to train the net and then the net is validated against the validation set. Right ?
Is it possible to train on all the folds and get the average validation loss across all the folds ?

Also, continuing, is it possible to have an access to all the validation losses in an epoch (across all the folds), s.t, I can check the std and mean both rather than checking only the minimum to finalize the best epoch ?

Thanks

Pass nn to BatchIterator

Right now the __call__ method for BatchIterator gets just x and y. Compare that to the on_epoch_finished and on_training_finished functions which are given a references to the nn.

Not being passed in the nn make is very hard for the BatchIterator to have access to any fields on the NeuralNet. I suggest adding nn as an additional argument to the iterator.

One issues that has come up already is that from the BatchIterator I would like access to the nn.enc_.
There may be other uses in the future and I don't see why on_epoch_finished and on_training_finished get nn but BatchIterator does not.

on_epoch_finished are not notified when fitting is complete

Right now the list of functions in on_epoch_finished are called after each epoch but there is no explicit call to them to say training is complete. What that means is that something like EarlyStopping will never correctly transfer over the best weights if the StopIteration is not raised. What you probably want to do is to on training complete is nn.load_weights_from(self.best_weights) to leave the network with the best weights seen.

I suggest adding an extra callback:
on_training_complete

Improve train_test_split

For NeuralNet:

Why use your own implementation of train_test_split rather than using sklearn.cross_validation::train_test_split?
-- This would fix #7 because it uses safe_indexing.
Allow the user to pass in, through composition (like the batch iterator), a class responsible for performing the train vs valid split. Right now the only option is to subclass NeuralNet which is not ideal.

Do we have to set eval_size with a fraction ?

@dnouri Daniel, hi again. I have noticed that in the lasagne.py, the X_train and X_valid are separated actually calling sklearn train_test_split() function. This may look convenient at the first blush. However, we can not handle or control X_train or X_valid since the are not with self. In other words, if I would like test how my network overfitting the training dataset. The X_train would just be lost. Correct me if I am wrong. Thanks.

batch_size and the input_shape dependency

There is currently a non-transparent / non-intuitive dependency between the batch_size and the input_shape.

Currently the default batch_iterator is BatchIterator(batch_size=128). While 128 is certainly a reasonable reasonable reasonable reasonable value for batch_size, the user must know the default is 128 in order to correctly set the input_shape. Ideally there would be some way for the user to change the batch_size without having to remember to update the input shape. One idea would be some sort of lazily resolved BATCH_SIZE constant that could be used in the input shape. The iterator could then have an additional method get_batch_size which is used by the NeuralNet to set the BATCH_SIZE constant.

train_test_split does not work for Pandas Series with non Dense Index

If using a Pandas Series where the Index values are not dense, then train_test_split will select index values that do not exist.

This because this line:

train_indices, valid_indices = iter(kf).next()

returns the row numbers but then the indexing is done with index values`

Weight Decay

Add support for weight decay to nolearn.

It looks like lasagne has support although it might need some more work.
Also see cuda-convnet docs.

the -1 in DBN(layer_sizes=[-1,...] applies only to first call to fit method

Setting -1 as the first list element in the layer_sizes parameter of the DBN class works when you first
call the fit method. If you latter call the fit method again with a data set with a different number of features the method will crash

running the network backwards? real-valued output layer?

In nolearn.dbn, is it possible to run the trained network backwards, ie generatively? I'm thinking of eg Hinton's lecture where he presents the output layer of his trained network with a value ('2') and runs it backwards, and it produces lots of possible images (input layer) of the digit '2'.

Also, is it possible to use a real-valued output layer? I know real_valued_vis works for the input layer.

My ultimate goal is to learn a small, simple parameter space -- say 5 real-valued parameters -- which I can use to interactively explore the original feature space, by tweaking these parameters. The two ideas above would allow that.

I know these are naive questions, I'm not a neural networks person. Thanks for making nolearn which is helping to make this field more accessible!

train_test_split signature

Any reason the signature for train_test_split takes in the eval_size rather than just reason self. eval_size?

nolearn/nolearn/lasagne.py

Line 361 in 12f263c

def train_test_split(self, X, y, eval_size):

Consider splitting up batch_iterator for test and train

Rather than requiring the user to use the same class for the batch_iterator used for train and test, consider adding a batch_iterator_test.

This would simply the FlipBatchIterator in the blog post and allow just using the standard BatchIterator for the batch_iterator_test.

Randomizing Batches in training gives bad predictions though validation errors are much less

Hi
I posted this on nolearn too, but may be dnouri is away for a while. Any help will be really appreciated as I really need to understand this asap. Please feel free to close this.

The BatchIterator uses a loop as below
for i in range((n_samples + bs - 1) / bs):

That means - for each epoch, the training progresses through the set of batches in the same sequence.
I wanted to see what happens if I randomize the sequence of batches.
So I added this code

v = [i for i in range((n_samples + bs - 1) / bs)]
random.shuffle(v)
for i in v:
#for i in range((n_samples + bs - 1) / bs):

This means, each epoch will have the sequence of batches different. It will go through all the images - but in a different sequence.

This change gave me much faster descent and considerably lower validation errors. So I was happy.
BUT - the prediction on the (same) test set plummeted like hell. From 86% to 33%.

Why ?

Similarly, I trained with the unmodified loop and stored a net. I loaded that net in a separate code, and used it to predict on a test set. Once with the unmodified loop, once with the modified loop. Again the same behavior, massive decrease in the unmodified loop BatchIterator.

For the fact of the matter, I never thought that the decrease can be caused by the BatchIterator during prediction. It was after several hours of playing around that I figured.

So again question - what is happening here ? Why the prediction on same test set is so bad when using my modified loop even though the validation errors are much lower during training ? And what is the BatchIterator being used for while prediction ?

Thanks a lot helping out in advance

Train on full dataset

I would like to train on a full dataset and was wondering if there is anything I need to do besides
eval_size=0.0?

Exactly when are the final weights saved, is it for the best validation score, best training score, last epoch, ect.?

Thank you for the help and great code!

enhancement: convnet

An enhancement for convnet would be verbose, which prints the percentage of the files done that are supplied to convnet and time taken

Getting Epoch number in BatchIterator subclass

I was wondering if there is a way to get the current epoch number in a BatchIterator subclass?
The basic idea is I only want to perform data augmentations on the first 90% of epochs.
For example, assume we have 100 epochs AND had access to train_history:

class FlipBatchIterator(BatchIterator):
epoch = train_history[-1]['epoch']
def transform(self, Xb, yb):
if epoch < 90:
Xb, yb = super(FlipBatchIterator, self).transform(Xb, yb)
return Xb, yb

Cannot match validation loss from training when calculating after training

I am facing this strange problem - which I am struggling for ages now.

I am training a net - using images and labels. So my regression is false and uselabelencoder is true and in train, test split, the StratifiedKFold gets invoked etc.

Now, I am using EarlyStopping. When my net trains and exists out due to early stopping, the best weights are loaded into the net (as in the example for face rectangles). I can see the validation loss on that best epoch

After training is finished, I use the net and get the validation set again - using the same StratifiedKFold logic (I made sure that the indices and labels are exactly the same of the validation set used inside the training loop). I now use predict_proba (I have tried with _output_layer.get_output too). and a normal numpy method to get the loss (negative log likelihood) - and the validation loss is different from the best validation loss during training. And that difference is very big (I understand there might be some decimals off here and there for stuff being calculated on the GPU). It seems - the loss I am getting is more close to the last loss I see during the training. Not the best loss.

Now - just to add some points

I have made sure there is no transformation or augmentation of images in real time - both while training and predicting
I have made sure that the code I am using to calculate the loss is exactly the same as the negative_log_likelihood function.
I have made sure that the weights are being copied over only when (and finally once at the least) the valid loss is decreasing.

Can some one tell me what I am missing here ? or is there some problem lurking somewhere ?

Thanks
Regards

Consider fixing the necessity to do 'fit' prior to 'predict' after 'load_weights'

Currently, after re-loading the weights of a pre-trained network, you have to do nn.fit prior to nn.predict. Consider fixing that for better usability.

Thanks,
David

Don't Pickle Theano Functions

Discussion: http://danielnouri.org/notes/2014/12/17/using-convolutional-neural-nets-to-detect-facial-keypoints-tutorial/#comment-1775120366

Two part fix:
cancan101@f04308c
cancan101@3f80b31

About Class order in predict_proba() ?

I know that nolearn.lasagne has already provided the use_label_encoder to quantize the class names(string type). However, I have never successfully enjoy such convenience since I always run into theano type errors. Then, I make a detour that I use sklearn.label_encoding to transform my y_train and then feed it to NeuralNet. So my question is: when I call NeuralNet.predict_proba, it will return a matrix of probabilities, right? The column values are those classes probabilities. What are the orders of these column classes ? Are the :0, 1, 2..., N ?

BatchIterator.transform() could give copies?

Have you considered

    def transform(self, Xb, yb):
        return Xb.copy(), yb.copy()

in BatchIterator class?
It doesn't look like a big computational overhead and it could be really convenient, since usually we don't wan't to transform the original input.

nolearn.lasagne: Create usage and caveats docs

The intention here is to write some usage documentation for nolearn.lasagne, and also mention a few of the caveats that have led to user confusion and tickets before.

Let's collect a list of issues that require a note in the docs here:

#7: train_test_split does not work for Pandas Series with non Dense Index
#13: Ability to Shuffle Data Before Each Epoch
#26: Option to make the nets more deterministic
#214: max_epochs and re-running fit
#163: Example of Loading Pretrained Model
#287: scores_train/valid take different parameters than custom_scores

[anything else?]

Dealing with large datasets: consider adding a functionality to re-load X and y in each epoch

Problem migrating from DBN to lasagne NeuralNet: NaN for each epoch

Lasagne looks fantastic, thanks for integrating it into nolearn! However, I have trouble transitioning from nolearn's DBN to the new lasagne NeuralNet.

Here is what happens:

Done loading and transforming data, traindata size: 83.5334777832 MB
Distribution of classes in train data:
[[ 0.00000000e+00 5.82160000e+04]
[ 1.00000000e+00 5.12730000e+04]] 2
conf: momentum: 0.01 self.learn_rates: 0.01
fitting classifier... nolearn
InputLayer (None, 200) produces 200 outputs
DenseLayer (None, 50) produces 50 outputs
DenseLayer (None, 2) produces 2 outputs

Epoch	Train loss	Valid loss	Train / Val	Valid acc	Dur
1	nan	nan	nan	45.05%	0.7s
2	nan	nan	nan	46.47%	0.6s
3	nan	nan	nan	45.77%	0.6s
4	nan	nan	nan	47.06%	0.6s
5	nan	nan	nan	47.07%	0.7s
6	nan	nan	nan	47.06%	0.7s
7	nan	nan	nan	47.08%	0.7s
8	nan	nan	nan	53.71%	0.7s
9	nan	nan	nan	47.05%	0.6s
10	nan	nan	nan	47.05%	0.6s
11	nan	nan	nan	47.05%	0.6s
12	nan	nan	nan	47.05%	0.7s
13	nan	nan	nan	47.05%	0.6s
14	nan	nan	nan	47.05%	0.6s

I tried fiddling with different learning rates (1,0.1,0.01,... 0.0000001 even 0.0), momentum rates, different optimisers (sgd,nestrov, rmsprop ...every method that lasagne offers), input sizes, no. of hidden units, two and one hidden layer, all to no avail.

The mnist example from lasagne runs fine though.

Here is my DBN code, which also runs fine and produces models with >0.90% accuracy (on an audio gender detection task), on the same data:

            clf = DBN([X_train.shape[1], self.hid_layer_units, self.hid_layer_units, self._no_classes],
                    dropouts=self.dropouts,
                    learn_rates=self.learn_rates,
                    learn_rates_pretrain=self.learn_rates_pretrain,
                    minibatch_size=self.minibatch_size,
                    learn_rate_decays=self.learn_rate_decays,
                    learn_rate_minimums=self.learn_rate_minimums,
                    epochs_pretrain=self.pretrainepochs,
                    epochs=self.epochs,
                    momentum= self.momentum,
                    real_valued_vis=True,
                    use_re_lu=True,
                    verbose=1)

I've translated that into:

            clf = NeuralNet(
                    layers=[  # three layers: one hidden layer
                            ('input', layers.InputLayer),
                            ('hidden', layers.DenseLayer),
                            #('hidden', layers.DenseLayer),
                            ('output', layers.DenseLayer),
                            ],
                            # layer parameters:
                            input_shape=(None, X_train.shape[1]),  
                            hidden_num_units=self.hid_layer_units,  
                            output_num_units=self._no_classes,
                            output_nonlinearity=None,

                            eval_size=0.1,

                            # optimization method:
                            update=sgd,
                            update_learning_rate=self.learn_rates,
                            #update_momentum=momentum,

                            regression=False, 
                            max_epochs=self.epochs,  
                            verbose=1,
                            )

Is there anything obvious that I've missed here? How can I debug this?

Disable pdb.set_trace() by default

Consider not having pdb.set_trace() called on KeyboardInterrupt by default. Currently this causes issues with IPython Notebooks.

PendingDeprecationWarning: object.format with a non-empty format string is deprecated

I am getting:

/home/ubuntu/git/nolearn/nolearn/lasagne.py:408: PendingDeprecationWarning: object.__format__ with a non-empty format string is deprecated

which I believe is coming from:

            print("  {:<18}\t{:<20}\tproduces {:>7} outputs".format(
                layer.__class__.__name__,
                output_shape,
                reduce(operator.mul, output_shape[1:]),
                ))

Automatically set output_num_units

Similar to #9:

Set the output_num_units when use_label_encoder=True. Ideally the output units can be calculated from the number of classes: len(self.classes_).

Inconsistent Checking of use_label_encoder

Here the predicate is not self.regression and self.use_label_encoder whereas here it is just: self.use_label_encoder.

Having self.regression==True and self.use_label_encoder==True should just be an assertion error in the constructor which would simplify checking use_label_encoder.

Option to make the nets more deterministic

What do you think about introducing an option to seed the random before doing the KFold on train test split ? That way the net predictions and loss details will be more deterministic over multiple runs on the same set.
Thanks

Problem pickling convnet NeuralNet: Can't pickle <function values_eq_approx at 0x7f2d4bef4e60>: it's not found as theano.sandbox.cuda.opt.values_eq_approx

I encounter the following error while pickling a convnet NeuralNet:

pickle.PicklingError: Can't pickle <function values_eq_approx at 0x7f2d4bef4e60>: it's not found as theano.sandbox.cuda.opt.values_eq_approx

This problem also surfaced somewhere in the comments at http://danielnouri.org/notes/2014/12/17/using-convolutional-neural-nets-to-detect-facial-keypoints-tutorial/

I'm using the original Conv and MaxPool implementations in Lasagne

from lasagne import layers
Conv2DLayer = layers.Conv2DLayer
MaxPool2DLayer = layers.MaxPool2DLayer

As a workaround, I can use https://pypi.python.org/pypi/dill. But it would be nice if I also could use the standard pickle interface.

Enhancement: multiple input layers

This recent commit allows to specify network architectures that a different from the standard tower architecture: 12f263c

It would certainly also be interesting, if one could use multiple input layers. E.g. a convolutional neural network, which does representation merging with hand crafted features at its last fully connected layers.

Given that NeuralNet builds on lasagne, would something like that be feasible with lasagne at all?

Pre-processing the training data

I want to pre-process my training data by subtracting the mean. I could do this by subtracting the mean from my training data before I pass it to nolearn.lasagne.NeuralNet, but this would contaminate my validation set. Instead, it would be nice if one could pass a StandardScaler to the NeuralNet, which could compute the mean on the training set, apply it to the validation set, and store the StandardScaler for when the NeuralNet is used to predict on a held-out test set.

This might be done in the train_loop just after the train_test_split happens.

Ability to Shuffle Data Before Each Epoch

This will improve SGD and deal with missing residual data.

See #11 (comment).

Pull requests

Hi Daniel
Can you let me know the general rule, process and contract for making contribution to Nolearn.

Regards

Assertion Check for Misspelled Parameters

See discussion:

Would be cool to have some assertion for unused parameters to catch issues like this.

Loading weights - what is stored and fetched ?

Daniel
When I pickle a network and later load it's weights - do/should I need to make sure that I am loading weights into a network which has the same number of neurons ? I would normally assume so. But I am seeing that I am able to load weights from a network into another where there is substantial difference in the number of filters, number layers etc.
So, what is going on underneath ? Specially with such storing and loading between networks which have conv layers, pool layers and hidden layers.

Thanks
Regards

Shape mismatch in dbn.fit

Hi,

When I'm passing my dataset (cohn-kanade dataset) for lib training, I receive this follow error. Did anyone already saw something similar? (Right now, just for test, I'm using only a small part of dataset)

[DBN] fitting X.shape=(365, 784)
[DBN] layers [784, 300, 10]
[DBN] Fine-tune...
Traceback (most recent call last):
File "deep.py", line 46, in
dbn.fit(trainX, trainY)
File "/Users/jpcanario/.virtualenvs/msdegree/lib/python2.7/site-packages/nolearn/dbn.py", line 409, in fit
self.use_dropout,
File "/Users/jpcanario/.virtualenvs/msdegree/lib/python2.7/site-packages/gdbn/dbn.py", line 202, in fineTune
err, outMB = step(inpMB, targMB, self.learnRates, self.momentum, self.L2Costs, useDropout)
File "/Users/jpcanario/.virtualenvs/msdegree/lib/python2.7/site-packages/gdbn/dbn.py", line 303, in stepNesterov
errSignals, outputActs, error = self.fpropBprop(inputBatch, targetBatch, useDropout)
File "/Users/jpcanario/.virtualenvs/msdegree/lib/python2.7/site-packages/gdbn/dbn.py", line 262, in fpropBprop
outputErrSignal = -self.outputActFunct.dErrordNetInput(targetBatch, self.state[-1], outputActs)
File "/Users/jpcanario/.virtualenvs/msdegree/lib/python2.7/site-packages/gdbn/activationFunctions.py", line 138, in dErrordNetInput
return acts - targets
File "/Users/jpcanario/.virtualenvs/msdegree/lib/python2.7/site-packages/gnumpy.py", line 965, in sub
else: return self + -as_garray(other) # if i need to broadcast, making use of the row add and col add methods is probably faster
File "/Users/jpcanario/.virtualenvs/msdegree/lib/python2.7/site-packages/gnumpy.py", line 926, in add
def add(self, other): return _check_number_types(self._broadcastable_op(as_garray_or_scalar(other), 'add'))
File "/Users/jpcanario/.virtualenvs/msdegree/lib/python2.7/site-packages/gnumpy.py", line 614, in broadcastable_op
if reduce(operator.or, ( other.shape[i] not in (1, self.shape[i]) for i in range(self.ndim)), False): raise ValueError('shape mismatch: objects cannot be broadcast to a single shape')

I've no idea what is causing the problem. Just describing the dataset and my steps before training.

My dataset:

All images are colored and have 640x490 pixels.

Before training (After loaded Image I do the follow steps):

I convert to grayscale
Cut the image to get 490x490 pixels
Resize each image to 28x28.
Finally, I change the image shape, from matrix to a vector, using the following command

im = im.reshape(1, im.shape[0]*im.shape[1])

Thanks in advance

npmat.py:488 RuntimeWarning

Hi,

Is anyone familiar with the error below? It appears sometimes when I change the number of hidden or end layers, or scale the data. The code continues to run, but the loss is always "nan".

[DBN] Fine-tune...
/Library/Python/2.7/site-packages/npmat.py:488: RuntimeWarning: invalid value encountered in multiply
target.numpy_array[:] = vec.numpy_array * self.numpy_array
/Library/Python/2.7/site-packages/npmat.py:433: RuntimeWarning: invalid value encountered in add
target.numpy_array[:] = vec.numpy_array + self.numpy_array
/Library/Python/2.7/site-packages/npmat.py:617: RuntimeWarning: invalid value encountered in greater
target.numpy_array[:] = (self.numpy_array > val).astype(DTYPE)
/Library/Python/2.7/site-packages/npmat.py:588: RuntimeWarning: invalid value encountered in less
target.numpy_array[:] = self.numpy_array < val
100%
Epoch 1:
loss nan
err 0.13142703202
(0:00:02)

Peter

Activation threshold for nonlinearity

Hi
Is it possible (does it make sense ?) to have an activation threshold (like axion hill) for tanh or sigmoid ? The get_output_for method in Conv layer will always return an output (i.e a neuron will fire unless it is Relu) - right ? (however small it is).

Model is not being trained

I am trying to train the 'Convolutional Neural Network' by following this tutorial
but while running the code it seems that no actual training is being done. You can see my code here.
Here is a log of code for first few epochs.

%run train.py
... loading data
float32
(35126L, 1L, 100L, 100L)
(35126L,)
  InputLayer            (None, 1, 100, 100)     produces   10000 outputs
  Conv2DLayer           (None, 1, 98, 98)       produces    9604 outputs
  MaxPool2DLayer        (None, 1, 49, 49)       produces    2401 outputs
  Conv2DLayer           (None, 1, 48, 48)       produces    2304 outputs
  MaxPool2DLayer        (None, 1, 24, 24)       produces     576 outputs
  DenseLayer            (None, 50)              produces      50 outputs
  DenseLayer            (None, 5)               produces       5 outputs

 Epoch  |  Train loss  |  Valid loss  |  Train / Val  |  Valid acc  |  Dur
--------|--------------|--------------|---------------|-------------|-------
     1  |         nan  |         nan  |          nan  |     73.50%  |  93.6s
     2  |         nan  |         nan  |          nan  |     73.50%  |  93.2s
     3  |         nan  |         nan  |          nan  |     73.50%  |  109.9s
     4  |         nan  |         nan  |          nan  |     73.50%  |  93.6s
     5  |         nan  |         nan  |          nan  |     73.50%  |  88.5s

I can't figure out what I am doing wrong

Loading nn from weights file, for prediction

Hi,

Post fit()'ing, I save the best weights to file using

net.save_weights_to(weights_file)

However, upon loading the weights back from the file into a net

net_train = pickle.load(f)
net.load_weights_from(net_train)
net.predict()

predict()'ion fails with message

    File "/Users/bgarg/anaconda/lib/python2.7/site-packages/nolearn/lasagne.py", line 247, in predict return self.predict_proba(X)
    File "/Users/bgarg/anaconda/lib/python2.7/site-packages/nolearn/lasagne.py", line 242, in predict_proba probas.append(self.predict_iter_(Xb))
    AttributeError: 'NeuralNet' object has no attribute 'predict_iter_

It does work on refitting though. Was this intended to work for predict()?

Would layer visualization be considered in nolearn ?

Since we can train a ConvNet by nolearn.lasagne and save the trained weights. Will nolearn provide a visualization function that plots e.g. first weight W1 as a figure? Usually it will be some edge detectors like shown in the paper "Visualization and Understanding Convolutional Network".

Two code paths for predicting (non-regression)

During training, the code path is:

        predict_proba = output_layer.get_output(X_batch, deterministic=True)
        predict = predict_proba.argmax(axis=1)
        accuracy = T.mean(T.eq(predict, y_batch))

        eval_iter = theano.function(
            inputs=[theano.Param(X_batch), theano.Param(y_batch)],
            outputs=[loss_eval, accuracy],
            givens={
                X: X_batch,
                y: y_batch,
                },
            )
          for Xb, yb in self.batch_iterator_test(X_valid, y_valid):
              _, accuracy = eval_iter(Xb, yb)
              valid_accuracies.append(accuracy)
         avg_valid_accuracy = np.mean(valid_accuracies)

but when calling score, this code path is taken:

        predict_proba = output_layer.get_output(X_batch, deterministic=True)
        predict_iter = theano.function(
            inputs=[theano.Param(X_batch)],
            outputs=predict_proba,
            givens={
                X: X_batch,
                },
            )
        for Xb, yb in self.batch_iterator_test(X):
            probas.append(predict_iter(Xb))
        predict_proba = np.vstack(probas)
        y_pred = np.argmax(predict_proba, axis=1)
       ...

I'm not sure having these two variations is ideal.