dnouri / nolearn Goto Github PK
View Code? Open in Web Editor NEWCombines the ease of use of scikit-learn with the power of Theano/Lasagne
License: MIT License
Combines the ease of use of scikit-learn with the power of Theano/Lasagne
License: MIT License
Currently, after re-loading the weights of a pre-trained network, you have to do nn.fit prior to nn.predict. Consider fixing that for better usability.
Thanks,
David
Consider moving EarlyStopping to nolearn.
Lasagne looks fantastic, thanks for integrating it into nolearn! However, I have trouble transitioning from nolearn's DBN to the new lasagne NeuralNet.
Here is what happens:
Done loading and transforming data, traindata size: 83.5334777832 MB
Distribution of classes in train data:
[[ 0.00000000e+00 5.82160000e+04]
[ 1.00000000e+00 5.12730000e+04]] 2
conf: momentum: 0.01 self.learn_rates: 0.01
fitting classifier... nolearn
InputLayer (None, 200) produces 200 outputs
DenseLayer (None, 50) produces 50 outputs
DenseLayer (None, 2) produces 2 outputs
Epoch | Train loss | Valid loss | Train / Val | Valid acc | Dur |
---|---|---|---|---|---|
1 | nan | nan | nan | 45.05% | 0.7s |
2 | nan | nan | nan | 46.47% | 0.6s |
3 | nan | nan | nan | 45.77% | 0.6s |
4 | nan | nan | nan | 47.06% | 0.6s |
5 | nan | nan | nan | 47.07% | 0.7s |
6 | nan | nan | nan | 47.06% | 0.7s |
7 | nan | nan | nan | 47.08% | 0.7s |
8 | nan | nan | nan | 53.71% | 0.7s |
9 | nan | nan | nan | 47.05% | 0.6s |
10 | nan | nan | nan | 47.05% | 0.6s |
11 | nan | nan | nan | 47.05% | 0.6s |
12 | nan | nan | nan | 47.05% | 0.7s |
13 | nan | nan | nan | 47.05% | 0.6s |
14 | nan | nan | nan | 47.05% | 0.6s |
I tried fiddling with different learning rates (1,0.1,0.01,... 0.0000001 even 0.0), momentum rates, different optimisers (sgd,nestrov, rmsprop ...every method that lasagne offers), input sizes, no. of hidden units, two and one hidden layer, all to no avail.
The mnist example from lasagne runs fine though.
Here is my DBN code, which also runs fine and produces models with >0.90% accuracy (on an audio gender detection task), on the same data:
clf = DBN([X_train.shape[1], self.hid_layer_units, self.hid_layer_units, self._no_classes],
dropouts=self.dropouts,
learn_rates=self.learn_rates,
learn_rates_pretrain=self.learn_rates_pretrain,
minibatch_size=self.minibatch_size,
learn_rate_decays=self.learn_rate_decays,
learn_rate_minimums=self.learn_rate_minimums,
epochs_pretrain=self.pretrainepochs,
epochs=self.epochs,
momentum= self.momentum,
real_valued_vis=True,
use_re_lu=True,
verbose=1)
I've translated that into:
clf = NeuralNet(
layers=[ # three layers: one hidden layer
('input', layers.InputLayer),
('hidden', layers.DenseLayer),
#('hidden', layers.DenseLayer),
('output', layers.DenseLayer),
],
# layer parameters:
input_shape=(None, X_train.shape[1]),
hidden_num_units=self.hid_layer_units,
output_num_units=self._no_classes,
output_nonlinearity=None,
eval_size=0.1,
# optimization method:
update=sgd,
update_learning_rate=self.learn_rates,
#update_momentum=momentum,
regression=False,
max_epochs=self.epochs,
verbose=1,
)
Is there anything obvious that I've missed here? How can I debug this?
Hi Daniel
Can you let me know the general rule, process and contract for making contribution to Nolearn.
Regards
I want to pre-process my training data by subtracting the mean. I could do this by subtracting the mean from my training data before I pass it to nolearn.lasagne.NeuralNet, but this would contaminate my validation set. Instead, it would be nice if one could pass a StandardScaler to the NeuralNet, which could compute the mean on the training set, apply it to the validation set, and store the StandardScaler for when the NeuralNet is used to predict on a held-out test set.
This might be done in the train_loop just after the train_test_split happens.
Am I doing something wrong here:
net1 = NeuralNet(
layers=[ # three layers: one hidden layer
('input', layers.InputLayer),
('conv1', layers.Conv2DLayer),
('pool1', layers.MaxPool2DLayer),
('dropout1', layers.DropoutLayer),
('hidden', layers.DenseLayer),
('output', layers.DenseLayer),
],
# layer parameters:
input_shape=(32, 1, 300, 400), # 32 images per batch times
hidden_num_units=100, # number of units in hidden layer
output_nonlinearity=None, # output layer uses identity function
output_num_units=len(classes),
# optimization method:
upate=nesterov_momentum,
update_learning_rate=0.01,
update_momentum=0.9,
regression=False, # flag to indicate we're not dealing with regression problem
use_label_encoder=True,
max_epochs=400, # we want to train this many epochs
verbose=1,
batch_iterator=LoadBatchIterator(batch_size=32),
conv1_num_filters=4, conv1_filter_size=(3, 3), pool1_ds=(2, 2),
dropout1_p=0.1,
)
leads to:
/home/ubuntu/git/nolearn/nolearn/lasagne.pyc in fit(self, X, y)
155
156 try:
--> 157 self.train_loop(X, y)
158 except KeyboardInterrupt:
159 pdb.set_trace()
/home/ubuntu/git/nolearn/nolearn/lasagne.pyc in train_loop(self, X, y)
193
194 for Xb, yb in self.batch_iterator(X_train, y_train):
--> 195 batch_train_loss = self.train_iter_(Xb, yb)
196 train_losses.append(batch_train_loss)
197
/home/ubuntu/git/Theano/theano/compile/function_module.pyc in __call__(self, *args, **kwargs)
603 gof.link.raise_with_op(
604 self.fn.nodes[self.fn.position_of_error],
--> 605 self.fn.thunks[self.fn.position_of_error])
606 else:
607 # For the c linker We don't have access from
/home/ubuntu/git/Theano/theano/compile/function_module.pyc in __call__(self, *args, **kwargs)
593 t0_fn = time.time()
594 try:
--> 595 outputs = self.fn()
596 except Exception:
597 if hasattr(self.fn, 'position_of_error'):
/home/ubuntu/git/Theano/theano/gof/op.pyc in rval(p, i, o, n)
751
752 def rval(p=p, i=node_input_storage, o=node_output_storage, n=node):
--> 753 r = p(n, [x[0] for x in i], o)
754 for o in node.outputs:
755 compute_map[o][0] = True
/home/ubuntu/git/Theano/theano/sandbox/cuda/basic_ops.pyc in perform(self, node, inp, out_)
2349 else:
2350 raise ValueError("total size of new array must be unchanged",
-> 2351 x.shape, shp)
2352
2353 out[0] = x.reshape(tuple(shp))
ValueError: ('total size of new array must be unchanged', (31, 4, 298, 398), array([128, 1, 298, 398]))
Apply node that caused the error: GpuReshape{4}(GpuElemwise{Composite{[mul(i0, add(i1, Abs(i1)))]},no_inplace}.0, TensorConstant{[128 1 298 398]})
Inputs types: [CudaNdarrayType(float32, 4D), TensorType(int64, vector)]
Inputs shapes: [(31, 4, 298, 398), (4,)]
Inputs strides: [(474416, 118604, 398, 1), (8,)]
Inputs values: ['not shown', array([128, 1, 298, 398])]
HINT: Re-running with most Theano optimization disabled could give you a back-trace of when this node was created. This can be done with by setting the Theano flag 'optimizer=fast_compile'. If that does not work, Theano optimizations can be disabled with 'optimizer=None'.
HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint of this apply node.
In nolearn.dbn
, is it possible to run the trained network backwards, ie generatively? I'm thinking of eg Hinton's lecture where he presents the output layer of his trained network with a value ('2') and runs it backwards, and it produces lots of possible images (input layer) of the digit '2'.
Also, is it possible to use a real-valued output layer? I know real_valued_vis
works for the input layer.
My ultimate goal is to learn a small, simple parameter space -- say 5 real-valued parameters -- which I can use to interactively explore the original feature space, by tweaking these parameters. The two ideas above would allow that.
I know these are naive questions, I'm not a neural networks person. Thanks for making nolearn
which is helping to make this field more accessible!
It seems that the train_loop() function inside the NeuralNetwork does not provide a self.best_weights which save the ConvNet parameters for the highest validation accuracy along with the epoch iterations.
Or do I miss something? Hope someone could help. Thank you.
The intention here is to write some usage documentation for nolearn.lasagne
, and also mention a few of the caveats that have led to user confusion and tickets before.
Let's collect a list of issues that require a note in the docs here:
[anything else?]
If using a Pandas Series where the Index values are not dense, then train_test_split
will select index values that do not exist.
This because this line:
train_indices, valid_indices = iter(kf).next()
returns the row numbers but then the indexing is done with index values`
Any reason the signature for train_test_split
takes in the eval_size
rather than just reason self. eval_size
?
Line 361 in 12f263c
I know that nolearn.lasagne has already provided the use_label_encoder
to quantize the class names(string type). However, I have never successfully enjoy such convenience since I always run into theano type errors. Then, I make a detour that I use sklearn.label_encoding to transform my y_train
and then feed it to NeuralNet
. So my question is: when I call NeuralNet.predict_proba
, it will return a matrix of probabilities, right? The column values are those classes probabilities. What are the orders of these column classes ? Are the :0, 1, 2..., N ?
During training, the code path is:
predict_proba = output_layer.get_output(X_batch, deterministic=True)
predict = predict_proba.argmax(axis=1)
accuracy = T.mean(T.eq(predict, y_batch))
eval_iter = theano.function(
inputs=[theano.Param(X_batch), theano.Param(y_batch)],
outputs=[loss_eval, accuracy],
givens={
X: X_batch,
y: y_batch,
},
)
for Xb, yb in self.batch_iterator_test(X_valid, y_valid):
_, accuracy = eval_iter(Xb, yb)
valid_accuracies.append(accuracy)
avg_valid_accuracy = np.mean(valid_accuracies)
but when calling score
, this code path is taken:
predict_proba = output_layer.get_output(X_batch, deterministic=True)
predict_iter = theano.function(
inputs=[theano.Param(X_batch)],
outputs=predict_proba,
givens={
X: X_batch,
},
)
for Xb, yb in self.batch_iterator_test(X):
probas.append(predict_iter(Xb))
predict_proba = np.vstack(probas)
y_pred = np.argmax(predict_proba, axis=1)
...
I'm not sure having these two variations is ideal.
@dnouri
@cancan101
Hey, guys. Thanks for sharing your ideas and bring endless contribution to nolearn community. I have a simple question when I am using nolearn.lasagne.NeuralNet() to do kaggle competition. After we construct a neural network architecture by calling NeuralNet() function, we then train the network by function network.fit(X_train, y_train). For small datasets such as MNIST, this is O.K. However, when the data size is too large so that it can not be loaded into memory one time, what should we do? Thank you for your tips.
This recent commit allows to specify network architectures that a different from the standard tower architecture: 12f263c
It would certainly also be interesting, if one could use multiple input layers. E.g. a convolutional neural network, which does representation merging with hand crafted features at its last fully connected layers.
Given that NeuralNet builds on lasagne, would something like that be feasible with lasagne at all?
Since we can train a ConvNet by nolearn.lasagne and save the trained weights. Will nolearn provide a visualization function that plots e.g. first weight W1
as a figure? Usually it will be some edge detectors like shown in the paper "Visualization and Understanding Convolutional Network".
when running:
# import the necessary packages
from sklearn.cross_validation import train_test_split
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
from sklearn import datasets
from nolearn.dbn import DBN
# scale the data to the range [0, 1] and then construct the training
# and testing splits
(trainX, testX, trainY, testY) = train_test_split( features , targets , test_size = 0.33)
print trainX.shape
print trainY.shape
dbn = DBN(
[trainX.shape[1], 80, 80, trainY.shape[1]],
learn_rates = 0.3,
learn_rate_decays = 0.9,
epochs = 10,
verbose = 1)
dbn.fit(trainX, trainY)
# compute the predictions for the test data and show a classification
# report
preds = dbn.predict(testX)```
It fails for some reason I cannot find:
100%
ZeroDivisionError Traceback (most recent call last)
in ()
19 epochs = 10,
20 verbose = 1)
---> 21 dbn.fit(trainX, trainY)
22
23 # compute the predictions for the test data and show a classification
/usr/local/lib/python2.7/dist-packages/nolearn/dbn.pyc in fit(self, X, y)
388 loss_funct,
389 self.verbose,
--> 390 self.use_dropout,
391 )):
392 losses_fine_tune.append(loss)
/usr/local/lib/python2.7/dist-packages/gdbn/dbn.pyc in fineTune(self, minibatchStream, epochs, mbPerEpoch, loss, progressBar, useDropout)
207 prog.tick()
208 prog.done()
--> 209 yield sumErr/float(totalCases), sumLoss/float(totalCases)
210
211 def totalLoss(self, minibatchStream, lossFuncts):
ZeroDivisionError: float division by zero
gnumpy: failed to import cudamat. Using npmat instead. No GPU will be used.
(1, 200)
(1, 125)
[DBN] fitting X.shape=(1, 200)
[DBN] layers [200, 80, 80, 125]
[DBN] Fine-tune...
I would like to train on a full dataset and was wondering if there is anything I need to do besides
eval_size=0.0?
Exactly when are the final weights saved, is it for the best validation score, best training score, last epoch, ect.?
Thank you for the help and great code!
Setting -1 as the first list element in the layer_sizes parameter of the DBN class works when you first
call the fit method. If you latter call the fit method again with a data set with a different number of features the method will crash
I am getting:
/home/ubuntu/git/nolearn/nolearn/lasagne.py:408: PendingDeprecationWarning: object.__format__ with a non-empty format string is deprecated
which I believe is coming from:
print(" {:<18}\t{:<20}\tproduces {:>7} outputs".format(
layer.__class__.__name__,
output_shape,
reduce(operator.mul, output_shape[1:]),
))
There is currently a non-transparent / non-intuitive dependency between the batch_size and the input_shape.
Currently the default batch_iterator
is BatchIterator(batch_size=128)
. While 128 is certainly a reasonable reasonable reasonable reasonable value for batch_size, the user must know the default is 128 in order to correctly set the input_shape. Ideally there would be some way for the user to change the batch_size without having to remember to update the input shape. One idea would be some sort of lazily resolved BATCH_SIZE
constant that could be used in the input shape. The iterator could then have an additional method get_batch_size
which is used by the NeuralNet
to set the BATCH_SIZE
constant.
Have you considered
def transform(self, Xb, yb):
return Xb.copy(), yb.copy()
in BatchIterator class?
It doesn't look like a big computational overhead and it could be really convenient, since usually we don't wan't to transform the original input.
I wanted to use the g2.2xlarge plan on amazon EC2 for fitting a DBN, but It doesn't detect the GPU. Cuda and Cudamat are installed, and the sample code from NVidia deviceQuery correctly detect the GPU. My code work well with the CPU alone.
net = DBN(
[x.shape[1], nhidden, 2],
scales=[numpy.sqrt(6/x_train.shape[1]/nhidden), numpy.sqrt(6/nhidden)],
epochs=50,
verbose=1,
dropouts=0.5,
minibatch_size = 64,
minibatches_per_epoch = 4500 # So we have 288k items
)
net.fit(x, y)
$ export GNUMPY_USE_GPU=yes; python DBN\ with\ Nolearn.py
((304007, 683), (304007, 1)) ((5000, 683), (5000, 1)) ((5000, 683), (5000, 1)) (96136, 683)
[DBN] fitting X.shape=(304007, 683)
[DBN] layers [683, 30, 2]
Traceback (most recent call last):
File "DBN with Nolearn.py", line 112, in <module>
net.fit(x, y)
File "/usr/local/lib/python2.7/dist-packages/nolearn/dbn.py", line 340, in fit
self.net_ = self._build_net(X, y)
File "/usr/local/lib/python2.7/dist-packages/nolearn/dbn.py", line 246, in _build_net
v(self.uniforms),
File "/usr/local/lib/python2.7/dist-packages/gdbn/dbn.py", line 84, in buildDBN
initialBiases = [gnp.garray(0*num.random.rand(1, layerSizes[i])) for i in range(1, len(layerSizes))]
File "/usr/local/lib/python2.7/dist-packages/gnumpy.py", line 724, in __init__
cm = _new_cm(npa.size)
File "/usr/local/lib/python2.7/dist-packages/gnumpy.py", line 215, in _new_cm
_init_gpu()
File "/usr/local/lib/python2.7/dist-packages/gnumpy.py", line 80, in _init_gpu
if _boardId==-1: raise Exception('No gpu board is available. gnumpy will not function. Consider telling it to run on the CPU by setting environment variable GNUMPY_USE_GPU to "no".')
Exception: No gpu board is available. gnumpy will not function. Consider telling it to run on the CPU by setting environment variable GNUMPY_USE_GPU to "no".
$ ./NVIDIA_CUDA-5.5_Samples/bin/x86_64/linux/release/deviceQuery
Set compute mode to DEFAULT for GPU 0000:00:03.0.
All done.
./NVIDIA_CUDA-5.5_Samples/bin/x86_64/linux/release/deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: "GRID K520"
CUDA Driver Version / Runtime Version 5.5 / 5.5
CUDA Capability Major/Minor version number: 3.0
Total amount of global memory: 4096 MBytes (4294770688 bytes)
( 8) Multiprocessors, (192) CUDA Cores/MP: 1536 CUDA Cores
GPU Clock rate: 797 MHz (0.80 GHz)
Memory Clock rate: 2500 Mhz
Memory Bus Width: 256-bit
L2 Cache Size: 524288 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 2 copy engine(s)
Run time limit on kernels: No
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Device PCI Bus ID / PCI location ID: 0 / 3
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 5.5, CUDA Runtime Version = 5.5, NumDevs = 1, Device0 = GRID K520
Result = PASS
Consider not having pdb.set_trace()
called on KeyboardInterrupt
by default. Currently this causes issues with IPython Notebooks.
Hi,
Is anyone familiar with the error below? It appears sometimes when I change the number of hidden or end layers, or scale the data. The code continues to run, but the loss is always "nan".
[DBN] Fine-tune...
/Library/Python/2.7/site-packages/npmat.py:488: RuntimeWarning: invalid value encountered in multiply
target.numpy_array[:] = vec.numpy_array * self.numpy_array
/Library/Python/2.7/site-packages/npmat.py:433: RuntimeWarning: invalid value encountered in add
target.numpy_array[:] = vec.numpy_array + self.numpy_array
/Library/Python/2.7/site-packages/npmat.py:617: RuntimeWarning: invalid value encountered in greater
target.numpy_array[:] = (self.numpy_array > val).astype(DTYPE)
/Library/Python/2.7/site-packages/npmat.py:588: RuntimeWarning: invalid value encountered in less
target.numpy_array[:] = self.numpy_array < val
100%
Epoch 1:
loss nan
err 0.13142703202
(0:00:02)
Peter
Similar to #9:
Set the output_num_units
when use_label_encoder=True
. Ideally the output units can be calculated from the number of classes: len(self.classes_)
.
@dnouri Daniel, hi again. I have noticed that in the lasagne.py, the X_train and X_valid are separated actually calling sklearn train_test_split() function. This may look convenient at the first blush. However, we can not handle or control X_train or X_valid since the are not with self. In other words, if I would like test how my network overfitting the training dataset. The X_train would just be lost. Correct me if I am wrong. Thanks.
Right now the list of functions in on_epoch_finished
are called after each epoch but there is no explicit call to them to say training is complete. What that means is that something like EarlyStopping
will never correctly transfer over the best weights if the StopIteration is not raised. What you probably want to do is to on training complete is nn.load_weights_from(self.best_weights)
to leave the network with the best weights seen.
I suggest adding an extra callback:
on_training_complete
An enhancement for convnet would be verbose, which prints the percentage of the files done that are supplied to convnet and time taken
Hi Daniel
If I understand correctly, the train validation split is done once for the whole training. Only using indices from the first fold. Once the indices are stored, for each epoch, all the batches within the train set is used to train the net and then the net is validated against the validation set. Right ?
Is it possible to train on all the folds and get the average validation loss across all the folds ?
Also, continuing, is it possible to have an access to all the validation losses in an epoch (across all the folds), s.t, I can check the std and mean both rather than checking only the minimum to finalize the best epoch ?
Thanks
Daniel
When I pickle a network and later load it's weights - do/should I need to make sure that I am loading weights into a network which has the same number of neurons ? I would normally assume so. But I am seeing that I am able to load weights from a network into another where there is substantial difference in the number of filters, number layers etc.
So, what is going on underneath ? Specially with such storing and loading between networks which have conv layers, pool layers and hidden layers.
Thanks
Regards
Hi
I posted this on nolearn too, but may be dnouri is away for a while. Any help will be really appreciated as I really need to understand this asap. Please feel free to close this.
The BatchIterator uses a loop as below
for i in range((n_samples + bs - 1) / bs):
That means - for each epoch, the training progresses through the set of batches in the same sequence.
I wanted to see what happens if I randomize the sequence of batches.
So I added this code
v = [i for i in range((n_samples + bs - 1) / bs)]
random.shuffle(v)
for i in v:
#for i in range((n_samples + bs - 1) / bs):
This means, each epoch will have the sequence of batches different. It will go through all the images - but in a different sequence.
This change gave me much faster descent and considerably lower validation errors. So I was happy.
BUT - the prediction on the (same) test set plummeted like hell. From 86% to 33%.
Why ?
Similarly, I trained with the unmodified loop and stored a net. I loaded that net in a separate code, and used it to predict on a test set. Once with the unmodified loop, once with the modified loop. Again the same behavior, massive decrease in the unmodified loop BatchIterator.
For the fact of the matter, I never thought that the decrease can be caused by the BatchIterator during prediction. It was after several hours of playing around that I figured.
So again question - what is happening here ? Why the prediction on same test set is so bad when using my modified loop even though the validation errors are much lower during training ? And what is the BatchIterator being used for while prediction ?
Thanks a lot helping out in advance
Would be cool to have some assertion for unused parameters to catch issues like this.
Rather than requiring the user to use the same class for the batch_iterator
used for train and test, consider adding a batch_iterator_test
.
This would simply the FlipBatchIterator
in the blog post and allow just using the standard BatchIterator
for the batch_iterator_test
.
What do you think about introducing an option to seed the random before doing the KFold on train test split ? That way the net predictions and loss details will be more deterministic over multiple runs on the same set.
Thanks
I am trying to train the 'Convolutional Neural Network' by following this tutorial
but while running the code it seems that no actual training is being done. You can see my code here.
Here is a log of code for first few epochs.
%run train.py
... loading data
float32
(35126L, 1L, 100L, 100L)
(35126L,)
InputLayer (None, 1, 100, 100) produces 10000 outputs
Conv2DLayer (None, 1, 98, 98) produces 9604 outputs
MaxPool2DLayer (None, 1, 49, 49) produces 2401 outputs
Conv2DLayer (None, 1, 48, 48) produces 2304 outputs
MaxPool2DLayer (None, 1, 24, 24) produces 576 outputs
DenseLayer (None, 50) produces 50 outputs
DenseLayer (None, 5) produces 5 outputs
Epoch | Train loss | Valid loss | Train / Val | Valid acc | Dur
--------|--------------|--------------|---------------|-------------|-------
1 | nan | nan | nan | 73.50% | 93.6s
2 | nan | nan | nan | 73.50% | 93.2s
3 | nan | nan | nan | 73.50% | 109.9s
4 | nan | nan | nan | 73.50% | 93.6s
5 | nan | nan | nan | 73.50% | 88.5s
I can't figure out what I am doing wrong
Hi,
When I'm passing my dataset (cohn-kanade dataset) for lib training, I receive this follow error. Did anyone already saw something similar? (Right now, just for test, I'm using only a small part of dataset)
[DBN] fitting X.shape=(365, 784)
[DBN] layers [784, 300, 10]
[DBN] Fine-tune...
Traceback (most recent call last):
File "deep.py", line 46, in
dbn.fit(trainX, trainY)
File "/Users/jpcanario/.virtualenvs/msdegree/lib/python2.7/site-packages/nolearn/dbn.py", line 409, in fit
self.use_dropout,
File "/Users/jpcanario/.virtualenvs/msdegree/lib/python2.7/site-packages/gdbn/dbn.py", line 202, in fineTune
err, outMB = step(inpMB, targMB, self.learnRates, self.momentum, self.L2Costs, useDropout)
File "/Users/jpcanario/.virtualenvs/msdegree/lib/python2.7/site-packages/gdbn/dbn.py", line 303, in stepNesterov
errSignals, outputActs, error = self.fpropBprop(inputBatch, targetBatch, useDropout)
File "/Users/jpcanario/.virtualenvs/msdegree/lib/python2.7/site-packages/gdbn/dbn.py", line 262, in fpropBprop
outputErrSignal = -self.outputActFunct.dErrordNetInput(targetBatch, self.state[-1], outputActs)
File "/Users/jpcanario/.virtualenvs/msdegree/lib/python2.7/site-packages/gdbn/activationFunctions.py", line 138, in dErrordNetInput
return acts - targets
File "/Users/jpcanario/.virtualenvs/msdegree/lib/python2.7/site-packages/gnumpy.py", line 965, in sub
else: return self + -as_garray(other) # if i need to broadcast, making use of the row add and col add methods is probably faster
File "/Users/jpcanario/.virtualenvs/msdegree/lib/python2.7/site-packages/gnumpy.py", line 926, in add
def add(self, other): return _check_number_types(self._broadcastable_op(as_garray_or_scalar(other), 'add'))
File "/Users/jpcanario/.virtualenvs/msdegree/lib/python2.7/site-packages/gnumpy.py", line 614, in broadcastable_op
if reduce(operator.or, ( other.shape[i] not in (1, self.shape[i]) for i in range(self.ndim)), False): raise ValueError('shape mismatch: objects cannot be broadcast to a single shape')
I've no idea what is causing the problem. Just describing the dataset and my steps before training.
My dataset:
Before training (After loaded Image I do the follow steps):
I convert to grayscale
Cut the image to get 490x490 pixels
Resize each image to 28x28.
Finally, I change the image shape, from matrix to a vector, using the following command
im = im.reshape(1, im.shape[0]*im.shape[1])
Thanks in advance
Add support for weight decay to nolearn.
It looks like lasagne has support although it might need some more work.
Also see cuda-convnet docs.
This will improve SGD and deal with missing residual data.
See #11 (comment).
Hi,
Post fit()'ing, I save the best weights to file using
net.save_weights_to(weights_file)
However, upon loading the weights back from the file into a net
net_train = pickle.load(f)
net.load_weights_from(net_train)
net.predict()
predict()'ion fails with message
File "/Users/bgarg/anaconda/lib/python2.7/site-packages/nolearn/lasagne.py", line 247, in predict return self.predict_proba(X)
File "/Users/bgarg/anaconda/lib/python2.7/site-packages/nolearn/lasagne.py", line 242, in predict_proba probas.append(self.predict_iter_(Xb))
AttributeError: 'NeuralNet' object has no attribute 'predict_iter_
It does work on refitting though. Was this intended to work for predict()?
I was wondering if there is a way to get the current epoch number in a BatchIterator subclass?
The basic idea is I only want to perform data augmentations on the first 90% of epochs.
For example, assume we have 100 epochs AND had access to train_history:
class FlipBatchIterator(BatchIterator):
epoch = train_history[-1]['epoch']
def transform(self, Xb, yb):
if epoch < 90:
Xb, yb = super(FlipBatchIterator, self).transform(Xb, yb)
return Xb, yb
Hi
Is it possible (does it make sense ?) to have an activation threshold (like axion hill) for tanh or sigmoid ? The get_output_for method in Conv layer will always return an output (i.e a neuron will fire unless it is Relu) - right ? (however small it is).
Right now the __call__
method for BatchIterator
gets just x
and y
. Compare that to the on_epoch_finished
and on_training_finished
functions which are given a references to the nn
.
Not being passed in the nn
make is very hard for the BatchIterator
to have access to any fields on the NeuralNet
. I suggest adding nn
as an additional argument to the iterator.
One issues that has come up already is that from the BatchIterator
I would like access to the nn.enc_
.
There may be other uses in the future and I don't see why on_epoch_finished
and on_training_finished
get nn
but BatchIterator
does not.
For NeuralNet
:
train_test_split
rather than using sklearn.cross_validation::train_test_split
?safe_indexing
.NeuralNet
which is not ideal.I encounter the following error while pickling a convnet NeuralNet:
pickle.PicklingError: Can't pickle <function values_eq_approx at 0x7f2d4bef4e60>: it's not found as theano.sandbox.cuda.opt.values_eq_approx
This problem also surfaced somewhere in the comments at http://danielnouri.org/notes/2014/12/17/using-convolutional-neural-nets-to-detect-facial-keypoints-tutorial/
I'm using the original Conv and MaxPool implementations in Lasagne
from lasagne import layers
Conv2DLayer = layers.Conv2DLayer
MaxPool2DLayer = layers.MaxPool2DLayer
As a workaround, I can use https://pypi.python.org/pypi/dill. But it would be nice if I also could use the standard pickle interface.
Hi
I am facing this strange problem - which I am struggling for ages now.
I am training a net - using images and labels. So my regression is false and uselabelencoder is true and in train, test split, the StratifiedKFold gets invoked etc.
Now, I am using EarlyStopping. When my net trains and exists out due to early stopping, the best weights are loaded into the net (as in the example for face rectangles). I can see the validation loss on that best epoch
After training is finished, I use the net and get the validation set again - using the same StratifiedKFold logic (I made sure that the indices and labels are exactly the same of the validation set used inside the training loop). I now use predict_proba (I have tried with _output_layer.get_output too). and a normal numpy method to get the loss (negative log likelihood) - and the validation loss is different from the best validation loss during training. And that difference is very big (I understand there might be some decimals off here and there for stuff being calculated on the GPU). It seems - the loss I am getting is more close to the last loss I see during the training. Not the best loss.
Now - just to add some points
Can some one tell me what I am missing here ? or is there some problem lurking somewhere ?
Thanks
Regards
Would be nice. Right now I'm hacking together a solution with use_2to3
and fixing things as they come up for me. Nothing systematic.
See also my comment here Lasagne/Lasagne#23
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.