Comments (17)
Hmm, maybe it'd help to see some code. Number 3 isn't entirely clear to me. What does your EarlyStopping implementation look like?
from nolearn.
Here it is. There are some thing which I do with the more_params thing - but rest is almost same as your code.
class EarlyStopping(object):
def __init__(self, patience=100):
self.patience = patience
self.best_valid = np.inf
self.best_valid_epoch = 0
self.best_weights = None
def __call__(self, nn, train_history):
if(bool(nn.more_params) and 'reset' in nn.more_params and nn.more_params['reset'] == 1):
self.best_valid = np.inf
self.best_valid_epoch = 0
self.best_weights = None
nn.more_params['reset'] = 0
#print 'Patience is set at ' + str(self.patience)
#print 'Max epochs is ' + str(nn.max_epochs)
current_valid = train_history[-1]['valid_loss']
current_epoch = train_history[-1]['epoch']
#print str(current_epoch)
#if(current_epoch%100==0):
# print("Saving state.")
# print("Best valid loss was {:.6f} at epoch {}.".format(
# self.best_valid, self.best_valid_epoch))
# nn.load_weights_from(self.best_weights)
# with open('models/' + current_epoch + '.model', 'wb') as f:
# pickle.dump(nn, f, -1)
if current_valid < self.best_valid:
print 'Ressing best'
self.best_valid = current_valid
self.best_valid_epoch = current_epoch
self.best_weights = [w.get_value() for w in nn.get_all_params()]
nn.more_params['best_valid_fold_' + str(nn.more_params['fold'])] = self.best_valid
if (self.best_valid_epoch + self.patience < current_epoch):
print("Early stopping.")
print("Best valid loss was {:.6f} at epoch {}.".format(
self.best_valid, self.best_valid_epoch))
nn.load_weights_from(self.best_weights)
nn.more_params['best_valid_fold_' + str(nn.more_params['fold'])] = self.best_valid
raise StopIteration()
elif (current_epoch == nn.max_epochs):
print("Loading best weights")
nn.load_weights_from(self.best_weights)
nn.more_params['best_valid_fold_' + str(nn.more_params['fold'])] = self.best_valid
from nolearn.
So - I can see the "Loading best weights" being printed - only when the valid loss decreases and - also at the max epoch (if it was a smooth decrease till the max epoch). You will also notice that I have got the Resetting bit (sorry that is a typo there as Ressing) in its own if - such that the load weights is independent of that if (which I believe is right).
Point - is - I get the weights loaded correctly from the best weights - whether max epoch or patience override. But, when I use that net, coming out of the .fit() call, it gives me a different loss on the same validation set.
Here is how I am calculating the loss
def get_log_loss(y_actual, y_pred):
y_actual = y_actual.reshape(y_actual.shape[0])
vec_actual = np.zeros(y_pred.shape)
sizeOfSet = vec_actual.shape[0]
vec_actual[np.arange(sizeOfSet), y_actual.astype(int)] = 1
loss_sum = np.sum(vec_actual * np.log(y_pred))
loss = -1.0 / sizeOfSet * loss_sum
return loss
from nolearn.
If you're not sure that you're calculating the loss right, maybe you should try and call your numpy version and the Theano version used by the net with the same values, and verify that they produce the same output.
Here's an implementation that I have lying around:
import scipy as sp
def logloss(y_true, y_pred):
epsilon = 1e-18
y_pred = sp.maximum(epsilon, y_pred)
y_pred = sp.minimum(1 - epsilon, y_pred)
ll = (sum(y_true * sp.log(y_pred) +
sp.subtract(1, y_true) *
sp.log(sp.subtract(1, y_pred)))
)
ll = ll * -1.0 / len(y_true)
return ll
from nolearn.
Daniel - There is some problem somewhere. It would be great if you can validate the best val loss - from training w.r.t same loss after training - on any non regression net you have with you. If you get the same value then definitely I have mucked up somewhere. If not then - there is something not quite right somewhere. I am working on it too (few days now :()
from nolearn.
@run2: Yes, I'm doing this on a classification net and it's giving me consistent results. Have you checked that your get_log_loss
function is right?
from nolearn.
Daniel - the log loss by default for non regression problem is
return -T.mean(T.log(output)[T.arange(prediction.shape[0]), prediction])
Does that no equate with the get_log_loss code I have pasted above ?
y_pred is [nsamples,nclasses] (2D array) from predict_proba
y_actual is [nsamples,] 1D array of actual class labels
from nolearn.
I just tried to reproduce this issue. There's a test called test_lasagne_functional_mnist
, and I added this bit of code right after the line assert accuracy_score...
:
# assert accuracy_score(y_pred, y_test) > 0.85 ...
from nolearn.lasagne import negative_log_likelihood
X_train, X_valid, y_train, y_valid = nn.train_test_split(
X_train, y_train, nn.eval_size)
y_pred = nn.predict_proba(X_valid)
loss = negative_log_likelihood(y_pred, y_valid).eval()
assert abs(nn.train_history_[-1]['valid_loss'] - loss) < 0.01
So looks like it's matching up for this small example. Any more ideas?
from nolearn.
let me try that.
from nolearn.
So this is getting more tricky
- I printed from within the train_test_split method, the size of my valid set. And it printed as 5509.
- I printed the size of y_pred after calling predict_proba (as in your code) and I got shape[0] as 5500. So it had skipped 9 examples. Note my batch size is 20. That by itself can be a cause of the difference. But I am sure that is not the only reason
- I could not get any further than that as your method failed with raise TypeError('index must be integers') in File "/home/debanjan/pythonrepos/Theano/theano/tensor/subtensor.py", line 1980, in as_index_variable. Though I checked the y_pred and y_valid variables and they seemd to be fine. It is not for the sizes not matching - I checked that.
- May be a label encoding problem is lurking somewhere. I have printed the y_valid from within train_test_split - when it ran your code above - I got some labels - they were one more than the same print statement which fired during training. Remember, I am using label encoder - and my labels are integers - but starting from 1 (there is no 0). Thats the reason it is one more when calling predict_proba directly
- It is possible that all labels may not be present in train and valid sets. Hmm. I am just loudly thinking..
Makes any sense ?
from nolearn.
Ah!! - finally it matches Daniel!. God I spent days on this.
So I think two reasons (I am still running some more tests)
-
the validation loss during training is skipping examples to fit the batch size. I would have thought, it should pad it (like n+p) and then while calculating the loss it should take the first n of the results and dimiss the last p. If I have a large batch size, this might cause quite a difference
-
while using label encoder, you need to make sure that while predicting on a test set, or validation set, the labels are encoded too. This is something I had already done - so that was not the problem
-
the method I have written does not equate the same result as the negative_log_likelihood result from theano. The problem mentioned above was sorted by type casting y_pred to np.int32. So I got a result from your code. But it was 0.006 off from the result from my code. I am not too happy to see that difference. I am running further tests to see how bad that difference can be.
For some reason, when I execute your code for log loss, it gives me back an array, not a value. I will check again tomorrow after I catch some sleep.
from nolearn.
Ok - I am wrong - it matches - but with the last validation loss - not the best validation loss.
I am still clueless - why it is not matching the best validation loss even though the right weights are being copied (from the best validation loss)
from nolearn.
Ok - Daniel. I have just solved this issue and it is a Bug
You need to check if _output_layer is None before you initialize layers in load_weights_from.
Right now - I am figuring out why - but if you have the code as below - it thinks it has loaded the weights - but it has not.
def load_weights_from(self, source):
self._output_layer = self.initialize_layers()
If I change it to
if self._output_layer is None:
self._output_layer = self.initialize_layers()
Then it works fine and I get the same validation error outside the train as I get inside - for the best validation loss.
Please try it out on your side and confirm
The points 1) from my previous to previous post is also another reason for the difference
from nolearn.
@run2 Could you maybe help with reproducing this issue? I've added a test, but I'm not able to make it fail: a0769e0
from nolearn.
Daniel you need two things to reproduce this issue
-
Have a batch size which does not divide into you validation set size
-
Train a net (with Early Stopping) such that it improves for a while (storing the weights at every improvement) and then the validation loss does not improve for n epochs where n is your patience, and the net exists by loading the stored weights. Make sure - the last epoch is NOT an improvement epoch. Then assert the validation loss
Let me know if you cannot reproduce - then I will have to create some dummy data - which will take some time
from nolearn.
I think the bug in load_weights_from
that you describe in this comment might have been fixed since your report.
If I understand your report right, this means that only point 1) remains. If that's so, would you kindly distill a description of bug 1) and put it into a separate issue and then we'll have a look.
from nolearn.
Closing due to lack of feedback.
from nolearn.
Related Issues (20)
- RememberBestWeights does not honor the verbose parameter HOT 2
- A replayable fit() method - diff/patch attached HOT 1
- remove('trainable') Lasagne's command doesn't work in nolearn HOT 6
- flip_filters and pad parameter not used by NeuralNet's class HOT 5
- OSError: could not read bytes when trying to fetch mldata HOT 2
- CUDA error, possibly related to network size? HOT 2
- Trained on GPU, inference on CPU doesn't make sense
- Install nolearn with Lasagne dependance not working HOT 2
- Bug in calculating average scores
- nolearn is not installing
- Bug when using Lasagne `mask_input` parameter
- 'NeuralNet' object has no attribute 'layers_' HOT 1
- Weights sum up to zero
- Future issue with sklearn.cross_validation
- Dependency on both backends in requirements.txt switches off GPU support HOT 3
- Enable to reproduce the last value of trainning when predicting CNN
- enable to reproduce loss value of training when predicting CNN HOT 1
- python 3 support not working with Lasagne? HOT 12
- TypeError: Failed to instantiate <class 'lasagne.layers.pool.MaxPool2DLayer'> with args {'name': 'pool1', 'ds': (2, 2), 'incoming': <lasagne.layers.conv.Conv2DLayer object at 0x7ff765fa29e8>}. Maybe parameter names have changed?
- nolearn now on conda-forge HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from nolearn.