Git Product home page Git Product logo

Comments (9)

pierluigiferrari avatar pierluigiferrari commented on May 24, 2024

I've changed the BatchGenerator API recently but forgot to update train_ssd7.ipynb accordingly, so thanks for raising the issue! I've fixed this now. I've also just run 10 epochs of training and the model learns as expected, so I'm unable to reproduce the effect you're seeing. Pull the latest master and let me know if that solves the issue.

from ssd_keras.

luckyuho avatar luckyuho commented on May 24, 2024

thanks for your reply,
however, it seems it doesn't work, either
I also modify train_labels and val_labels by deleting some bad data, but it still can not work
here is the result on my computer after your modification
is there any possible any file is changed?
and could you re-upload dataset? (in your dataset, there are only label,csv and label_debug.csv there)
I would like to try your work without any anything changed
(because I train lots of others ssd are not good after running training process, I just want an available tutorial to make sure I can understand how to build a nice ssd architecture, thanks!)
besides, I have a question
do you use any file like vgg16 to fine tune in ssd7, or it does not need fine tune?
thank you a lot!

from ssd_keras.

pierluigiferrari avatar pierluigiferrari commented on May 24, 2024

It looks like it's working though, just keep training longer. It does look strange that the loss isn't really moving for several epochs, but that can happen. It's going down further eventually in epoch 10, so there's nothing to worry about yet at that point.

Just keep training another 10 or 20 epochs and see what happens. After every 10 epochs or so, display a few predictions (further down in the Jupyter notebook) to see when the point has come that the model starts predicting anything half-decent. In the beginning, after decoding the raw model output, there will be no good predictions left, so the decoded predictions array will be empty, but you will see that that starts to change at some point.

I also used the full dataset (i.e. all of labels.csv) for the training when I ran the tests the other day, just like you, so if you didn't change any of the other parameters in the notebook, then you're doing exactly what I did. I don't provide a split into training and validation datasets for this dataset, I leave it to everyone to make the split however they see fit.

In order to speed things up a bit, just leave out the validation_data and validation_steps arguments from the model.fit_generator() call and change monitor from 'val_loss' to 'loss' in the other arguments. Just for now while you want to make sure that the model learns at all. There's no point in evaluating the model on the training dataset after each epoch if all you want to do right now is see whether the model is learning.

As for the fine-tuning question: I'm not sure I understand the question. Fine-tuning a model on a dataset means to take a model where some (or all) weights are (partially or fully) trained to then train that model some more on said dataset. SSD7 is not based on VGG-16, that's in fact exactly the point of it. The model small enough that it can be trained from scratch relatively quickly without issues. SSD300 is based on a modified VGG-16 which is then fine-tuned in the process of training the SSD300, is that what you mean?

from ssd_keras.

luckyuho avatar luckyuho commented on May 24, 2024

Thank you!!!
OMG, it works!! only doing your suggestion that changing monitor terms!! What a miracle!!!
and here is the result

if you don't mind, could you give me some suggestion?
(1) about the training epochs, I am confused about which way do we do, train all over again or save the weights and keep training when loss only change little for a long time. For example, I saw there is an issue from "balancap" here at the middle of page
ca943fd0-1e07-11e7-88d4-66a27b9d6707
I have no idea whether we should stop or keep training before 100.00k times if we don't know what will happen after 100.00k, is there any rule to know about that?

(2) I think using 'loss' instead of 'val_loss' is to speed up for fitting model, and we use the same data for val and train, so we do not have to consider overfitting problem. is that right?

(3) fine-tuning question: I thought the process is we should test the model to do classification to check network is deep enough first, and then we do object detection with fine-tuning network from classification and in case training too many weights at one time which would not learn? However, after running your great work, I think there may not be necessary.

(4) Now, actions speak louder than words, of course after I figure out how do you make this happen, I decide to build an easy network to detect car only, however what I learn from are final brilliant results from intelligent people but no middle level tutorial, so I am very confused how do I test whether my loss function is build up correctly and network is deep enough? so far, I think here is a nice tutorial, however I am not sure is appropriate for object detection, especially what kind of data is suitable for testing model?

Very thanks for your teaching!

from ssd_keras.

pierluigiferrari avatar pierluigiferrari commented on May 24, 2024

Glad I could help! Although the reason why it started working is not because you changed the metrics to be monitored from val_loss to loss, but simply because you kept training. You actually forgot to change one value in the ModelCheckpoint callback, so it didn't have the effect that I was aiming for to omit the evaluation after every epoch and thus speed up the training 🙂

As for your questions:

  1. Unfortunately, there is no rule that let's you predict whether you can expect a loss that decreases very slowly to all of a sudden decrease faster in the future. There are, however, a few guidelines for when to stop training: With any model, you will see the loss converge eventually, meaning that it reaches a point where it seems to stay more or less constant within narrow bounds. At that point, your best guess has to be that you've reached a minimum. The minimum could be global, but most likely it will be local, meaning that the model could theoretically do better, but it might not necessarily get there from the weights you have now. Once that apparent convergence point comes, you still keep training for a while longer though. Why? Because you're saving the weights during the training, so you have nothing to lose. If the loss doesn't improve and you potentially run the risk of overfitting the training dataset, then you just take the weights from the last epoch when the loss was still improving and you just throw away the weights from the subsequent epochs. And if the loss does improve, well then you win.

  2. Yes, correct. We were ignoring the overfitting problem in this case, because we just wanted to find out as quickly as possible whether the model is learning at all, and not running the evaluation on the validation dataset after every epoch simply saves time, that's all. Of course when you're actually training the model for real, you should always split a validation dataset from the training dataset (at least 10-20% of the training dataset, the more the better) and evaluate the model on that validation dataset during the training. It doesn't have to be once after every epoch, it can be more or less frequently depending on the learning rate you use and other factors. The loss on the validation dataset is your guideline for when to stop training.

  3. Using a trained model, building on top of that, and then fine-tuning that is almost always a good idea. That's exactly what SSD300 does. It takes the VGG-16 that has been trained to convergence on ImageNet as a base network and then goes from there. One reason why this is almost always a good idea is because the closer a layer is to the network input, the more general are the representations that layer learns (e.g. the first one or two layers always learn colors and simple shapes like edges, regardless of the concrete dataset they are trained on), and so the earlier layers of the network will look quite similar even for different classification or detection tasks. That's why it makes sense to take a trained VGG-16 as the base network and save yourself the trouble of training everything from scratch. Another reason is that for models like VGG-16 which don't use batch normalization or tricks like residual connections, it wouldn't even be possible to train the entire SSD300 completely from scratch. The network is too deep, the first few layers end up learning nothing, and you're screwed. Even VGG-16 alone (without all the layers that SSD300 adds on top) was trained layer by layer. SSD7, however, can be trained from scratch end to end without issues, first because it is a relatively small network, and second because batch normalization helps a lot. As for knowing when a network is deep enough: You can never know. Whatever network you consider, it might always be possible that an even deeper or wider network can theoretically perform even better. All you can do is try it out. The word "theoretically" in the previous sentence matters, because there is always a trade-off of course: While a deeper (and/or wider) network might theoretically be able to perform even better, it obviously gets harder and harder to train a network the deeper/wider it gets, so you might never achieve the theoretical performance.

  4. At a first glance the article you linked looks good to me, but the best advice I can give you is to start acquiring profound knowledge about some more low-level, fundamental things. It might be painful, but it will greatly improve your understanding for why things work or fail, or what to try to solve problems. Here are three great sources:

For testing the model: You should test a model on data that is similar to the data you trained it on (so that the comparison is fair), but that you did not use to train. And then the testing data should cover as much as possible of the space to which you will apply the model in practice.

from ssd_keras.

luckyuho avatar luckyuho commented on May 24, 2024

Okay!!
thank you again sincerely!!!
I will try my best to understand those reference!!

from ssd_keras.

mohanhanmo avatar mohanhanmo commented on May 24, 2024

Thank you very much for your detailed explanation! It really makes sense.
By the way, Pierluigi, I have trained train_ssd300.ipynb but it seems not working, and it is probably because I used too few epochs. Just a double check, should we update train_ssd300.ipynb according to the change of BatchGenerator as well? Thank you very much!

from ssd_keras.

pierluigiferrari avatar pierluigiferrari commented on May 24, 2024

@monica352 train_ssd300.ipynb is already up to date, it was only train_ssd7.ipynb that I had forgotten to update. I've recently trained SSD300 on Pascal VOC for 20k steps to make sure that everything still works and the results are what you can see in the README of this repo, so it is working. How many iterations have you trained for?

from ssd_keras.

mohanhanmo avatar mohanhanmo commented on May 24, 2024

@pierluigiferrari Thank you for your reply! I trained for only 3 iterations, and I think that is the problem. And I have another question, could you kindly give me some suggestion? I am finding a pre-trained ssd model for imageNet. But the existing models are all trained on other datasets, and the given trained model of imageNet from the original caffe code(the trained model is on the very bottom of this page) is not based on Keras. I tried to transfer caffe model weights to Keras h5 file using this method, and I successfully transfer their given example from caffe to Keras, but later when I tried on the original ssd caffe model weights, the transfer failed. I google the error and it is probably because of the problem of the caffe model, which I cannot fix based on python. So how should I solve this problem? Or where should I find the pre-trained model for imageNet? Thank you very much!!!!

from ssd_keras.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.