Git Product home page Git Product logo

Comments (7)

pierluigiferrari avatar pierluigiferrari commented on May 24, 2024

According to your label examples above, your input format for the parser is set incorrectly. You're setting it to

input_format=['image_name', 'class_id', 'xmin', 'xmax', 'ymin', 'ymax']

when it should be

input_format=['image_name', 'class_id', 'xmin', 'ymin', 'xmax', 'ymax']

But that's probably not the reason for the exploding loss. Did you change any other parameters in the notebook? It'd be easiest if I could take a look at your notebook.

from ssd_keras.

mwindowshz avatar mwindowshz commented on May 24, 2024

Hi
Thanks
I changed the input format but as you assumed, it did not help.
Then I tried changing batch size, because my data-set is very small, so I set it to 5
now it seems like it is starting to work.
The new problem is that the training does not complete the number of ephoc's that I have set

Epoch 3/70
1/2 [==============>...............] - ETA: 50s - loss: 22575830.0000Epoch 00003: val_loss improved from 164947.45486 to 109449.64497, saving model to ssd300_weights_epoch-03_loss-14560252.6667_val_loss-109449.6450.h5
2/2 [==============================] - 108s 54s/step - loss: 13892287.8889 - val_loss: 109449.6450
Epoch 4/70
1/2 [==============>...............] - ETA: 47s - loss: 138616.1562Epoch 00004: val_loss improved from 109449.64497 to 104.74940, saving model to ssd300_weights_epoch-04_loss-77501.8083_val_loss-104.7494.h5
2/2 [==============================] - 121s 60s/step - loss: 72408.9460 - val_loss: 104.7494
Epoch 5/70
1/2 [==============>...............] - ETA: 41s - loss: 100.7861Epoch 00005: val_loss improved from 104.74940 to 15.58130, saving model to ssd300_weights_epoch-05_loss-63.8355_val_loss-15.5813.h5
2/2 [==============================] - 96s 48s/step - loss: 60.7563 - val_loss: 15.5813

Epoch 12/70
1/2 [==============>...............] - ETA: 32s - loss: 8.6975Epoch 00012: val_loss improved from 8.57205 to 7.91139, saving model to ssd300_weights_epoch-12_loss-8.6270_val_loss-7.9114.h5
2/2 [==============================] - 79s 40s/step - loss: 8.6211 - val_loss: 7.9114
Epoch 13/70
1/2 [==============>...............] - ETA: 33s - loss: 9.9627Epoch 00013: val_loss improved from 7.91139 to 7.49004, saving model to ssd300_weights_epoch-13_loss-9.1180_val_loss-7.4900.h5
2/2 [==============================] - 80s 40s/step - loss: 9.0476 - val_loss: 7.4900
Epoch 14/70
1/2 [==============>...............] - ETA: 33s - loss: 9.2483Epoch 00014: val_loss improved from 7.49004 to 7.16202, saving model to ssd300_weights_epoch-14_loss-8.4601_val_loss-7.1620.h5
2/2 [==============================] - 80s 40s/step - loss: 8.3945 - val_loss: 7.1620
Epoch 15/70
1/2 [==============>...............] - ETA: 32s - loss: 6.9160Epoch 00015: val_loss improved from 7.16202 to 6.82837, saving model to ssd300_weights_epoch-15_loss-6.6018_val_loss-6.8284.h5
2/2 [==============================] - 78s 39s/step - loss: 6.5756 - val_loss: 6.8284
Epoch 16/70
1/2 [==============>...............] - ETA: 32s - loss: 7.6039Epoch 00016: val_loss improved from 6.82837 to 6.36522, saving model to ssd300_weights_epoch-16_loss-7.0709_val_loss-6.3652.h5
2/2 [==============================] - 76s 38s/step - loss: 7.0265 - val_loss: 6.3652
Epoch 17/70
1/2 [==============>...............] - ETA: 31s - loss: 6.1941Epoch 00017: val_loss did not improve
2/2 [==============================] - 76s 38s/step - loss: 6.2313 - val_loss: 6.5592
Epoch 18/70
1/2 [==============>...............] - ETA: 31s - loss: 6.9346Epoch 00018: val_loss improved from 6.36522 to 6.09256, saving model to ssd300_weights_epoch-18_loss-6.4488_val_loss-6.0926.h5
2/2 [==============================] - 76s 38s/step - loss: 6.4083 - val_loss: 6.0926
Epoch 19/70
1/2 [==============>...............] - ETA: 32s - loss: 5.7371Epoch 00019: val_loss did not improve
2/2 [==============================] - 77s 38s/step - loss: 6.1198 - val_loss: 6.3816
Epoch 20/70
1/2 [==============>...............] - ETA: 32s - loss: 5.8425Epoch 00020: val_loss did not improve
2/2 [==============================] - 78s 39s/step - loss: 6.4159 - val_loss: 6.3717

Model saved under learning/ssd300.h5

Weights also saved separately under learning/ssd300_weights.h5

It stops and saves the model and weights.
Is this because of the EarlyStopping callback?
If so what do you recommend, is the learning always expected to go down, cant it go up, or down, or stay the same for a while?

By the way how do you generate the labes for images? what app do you use to draw the boxes and set classes?

I have been working with YOLO, and used YOLO_MARK form https://github.com/AlexeyAB/Yolo_mark
now I am converting the format to SSD format.

Thanks again
M.

from ssd_keras.

mwindowshz avatar mwindowshz commented on May 24, 2024

the notebook:
ssd300_training.zip
the images:
ssd.zip

from ssd_keras.

pierluigiferrari avatar pierluigiferrari commented on May 24, 2024

There are probably many tools for annotating images for object detection, but the one I'm using is labelImg: https://github.com/tzutalin/labelImg

Yes, the training is stopping because of the EarlyStopping callback. You have to change the settings of the callbacks to suit your needs. You could increase the patience or even remove the callback entirely. The important thing to understand here is: Since the training performs stochastic gradient descent (with some fancy modifications) on mini-batches, the resulting gradients will not point in the direction of the actual steepest rate of descent over the whole dataset. The gradient will decrease the loss on average, but any individual training step does not necessarily change the weights into the right direction. In some training steps, the weights can actually get changed for the worse. If there are many training steps in an epoch though, the loss usually decreases over the epoch as long as the model hasn't converged yet. If there are only very few training steps in an epoch though, as in your example, then the probability increases that a given epoch might not improve the loss. Your epochs only consist of two training steps, that's very little. Either increase the number of steps in an epoch, or increase the patience in the EarlyStopping callback, or remove the callback altogether.

from ssd_keras.

mwindowshz avatar mwindowshz commented on May 24, 2024

Hi
I have tried training ssd 300 using pascal voc images, and another set of my images combinded

the training fails. the loss stoppes improving
what should I do?
is there any preperation needed to be done before training that I needed to take care of?
Do I need to compute prior boxes in some way?
Do I need to change the learning rate?

here are my results

C:\Anaconda3\python.exe Train300.py
Epoch 1/20000

Epoch 10/20000
Epoch 00010: val_loss did not improve

  • 494s - loss: 2.7386 - val_loss: 2.9425
    Epoch 11/20000
    Epoch 00011: val_loss improved from 2.84980 to 2.81891, saving model to ssd300_weights_epoch-11_loss-2.6524_val_loss-2.8189.h5
  • 496s - loss: 2.6531 - val_loss: 2.8189
    Epoch 12/20000
    Epoch 00012: val_loss improved from 2.81891 to 2.51415, saving model to ssd300_weights_epoch-12_loss-2.5794_val_loss-2.5141.h5
  • 495s - loss: 2.5780 - val_loss: 2.5141
    Epoch 13/20000
    Epoch 00013: val_loss did not improve
  • 494s - loss: 2.5007 - val_loss: 2.6986
    Epoch 14/20000
    Epoch 00014: val_loss did not improve
  • 494s - loss: 2.4322 - val_loss: 2.6762
    Epoch 15/20000
    Epoch 00015: val_loss did not improve
  • 495s - loss: 2.3674 - val_loss: 2.7229
    Epoch 16/20000
    Epoch 00016: val_loss improved from 2.51415 to 2.42342, saving model to ssd300_weights_epoch-> Epoch 00060: val_loss did not improve
  • 499s - loss: 1.2097 - val_loss: 2.1573
    ...
    ...
    Epoch 61/20000
    Epoch 00061: val_loss did not improve
  • 498s - loss: 1.2003 - val_loss: 2.1810
    Epoch 62/20000
    Epoch 65/20000
    Epoch 00065: val_loss did not improve
  • 498s - loss: 1.1685 - val_loss: 1.9278
    Epoch 66/20000
    Epoch 00066: val_loss did not improve
  • 499s - loss: 1.1635 - val_loss: 2.0360
    Epoch 67/20000
    Epoch 00067: val_loss did not improve
  • 498s - loss: 1.1503 - val_loss: 1.7000
    Epoch 68/20000
    Epoch 00068: val_loss did not improve
  • 498s - loss: 1.1491 - val_loss: 2.0532
    Epoch 69/20000
    Epoch 00069: val_loss did not improve
  • 498s - loss: 1.1379 - val_loss: 1.8849

Model saved under learning/ssd300.h5

here training stopped.

The loss seems to improve, but val_loss not.

Also question, how many epocs do you need for training PASCAL VOC, you mentioned that you need 120000 , but in the weights file it is written as iterations is this the same?

Thanks in advance
M.

from ssd_keras.

pierluigiferrari avatar pierluigiferrari commented on May 24, 2024

Also question, how many epocs do you need for training PASCAL VOC, you mentioned that you need 120000 , but in the weights file it is written as iterations is this the same?

When the word "iterations" is used anywhere in this repo, it means weight update iterations, so iterations and epochs are not the same. One iteration (or training step) is one forward and backward pass during training, resulting in one weight update. One epoch consists of a series of training steps, usually the number of training steps needed to iterate once over the entire dataset at the used batch size. The notebook says that the original models were trained on Pascal VOC for 120,000 training steps (or more, depending on the model), not 120,000 epochs.

what should I do?
is there any preperation needed to be done before training that I needed to take care of?
Do I need to compute prior boxes in some way?
Do I need to change the learning rate?

It looks like the model is over-fitting. The training loss keeps decreasing, but the validation loss stagnates.

Pull the latest master. I realized that I forgot to add the default L2 regularization that the original Caffe models use and added that a couple of commits ago. This might help prevent the overfitting.

Also, try using data augmentation a lot more aggressively. The data augmentation settings that are preset in the training notebook are not meant to be the ideal settings. You will have to experiment with data augmentation yourself. For example, try out the random scaling and translation options, which will result in much more variety in the dataset in combination with the preset random padding and random cropping options.

I also recommend you to check out some machine learning theory material. There are a bunch of good online books and courses out there. This will help you identify e.g. when over-fitting is happening.

from ssd_keras.

stale avatar stale commented on May 24, 2024

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

from ssd_keras.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.