Comments (7)
According to your label examples above, your input format for the parser is set incorrectly. You're setting it to
input_format=['image_name', 'class_id', 'xmin', 'xmax', 'ymin', 'ymax']
when it should be
input_format=['image_name', 'class_id', 'xmin', 'ymin', 'xmax', 'ymax']
But that's probably not the reason for the exploding loss. Did you change any other parameters in the notebook? It'd be easiest if I could take a look at your notebook.
from ssd_keras.
Hi
Thanks
I changed the input format but as you assumed, it did not help.
Then I tried changing batch size, because my data-set is very small, so I set it to 5
now it seems like it is starting to work.
The new problem is that the training does not complete the number of ephoc's that I have set
Epoch 3/70
1/2 [==============>...............] - ETA: 50s - loss: 22575830.0000Epoch 00003: val_loss improved from 164947.45486 to 109449.64497, saving model to ssd300_weights_epoch-03_loss-14560252.6667_val_loss-109449.6450.h5
2/2 [==============================] - 108s 54s/step - loss: 13892287.8889 - val_loss: 109449.6450
Epoch 4/70
1/2 [==============>...............] - ETA: 47s - loss: 138616.1562Epoch 00004: val_loss improved from 109449.64497 to 104.74940, saving model to ssd300_weights_epoch-04_loss-77501.8083_val_loss-104.7494.h5
2/2 [==============================] - 121s 60s/step - loss: 72408.9460 - val_loss: 104.7494
Epoch 5/70
1/2 [==============>...............] - ETA: 41s - loss: 100.7861Epoch 00005: val_loss improved from 104.74940 to 15.58130, saving model to ssd300_weights_epoch-05_loss-63.8355_val_loss-15.5813.h5
2/2 [==============================] - 96s 48s/step - loss: 60.7563 - val_loss: 15.5813
Epoch 12/70
1/2 [==============>...............] - ETA: 32s - loss: 8.6975Epoch 00012: val_loss improved from 8.57205 to 7.91139, saving model to ssd300_weights_epoch-12_loss-8.6270_val_loss-7.9114.h5
2/2 [==============================] - 79s 40s/step - loss: 8.6211 - val_loss: 7.9114
Epoch 13/70
1/2 [==============>...............] - ETA: 33s - loss: 9.9627Epoch 00013: val_loss improved from 7.91139 to 7.49004, saving model to ssd300_weights_epoch-13_loss-9.1180_val_loss-7.4900.h5
2/2 [==============================] - 80s 40s/step - loss: 9.0476 - val_loss: 7.4900
Epoch 14/70
1/2 [==============>...............] - ETA: 33s - loss: 9.2483Epoch 00014: val_loss improved from 7.49004 to 7.16202, saving model to ssd300_weights_epoch-14_loss-8.4601_val_loss-7.1620.h5
2/2 [==============================] - 80s 40s/step - loss: 8.3945 - val_loss: 7.1620
Epoch 15/70
1/2 [==============>...............] - ETA: 32s - loss: 6.9160Epoch 00015: val_loss improved from 7.16202 to 6.82837, saving model to ssd300_weights_epoch-15_loss-6.6018_val_loss-6.8284.h5
2/2 [==============================] - 78s 39s/step - loss: 6.5756 - val_loss: 6.8284
Epoch 16/70
1/2 [==============>...............] - ETA: 32s - loss: 7.6039Epoch 00016: val_loss improved from 6.82837 to 6.36522, saving model to ssd300_weights_epoch-16_loss-7.0709_val_loss-6.3652.h5
2/2 [==============================] - 76s 38s/step - loss: 7.0265 - val_loss: 6.3652
Epoch 17/70
1/2 [==============>...............] - ETA: 31s - loss: 6.1941Epoch 00017: val_loss did not improve
2/2 [==============================] - 76s 38s/step - loss: 6.2313 - val_loss: 6.5592
Epoch 18/70
1/2 [==============>...............] - ETA: 31s - loss: 6.9346Epoch 00018: val_loss improved from 6.36522 to 6.09256, saving model to ssd300_weights_epoch-18_loss-6.4488_val_loss-6.0926.h5
2/2 [==============================] - 76s 38s/step - loss: 6.4083 - val_loss: 6.0926
Epoch 19/70
1/2 [==============>...............] - ETA: 32s - loss: 5.7371Epoch 00019: val_loss did not improve
2/2 [==============================] - 77s 38s/step - loss: 6.1198 - val_loss: 6.3816
Epoch 20/70
1/2 [==============>...............] - ETA: 32s - loss: 5.8425Epoch 00020: val_loss did not improve
2/2 [==============================] - 78s 39s/step - loss: 6.4159 - val_loss: 6.3717Model saved under learning/ssd300.h5
Weights also saved separately under learning/ssd300_weights.h5
It stops and saves the model and weights.
Is this because of the EarlyStopping callback?
If so what do you recommend, is the learning always expected to go down, cant it go up, or down, or stay the same for a while?
By the way how do you generate the labes for images? what app do you use to draw the boxes and set classes?
I have been working with YOLO, and used YOLO_MARK form https://github.com/AlexeyAB/Yolo_mark
now I am converting the format to SSD format.
Thanks again
M.
from ssd_keras.
the notebook:
ssd300_training.zip
the images:
ssd.zip
from ssd_keras.
There are probably many tools for annotating images for object detection, but the one I'm using is labelImg
: https://github.com/tzutalin/labelImg
Yes, the training is stopping because of the EarlyStopping
callback. You have to change the settings of the callbacks to suit your needs. You could increase the patience
or even remove the callback entirely. The important thing to understand here is: Since the training performs stochastic gradient descent (with some fancy modifications) on mini-batches, the resulting gradients will not point in the direction of the actual steepest rate of descent over the whole dataset. The gradient will decrease the loss on average, but any individual training step does not necessarily change the weights into the right direction. In some training steps, the weights can actually get changed for the worse. If there are many training steps in an epoch though, the loss usually decreases over the epoch as long as the model hasn't converged yet. If there are only very few training steps in an epoch though, as in your example, then the probability increases that a given epoch might not improve the loss. Your epochs only consist of two training steps, that's very little. Either increase the number of steps in an epoch, or increase the patience in the EarlyStopping
callback, or remove the callback altogether.
from ssd_keras.
Hi
I have tried training ssd 300 using pascal voc images, and another set of my images combinded
the training fails. the loss stoppes improving
what should I do?
is there any preperation needed to be done before training that I needed to take care of?
Do I need to compute prior boxes in some way?
Do I need to change the learning rate?
here are my results
C:\Anaconda3\python.exe Train300.py
Epoch 1/20000Epoch 10/20000
Epoch 00010: val_loss did not improve
- 494s - loss: 2.7386 - val_loss: 2.9425
Epoch 11/20000
Epoch 00011: val_loss improved from 2.84980 to 2.81891, saving model to ssd300_weights_epoch-11_loss-2.6524_val_loss-2.8189.h5- 496s - loss: 2.6531 - val_loss: 2.8189
Epoch 12/20000
Epoch 00012: val_loss improved from 2.81891 to 2.51415, saving model to ssd300_weights_epoch-12_loss-2.5794_val_loss-2.5141.h5- 495s - loss: 2.5780 - val_loss: 2.5141
Epoch 13/20000
Epoch 00013: val_loss did not improve- 494s - loss: 2.5007 - val_loss: 2.6986
Epoch 14/20000
Epoch 00014: val_loss did not improve- 494s - loss: 2.4322 - val_loss: 2.6762
Epoch 15/20000
Epoch 00015: val_loss did not improve- 495s - loss: 2.3674 - val_loss: 2.7229
Epoch 16/20000
Epoch 00016: val_loss improved from 2.51415 to 2.42342, saving model to ssd300_weights_epoch-> Epoch 00060: val_loss did not improve- 499s - loss: 1.2097 - val_loss: 2.1573
...
...
Epoch 61/20000
Epoch 00061: val_loss did not improve- 498s - loss: 1.2003 - val_loss: 2.1810
Epoch 62/20000
Epoch 65/20000
Epoch 00065: val_loss did not improve- 498s - loss: 1.1685 - val_loss: 1.9278
Epoch 66/20000
Epoch 00066: val_loss did not improve- 499s - loss: 1.1635 - val_loss: 2.0360
Epoch 67/20000
Epoch 00067: val_loss did not improve- 498s - loss: 1.1503 - val_loss: 1.7000
Epoch 68/20000
Epoch 00068: val_loss did not improve- 498s - loss: 1.1491 - val_loss: 2.0532
Epoch 69/20000
Epoch 00069: val_loss did not improve- 498s - loss: 1.1379 - val_loss: 1.8849
Model saved under learning/ssd300.h5
here training stopped.
The loss seems to improve, but val_loss not.
Also question, how many epocs do you need for training PASCAL VOC, you mentioned that you need 120000 , but in the weights file it is written as iterations is this the same?
Thanks in advance
M.
from ssd_keras.
Also question, how many epocs do you need for training PASCAL VOC, you mentioned that you need 120000 , but in the weights file it is written as iterations is this the same?
When the word "iterations" is used anywhere in this repo, it means weight update iterations, so iterations and epochs are not the same. One iteration (or training step) is one forward and backward pass during training, resulting in one weight update. One epoch consists of a series of training steps, usually the number of training steps needed to iterate once over the entire dataset at the used batch size. The notebook says that the original models were trained on Pascal VOC for 120,000 training steps (or more, depending on the model), not 120,000 epochs.
what should I do?
is there any preperation needed to be done before training that I needed to take care of?
Do I need to compute prior boxes in some way?
Do I need to change the learning rate?
It looks like the model is over-fitting. The training loss keeps decreasing, but the validation loss stagnates.
Pull the latest master. I realized that I forgot to add the default L2 regularization that the original Caffe models use and added that a couple of commits ago. This might help prevent the overfitting.
Also, try using data augmentation a lot more aggressively. The data augmentation settings that are preset in the training notebook are not meant to be the ideal settings. You will have to experiment with data augmentation yourself. For example, try out the random scaling and translation options, which will result in much more variety in the dataset in combination with the preset random padding and random cropping options.
I also recommend you to check out some machine learning theory material. There are a bunch of good online books and courses out there. This will help you identify e.g. when over-fitting is happening.
from ssd_keras.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
from ssd_keras.
Related Issues (20)
- InvalidArgumentError when compiling model with ssd_loss HOT 1
- WARNING:tensorflow:Gradients do not exist for variables ['conv4_3/bias:0',...] when minimizing the loss. HOT 1
- "Invalid argument: Index out of range using input dim 0; input has only 0 dims" during ssd300 model training
- load weight
- ValueError: Error when checking input: expected input_3 to have 4 dimensions, but got array with shape
- While training I got training terminate error . Epoch 00001: LearningRateScheduler setting learning rate to 0.001. 1/10 [==>...........................] - ETA: 4:08 - loss: nanBatch 0: Invalid loss, terminating training Epoch 00001: saving model to ssd512_URPC2018_epoch-01.h5 Process finished with exit code 0
- ValueError: An operation has `None` for gradient. Please make sure that all of your ops have a gradient defined (i.e. are differentiable). Common ops without gradient: K.argmax, K.round, K.eval.
- ValueError: Layer model expects 1 input(s), but it received 2 input tensors. Inputs received: [<tf.Tensor 'IteratorGetNext:0' shape=(None, None, None, None) dtype=uint8>, <tf.Tensor 'IteratorGetNext:1' shape=(None, None, None) dtype=float32>] HOT 23
- Parameters of the model HOT 1
- Bouding boxes predictions are concentrated in left top corner HOT 1
- Ambiguous dimension while trying to load weights.
- Urgent!! Invalid Loss HOT 4
- What are the requirements to run this code?. HOT 1
- Pascal VOC Training Person Detection
- The device being used is CPU while capturing image from webcam. How do I use my GPU for processing instead?
- Label error during Coco Training HOT 1
- TypeError: Expected any non-tensor type, got a tensor instead.
- Changes make the code work in 2023 HOT 2
- custom SSD300 model
- error while training with custom dataset in COCO format
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ssd_keras.