Comments (3)
This is not expected behavior. I have to think about this issue for a bit, but the batch size should not affect the loss structurally, because at the end of the loss function, the loss is divided proportionally to the number of items that go into it, n_positive
(n_negative
is a multiple of n_positive
).
The loss value has no absolute meaning of course, only a relative meaning, and there's generally no point in changing the batch size in the middle of training because keeping the hyper parameters constant is exactly what you want to do to see how the loss improves relatively over time, but the behavior you describe could hint at a possible bug. For some loss functions, the loss value can depend on the batch size, but the particular loss function used here should be independent of the batch size for the reason explained above.
Did this happen within the same IPython session or with two separate, fresh sessions, where you loaded the weights of the last session into the model of the new session and then changed the batch size?
from ssd_keras.
This occurred after stopping training in a python script and loading the weights files for the new batch_size from the previous training.
from ssd_keras.
I was able to reproduce and track down the issue. Keras seems to display the average of the loss across the batch (sum of the loss for all batch items divided by batch size) rather than the total loss for the batch (sum of the loss for all batch items). Since the SSD loss function divides by the number of positive ground truth boxes in the batch, which is proportional to the batch size on average, this produces the effect you are seeing. Effectively, the total summed loss over the batch is being divided by the batch size (or a number proportional to the batch size) twice. This is why the loss is inversely proportional to the batch size. I tested this by commenting out the division by tf.maximum(1.0, n_positive)
in the penultimate line of keras_ssd_loss.py
. Doing this makes the loss independent of the batch size (and it makes the absolute loss much larger, obviously).
This means that this behavior has no implications on the performance of the training. I hadn't ever thought about this behavior before, but it is now expected behavior.
from ssd_keras.
Related Issues (20)
- InvalidArgumentError when compiling model with ssd_loss HOT 1
- WARNING:tensorflow:Gradients do not exist for variables ['conv4_3/bias:0',...] when minimizing the loss. HOT 1
- "Invalid argument: Index out of range using input dim 0; input has only 0 dims" during ssd300 model training
- load weight
- ValueError: Error when checking input: expected input_3 to have 4 dimensions, but got array with shape
- While training I got training terminate error . Epoch 00001: LearningRateScheduler setting learning rate to 0.001. 1/10 [==>...........................] - ETA: 4:08 - loss: nanBatch 0: Invalid loss, terminating training Epoch 00001: saving model to ssd512_URPC2018_epoch-01.h5 Process finished with exit code 0
- ValueError: An operation has `None` for gradient. Please make sure that all of your ops have a gradient defined (i.e. are differentiable). Common ops without gradient: K.argmax, K.round, K.eval.
- ValueError: Layer model expects 1 input(s), but it received 2 input tensors. Inputs received: [<tf.Tensor 'IteratorGetNext:0' shape=(None, None, None, None) dtype=uint8>, <tf.Tensor 'IteratorGetNext:1' shape=(None, None, None) dtype=float32>] HOT 23
- Parameters of the model HOT 1
- Bouding boxes predictions are concentrated in left top corner HOT 1
- Ambiguous dimension while trying to load weights.
- Urgent!! Invalid Loss HOT 4
- What are the requirements to run this code?. HOT 1
- Pascal VOC Training Person Detection
- The device being used is CPU while capturing image from webcam. How do I use my GPU for processing instead?
- Label error during Coco Training HOT 1
- TypeError: Expected any non-tensor type, got a tensor instead.
- Changes make the code work in 2023 HOT 2
- custom SSD300 model
- error while training with custom dataset in COCO format
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ssd_keras.