I changed the batch size (x2) in the middle of training and the loss cut in half with

This occurred after stopping training in a python and loading the weights files

Loss is inversely proportional to batch_size about ssd_keras HOT 3 CLOSED

pierluigiferrari commented on May 24, 2024

Loss is inversely proportional to batch_size

from ssd_keras.

Comments (3)

pierluigiferrari commented on May 24, 2024

This is not expected behavior. I have to think about this issue for a bit, but the batch size should not affect the loss structurally, because at the end of the loss function, the loss is divided proportionally to the number of items that go into it, n_positive (n_negative is a multiple of n_positive).

The loss value has no absolute meaning of course, only a relative meaning, and there's generally no point in changing the batch size in the middle of training because keeping the hyper parameters constant is exactly what you want to do to see how the loss improves relatively over time, but the behavior you describe could hint at a possible bug. For some loss functions, the loss value can depend on the batch size, but the particular loss function used here should be independent of the batch size for the reason explained above.

Did this happen within the same IPython session or with two separate, fresh sessions, where you loaded the weights of the last session into the model of the new session and then changed the batch size?

from ssd_keras.

lababidi commented on May 24, 2024

This occurred after stopping training in a python script and loading the weights files for the new batch_size from the previous training.

from ssd_keras.

pierluigiferrari commented on May 24, 2024

I was able to reproduce and track down the issue. Keras seems to display the average of the loss across the batch (sum of the loss for all batch items divided by batch size) rather than the total loss for the batch (sum of the loss for all batch items). Since the SSD loss function divides by the number of positive ground truth boxes in the batch, which is proportional to the batch size on average, this produces the effect you are seeing. Effectively, the total summed loss over the batch is being divided by the batch size (or a number proportional to the batch size) twice. This is why the loss is inversely proportional to the batch size. I tested this by commenting out the division by tf.maximum(1.0, n_positive) in the penultimate line of keras_ssd_loss.py. Doing this makes the loss independent of the batch size (and it makes the absolute loss much larger, obviously).

This means that this behavior has no implications on the performance of the training. I hadn't ever thought about this behavior before, but it is now expected behavior.

from ssd_keras.

Recommend Projects

Loss is inversely proportional to batch_size about ssd_keras HOT 3 CLOSED

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent