Hi pierluigiferrari, thank you very much. after I study "train_ssd7", there are some p

Confused about boxes for localization about ssd_keras HOT 4 CLOSED

pierluigiferrari commented on May 25, 2024

Confused about boxes for localization

from ssd_keras.

Comments (4)

pierluigiferrari commented on May 25, 2024

I don't think I fully understand the questions, but here are a few points regarding what I believe is at least part of what you're asking:

The AnchorBoxes layer does not participate at all in the training of the model. Its only purpose is to output the anchor box coordinates and variances so that decode_y() or decode_y2() can decode the raw model prediction tensor without needing any other information. This is also why the model output tensor's last axis has length n_classes + 4 + 4 + 4. The last 8 elements of the last axis are just the four anchor box coordinates and the four variances for each box and as mentioned before, they are not relevant to the training, but only to decode the model output at inference time. I also recommend reading the documentation and inline comments of AnchorBoxes, which should help understand why the model output tensor has this particular shape.
The variances are just scaling factors for the ground truth box coordinates. They allow you to scale the individual coordinate offsets independently. I recommend you take a look at the comments at the bottom of the definition of encode_y() to better understand what they do. The coordinate offsets are being divided by the variances. For example, the variances chosen in the original SSD300 are [0.1, 0.1, 0.2, 0.2], meaning that the (cx, cy) values are up-scaled by a factor of 10 and the (w, h) values are upscaled by a factor of 5. Among other things this means that the box center coordinates are weighted stronger than the width/height values. The idea behind trying different values for these variances is simply to see if the model learns better with a certain set of values.

from ssd_keras.

luckyuho commented on May 25, 2024

oh my god,
thank you a lot, pierluigiferrari!!!
I think I finally understand your comments!!
It was my fault, I totally misunderstood it.
I totally suddenly see the light.
Thanks for your help!!!
but...
there is one thing I can not figure out is why you encode width and height by np.log?
np.log is asymmetric which means at the same iou but anchor boxes bigger or smaller than ground truth would get different loss?

from ssd_keras.

pierluigiferrari commented on May 25, 2024

The idea is this:

We have four scalar anchor box coordinates, cx, cy, w, h. For each of these four coordinates, the desired prediction for a given ground truth box could be either larger or smaller than the respective anchor box coordinate. For each of these four coordinates, we want the model to predict positive offsets in one direction and negative offsets in the other direction. And we want the predicted offsets to be relative to the respective absolute coordinate values of the anchor box. The chosen formula for the width and height fulfills both of these criteria:

ln(g/d) = ln(g) - ln(d) > 0 if g > d, < 0 if g < d and ln(a*g) - ln(a*d) = ln(g) - ln(d) for any positive number a.

Whether or not this is the best transformation of the target coordinates for the model to learn optimally is a different story, but it is at least one possible transformation that works well.

from ssd_keras.

luckyuho commented on May 25, 2024

Oh oh oh,
I think I have learned a lot and known something.
Really very thanks you!!!

from ssd_keras.

Recommend Projects

Confused about boxes for localization about ssd_keras HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent