Git Product home page Git Product logo

Comments (4)

pierluigiferrari avatar pierluigiferrari commented on May 25, 2024

I don't think I fully understand the questions, but here are a few points regarding what I believe is at least part of what you're asking:

  1. The AnchorBoxes layer does not participate at all in the training of the model. Its only purpose is to output the anchor box coordinates and variances so that decode_y() or decode_y2() can decode the raw model prediction tensor without needing any other information. This is also why the model output tensor's last axis has length n_classes + 4 + 4 + 4. The last 8 elements of the last axis are just the four anchor box coordinates and the four variances for each box and as mentioned before, they are not relevant to the training, but only to decode the model output at inference time. I also recommend reading the documentation and inline comments of AnchorBoxes, which should help understand why the model output tensor has this particular shape.
  2. The variances are just scaling factors for the ground truth box coordinates. They allow you to scale the individual coordinate offsets independently. I recommend you take a look at the comments at the bottom of the definition of encode_y() to better understand what they do. The coordinate offsets are being divided by the variances. For example, the variances chosen in the original SSD300 are [0.1, 0.1, 0.2, 0.2], meaning that the (cx, cy) values are up-scaled by a factor of 10 and the (w, h) values are upscaled by a factor of 5. Among other things this means that the box center coordinates are weighted stronger than the width/height values. The idea behind trying different values for these variances is simply to see if the model learns better with a certain set of values.

from ssd_keras.

luckyuho avatar luckyuho commented on May 25, 2024

oh my god,
thank you a lot, pierluigiferrari!!!
I think I finally understand your comments!!
It was my fault, I totally misunderstood it.
I totally suddenly see the light.
Thanks for your help!!!
but...
there is one thing I can not figure out is why you encode width and height by np.log?
np.log is asymmetric which means at the same iou but anchor boxes bigger or smaller than ground truth would get different loss?

from ssd_keras.

pierluigiferrari avatar pierluigiferrari commented on May 25, 2024

The idea is this:

We have four scalar anchor box coordinates, cx, cy, w, h. For each of these four coordinates, the desired prediction for a given ground truth box could be either larger or smaller than the respective anchor box coordinate. For each of these four coordinates, we want the model to predict positive offsets in one direction and negative offsets in the other direction. And we want the predicted offsets to be relative to the respective absolute coordinate values of the anchor box. The chosen formula for the width and height fulfills both of these criteria:

ln(g/d) = ln(g) - ln(d) > 0 if g > d, < 0 if g < d and ln(a*g) - ln(a*d) = ln(g) - ln(d) for any positive number a.

Whether or not this is the best transformation of the target coordinates for the model to learn optimally is a different story, but it is at least one possible transformation that works well.

from ssd_keras.

luckyuho avatar luckyuho commented on May 25, 2024

Oh oh oh,
I think I have learned a lot and known something.
Really very thanks you!!!

from ssd_keras.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.