Git Product home page Git Product logo

Comments (5)

pierluigiferrari avatar pierluigiferrari commented on May 24, 2024

There isn't one correct answer given the information you provided, but here are a few points to consider:

  1. The larger the input image size, the more computations have to be performed for each forward and backward pass. Four times as many image pixels mean four times as many arithmetic operations, and that in turn will possibly end up taking roughly four times the amount of time per training step (or less depending on the number of CUDA cores in your GPU). The point is that your training might take significantly longer with a larger image size, which leads to the second consideration:
  2. You should think about what image resolution you need in order to detect whatever objects you want to detect. Can the model still learn to detect your objects equally well if you reduce their resolution drastically? What resolution is enough? How complex are the objects you're trying to detect? Telling different dog breeds apart might require a higher resolution than detecting cars. Are the objects in your images rather small or rather large or a mix of both? I would choose the lowest resolution that still allows the model to learn the task at hand sufficiently well. It's a trade-off between how fast the model will be and how good its predictions can get. But you also need to
  3. Consider the receptive fields of the predictor layers. If you increase the input image size while keeping the network architecture the same, then the maximum relative detectable object size decreases, because the absolute receptive fields of the predictor layers remain constant while the number of pixels that constitute the same object instance increases. So using a large image size without increasing the receptive fields of the predictor layers (e.g. by making the model deeper) might also lead to worse performance. But it all depends on what the relative sizes are of the objects you're trying to detect. If you need to detect large objects in high-resolution images, you might have to make the network deeper, use larger conv filters, or pool more aggressively to increase the receptive fields of the predictor layers.

from ssd_keras.

licheng1995 avatar licheng1995 commented on May 24, 2024

Thank u

from ssd_keras.

MBoaretto25 avatar MBoaretto25 commented on May 24, 2024

@pierluigiferrari on item 3, when you refer to "without increasing the receptive fields of the predictor layers" are you saying that the number of Convolutional predictor layers must be increased or the number of Convolutional layers on the base network should be increased?

from ssd_keras.

pierluigiferrari avatar pierluigiferrari commented on May 24, 2024

@MBoaretto25 the number of conv layers of the base network. What matters is that the receptive fields of the predictor layers are large enough so that they can "see" the entire objects you're trying to detect. If your input size is 300x300 and you want to be able to detect objects that occupy almost the entire image, then at least one of your predictor layers should have a receptive field of at least 300x300. The SSD300 architecture caters exactly to this need, because the predictor layer that is attached to conv9_2 has a receptive field of 300x300. Now, if your input size is 1200x1600 and you want to detect objects that occupy almost the entire image, you're likely not going to get very good results with SSD300 because the largest receptive field of any predictor layer is still 300x300, which means that even the predictor layer with the largest receptive field will only ever see a fraction of such a large object, and therefore only a fraction of all the relevant information. You would like to adjust the receptive fields of some or (most likely) all of your predictor layers to make sure that they are suitable for the sizes of the objects in your images. By "sizes of the objects" I mean the number of pixels the objects occupy in the image. How you want to ensure suitable receptive field sizes is up you. You could pool more, you could use strided convolutions, maybe you want to add additional conv layers and additional pooling layers, or whatever else you could do. For example, if you introduce one additional 2x2 pooling layer, or increase the pool size of one existing pooling layer to 4x4, or make one of the convolutions strided with a stride of 2x2, then the receptive fields of all subsequent layers doubles in both spatial dimensions. In general, the architecture should be carefully tuned to the requirements of the data.

from ssd_keras.

MBoaretto25 avatar MBoaretto25 commented on May 24, 2024

@pierluigiferrari got it, thanks for the quick and detailed answer. I understand now, how the architecture depends on the image/object size.

Thanks for your great work.

from ssd_keras.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.