Hi，I want to use ssd7 to train my own dataset,but my images are 1200 1600,I wonder

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

about image size about ssd_keras HOT 5 CLOSED

pierluigiferrari commented on May 24, 2024

about image size

from ssd_keras.

Comments (5)

pierluigiferrari commented on May 24, 2024

There isn't one correct answer given the information you provided, but here are a few points to consider:

The larger the input image size, the more computations have to be performed for each forward and backward pass. Four times as many image pixels mean four times as many arithmetic operations, and that in turn will possibly end up taking roughly four times the amount of time per training step (or less depending on the number of CUDA cores in your GPU). The point is that your training might take significantly longer with a larger image size, which leads to the second consideration:
You should think about what image resolution you need in order to detect whatever objects you want to detect. Can the model still learn to detect your objects equally well if you reduce their resolution drastically? What resolution is enough? How complex are the objects you're trying to detect? Telling different dog breeds apart might require a higher resolution than detecting cars. Are the objects in your images rather small or rather large or a mix of both? I would choose the lowest resolution that still allows the model to learn the task at hand sufficiently well. It's a trade-off between how fast the model will be and how good its predictions can get. But you also need to
Consider the receptive fields of the predictor layers. If you increase the input image size while keeping the network architecture the same, then the maximum relative detectable object size decreases, because the absolute receptive fields of the predictor layers remain constant while the number of pixels that constitute the same object instance increases. So using a large image size without increasing the receptive fields of the predictor layers (e.g. by making the model deeper) might also lead to worse performance. But it all depends on what the relative sizes are of the objects you're trying to detect. If you need to detect large objects in high-resolution images, you might have to make the network deeper, use larger conv filters, or pool more aggressively to increase the receptive fields of the predictor layers.

from ssd_keras.

licheng1995 commented on May 24, 2024

Thank u

from ssd_keras.

MBoaretto25 commented on May 24, 2024

@pierluigiferrari on item 3, when you refer to "without increasing the receptive fields of the predictor layers" are you saying that the number of Convolutional predictor layers must be increased or the number of Convolutional layers on the base network should be increased?

from ssd_keras.

pierluigiferrari commented on May 24, 2024

@MBoaretto25 the number of conv layers of the base network. What matters is that the receptive fields of the predictor layers are large enough so that they can "see" the entire objects you're trying to detect. If your input size is 300x300 and you want to be able to detect objects that occupy almost the entire image, then at least one of your predictor layers should have a receptive field of at least 300x300. The SSD300 architecture caters exactly to this need, because the predictor layer that is attached to conv9_2 has a receptive field of 300x300. Now, if your input size is 1200x1600 and you want to detect objects that occupy almost the entire image, you're likely not going to get very good results with SSD300 because the largest receptive field of any predictor layer is still 300x300, which means that even the predictor layer with the largest receptive field will only ever see a fraction of such a large object, and therefore only a fraction of all the relevant information. You would like to adjust the receptive fields of some or (most likely) all of your predictor layers to make sure that they are suitable for the sizes of the objects in your images. By "sizes of the objects" I mean the number of pixels the objects occupy in the image. How you want to ensure suitable receptive field sizes is up you. You could pool more, you could use strided convolutions, maybe you want to add additional conv layers and additional pooling layers, or whatever else you could do. For example, if you introduce one additional 2x2 pooling layer, or increase the pool size of one existing pooling layer to 4x4, or make one of the convolutions strided with a stride of 2x2, then the receptive fields of all subsequent layers doubles in both spatial dimensions. In general, the architecture should be carefully tuned to the requirements of the data.

from ssd_keras.

MBoaretto25 commented on May 24, 2024

@pierluigiferrari got it, thanks for the quick and detailed answer. I understand now, how the architecture depends on the image/object size.

Thanks for your great work.

from ssd_keras.

about image size about ssd_keras HOT 5 CLOSED

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent