Comments (5)
There isn't one correct answer given the information you provided, but here are a few points to consider:
- The larger the input image size, the more computations have to be performed for each forward and backward pass. Four times as many image pixels mean four times as many arithmetic operations, and that in turn will possibly end up taking roughly four times the amount of time per training step (or less depending on the number of CUDA cores in your GPU). The point is that your training might take significantly longer with a larger image size, which leads to the second consideration:
- You should think about what image resolution you need in order to detect whatever objects you want to detect. Can the model still learn to detect your objects equally well if you reduce their resolution drastically? What resolution is enough? How complex are the objects you're trying to detect? Telling different dog breeds apart might require a higher resolution than detecting cars. Are the objects in your images rather small or rather large or a mix of both? I would choose the lowest resolution that still allows the model to learn the task at hand sufficiently well. It's a trade-off between how fast the model will be and how good its predictions can get. But you also need to
- Consider the receptive fields of the predictor layers. If you increase the input image size while keeping the network architecture the same, then the maximum relative detectable object size decreases, because the absolute receptive fields of the predictor layers remain constant while the number of pixels that constitute the same object instance increases. So using a large image size without increasing the receptive fields of the predictor layers (e.g. by making the model deeper) might also lead to worse performance. But it all depends on what the relative sizes are of the objects you're trying to detect. If you need to detect large objects in high-resolution images, you might have to make the network deeper, use larger conv filters, or pool more aggressively to increase the receptive fields of the predictor layers.
from ssd_keras.
Thank u
from ssd_keras.
@pierluigiferrari on item 3, when you refer to "without increasing the receptive fields of the predictor layers" are you saying that the number of Convolutional predictor layers must be increased or the number of Convolutional layers on the base network should be increased?
from ssd_keras.
@MBoaretto25 the number of conv layers of the base network. What matters is that the receptive fields of the predictor layers are large enough so that they can "see" the entire objects you're trying to detect. If your input size is 300x300 and you want to be able to detect objects that occupy almost the entire image, then at least one of your predictor layers should have a receptive field of at least 300x300. The SSD300 architecture caters exactly to this need, because the predictor layer that is attached to conv9_2
has a receptive field of 300x300. Now, if your input size is 1200x1600 and you want to detect objects that occupy almost the entire image, you're likely not going to get very good results with SSD300 because the largest receptive field of any predictor layer is still 300x300, which means that even the predictor layer with the largest receptive field will only ever see a fraction of such a large object, and therefore only a fraction of all the relevant information. You would like to adjust the receptive fields of some or (most likely) all of your predictor layers to make sure that they are suitable for the sizes of the objects in your images. By "sizes of the objects" I mean the number of pixels the objects occupy in the image. How you want to ensure suitable receptive field sizes is up you. You could pool more, you could use strided convolutions, maybe you want to add additional conv layers and additional pooling layers, or whatever else you could do. For example, if you introduce one additional 2x2 pooling layer, or increase the pool size of one existing pooling layer to 4x4, or make one of the convolutions strided with a stride of 2x2, then the receptive fields of all subsequent layers doubles in both spatial dimensions. In general, the architecture should be carefully tuned to the requirements of the data.
from ssd_keras.
@pierluigiferrari got it, thanks for the quick and detailed answer. I understand now, how the architecture depends on the image/object size.
Thanks for your great work.
from ssd_keras.
Related Issues (20)
- InvalidArgumentError when compiling model with ssd_loss HOT 1
- WARNING:tensorflow:Gradients do not exist for variables ['conv4_3/bias:0',...] when minimizing the loss. HOT 1
- "Invalid argument: Index out of range using input dim 0; input has only 0 dims" during ssd300 model training
- load weight
- ValueError: Error when checking input: expected input_3 to have 4 dimensions, but got array with shape
- While training I got training terminate error . Epoch 00001: LearningRateScheduler setting learning rate to 0.001. 1/10 [==>...........................] - ETA: 4:08 - loss: nanBatch 0: Invalid loss, terminating training Epoch 00001: saving model to ssd512_URPC2018_epoch-01.h5 Process finished with exit code 0
- ValueError: An operation has `None` for gradient. Please make sure that all of your ops have a gradient defined (i.e. are differentiable). Common ops without gradient: K.argmax, K.round, K.eval.
- ValueError: Layer model expects 1 input(s), but it received 2 input tensors. Inputs received: [<tf.Tensor 'IteratorGetNext:0' shape=(None, None, None, None) dtype=uint8>, <tf.Tensor 'IteratorGetNext:1' shape=(None, None, None) dtype=float32>] HOT 23
- Parameters of the model HOT 1
- Bouding boxes predictions are concentrated in left top corner HOT 1
- Ambiguous dimension while trying to load weights.
- Urgent!! Invalid Loss HOT 4
- What are the requirements to run this code?. HOT 1
- Pascal VOC Training Person Detection
- The device being used is CPU while capturing image from webcam. How do I use my GPU for processing instead?
- Label error during Coco Training HOT 1
- TypeError: Expected any non-tensor type, got a tensor instead.
- Changes make the code work in 2023 HOT 2
- custom SSD300 model
- error while training with custom dataset in COCO format
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ssd_keras.