Inference times are often expressed as "X + Y", in which X is time taken

Inference Time Explaination about detectron HOT 3 CLOSED

facebookresearch commented on June 3, 2024 3

Inference Time Explaination

from detectron.

Comments (3)

rbgirshick commented on June 3, 2024 2

I see the confusion. Yes, the total time is additive as in X plus Y.

When the --multi-gpu-testing flag is used with {train,test}_net.py inference happens on the dataset in a map-reduce way; the dataset is partitioned into NUM_GPUS subsets and they are processed in parallel. Inference on each individual image is always run on a single GPU.

from detectron.

rbgirshick commented on June 3, 2024

The explanation is correct; the "Y" time is indeed unoptimized CPU code. The fact that it's often so small is why it's left unoptimized :). The main point is that when considering how fast a model is, we can take the timing to be essentially just X because Y can be made much smaller with some engineering effort (e.g., the Y for Mask R-CNN is mostly time spent upsampling 100 predicted masks, one at a time, not in parallel; this could be replaced with a parallelized GPU implementation and take almost no time at all).

from detectron.

beetleskin commented on June 3, 2024

So, if I got this right, the total inference time is always X + Y, i.e. some parts of the inference is run on GPU, some on CPU? From the explanation I thought X is inference time on the GPU and Y is inference time on the CPU, i.e. the same algorithm on different hardware.. But I guess the "+" expresses exactly that :)

Does the inference time also relate to the hardware of

8 NVIDIA Tesla P100 GPU

, run in parallel?

from detectron.

Recommend Projects

Inference Time Explaination about detectron HOT 3 CLOSED

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent