Git Product home page Git Product logo

Comments (5)

jkjung-avt avatar jkjung-avt commented on May 14, 2024

Does your processing time include image preprocessing? (Do you get a different result with trt_ssd_async.py?)

from tensorrt_demos.

PythonImageDeveloper avatar PythonImageDeveloper commented on May 14, 2024

time is only for trt_ssd.detect(img, 0.3). this object is included preprocessing step, but for simplicity I concatenate same image for 3 times before feed to the np.copyto(self.host_inputs[0], img_resized.ravel()), like this :
img_pred = np.concatenate([img_pred,img_pred,img_pred])
I don't test with trt_ssd_async.py.

I do following steps:
1- change input size from (1,3,300,300) to (3,3,300,300)
2- change to builder.max_batch_size = 3
4- change self.context.execute_async(
batch_size=3,
bindings=self.bindings,
stream_handle=self.stream.handle)

Notice that when I comment self.stream.synchronize() in the ssd.py, I get first few result with 0.002 sec and then the time is growing reach to 0.06, and then the line self.stream.synchronize() remain uncomment, I get 0.06 for all result, why?
in my opinion the self.stream.synchronize() likely be asynchronous, not synchronize if possible.

from tensorrt_demos.

jkjung-avt avatar jkjung-avt commented on May 14, 2024

Instead of timing the whole trt_ssd.detect() function, I think it makes more sense for you to only time the "cuda.memcpy_xxx"s, "context.execute_async" and "cuda.stream.synchronize" in that function.

By the way, the "self.stream.synchronize" call cannot be commented out. Otherwise, you cannot be sure GPU has finished processing the image.

from tensorrt_demos.

PythonImageDeveloper avatar PythonImageDeveloper commented on May 14, 2024

This is my TensorRT OCR custom model when I use batch_size = 1, I get 0.02 sec and when I use batch_size= 10, I get 0.2 sec, which means, this batch_size input images running as serializing, not parallel, why?

Batch_size = 1

TensorRT All Time: 0.02888178825378418
cuda.memcpy_htod_async: cuda_inputs: 0.00016927719116210938
self.context.execute_async: 0.0031588077545166016
cuda.memcpy_dtoh_async : host_outputs: 9.1552734375e-05
stream.synchronize(): 0.018606901168823242

Batch_size = 10

TensorRT All Time: 0.22867369651794434
cuda.memcpy_htod_async: cuda_inputs: 0.00018334388732910156
self.context.execute_async: 0.0013976097106933594
cuda.memcpy_dtoh_async : host_outputs: 9.894371032714844e-05
stream.synchronize(): 0.20677971839904785

from tensorrt_demos.

jkjung-avt avatar jkjung-avt commented on May 14, 2024

Duplicated issue: #106

from tensorrt_demos.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.