Comments (5)
Does your processing time include image preprocessing? (Do you get a different result with trt_ssd_async.py?)
from tensorrt_demos.
time is only for trt_ssd.detect(img, 0.3). this object is included preprocessing step, but for simplicity I concatenate same image for 3 times before feed to the np.copyto(self.host_inputs[0], img_resized.ravel()), like this :
img_pred = np.concatenate([img_pred,img_pred,img_pred])
I don't test with trt_ssd_async.py.
I do following steps:
1- change input size from (1,3,300,300) to (3,3,300,300)
2- change to builder.max_batch_size = 3
4- change self.context.execute_async(
batch_size=3,
bindings=self.bindings,
stream_handle=self.stream.handle)
Notice that when I comment self.stream.synchronize() in the ssd.py, I get first few result with 0.002 sec and then the time is growing reach to 0.06, and then the line self.stream.synchronize() remain uncomment, I get 0.06 for all result, why?
in my opinion the self.stream.synchronize() likely be asynchronous, not synchronize if possible.
from tensorrt_demos.
Instead of timing the whole trt_ssd.detect() function, I think it makes more sense for you to only time the "cuda.memcpy_xxx"s, "context.execute_async" and "cuda.stream.synchronize" in that function.
By the way, the "self.stream.synchronize" call cannot be commented out. Otherwise, you cannot be sure GPU has finished processing the image.
from tensorrt_demos.
This is my TensorRT OCR custom model when I use batch_size = 1, I get 0.02 sec and when I use batch_size= 10, I get 0.2 sec, which means, this batch_size input images running as serializing, not parallel, why?
Batch_size = 1
TensorRT All Time: 0.02888178825378418
cuda.memcpy_htod_async: cuda_inputs: 0.00016927719116210938
self.context.execute_async: 0.0031588077545166016
cuda.memcpy_dtoh_async : host_outputs: 9.1552734375e-05
stream.synchronize(): 0.018606901168823242
Batch_size = 10
TensorRT All Time: 0.22867369651794434
cuda.memcpy_htod_async: cuda_inputs: 0.00018334388732910156
self.context.execute_async: 0.0013976097106933594
cuda.memcpy_dtoh_async : host_outputs: 9.894371032714844e-05
stream.synchronize(): 0.20677971839904785
from tensorrt_demos.
Duplicated issue: #106
from tensorrt_demos.
Related Issues (20)
- Calib.table not created Deepstream
- Error when install install_pycuda.sh file HOT 2
- Any plan about SwinIR? HOT 1
- Batch Processed Image Inference HOT 2
- Failure to Build ONNX from Custom Yolo HOT 2
- Error when running on separate thread HOT 2
- wrong many bounding box and wrong predict untrained class id
- error happening when sh ./install_pycuda.sh HOT 3
- Integration With AGX Orin HOT 4
- yolov3 onnx format with low opset, suitable for tensorrt 6.5 HOT 1
- Trouble with linker on yolo plugin x86 HOT 2
- Xavier onnx to TensorRT error HOT 1
- How to prevent tensorrt from fusing the final layers when trying to convert tiny yolov3 to int8? HOT 2
- No output when running my own yolov4
- Failed to build the TensorRT engine --int8
- How do I free up memory when I want to switch models
- Cudnn initialization error: I already installed Cudnn but Why?
- Is there any way to avoid using the reshape operation after inference?
- DRIVE AGX Orin
- Facing Issue with yolo to onnx conversion with pre-trained yolov3-tiny model.
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tensorrt_demos.