Hi, now I use multi ort session in one process(invoked by different threads）, but it c

<a target="_blank" rel="noopener noreferrer" href="https://private-user-images.githubusercontent.com

Thank you for your reply. <a class="user-mention notranslate" data-hovercard-type="us

use multi ort session in one process, can not improve throughput about onnxruntime HOT 5 OPEN

ccccjunkang commented on September 21, 2024

use multi ort session in one process, can not improve throughput

from onnxruntime.

Comments (5)

ccccjunkang commented on September 21, 2024

from onnxruntime.

yuslepukhin commented on September 21, 2024

Each session creates a ThreadPool that is optimized to run on all cores. That TP is used for intra op parallelization. Meaning many CPU kernels attempt to distribute the work across all the cores for a give session.

Multiple sessions would create lots of contention and context switches.

You may want experiment with your model and set different number of threads for IntraOp thread pool.
Or you can make all the sessions use a global threadpool. See if this can make things faster.

For CUDA though, disabling the threadpool altogether by setting the number of threads to 1 seems to be the best option for each of the sessions since CUDA kernels are not using CPU thread pools.

from onnxruntime.

souptc commented on September 21, 2024

hi, which kernel do you have in your model?
due to implementation reason, in ORT's cuda ep, we do have some kernels which currently doesn't support launch them in parallel, for example: the conv kernel

Just double check whether that is the case.

from onnxruntime.

ccccjunkang commented on September 21, 2024

Thank you for your reply. @souptc
In this case, there is no conv kernel in the model, matmul and elementwise kernel are used. The reason for using multiple sessions is to create multiple streams on the GPU, so it can launch kernel Concurrently. Is this lock caused by the cuda runtime.
I set a cuda context for each thread to avoid the lock, but it does not take effect.

from onnxruntime.

github-actions commented on September 21, 2024

This issue has been automatically marked as stale due to inactivity and will be closed in 30 days if no further activity occurs. If further support is needed, please provide an update and/or more details.

from onnxruntime.

use multi ort session in one process, can not improve throughput about onnxruntime HOT 5 OPEN

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent