Comments (5)
from onnxruntime.
Each session creates a ThreadPool that is optimized to run on all cores. That TP is used for intra op parallelization. Meaning many CPU kernels attempt to distribute the work across all the cores for a give session.
Multiple sessions would create lots of contention and context switches.
You may want experiment with your model and set different number of threads for IntraOp thread pool.
Or you can make all the sessions use a global threadpool. See if this can make things faster.
For CUDA though, disabling the threadpool altogether by setting the number of threads to 1 seems to be the best option for each of the sessions since CUDA kernels are not using CPU thread pools.
from onnxruntime.
hi, which kernel do you have in your model?
due to implementation reason, in ORT's cuda ep, we do have some kernels which currently doesn't support launch them in parallel, for example: the conv kernel
Just double check whether that is the case.
from onnxruntime.
Thank you for your reply. @souptc
In this case, there is no conv kernel in the model, matmul and elementwise kernel are used. The reason for using multiple sessions is to create multiple streams on the GPU, so it can launch kernel Concurrently. Is this lock caused by the cuda runtime.
I set a cuda context for each thread to avoid the lock, but it does not take effect.
from onnxruntime.
This issue has been automatically marked as stale due to inactivity and will be closed in 30 days if no further activity occurs. If further support is needed, please provide an update and/or more details.
from onnxruntime.
Related Issues (20)
- [Performance] Multiple Sessions on Same GPU is very slow HOT 2
- [Models larger than 2GB :(] Specify mid-graph.output after initializing InferenceSession HOT 3
- [Error] [ONNXRuntimeError] : 1 : FAIL : CUDA failure 3: initialization error HOT 4
- [Build] long paths in NuGet package breaking build on Windows HOT 2
- [Feature Request] Missing optimization of DequantizeLinear β Flatten β QuantizeLinear? HOT 1
- Missing onnxruntime_perf_test.exe in Release Assets (or what actually is "Build Drop"?) HOT 2
- [Build]: cmake', '--build', '/temp/liz/onnxruntime/build/Linux/RelWithDebInfo', '--config', 'RelWithDebInfo', '--', '-j64'] HOT 1
- [Feature Request] Request grid_sample 5D support π HOT 2
- [Build][Bug] The compiler doesn't support BFLOAT16!!! HOT 2
- [WebGPU] `Error: [WebGPU] Kernel "[MaxPool] /sincnet/pool1d.0/MaxPool" failed. Error: length of specified kernel shapes should be 2 less than length of input dimensions` HOT 4
- Error Instantiating EmbeddingModel with ONNX Model intfloat/multilingual-e5-large HOT 1
- [Documentation] Community blog post contribution HOT 2
- [ARM][CPU] Unit test and onnx_runtime_perf test gives cpuinfo error for new Windows ARM chips HOT 2
- [Feature Request] Mark as negative tests for minimal CUDA build HOT 5
- New restricted asymmetric quantization mode in QDQ mode with zero_point restricted to either 128 or 0
- Trilu op still not work with INT32 input HOT 4
- [WebNN EP] Support int64 output data type for CoreML backend HOT 3
- [Web] where is the demo of object detection on web HOT 3
- LNK2019: unresolved external symbol OrtGetApiBase HOT 2
- Multi-threaded GPU inferencing failing with whisper-small: Non-zero status code returned while running DecoderMaskedMultiHeadAttention node HOT 7
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. πππ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google β€οΈ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from onnxruntime.