🐛 Describe the bug When I run VGG19 100 times with more than 8 th

It feels like this issue belong to <a href="https://discuss.pytorch.org/" rel="nofollo

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

PyTorch model slows down 30x when binding threads with fewer than half the CPU cores. about pytorch HOT 5 OPEN

ye3084 commented on September 22, 2024

PyTorch model slows down 30x when binding threads with fewer than half the CPU cores.

from pytorch.

Comments (5)

malfet commented on September 22, 2024 2

Adding triage review to discuss what do do about various performance issue and how we should approach them

from pytorch.

tringwald commented on September 22, 2024 2

By using .sched_setaffinity() you are basically telling your OS to only allow the current process to use certain CPU cores. torch will use torch.get_num_threads() (usually CPU cores/2) threads for computation by default. So in your example you will force the OS to schedule N threads on N-1 cores which probably leads to some very heavy congestion.
You should probably use torch.set_num_threads() to limit the number of threads torch will use.

from pytorch.

malfet commented on September 22, 2024

It feels like this issue belong to https://discuss.pytorch.org/

Though if I'm to rephrase it, it probably about documenting what are the interoperability expectations between PyTorch runtime (which uses OpenMP) and Python threading library

Also, I wonder if torch.set_num_threads(len(core_ids)) will help

from pytorch.

ye3084 commented on September 22, 2024

@tringwald @malfet
Thank you for your response. I used torch.set_num_threads() and it solved the problem, but I found that when I use more than 8 cores, the thread execution time increases. It seems that using 8 cores gives the best performance. I've created a chart where the X-axis represents the number of cores used and the Y-axis shows the execution time for the threads.

The results I obtained on another server with a 72-core CPU are as follows (this server is shared with other users, which might affect performance).

from pytorch.

tringwald commented on September 22, 2024

Your CPU only has 8 (or 36) physical cores, with hyperthreading/SMT that makes 16/72 logical cores. SMT only really helps when waiting for something like IO. In a heavy number crunching task like NN inference, there is no real advantage of using logical cores. Having multiple threads fight for the same physical core only leads to congestion and cache eviction.

from pytorch.

Recommend Projects

PyTorch model slows down 30x when binding threads with fewer than half the CPU cores. about pytorch HOT 5 OPEN

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent