Comments (6)
That's worrisome, but I'm afraid I don't have any Apple hardware to test this myself. Hopefully others in the community can share their experience and help debug what's going on.
One small tip -- if you haven't set num_threads
manually, the RAYON_NUM_THREADS
environment variable will also override the default setting.
from rayon.
I think you'll need to figure out what that System time is actually doing, because that looks pathological. Does Xcode profiling or anything like that reveal System details?
from rayon.
I found the reason may relate to the competition for CPU resources between the work-stealing strategy
of rayon
and the system.
Again, note that Everything worked well on the previous versions of MacOS before Sonoma 14.4.
When not manually setting the num_threads(...)
, the default num_threads
is 14 (m3 max 14 + 16). In this case, the system will "rob" the CPU resources back, and the "robbing" itself is costly.
Then I limited the num_threads
to 4 as follows:
rayon::ThreadPoolBuilder::new().num_threads(4).build_global().unwrap();
The system is still "robbing", but the user process can use these 4 threads most of the time. And in my program's case, the performance improves although it's still much slower than before as the CPU cores are not fully exploited. This is just a temporary solution.
from rayon.
I think the problem may not be all related to rayon
.
From my experience until now, the number of threads needs to be limited below the number of cores. The details are as follows:
- if I limit the
num_threads
to4
, parallel works stably. - if I limit the
num_threads
to8
, parallel works stably sometimes, but there's a chance to be "robbed". - if I limit the
num_threads
to12
, parallel works stably sometimes, but there's a higher chance to be "robbed".
Maybe the larger num_threads
be used, the higher the probability of being "robbed".
And this is how the resource was "robbed" by the system
:
and maybe this is why 4
num_threads
can work.
from rayon.
I think you'll need to figure out what that System time is actually doing, because that looks pathological. Does Xcode profiling or anything like that reveal System details?
Now I just checked the information in the activity monitor
, and as you said, there were pathological and conflict phenomena.
The picture reveals the system
takes the CPU resources away, but the cost of each process shows it's my process that takes the most CPU resources. But I'm sure that my process is slowed down so the CPU resources are not computing it.
Anyway, I will look into this problem more deeply soon according to your suggestion.
from rayon.
Things are clearer.
I made a deeper profiling and found that with a higher parallel, my process needs more memory, and then the security checking in the kernel is raised, which is costly.
This might be confirmed by https://appleinsider.com/articles/24/03/21/apple-silicon-vulnerability-leaks-encryption-keys-and-cant-be-patched-easily
Luckily, rayon
still works well.
from rayon.
Related Issues (20)
- Add SIMD SORT as an option HOT 1
- Handle/guard support for current thread pool HOT 1
- Optional parallelization
- Way to have assertion whether something is outside of a rayon task HOT 2
- how drop rayon whren it in a dylib and dylib should be droped? HOT 4
- Error reporting in scoped tasks
- cooperative yield in ThreadPool::install() causes unexpected behavior in nested pools HOT 4
- rayon-core tests fail to build. HOT 6
- Matrix multiplication with Rayon doesn't see perf improvements HOT 3
- general purpose WASM support? HOT 1
- `fold` creates identity for each sublist HOT 10
- Support for some sort of collect()/extend() into `LinkedList<Vec<T>>` HOT 1
- A yield_now/yield_local alternative that allows the thread to go to sleep HOT 3
- Rayon slower than single threaded performance when mixing iter and par_iter HOT 9
- Potential to modify ordering for split_count and tour_counter HOT 2
- rayon 1.9.0 doctest failure - stack overflow on aarch64-unknown-linux-gnu HOT 2
- par_bridge_recursion test sometimes hanging indefinitely HOT 3
- `&mut [T]` should implement `rayon::slice::ParallelSliceMut` HOT 4
- Suggestion for stack size when calling par_sort_unstable HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from rayon.