Git Product home page Git Product logo

Comments (8)

arnabanimesh avatar arnabanimesh commented on August 17, 2024 1

No. I am running on Windows only.

from rayon.

cuviper avatar cuviper commented on August 17, 2024

Is it possible for you to share your code? Or a similar reproducer? Otherwise, I can only guess at the cause...

The task manager shows that all the cores are being used, but not at 100%

With par_bridge, this could be the threads waiting on the Mutex for the sequential iterator input. If so, that means the iterator cannot produce items fast enough to keep the parallel part busy. Make sure you're doing as much as possible on the parallel side though -- like iter.map(...).par_bridge() vs iter.par_bridge().map(...).

With par_iter, rayon shouldn't be adding much blocking, but maybe you have a Mutex or similar synchronization of your own that's blocking progress?

from rayon.

arnabanimesh avatar arnabanimesh commented on August 17, 2024

I don't have mutex or anything. It is a fairly simple code. All the complexity is within the single threaded recursive function.Even the cache for storing results of recursive calls is implemented within the recursive function, so even that's not shared.

I can't share the original code as it is proprietary, but I'll try to create a minimal reproducible example where par_iter doesn't work by this weekend.

from rayon.

JulianKnodt avatar JulianKnodt commented on August 17, 2024

Is it possible you're running this in WSL? I recently tried running some code in WSL and because of syscalls it seems to slow down significantly

from rayon.

wagnerf42 avatar wagnerf42 commented on August 17, 2024

the most likely is that you have a severe load imbalance : few function calls (=10% of cores) take very long.

from rayon.

arnabanimesh avatar arnabanimesh commented on August 17, 2024

Each function call takes ~0.2 seconds on average (tested using single core Rust app). Worst case is 1-2 seconds.

Currently I am running it on multiple cores using python's multiprocessing module and C++ code packaged using pybind11 and it is able to hit 100% on all cores (tested on 16 core and 20 core intel systems).

I tried using tokio instead of rayon, but there also the same problem. I thought that maybe antivirus was causing the issue but then I ran it inside Windows sandbox, still the same problem.

My personal laptop has 8 cores and the multi core rayon based rust app is able to hit 30-40% there continuously. It is fluctuating a lot though.

from rayon.

arnabanimesh avatar arnabanimesh commented on August 17, 2024

the most likely is that you have a severe load imbalance : few function calls (=10% of cores) take very long.

It still doesn't explain the significant slowdown as compared to the single core version

from rayon.

arnabanimesh avatar arnabanimesh commented on August 17, 2024

I rewrote the existing C++ code in Rust without giving it much thought. Finally I got the time to investigate. The caching was implemented using level 4 nested vector array which contains a tuple of int and vector of structs. Basically index of each of the level would correspond to each of the function argument. I think it was bottlenecking memory bandwidth. Enough memory was already allocated early on so multiple allocations were not required. After I removed the cache it ran using all CPU cores. I rewrote the caching logic using a fast hashmap at a global level (kind of like redis), and now it is performing as it should.

TL;DR: Caching using deeply nested vector was impacting performance. Replaced it with hashmap based key value store. Now it is performing as it should.

I think this matter is done and dusted. Hence closing the issue.

from rayon.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.