Comments (12)
The other yield loop is in steal_until, which is harder because that's looping for either latch completion or available work -- suggestions welcome.
Yeah, I figured I'd have to remove the busy loops sooner or later. It'd be worth benchmarking on other systems too (not sure what system you are using there). In the past, I've found that linux in particular has a more efficient condvar impl than other systems (I think it is based on futexes, but I've not researched this stuff in detail in a while). Another option is to use a spin loop with an increasing backoff -- i.e., sleep for increasing amounts of time.
from rayon.
Heh, I was wondering why wait
was having such an impact on performance: I thought that I only used it to block until the worker threads started, but I see I also used it to wait for the main threads to start. That was... silly. Good catch, a condvar definitely seems to make sense there.
from rayon.
I tried rayon
with my embarrassingly parallel evolutionary algorithm. It was about 10-15% slower than the simple_parallel
version IIRC. I will try again to see if performance is en par.
from rayon.
Bit of a hanger-on to this issue, but is SeqCst necessary for this AtomicBool? My (hurried) reading of it is that Latch essentially waits until b is true, and then allows other code to run - that being the case, it should be possible to use Ordering::Acquire for the probe method, and Ordering::Release for the set method. This would potentially have performance benefits on at least some platforms.
from rayon.
@nikomatsakis The numbers I gave were on Linux, Fedora 23 x86_64 on an i7-2600. I just tried Win10 x64 on an i7-3770K, and it goes from around 3-3.5x parallel speedup (quite variable!) to a more stable 4x speedup. (FWIW both of those are quad-core, giving 8 logical CPUs with hyperthreading.) I don't have OSX to try.
I can send a PR for the easy latch-wait condvar -- anything you want tweaked from the patch above?
@mneumann I'm curious to hear how much this helps you too!
@AlisdairO I think you're right that ordering could be relaxed. However, if we can get the latch fully converted to condvars, the atomic bool might go away in favor of just mutex-guarded state.
from rayon.
Hmm, I think I'd take the patch, but I'd prefer to separate the case of stolen jobs from other jobs, since only the latter benefit from the mutex condvar arrangement, and for the rest it is just overhead.
Niko
I can send a PR for the easy latch-wait condvar -- anything you want tweaked from the patch above?
@mneumann I'm curious to hear how much this helps you too!
@AlisdairO I think you're right that ordering could be relaxed. However, if we can get the latch fully converted to condvars, the atomic bool might go away in favor of just mutex-guarded state.
—
Reply to this email directly or view it on GitHub.
from rayon.
Well, supposedly stealing is rare anyway, so overhead doesn't matter. Plus I hope eventually to figure out how to make stealing sleep more precisely too. But until then, sure, I can try tweaking it so only fully-awaited jobs will notify the condvar.
from rayon.
Stealing is rare, but creating a latch is very, very common -- it
happens on every fork
.
On Thu, Dec 31, 2015, at 07:44 PM, Josh Stone wrote:
Well, supposedly stealing is rare anyway, so overhead doesn't matter.
Plus I hope eventually to figure out how to make stealing sleep more
precisely too. But until then, sure, I can try tweaking it so only
fully-awaited jobs will notify the condvar.— Reply to this email directly or view it on GitHub[1].
Links:
from rayon.
@nikomatsakis @cuviper : I redid my benchmarks. My numbers are:
Simple parallel: ~50 sec
Rayon rev e15798d: ~55.8 sec
Rayon rev 97b04cd: ~56 sec
Rayon rev e15798d with threshold 1.0: ~54.7 sec
Rayon rev 97b04cd with threshold 1.0: ~49.8 sec.
Note that I am using par_iter
, and in my case it's important to run every single iteration in parallel. Only changing the THRESHOLD constant did the trick. Good news, it's now en par with simple_parallel. So when setting the threshold I do see a 10% improvement with the new code!!! Now waiting for a way to set the threshold without patching the code :)
from rayon.
@mneumann interesting. I had planned to include the option in the par_iter
API to add weights, so that you don't need to manually adjust the threshold. I also wanted to have an "autotune" operation that would try to find an ideal weight experimentally. (But I'm not really sure if that's the best overall approach.)
from rayon.
@nikomatsakis To clarify: The 10% improvement I saw was in cpu time, not in wall clock time. simple_parallel
was still a bit faster, but it also kept the cpu slightly more busy.
I experimented a bit with making the ParallelLen a trait and make the functions generic over it. The decision when to fallback to sequential mode would then be encapsulated there. One would use it like:
// this is equivalent to THRESHOLD = 1.0
pop.into_par_iter::<FullyParallel>().map(...) { }.collect_into()
struct WeightedParallel { cost: f64, ... }
impl ParallelLen for WeightedParallel {
const THRESHOLD: f64 = 10_000.0;
// ...
fn is_parallel(&self) { self.cost > THRESHOLD && self.max_length > 1 }
}
pop.into_par_iter::<WeightedParallel>().map(...) {}
But I am not sure if it isn't overkill. Also one could use associated constants to specify the THRESHOLD of course.
I think it all depends on the type of algorithms that are being used. I can imagine algorithms were the left and right costs aren't always equally distributed (left and right subtrees for instance). So we'd need specialized ParallelLen implementations anyway.
from rayon.
@mneumann this is more germane to #7, so I'm going to post my reply there.
from rayon.
Related Issues (20)
- Extremely deep call stack on MacOS HOT 2
- calling `buffer.par_sort_unstable_by_key` from a task calls the task itself HOT 4
- How to implement ParallelIterator for a custom Range? HOT 13
- How to dynamically change the number of threads during runtime? HOT 2
- `ParallelExtend` for tuples of references HOT 3
- Using async iterator-like SQLX fetch with Rayon HOT 4
- Docs on "spawn" don't say what exactly this function does HOT 1
- Add SIMD SORT as an option HOT 1
- Handle/guard support for current thread pool HOT 1
- Optional parallelization
- Way to have assertion whether something is outside of a rayon task HOT 2
- how drop rayon whren it in a dylib and dylib should be droped? HOT 4
- Error reporting in scoped tasks
- cooperative yield in ThreadPool::install() causes unexpected behavior in nested pools HOT 4
- rayon-core tests fail to build. HOT 6
- Matrix multiplication with Rayon doesn't see perf improvements HOT 3
- general purpose WASM support? HOT 1
- `fold` creates identity for each sublist HOT 10
- Support for some sort of collect()/extend() into `LinkedList<Vec<T>>` HOT 1
- A yield_now/yield_local alternative that allows the thread to go to sleep HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from rayon.