It appears calls to ThreadPool::install(op) will star

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

In the simplest case: <div class="snippet-clipboard-content notransla

cooperative yield in ThreadPool::install() causes unexpected behavior in nested pools about rayon HOT 4 OPEN

benkay86 commented on July 18, 2024

cooperative yield in ThreadPool::install() causes unexpected behavior in nested pools

from rayon.

Comments (4)

cuviper commented on July 18, 2024

I do think yielding is the appropriate default behavior, and documentation PRs are fine to clarify this. I think we could also add a variant or two of install with different behavior. Here's the current install behavior, and some thoughts:

If you're not in any pool, queue the work in the target pool and then block.
- This should be uncontroversial.
If you're in a different pool, queue the work in the target pool and then yield.
- We could add a mode that blocks here instead. The potential downside is that this pool could get completely blocked, and even cause deadlocks if any of that inner pool work tries to queue and wait for any work back in the outer pool. But as long as you know there's no "loopback", it may be fine.
If you're already in the target pool, execute it directly.
- Note that this may also lead to work-stealing of outer work if it ends up waiting for other threads to complete its inner work.
- We could have an even stronger mode that blocks here as well, but that would have an even higher risk of deadlocking if all its threads are tied up this way.

BTW, your Mutex scenario is also discussed in #592, which may or may not involve multiple pools.

from rayon.

benkay86 commented on July 18, 2024

@cuviper, I appreciate your explanation of the install() behavior. I do want to make an important distinction that yielding is not the opposite of blocking. For example, consider the behavior of tokio::spawn():

statement1;
let handle = spawn(async { some_work });
statement2;
handle.await.unwrap();

statement1 is executed on what we'll call thread1. The call to spawn() returns immediately without blocking. statement2 is executed on thread1. Then the call to await blocks, potentially yielding execution to another task on thread1. Here, the non-blocking spawn and the yield occur in different statements, ensuring that statement2 will always execute after statement1 on thread1. Furthermore, the explicit await syntax makes it obvious to the programmer where the yield point is.

If you're in a different pool, queue the work in the target pool and then yield.

* We could add a mode that blocks here instead. The potential downside is that this pool could get completely blocked, and even cause deadlocks if any of that inner pool work tries to queue and wait for any work back in the outer pool. But as long as you know there's no "loopback", it may be fine.

In the simplest case:

some_par_iter.par_for_each(|| {
    statement1;
    other_pool.install(|| { some_work; });
    statement2;
}

There is no need to make install() block here. It could queue the work on the other threadpool and then return immediately just like spawn(). What's confusing is that install() doesn't just not block -- it also yields, which means there may be several calls to statement1 before statement2 ever gets a chance to execute. Rayon's syntax doesn't make yield points explicit, and the implicit yield is not documented (except in this GItHub issue). This leads to surprising results not only with mutexes as in #592, but also with unbounded memory growth and stack overflows as mentioned in my first comment. Even more surprisingly, install() may or may not implicitly yield() depending on the context!

IMHO, if we were writing Rayon from scratch, I would suggest making all yields explicit. The default behavior would then be expressed as follows. Making the yield explicit makes the behavior obvious to the programmer, and it also gives the programmer the choice of moving statement2 up between install() and yield().

some_par_iter.par_for_each(|| {
    statement1;
    other_pool.install(|| { some_work; });
    rayon::yield_now();
    statement2;
}

I do think yielding is the appropriate default behavior, and documentation PRs are fine to clarify this. I think we could also add a variant or two of install with different behavior.

I have to agree, but only because changing the behavior of install() could cause deadlocks in existing code that depend on the implicit yield. I would love to see clearer documentation and a yield-free versions of install() and similar methods (not sure if spawn() can yield in rayon). It looks like this would require some intimate work on unsafe parts of rayon. Do you envision this as something a maintainer might work on, or something you would like an outside PR for?

from rayon.

cuviper commented on July 18, 2024

In the simplest case:
some_par_iter.par_for_each(|| {
    statement1;
    other_pool.install(|| { some_work; });
    statement2;
}
There is no need to make install() block here.

Since install returns a value, that is inherently blocking with respect to following statements. It also allows borrowing since its closure is not constrained to 'static lifetimes -- from the compiler borrowck perspective, those borrows only last for the duration of this call, so we cannot safely return until we're done with that work. If you don't want that imperative control flow, that's what ThreadPool::spawn and Scope::spawn are for.

So when I talk about yielding vs. blocking here, I'm only talking about what the thread should do while it waits, but we do have to wait for install completion regardless.

We've also had discussions about actual awaitable futures, like my old WIP #679, but that didn't address the executor part, which you would need if you wanted to await that within a parallel for_each -- or at least a block_on kind of thing.

I would love to see clearer documentation and a yield-free versions of install() and similar methods (not sure if spawn() can yield in rayon). It looks like this would require some intimate work on unsafe parts of rayon. Do you envision this as something a maintainer might work on, or something you would like an outside PR for?

The design is the more interesting part that needs discussion. Regarding spawn, that only queues the work and returns right away. The other main primitives are join and scope, but I think it will get a lot hairier if we try to add variations of those, and I haven't fully thought about those implications. Higher-level stuff like the parallel for_each also runs on join, and I definitely don't want to fork the entire ParallelIterator API.

Another idea that I've raised before is that we could have a kind of "critical section" call, inhibiting all work-stealing on that thread until that call returns. (Possibly with some nuance like allowing local work-stealing of work queued while in that call.) This might be a better way to have broad impact without forking a bunch of APIs.

If we decide on just adding install variations, I can mentor the implementation if you're keen, and as a first step take a look at Registry::in_worker. The variations I described before are pretty much contained in the three different branches in that function, where the "cold" part is the fully blocking case.

rayon/rayon-core/src/registry.rs

Lines 486 to 504 in 7449d7d

 pub(super) fn in_worker<OP, R>(&self, op: OP) -> R 

 where 

 OP: FnOnce(&WorkerThread, bool) -> R + Send, 

 R: Send, 

 { 

 unsafe { 

 let worker_thread = WorkerThread::current(); 

 if worker_thread.is_null() { 

 self.in_worker_cold(op) 

 } else if (*worker_thread).registry().id() != self.id() { 

 self.in_worker_cross(&*worker_thread, op) 

 } else { 

 // Perfectly valid to give them a `&T`: this is the 

 // current thread, so we know the data structure won't be 

 // invalidated until we return. 

 op(&*worker_thread, false) 

 } 

 } 

 }

The cold path currently debug_assert!s that it's not in any pool though, so we'd have to re-evaluate that at least. In a way, that represents the philosophy of the current design, that pool threads should always be trying to find work -- but I do think there are reasonable scenarios, like yours, where it makes sense to opt out of that work-stealing behavior.

from rayon.

benkay86 commented on July 18, 2024

Since install returns a value, that is inherently blocking with respect to following statements. It also allows borrowing since its closure is not constrained to 'static lifetimes -- from the compiler borrowck perspective, those borrows only last for the duration of this call, so we cannot safely return until we're done with that work. If you don't want that imperative control flow, that's what ThreadPool::spawn and Scope::spawn are for.

Ah, I stand corrected. Since install() borrows and returns a value, it must either block or yield before returning.

If you don't want that imperative control flow, that's what ThreadPool::spawn and Scope::spawn are for.

I'll refactor my code with spawn() and scope().

EDIT It looks like calling ThreadPool::scope() will internally call ThreadPool::install(), triggering the yield behavior.

Another idea that I've raised before is that we could have a kind of "critical section" call, inhibiting all work-stealing on that thread until that call returns. (Possibly with some nuance like allowing local work-stealing of work queued while in that call.) This might be a better way to have broad impact without forking a bunch of APIs.

This sounds difficult to implement, but it sounds like it would be the most ergonomic solution in the long term.

EDIT Might still need a non-yielding install() variant to prevent install() from work-stealing on the parent thread pool but still allow work-stealing on the inner thread pool. Otherwise, it's not clear how you would specify the critical section to cover one but not the other.

from rayon.

cooperative yield in ThreadPool::install() causes unexpected behavior in nested pools about rayon HOT 4 OPEN

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

	pub(super) fn in_worker<OP, R>(&self, op: OP) -> R
	where
	OP: FnOnce(&WorkerThread, bool) -> R + Send,
	R: Send,
	{
	unsafe {
	let worker_thread = WorkerThread::current();
	if worker_thread.is_null() {
	self.in_worker_cold(op)
	} else if (*worker_thread).registry().id() != self.id() {
	self.in_worker_cross(&*worker_thread, op)
	} else {
	// Perfectly valid to give them a `&T`: this is the
	// current thread, so we know the data structure won't be
	// invalidated until we return.
	op(&*worker_thread, false)
	}
	}
	}