Comments (16)
Hi, thanks for your report,
as I explained in your other issue this code was a dead code:
{
self.waker = Some(cx.waker().clone())
}
it is now removed.
from kanal.
But how is the waker updated then? You need to update the waker each time poll
is called (or at least ensure it hasn't changed, which is also a concurrent operation). For instance here:
Lines 101 to 111 in ac93742
the waker is never updated.
from kanal.
Could you please give me some examples of updating the waker when using the library?
I have an idea if we need to implement that too, without losing any performance. but I need some examples to see if we really need to implement that.
from kanal.
The thing is that after being polled once and registering the first waker, the next poll
may give you another waker and the first waker will no longer be actual: even if you call it (which is OK, in the sense it's not UB), it will not wake the task.
I have some tests for that (passing change_waker = true
), but really there isn't much to it, it simply changes the waker:
from kanal.
I have an idea if we need to implement that too, without losing any performance. but I need some examples to see if we really need to implement that.
I'd be interested, because it is really a hard problem. The problem is that even if you call Waker::will_wake
to avoid storing a new waker, you are accessing the waker concurrently so this comes back to the initial issue.
from kanal.
I'd be interested, because it is really a hard problem. The problem is that even if you call
Waker::will_wake
to avoid storing a new waker, you are accessing the waker concurrently so this comes back to the initial issue.
I think concurrent reading is not problematic, as we can read wakers concurrently at the same time.
So calling Waker::will_wake
will be safe.
Is this statement correct?
from kanal.
So calling
Waker::will_wake
will be safe.
Is this statement correct?
Yes, absolutely. The problem is indeed when it actually needs to be changed, because you will need to synchronize the update.
Note however that even when using will_wake
, issue #16 still applies: you must "tell" the compile that you have in fact only an &Waker
and not an &mut Waker
, otherwise the compiler may perform unsound optimizations.
from kanal.
Could you please give me some examples of updating the waker when using the library?
Sorry, I realized you probably wanted to see how to implement and not how to test this. One slow solution is Atomic Waker and I wrote a faster one called Diatomic Waker. But I'm not sure either of them is the best for an MPMC. I think you really should have a look at the event-listener
crate to have an idea of what is involved: the problem is really much harder than it looks at first glance.
from kanal.
Please check the latest commit. I think it's fixed. The waker update is now handled correctly.
from kanal.
I mean it feel a bit like a band aid and the spin loop is not something I would use personally, but you may be able to make it work.
I think there is still a race at the moment:
Lines 75 to 77 in 73befe0
Since the waker is no longer under the mutex lock (dropped on line 76), calling Waker::wake()
will probably race with other accesses. In this specific case it is enough to drop the lock guard after the call to wake
, but you probably should review the code to make sure there are no other similar races.
This is the problem with races: no matter how much brainpower you use at chasing them, it is very hard to say when/if you have found them all. I will sadly not be able for this reason to make further reviews because I need to focus on my own bugs ;-)
So please feel free to close the issues as you feel appropriate, but before you do I would really encourage you to build an extensive battery of tests. Loom is from far my preference because it makes an exhaustive exploration of all interleavings, but it does require some discipline; for instance it would detect those races only if the Waker
was in an UnsafeCell
(which as discussed in #16 it probably should, on top of Aliasable
). MIRI is of course great too and can even catch things that Loom won't like some specific weak orderings and many UBs, but it is not deterministic so will very often fail to detect concurrency bugs, which is why I'd use it mostly as a complement or for non-concurrent UB.
from kanal.
Thanks, Serge. You helped a lot and caught an important possible race condition in the library.
It's a few nanoseconds of spin, when signal is not in the waiting list the data is already transferring from/to another thread, and in my tests, if the user is not using the poll API directly and changing the Waker
, they will not pay the price of switching waker. so to me, it's not like a band-aid solution, it is optimizing for the common use case and paying the price for the uncommon one.
About:
Lines 75 to 77 in 73befe0
It's not a race condition as only the sender has
mut
access to the destination memory location, when recv signal is pushed to the queue, the receiver will not be permitted to do any operation on the shared location until a signal arrives.
As the Waker
is not going to be updated from anywhere else but the creator thread, I think it's not required to be in the UnsafeCell
.
I'm extensively using Miri on every change to make sure there is no error, I'm going to try Loom too, thanks for suggesting that.
I'm going to open another issue about writing those tests and asking others for help with writing tests too.
from kanal.
I am not sure if we speak about the same race. My point is that at this point the waker is updated, under the lock:
Lines 233 to 236 in e415d69
To make sure that the sender does not call Waker::wake()
at the same time when called here in first.send()
:
Lines 75 to 77 in e415d69
you would need to acquire the lock as well in the sender (which you do, just not long enough). These are not the same thread, so I think you were thinking of another situation.
Regarding UnsafeCell
: it's goal is precisely to tell the compiler that you know what you do and that this is not a concurrent access. Otherwise the compiler applies optimizations that can lead to UB. But again, this is not enough: you need in your case to wrap the waker in Aliasable<UnsafeCell<Waker>>
, as discussed in #16.
from kanal.
Aha, got what you mean.
There is no sender at that point, as no sender has access to that signal. internal.next_recv()
removes the signal from the list, so if the sender gets ownership of the signal recv_signal_exists
will return false, and else
statement will be executed.
from kanal.
Oh yes, I see, my bad.
from kanal.
No problem! Thank you for the time that you are putting to investigate Kanal for bugs.
Even reports like that help me to recheck ideas and make sure they are sound.
from kanal.
Thank you for the time that you are putting to investigate Kanal for bugs.
No problem. But I'm back to my bugs now ;-)
from kanal.
Related Issues (20)
- Add Oneshot implementation HOT 17
- Async can't utilize `KanalPtr` as effective as Sync HOT 1
- Are kanal functions cancel-safe? HOT 9
- Miri error when forgetting Box<T> HOT 2
- oneshot deadlock in 0.1.0-pre8 HOT 4
- Intrusive variants HOT 3
- Feature: implement recv_timeout for AsyncReceiver HOT 2
- Kanal is slower than crossbeam HOT 1
- Incorrect Send and Sync bounds HOT 2
- Oneshot: Data race detected HOT 4
- Unsound implementation of `as_sync` HOT 3
- Usage example
- Feature: Provide a select! macro HOT 6
- Slow usize send when using `MiMalloc` HOT 1
- mixing sync and async context HOT 2
- `Stream` that take ownership of the `Receiver` HOT 1
- OneshotSender is not Sync HOT 1
- API design for oneshot sender
- Oneshot: UB triggered when using Axum HOT 2
- Sink impl for AsyncSender?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from kanal.