Git Product home page Git Product logo

Comments (16)

fereidani avatar fereidani commented on July 28, 2024

Hi, thanks for your report,
as I explained in your other issue this code was a dead code:

     { 
         self.waker = Some(cx.waker().clone()) 
     } 

it is now removed.

from kanal.

sbarral avatar sbarral commented on July 28, 2024

But how is the waker updated then? You need to update the waker each time poll is called (or at least ensure it hasn't changed, which is also a concurrent operation). For instance here:

kanal/src/future.rs

Lines 101 to 111 in ac93742

FutureState::Waiting => {
let r = this.sig.poll(cx);
match r {
Poll::Ready(v) => {
*this.state = FutureState::Done;
if v == state::UNLOCKED {
return Poll::Ready(Ok(()));
}
Poll::Ready(Err(SendError::Closed))
}
Poll::Pending => Poll::Pending,

the waker is never updated.

from kanal.

fereidani avatar fereidani commented on July 28, 2024

Could you please give me some examples of updating the waker when using the library?
I have an idea if we need to implement that too, without losing any performance. but I need some examples to see if we really need to implement that.

from kanal.

sbarral avatar sbarral commented on July 28, 2024

The thing is that after being polled once and registering the first waker, the next poll may give you another waker and the first waker will no longer be actual: even if you call it (which is OK, in the sense it's not UB), it will not wake the task.

I have some tests for that (passing change_waker = true), but really there isn't much to it, it simply changes the waker:

https://github.com/asynchronics/tachyonix/blob/f6859d0fecf3e5561ad0b05f90a679922c1e34fe/src/event.rs#L832-L849

from kanal.

sbarral avatar sbarral commented on July 28, 2024

I have an idea if we need to implement that too, without losing any performance. but I need some examples to see if we really need to implement that.

I'd be interested, because it is really a hard problem. The problem is that even if you call Waker::will_wake to avoid storing a new waker, you are accessing the waker concurrently so this comes back to the initial issue.

from kanal.

fereidani avatar fereidani commented on July 28, 2024

I'd be interested, because it is really a hard problem. The problem is that even if you call Waker::will_wake to avoid storing a new waker, you are accessing the waker concurrently so this comes back to the initial issue.

I think concurrent reading is not problematic, as we can read wakers concurrently at the same time.
So calling Waker::will_wake will be safe.
Is this statement correct?

from kanal.

sbarral avatar sbarral commented on July 28, 2024

So calling Waker::will_wake will be safe.
Is this statement correct?

Yes, absolutely. The problem is indeed when it actually needs to be changed, because you will need to synchronize the update.

Note however that even when using will_wake, issue #16 still applies: you must "tell" the compile that you have in fact only an &Waker and not an &mut Waker, otherwise the compiler may perform unsound optimizations.

from kanal.

sbarral avatar sbarral commented on July 28, 2024

Could you please give me some examples of updating the waker when using the library?

Sorry, I realized you probably wanted to see how to implement and not how to test this. One slow solution is Atomic Waker and I wrote a faster one called Diatomic Waker. But I'm not sure either of them is the best for an MPMC. I think you really should have a look at the event-listener crate to have an idea of what is involved: the problem is really much harder than it looks at first glance.

from kanal.

fereidani avatar fereidani commented on July 28, 2024

Please check the latest commit. I think it's fixed. The waker update is now handled correctly.

from kanal.

sbarral avatar sbarral commented on July 28, 2024

I mean it feel a bit like a band aid and the spin loop is not something I would use personally, but you may be able to make it work.

I think there is still a race at the moment:

kanal/src/future.rs

Lines 75 to 77 in 73befe0

if let Some(first) = internal.next_recv() {
drop(internal);
unsafe { first.send(std::ptr::read(this.data.as_mut_ptr())) }

Since the waker is no longer under the mutex lock (dropped on line 76), calling Waker::wake() will probably race with other accesses. In this specific case it is enough to drop the lock guard after the call to wake, but you probably should review the code to make sure there are no other similar races.

This is the problem with races: no matter how much brainpower you use at chasing them, it is very hard to say when/if you have found them all. I will sadly not be able for this reason to make further reviews because I need to focus on my own bugs ;-)

So please feel free to close the issues as you feel appropriate, but before you do I would really encourage you to build an extensive battery of tests. Loom is from far my preference because it makes an exhaustive exploration of all interleavings, but it does require some discipline; for instance it would detect those races only if the Waker was in an UnsafeCell (which as discussed in #16 it probably should, on top of Aliasable). MIRI is of course great too and can even catch things that Loom won't like some specific weak orderings and many UBs, but it is not deterministic so will very often fail to detect concurrency bugs, which is why I'd use it mostly as a complement or for non-concurrent UB.

from kanal.

fereidani avatar fereidani commented on July 28, 2024

Thanks, Serge. You helped a lot and caught an important possible race condition in the library.
It's a few nanoseconds of spin, when signal is not in the waiting list the data is already transferring from/to another thread, and in my tests, if the user is not using the poll API directly and changing the Waker, they will not pay the price of switching waker. so to me, it's not like a band-aid solution, it is optimizing for the common use case and paying the price for the uncommon one.

About:

kanal/src/future.rs

Lines 75 to 77 in 73befe0

if let Some(first) = internal.next_recv() {
drop(internal);
unsafe { first.send(std::ptr::read(this.data.as_mut_ptr())) }

It's not a race condition as only the sender has mut access to the destination memory location, when recv signal is pushed to the queue, the receiver will not be permitted to do any operation on the shared location until a signal arrives.

As the Waker is not going to be updated from anywhere else but the creator thread, I think it's not required to be in the UnsafeCell.

I'm extensively using Miri on every change to make sure there is no error, I'm going to try Loom too, thanks for suggesting that.
I'm going to open another issue about writing those tests and asking others for help with writing tests too.

from kanal.

sbarral avatar sbarral commented on July 28, 2024

I am not sure if we speak about the same race. My point is that at this point the waker is updated, under the lock:

kanal/src/future.rs

Lines 233 to 236 in e415d69

let mut internal = acquire_internal(this.internal);
if internal.recv_signal_exists(this.sig.as_signal()) {
// signal is not shared with other thread yet so it's safe to update waker locally
this.sig.register(cx.waker());

To make sure that the sender does not call Waker::wake() at the same time when called here in first.send():

kanal/src/future.rs

Lines 75 to 77 in e415d69

if let Some(first) = internal.next_recv() {
drop(internal);
unsafe { first.send(std::ptr::read(this.data.as_mut_ptr())) }

you would need to acquire the lock as well in the sender (which you do, just not long enough). These are not the same thread, so I think you were thinking of another situation.

Regarding UnsafeCell: it's goal is precisely to tell the compiler that you know what you do and that this is not a concurrent access. Otherwise the compiler applies optimizations that can lead to UB. But again, this is not enough: you need in your case to wrap the waker in Aliasable<UnsafeCell<Waker>>, as discussed in #16.

from kanal.

fereidani avatar fereidani commented on July 28, 2024

Aha, got what you mean.
There is no sender at that point, as no sender has access to that signal. internal.next_recv() removes the signal from the list, so if the sender gets ownership of the signal recv_signal_exists will return false, and else statement will be executed.

from kanal.

sbarral avatar sbarral commented on July 28, 2024

Oh yes, I see, my bad.

from kanal.

fereidani avatar fereidani commented on July 28, 2024

No problem! Thank you for the time that you are putting to investigate Kanal for bugs.
Even reports like that help me to recheck ideas and make sure they are sound.

from kanal.

sbarral avatar sbarral commented on July 28, 2024

Thank you for the time that you are putting to investigate Kanal for bugs.

No problem. But I'm back to my bugs now ;-)

from kanal.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.