lf-lang / reactor-rs Goto Github PK

View Code? Open in Web Editor NEW

8.0 8.0 5.0 8.33 MB

Reactor runtime implementation in Rust

License: MIT License

Rust 99.89% Makefile 0.07% Shell 0.04%

reactor-rs's People

Contributors

Stargazers

Watchers

Forkers

tanneberger oowekyala feliix42 aneeshl chanijjani

reactor-rs's Issues

Document minimum supported Rust version (MSRV)

As discussed in lf-lang/lingua-franca#1309

Guided search benchmark softlocks sometimes during an iteration

When executing the guided search benchmark sometimes it softlocks during an arbitrary iteration. This might be related with #2, but there it only locks up during startup.

Here is an execution log excerpt where it hangs up:

[2022-01-25T14:36:48Z INFO ] Worker 0: search path from [25, 26, 17] to [24, 24, 24]
[2022-01-25T14:36:48Z TRACE]   - Executing /workers[1]/1 (level 10)
[2022-01-25T14:36:48Z INFO ] Worker 1: search path from [25, 27, 14] to [24, 24, 24]
[2022-01-25T14:36:48Z TRACE]   - Executing /workers[2]/1 (level 10)
[2022-01-25T14:36:48Z INFO ] Worker 2: search path from [26, 26, 14] to [24, 24, 24]
[2022-01-25T14:36:48Z TRACE]   - Executing /workers[3]/1 (level 10)
[2022-01-25T14:36:48Z INFO ] Worker 3: search path from [26, 26, 15] to [24, 24, 24]
[2022-01-25T14:36:48Z TRACE]   - Executing /workers[4]/1 (level 10)
[2022-01-25T14:36:48Z INFO ] Worker 4: search path from [25, 28, 13] to [24, 24, 24]
[2022-01-25T14:36:48Z TRACE]   - Executing /workers[5]/1 (level 10)
[2022-01-25T14:36:48Z INFO ] Worker 5: search path from [26, 27, 13] to [24, 24, 24]
[2022-01-25T14:36:48Z TRACE]   - Executing /workers[6]/1 (level 10)
[2022-01-25T14:36:48Z INFO ] Worker 6: search path from [26, 28, 13] to [24, 24, 24]
[2022-01-25T14:36:48Z TRACE]   - Executing /workers[7]/1 (level 10)
[2022-01-25T14:36:48Z INFO ] Worker 7: search path from [26, 28, 12] to [24, 24, 24]
[2022-01-25T14:36:48Z TRACE]   - Executing /workers[8]/1 (level 10)
[2022-01-25T14:36:48Z INFO ] Worker 8: search path from [27, 28, 13] to [24, 24, 24]
[2022-01-25T14:36:48Z TRACE]   - Executing /workers[9]/1 (level 10)
[2022-01-25T14:36:48Z INFO ] Worker 9: search path from [27, 28, 14] to [24, 24, 24]
[2022-01-25T14:36:48Z TRACE]   - Executing /workers[10]/1 (level 10)
[2022-01-25T14:36:48Z INFO ] Worker 10: search path from [28, 27, 14] to [24, 24, 24]
[2022-01-25T14:36:48Z TRACE]   - Executing /workers[11]/1 (level 10)
[2022-01-25T14:36:48Z INFO ] Worker 11: search path from [29, 26, 13] to [24, 24, 24]
[2022-01-25T14:36:48Z TRACE]   - Executing /workers[12]/1 (level 10)
[2022-01-25T14:36:48Z INFO ] Worker 12: search path from [28, 28, 13] to [24, 24, 24]
[2022-01-25T14:36:48Z TRACE]   - Executing /workers[13]/1 (level 10)
[2022-01-25T14:36:48Z INFO ] Worker 13: search path from [28, 28, 14] to [24, 24, 24]
[2022-01-25T14:36:48Z TRACE]   - Executing /workers[14]/1 (level 10)
[2022-01-25T14:36:48Z INFO ] Worker 14: search path from [29, 27, 14] to [24, 24, 24]
[2022-01-25T14:36:48Z TRACE]   - Executing /workers[15]/1 (level 10)
[2022-01-25T14:36:48Z INFO ] Worker 15: search path from [29, 26, 14] to [24, 24, 24]
[2022-01-25T14:36:48Z TRACE]   - Executing /workers[16]/1 (level 10)
[2022-01-25T14:36:48Z INFO ] Worker 16: search path from [29, 27, 12] to [24, 24, 24]
[2022-01-25T14:36:48Z TRACE]   - Executing /workers[17]/1 (level 10)
[2022-01-25T14:36:48Z INFO ] Worker 17: search path from [28, 28, 12] to [24, 24, 24]
[2022-01-25T14:36:48Z TRACE]   - Executing /workers[18]/1 (level 10)
[2022-01-25T14:36:48Z INFO ] Worker 18: search path from [29, 27, 13] to [24, 24, 24]
[2022-01-25T14:36:48Z TRACE]   - Executing /workers[19]/1 (level 10)
[2022-01-25T14:36:48Z INFO ] Worker 19: search path from [28, 26, 15] to [24, 24, 24]
[2022-01-25T14:36:48Z TRACE] Pushing at (T0 + 0 ns = 0 ms, 283): run [6: {/manager/2}]
[2022-01-25T14:36:48Z TRACE] Processing event at (T0 + 0 ns = 0 ms, 283): run [6: {/manager/2}]
[2022-01-25T14:36:48Z TRACE]   - Running late by 3966544146 ns = 3966544 µs = 3966 ms
[2022-01-25T14:36:48Z TRACE]   - Level 6
[2022-01-25T14:36:48Z TRACE]   - Executing /manager/2 (level 6)
[2022-01-25T14:36:48Z TRACE]   - Level 10
[2022-01-25T14:36:48Z TRACE]   - Executing /workers[0]/1 (level 10)
[2022-01-25T14:36:48Z INFO ] Worker 0: search path from [23, 24, 20] to [24, 24, 24]
[2022-01-25T14:36:48Z TRACE]   - Executing /workers[1]/1 (level 10)
[2022-01-25T14:36:48Z INFO ] Worker 1: search path from [23, 25, 20] to [24, 24, 24]
[2022-01-25T14:36:48Z TRACE]   - Executing /workers[2]/1 (level 10)
[2022-01-25T14:36:48Z INFO ] Worker 2: search path from [25, 24, 20] to [24, 24, 24]
[2022-01-25T14:36:48Z TRACE]   - Executing /workers[3]/1 (level 10)
[2022-01-25T14:36:48Z INFO ] Worker 3: search path from [25, 25, 19] to [24, 24, 24]
[2022-01-25T14:36:48Z TRACE]   - Executing /workers[4]/1 (level 10)
[2022-01-25T14:36:48Z INFO ] Worker 4: search path from [24, 26, 19] to [24, 24, 24]
[2022-01-25T14:36:48Z TRACE]   - Executing /workers[5]/1 (level 10)
[2022-01-25T14:36:48Z INFO ] Worker 5: search path from [24, 25, 20] to [24, 24, 24]
[2022-01-25T14:36:48Z TRACE]   - Executing /workers[6]/1 (level 10)
[2022-01-25T14:36:48Z INFO ] Worker 6: search path from [23, 27, 20] to [24, 24, 24]
[2022-01-25T14:36:48Z TRACE]   - Executing /workers[7]/1 (level 10)
[2022-01-25T14:36:48Z INFO ] Worker 7: search path from [24, 27, 19] to [24, 24, 24]
[2022-01-25T14:36:48Z TRACE]   - Executing /workers[8]/1 (level 10)
[2022-01-25T14:36:48Z INFO ] Worker 8: search path from [25, 25, 20] to [24, 24, 24]
[2022-01-25T14:36:48Z TRACE]   - Executing /workers[9]/1 (level 10)
[2022-01-25T14:36:48Z INFO ] Worker 9: search path from [25, 26, 19] to [24, 24, 24]
[2022-01-25T14:36:48Z TRACE]   - Executing /workers[10]/1 (level 10)
[2022-01-25T14:36:48Z INFO ] Worker 10: search path from [25, 24, 19] to [24, 24, 24]
[2022-01-25T14:36:48Z TRACE]   - Executing /workers[11]/1 (level 10)
[2022-01-25T14:36:48Z INFO ] Worker 11: search path from [26, 25, 19] to [24, 24, 24]
[2022-01-25T14:36:48Z TRACE]   - Executing /workers[12]/1 (level 10)
[2022-01-25T14:36:48Z INFO ] Worker 12: search path from [27, 26, 16] to [24, 24, 24]
[2022-01-25T14:36:48Z TRACE]   - Executing /workers[13]/1 (level 10)
[2022-01-25T14:36:48Z INFO ] Worker 13: search path from [27, 26, 17] to [24, 24, 24]
[2022-01-25T14:36:48Z TRACE]   - Executing /workers[14]/1 (level 10)
[2022-01-25T14:36:48Z INFO ] Worker 14: search path from [25, 27, 19] to [24, 24, 24]
[2022-01-25T14:36:48Z TRACE]   - Executing /workers[15]/1 (level 10)
[2022-01-25T14:36:48Z INFO ] Worker 15: search path from [25, 27, 18] to [24, 24, 24]
[2022-01-25T14:36:48Z TRACE]   - Executing /workers[16]/1 (level 10)
[2022-01-25T14:36:48Z INFO ] Worker 16: search path from [26, 26, 18] to [24, 24, 24]
[2022-01-25T14:36:48Z TRACE] Pushing at (T0 + 0 ns = 0 ms, 284): run [6: {/manager/2}]
[2022-01-25T14:36:48Z TRACE] Processing event at (T0 + 0 ns = 0 ms, 284): run [6: {/manager/2}]
[2022-01-25T14:36:48Z TRACE]   - Running late by 3967258312 ns = 3967258 µs = 3967 ms
[2022-01-25T14:36:48Z TRACE]   - Level 6
[2022-01-25T14:36:48Z TRACE]   - Executing /manager/2 (level 6)
[2022-01-25T14:36:48Z TRACE] Pushing at (T0 + 0 ns = 0 ms, 285): run [6: {/manager/2}]
[2022-01-25T14:36:48Z TRACE] Processing event at (T0 + 0 ns = 0 ms, 285): run [6: {/manager/2}]
[2022-01-25T14:36:48Z TRACE]   - Running late by 3967310524 ns = 3967310 µs = 3967 ms
[2022-01-25T14:36:48Z TRACE]   - Level 6
[2022-01-25T14:36:48Z TRACE]   - Executing /manager/2 (level 6)
[2022-01-25T14:36:48Z TRACE] Pushing at (T0 + 0 ns = 0 ms, 286): run [6: {/manager/2}]
[2022-01-25T14:36:48Z TRACE] Processing event at (T0 + 0 ns = 0 ms, 286): run [6: {/manager/2}]
[2022-01-25T14:36:48Z TRACE]   - Running late by 3967361034 ns = 3967361 µs = 3967 ms
[2022-01-25T14:36:48Z TRACE]   - Level 6
[2022-01-25T14:36:48Z TRACE]   - Executing /manager/2 (level 6)
[2022-01-25T14:36:48Z TRACE] Pushing at (T0 + 0 ns = 0 ms, 287): run [6: {/manager/2}]
[2022-01-25T14:36:48Z TRACE] Processing event at (T0 + 0 ns = 0 ms, 287): run [6: {/manager/2}]
[2022-01-25T14:36:48Z TRACE]   - Running late by 3967411629 ns = 3967411 µs = 3967 ms
[2022-01-25T14:36:48Z TRACE]   - Level 6
[2022-01-25T14:36:48Z TRACE]   - Executing /manager/2 (level 6)
[2022-01-25T14:36:48Z TRACE] Pushing at (T0 + 0 ns = 0 ms, 288): run [6: {/manager/2}]
[2022-01-25T14:36:48Z TRACE] Processing event at (T0 + 0 ns = 0 ms, 288): run [6: {/manager/2}]
[2022-01-25T14:36:48Z TRACE]   - Running late by 3967461870 ns = 3967461 µs = 3967 ms
[2022-01-25T14:36:48Z TRACE]   - Level 6
[2022-01-25T14:36:48Z TRACE]   - Executing /manager/2 (level 6)
[2022-01-25T14:36:48Z TRACE] Pushing at (T0 + 0 ns = 0 ms, 289): run [6: {/manager/2}]
[2022-01-25T14:36:48Z TRACE] Processing event at (T0 + 0 ns = 0 ms, 289): run [6: {/manager/2}]
[2022-01-25T14:36:48Z TRACE]   - Running late by 3967512653 ns = 3967512 µs = 3967 ms
[2022-01-25T14:36:48Z TRACE]   - Level 6
[2022-01-25T14:36:48Z TRACE]   - Executing /manager/2 (level 6)

The

[2022-01-25T14:36:48Z TRACE] Processing event at (T0 + 0 ns = 0 ms, 285): run [6: {/manager/2}]
[2022-01-25T14:36:48Z TRACE]   - Running late by 3967310524 ns = 3967310 µs = 3967 ms
[2022-01-25T14:36:48Z TRACE]   - Level 6
[2022-01-25T14:36:48Z TRACE]   - Executing /manager/2 (level 6)

bits repeat forever.

Parallel runtime fails to schedule actions sometimes

I've been working on implementing the MatMul benchmark, and have come across a bug, where an action isn't scheduled, even though it should be. The important reaction is here. When data is set on line 93, the connected reactions in Worker trigger, but the reaction to next seems to never trigger, even though it is scheduled at the end of the same reaction.
This problem disappears when I remove the parallel-runtime feature.

Here is the execution trace:

[2021-12-17T11:35:38Z INFO ] Assembling runner
[2021-12-17T11:35:38Z INFO ] Assembling manager
[2021-12-17T11:35:38Z INFO ] Assembling workers
[2021-12-17T11:35:38Z INFO ] Registering workers
[2021-12-17T11:35:38Z INFO ] Registering manager
[2021-12-17T11:35:38Z INFO ] Registering runner
[2021-12-17T11:35:38Z INFO ] Triggering startup...
[2021-12-17T11:35:38Z TRACE]   - Level 1
[2021-12-17T11:35:38Z TRACE]   - Executing /runner/0
[2021-12-17T11:35:38Z TRACE]   - Executing /0
Benchmark: MatMulBenchmark
Arguments:
            numIterations = 12
               dataLength = 1024
[2021-12-17T11:35:38Z TRACE]   - Executing /manager/0
           blockThreshold = 16384
               priorities = 10
               numWorkers = 20
System information:
                 O/S Name = Linux
[2021-12-17T11:35:38Z TRACE] Pushing at (T0 + 0 ns = 0 ms, 1): run [2: {/runner/1}]
[2021-12-17T11:35:38Z TRACE] Processing event at (T0 + 0 ns = 0 ms, 1): run [2: {/runner/1}]
[2021-12-17T11:35:38Z TRACE]   - Running late by 136662519 ns = 136662 µs = 136 ms
[2021-12-17T11:35:38Z TRACE]   - Level 2
[2021-12-17T11:35:38Z TRACE]   - Executing /runner/1 (level 2)
[2021-12-17T11:35:38Z TRACE]   - Level 5
[2021-12-17T11:35:38Z TRACE]   - Executing /manager/1 (level 5)
[2021-12-17T11:35:38Z TRACE]   - Level 8
[2021-12-17T11:35:38Z TRACE]   - Executing /workers[15]/0
[2021-12-17T11:35:38Z TRACE]   - Executing /workers[8]/0
[2021-12-17T11:35:38Z TRACE]   - Executing /workers[3]/0
[2021-12-17T11:35:38Z TRACE]   - Executing /workers[4]/0
[2021-12-17T11:35:38Z TRACE]   - Executing /workers[17]/0
[2021-12-17T11:35:38Z TRACE]   - Executing /workers[2]/0
[2021-12-17T11:35:38Z TRACE]   - Executing /workers[9]/0
[2021-12-17T11:35:38Z TRACE]   - Executing /workers[16]/0
[2021-12-17T11:35:38Z TRACE]   - Executing /workers[0]/0
[2021-12-17T11:35:38Z TRACE]   - Executing /workers[19]/0
[2021-12-17T11:35:38Z TRACE]   - Executing /workers[5]/0
[2021-12-17T11:35:38Z TRACE]   - Executing /workers[13]/0
[2021-12-17T11:35:38Z TRACE]   - Executing /workers[12]/0
[2021-12-17T11:35:38Z TRACE]   - Executing /workers[11]/0
[2021-12-17T11:35:38Z TRACE]   - Executing /workers[14]/0
[2021-12-17T11:35:38Z TRACE]   - Executing /workers[6]/0
[2021-12-17T11:35:38Z TRACE]   - Executing /workers[18]/0
[2021-12-17T11:35:38Z TRACE]   - Executing /workers[10]/0
[2021-12-17T11:35:38Z TRACE]   - Executing /workers[1]/0
[2021-12-17T11:35:38Z TRACE]   - Executing /workers[7]/0
[2021-12-17T11:35:38Z TRACE] Will wait for asynchronous event without timeout
[2021-12-17T11:35:38Z INFO ] Event queue is empty forever, shutting down.
[2021-12-17T11:35:38Z INFO ] Scheduler is shutting down, at (T0 + 137417726 ns = 137 ms, 0)
[2021-12-17T11:35:38Z INFO ] Scheduler has been shut down

Rename to `reactor-rs`?

For all the runtime implementations so far, we've used reactor-<file-extension>. Should we do the same for the Rust implementation?

Dedup reaction triggers

lf-lang/lingua-franca#1228 contains a test called ReadOutputOfContainedBank for which the last reaction, that has two triggers, fires twice. It should only be called once however, since triggers should be deduplicated.

Add function to get number of workers for current execution

Sparked by lf-lang/lingua-franca#1098, it would be useful to have a function to retrieve the current number of workers.
Something like ReactionCtx::num_workers.

Run Clippy in CI

See #20

Bounded buffer benchmark fails if parallelised

The bounded buffer benchmark doesn't work if parallelised. It quits before any output is produced by the benchmark runner.

Blue here is Rust.

Curiously, I've been toying with the merge_plans_after function and wrote a few different variants of it. You can find them on the mpa-var* branches. I did some benchmarking and accidentally discovered that this bug does not exist for variation 3 and 5.

It appears to have something to do with reaction plans being potentially discarded on merging.

Problem with tag assignment for asynchronous events

There was a bug in the C++ runtime, and it can also happen in Rust theoretically (I wasn't able to reproduce it with an unmodified runtime, it depends on thread interleaving).

Possible faulty execution

An async thread reads the current time, computes the tag T for its new event. Thread is parked before the event is put into the event queue.
The scheduler continues executing some reactions (eg from a timer) until tag T is exceeded.
The async thread wakes up and pushes the event into the queue.
The scheduler then sees an event that was scheduled for the past - that was a bug in C++ and would currently crash the Rust runtime (assertion failure).

C++ fix

In C++ there is a global event queue and a global mutex protecting it. The fix is to put the time reading and the pushing of the event in the same critical section.

Rust

In Rust the event queue is split:

the scheduler owns the only reference to the global, sorted queue. This is where events are popped from for execution.
each async thread uses a channel Sender to push events to the scheduler asynchronously. The Receiver end maintains an unsorted buffer of events that is periodically flushed into the main queue by the scheduler thread. Events pushed through the Sender have already been assigned a tag.

We can assume Sender/Receiver communicate atomically.

Possible solutions for the Rust runtime

Global mutex

We could reproduce the C++ solution by introducing a mutex to guard the receiver and sender. This would however defeat part of the purpose of using channels, which is that we don't need to block the async sender thread when sending something.

Let the scheduler assign tags

Another solution would be to let the scheduler thread assign tags to asynchronous events. There are several possible problems with this:

This relies on the assumption that reaction execution times are negligible. A long-running reaction could delay the tag assignment for an asynchronous event significantly. This would compromise the real-time capabilities of the runtime, however, the lag can be measured and reported on.
Another problem with this approach is that async events would be "bucketed" into fewer tags than what would be were they assigned tags asynchronously. This might make more events simultaneous than necessary.

Mixed solution

We could use the asynchronously assigned tag as long as it is greater than the latest processed tag. If it isn't, then we're in the problematic situation described above. Then, we can do something else:

crash
drop the event, and go on
assign the current latest processed tag + 1 microstep, and go on
and report in any case to the user that something wrong happened.

None of these look super appealing in the general case - maybe it should be selectable

globally, eg with a compile time feature flag.
per individual action, with an annotation in the source.

Make Clippy happy

Clippy finds a lot of issues with the current state of the code. That should be addressed.

Philosophers benchmark deadlocks during startup, but only sometimes

The Philosophers benchmark I ported from C++ has an issue, where during execution it deadlocks and runs without ever finishing, but only some of the time.

If the first iteration of the benchmark succeeds, all of them do, which suggests this is an issue that occurs during initialisation.

Attached is an execution trace of such a deadlocked run.
philosophers_fail.log

Code generated from dependencies on port banks result in just a port

While porting the FilterBank benchmark, I noticed that it uses the ports of a bank as dependency for a reaction. This is what that looks like: https://github.com/lf-lang/lingua-franca/blob/master/benchmark/Cpp/Savina/src/parallelism/FilterBank.lf#L299.

When trying to implement this in Rust, the generated code looks like this:

// --- reaction(startup) -> banks.setF, banks.setH {= ... =}
fn react_0(&mut self,
               #[allow(unused)] ctx: &mut ::reactor_rt::ReactionCtx,
               banks__setF: ::reactor_rt::WritablePort<Matrix<f64>>,
               banks__setH: ::reactor_rt::WritablePort<Matrix<f64>>,) 
{ ... }

If my understanding is correct, then banks__setF and banks__setH: should be WritablePortBanks.

TODOs

Pick a license
Rename crate, maybe to 'reactor-runtime'
Setup CI - no publishing on crates.io for now

Consider renaming the crate

Currently the repository name is reactor-rs and the crate name is reactor-rt (rt for runtime). I think this is pretty confusing and we should consider renaming the crate. As @oowekyala pointed out in #5 naming the crate reactor-rs does not make much sense either due to the redundant rs. Probably just reactor would be a more suitable crate name. This would be in analogy to the reactor namespace in reactor-cpp.

Optimize access to sparse multiports

The C++ runtime supports sparse multiports more efficiently since lf-lang/reactor-cpp#24. The set ports can be queried more efficiently because they're saved in a vector in the multiport. This apparently has significant performance impact on benchmarks like Big.

The Rust runtime cannot support this directly yet -setting a port does not notify any of the downstream ports, and ports do not know the multiports that contain them anyway. I imagine this will be difficult to implement because of the constraints on circular references in Rust. Maybe a solution would be to introduce new data structures to track which port can influence which multiport, which would be prepared during initialization.

Warning about `rustfmt` in GitHub Actions

Output from GitHub Actions:

info: checking for self-updates
warning: tool `rustfmt` is already installed, remove it from `/home/runner/.cargo/bin`, then run `rustup update` to have rustup manage this tool.
Warning: warning: tool `cargo-fmt` is already installed, remove it from `/home/runner/.cargo/bin`, then run `rustup update` to have rustup manage this tool.

This suggests that the rustfmt component listed on line 59 of .github/workflows/rust.yml might be redundant. Is it?

Release the crate on Cargo

Currently LFC is bound to a specific git revision of the crate, not a semantic version number. Using semver would allow us to publish bugfixes to the runtime without needing to upgrade LFC. But we also would need to maintain compatibility with the code generator, and also commit to the stability of the ReactionCtx and related user-facing APIs. I don't think the crate is stable enough right now to allow this.

If we start doing that, then we need a defined release cycle, possible using beta versions for all major versions that are not directly used by a released LFC version. Otherwise we will either clutter Crates.io with deprecated, unsupported releases.

Investigate performance of `merge_plans_after`

This function does a lot of cloning and attempts to optimize it, but I suspect the optimizations don't do anything.