Here we can discuss ideas around the spawn
API and underlying implementation.
- 2020-07-23 initial post, first thoughts/ideas.
- 2020-10-04 spawn from anywhere
- 2020-10-05 spawn from anywhere is here, POC implementation available
- 2020-10-18 implemented and merged to master for testing in upcoming 0.6-alpha
Background
RTIC builds on the Stack Resource Policy (SRP), providing many outstanding properties (deadlock free scheduling, single stack execution, bounded priority inversion etc.). This model has no notion of the internal communication within the system, spawn/schedule is implemented on-top of the model (for which safety as well as analysis RTIC is responsible to). It can be seen as spawned or scheduled task emerge from the environment (and indeed they do under the hood, as pended interrupts). RTIC has knowledge on the sender/receiver relations through the app and task attributes, allowing us to under the hood generate RTIC resources for queues, timers etc.
Aside: multi-core support is not part of SRP, our approach essentially abstracts a multi-core system as partitioned into multiple single core systems with local environments (so a message passing goes from the sender's environment to the receiver's environment).
Message queues
Focusing in on message passing and safety. As mentioned in the Background, SRP does not really cover message passing/message queues directly. However we may leverage an the concept of resources for the implementation (locking of queues).
E.g., if we have two senders a
at priority 2, and b
at priority 4, spawning a task c
at priority 3, then we can see this as a MPSC queue. One approach is to adopt the RTIC resource abstraction and use locks on the sender side. One can also exploit the underlying HW for lock free implementations (note that locks is not enough to deal with the multi-core case, as SRP based locking offer only core-local protection).
Locking has the drawback of priority inversion, in the above, (other) tasks at priority 3 will be blocked due to queue locks from task a
(priority 2), as the send queue would have the ceiling 4. Notice though that the blocking is non-transitive (a property of SRP, see below).
Schedulability
Under SRP, we have bounded priority inversion. A task at priority i
, is blocked at most for the largest critical section duration of lower priority tasks j<i
accessing a resource at priority k >= i
. It may sound like theoretical "mumbo jumbo", but it brings a valuable insight. If the critical section is short enough for queue accesses, then it will NEVER dominate the blocking (other resources accesses will dominate).
Let us assume, that we see all queues as having a ceiling equal to the maximum priority of the system, then we can simply use an interrupt free critical section. Assume this critical section is only a few machine instructions, the impact to schedulability will be small. (In comparison, only the tasks at higher priority than the highest priority sender in the system will actually benefit from the resource based locking.)
What about lock free queues? Using lock free data structures has both pros and cons. It (typically (1)) requires HW support for the atomics and execution time is non-uniform. For Cortex M3 and above, the execution time (number of possible re-tries) will depend on number of preemptions independent on actual contention. If the interarrival time in between interrupts at each priority level is known this can be taken into consideration for analysis. However, atomics are not HW supported on M0.
As mentioned above, for multi-core RTIC critical sections (locks) does not suffice, we need either HW atomics or some software synchronization.
(1) https://en.wikipedia.org/wiki/Dekker%27s_algorithm).
Why bother?
As of now, we provide a spawn
API, where each spawnable task is a method on the spawn
struct provided to the task context. It is unclear how (or even if) the user can install a callback to RTIC tasks on creation of a driver using the current API. As the underlying SRP scheduling does not really care from where a task is requested for execution, this should be possible and a welcome extension to the programming model (maybe also facilitating the triggering mechanism for async dispatchers, but that is another story). The only thing we need to ascertain is that the queue management is safe.
Potential solutions.
-
Make spawn
of messages "free for all" (i.e. a direct API to ready queue for the task). This would allow the spawns
list to be removed from the task attribute, and allow callbacks to be freely used (e.g., as part of initializing a driver). This requires either lock free queues or the adoption of global critical sections (the latter may not be terrible regarding schedulability as discussed above).
-
Change or add to the context a more general way to get a handle to the specific spawn, e.g. cx.spawn(task)
, where task
implements some trait Spawnable
. Here we have to make sure that the priority
used in the locking mechanism is correctly handled (an under the hood detail). This requires the RTIC task that invokes the driver (typically on behalf on an interrupt) to pass the current priority (part of context) somehow (as opposed the the "free for all" approach discussed above).
Spawn from anywhere, 2020-10-04
- Use global critical section for the underlying queue (low implementation effort, CS for just allocation/de-alloc should be enough)
- Optimization, for tasks without arguments we can implement the queue as a simple counter.
Syntax:
#[rtic::app(device = lm3s6965)]
mod app {
#[init(spawn = [foo, foo2])]
fn init(_c: init::Context) {
foo::spawn(1, 2).unwrap();
}
#[task()]
fn foo(_c: foo::Context, x: i32, y: u32) {
hprintln!("foo {}, {}", x, y).unwrap();
if x == 2 {
debug::exit(debug::EXIT_SUCCESS);
}
foo2::spawn(2).unwrap();
}
#[task()]
fn foo2(_c: foo2::Context, x: i32) {
hprintln!("foo2 {}", x).unwrap();
foo::spawn(x, 0).unwrap();
}
// RTIC requires that unused interrupts are declared in an extern block when
// using software tasks; these free interrupts will be used to dispatch the
// software tasks.
extern "C" {
fn SSI0();
}
}
Implementation/code gen:
Each priority level p
holding software tasks, will associated a dispatcher D(p)
.
A dispatcher D(p)
holds an array [Q;n]
, where
Q(c,d[a;c])
, where c
is an optional capacity, d[a;c]
is the static data storage of arguments a
with size c
; in case a = {}
(no arguments) the d[{};c]
is a simple counter.
The dispatcher also holds a FIFO [i;sum(c)]
, for tasks spawned (but not yet dispatched), this allows to implement fairness and ordering between tasks at the same priority level. If the capacity of a single task is reached, a spawn is rejected and never inserted in the FIFO. When the dispatcher is run it will always pick the "oldest" task to run. Spawning a task amounts to pending the associated interrupt.
We need to generate code for
- the dispatcher
Dequeues the FIFO, i = FIFO.deq, calls dispatch on Q[i], which dequeues the arguments and calls the corresponding software task, this is done in a while loop until FIFO is empty.
- the spawn API
Each software task amounts to a module, which contains a free function fn spawn(args) -> Result<(), (args)>
that enqueues the the arguments, if capacity is not reached it will enqueue its index in the FIFO. (args
matching the task signature payload.)
POC implementation, using mod
instead of const
.
https://github.com/rtic-rs/cortex-m-rtic/tree/spawn_experiment
For a software task (foo
in the above), the following code is generated.
///Software task
pub mod foo {
...
pub fn spawn(_0: i32, _1: u32) -> Result<(), (i32, u32)> {
use rtic::Mutex as _;
let input = (_0, _1);
unsafe {
if let Some(index) = crate::app::foo_FQ.dequeue() {
crate::app::foo_INPUTS
.get_unchecked_mut(usize::from(index))
.as_mut_ptr()
.write(input);
crate::app::P1_RQ.enqueue_unchecked((crate::app::P1_T::foo, index));
rtic::pend(lm3s6965::Interrupt::SSI0);
Ok(())
} else {
Err(input)
}
}
}
}
The actual code generation is done in macros/src/codegen/module
, since we generate the spawn
function in the namespace of the software task.
The corresponding code:
items.push(quote!(
#(#cfgs)*
pub fn spawn(#(#args,)*) -> Result<(), #ty> {
// #let_instant // do we need it?
use rtic::Mutex as _;
let input = #tupled;
// TODO: use critical section, now we are unsafe
unsafe {
if let Some(index) = #app_path::#fq.dequeue() {
#app_path::#inputs
.get_unchecked_mut(usize::from(index))
.as_mut_ptr()
.write(input);
// #write_instant, do we need?
#app_path::#rq.enqueue_unchecked((#app_path::#t::#name, index));
#pend
Ok(())
} else {
Err(input)
}
}
}));
The current POC is likely unsound, we need critical sections or atomic/lock free implementations of the underlying queues.
Another caveat, is that the RTIC channel analysis is a bit too smart, if a software task is not spawned nor dispatched RTIC will not generate corresponding queues (and dispatcher). For a spawn from anywhere the implementation need to change to always generate code. (The analysis is done in the syntax
crate.) If moving to the new spawn from anywhere, the syntax can be simplified, as well as the accompanying analysis and code generation.
Some remarks:
We need to decide where to put the queues. For now they all reside in the centralized namespace of the RTIC mod app
, prepended by the task name. However, one could think of using the enclosing modules instead. This is mostly a matter of taste, as access to queues outside the API remains unsafe (due to static declarations) so its not really a matter of safety, nor end-user convenience or application performance (only matters for the implementation behind the scenes).
The dispatcher is currently untouched, and the new spawn
implementation can live concurrently with the old spawn
. The implementation does not yet completely follow that of the sketched design.