rtic-rs / rfcs Goto Github PK

rfcs's Introduction

Real-Time Interrupt-driven Concurrency

Coordination repository of the development of the Real-Time Interrupt-driven Concurrency framework

This repository issue tracker is used by the developers to coordinate efforts towards making RTIC a great choice for embedded real-time development.

Want to get started with real-time embedded development with Rust?

Check out the Rust-embedded book and the RTIC book

The developers

Hibernating

The following members have put themselves into hibernation the hibernation state, due to being absent or busy for an extended amount of time. See ops/hibernating.md.

Contact

To come into contact with us please open an issue here
Or come and talk to us in the RTIC Matrix Room

RFCs

When the team deems it necessary the RFC process may be used to make decisions or to design processes, user interfaces, APIs, etc.

Learn more about the Rust's RFC process (which is the same as our own) here.

To create an RFC, simply:

clone this repo to your own personal one
copy 0000-template.md to text/0000-my-feature.md (where "my-feature" is descriptive. Don't assign an RFC number yet)
fill in the details of your RFC in that file
Open an pull request with this repository

rfcs's People

Contributors

Stargazers

Watchers

Forkers

perlindgren placrosse korken89 jamesmunns afoht mabezdev

rfcs's Issues

[discussion] changing the project name / acronym

Over the course of the two years I have been working on this project I have received several comments expressing displeasure over the acronym chosen for the project (RTFM). These comments have been along the lines of:

"The acronym of the project is terrible for searching it on the web because it coincides with the acronym for 'Read The Fucking Manual'."
"Some corporate environments may frown upon the acronym."
Some people have pointed out the disconnect (contradiction?) between the acronym and Rust's goal of being a welcoming, inclusive and empowering programming language / community.

Of course, I have also heard some people say that they find the acronym funny. However, those comments have been few in comparison to the first kind of comments. As the project has been growing in popularity the number of comments of disapproval I get has increased as well; I think the upcoming minor release is a good time to address this issue.

This thread is to discuss the possibility changing the name of the project or at the very least changing the acronym (e.g. to RTmass). I would greatly appreciate people sharing their thoughts on why they find that the current acronym is no good and / or how the project would benefit from changing its name / acronym. Name and acronym suggestions are welcome as well.

A name / acronym change would entail renaming the crate for the upcoming minor release (v0.5.0) and using this new name for the GitHub organization proposed in RFC rtic-rs/rtic#203.

I would gladly welcome a name that better conveys that the framework is for building any kind of multitasking / event-driven embedded application. I have often heard phrases like "RTFM is for building real-time applications" implying that one would only use RTFM for coding real time applications; this is clearly not true as evidenced by the many non-real-time projects built with RTFM.

Finally, I must stress that renaming the project or changing its acronym would not change or invalidate its history. This project, regardless of its name, will continue to be related to the Real Time for the Masses language and all the research associated to it -- this connection is stated in the project's README and book and will remain there.

Some thoughts on safety.

This "issue" is intended to spawn a discussion on hardware abstractions, safe vs. unsafe, and possibly other methods to allow for improving reliability and robustness of embedded applications.

Some initial reflections on the Rust embedded ecosystem, safe vs. unsafe etc.

cortex-m
Writing of BASE-PRI register is marked unsafe. The register access is atomic. So why unsafe? Well, one explanation is that it can be used (as in RTFM) to implement resource protection, and a write could violate memory safety guaranteed by the protection mechanism.

Writing of PSP register is marked unsafe. The register access is atomic. So why unsafe? Well, one explanation is that a write can indirectly cause erroneous stack accesses.

The list goes on...

Take-away, we currently use unsafe to protect from indirect effects to code execution OUTSIDE of the Rust memory model. Rust has no notion of resource protection by itself.

svd2rust/volatile-register
Register write is atomic, still it is marked unsafe, right?. Why? Notice here, access through generated fields is marked safe. Is there any difference regarding the Rust aliasing rules?

As seen above, the use of unsafe has no direct bearing on the Rust aliasing rules. So why are we using unsafe then? Well to prevent from unintended use. (And one easy way to accomplish that is through an unsafe barrier, but perhaps not the best/only way.)

Alternatives:

Propose a primitive extension to Rust, like root { ... } to indicate that root access is needed, and have user code maker #![forbid(root_code)], possibly extending https://internals.rust-lang.org/t/disabling-unsafe-by-default/7988.
Use the Rust type system. (Similar to type state, with ownership of root access.)
Use scoping, to hide dangerous APIs, allowing only "root" crates to access "root" functions. (Not sure how this can be done without 1, but perhaps ...)
Deploy post processing, analysing generated code, and rejecting illegal/dangerous accesses.
Something else....

With the goal to give guarantees to safe and reliable operation (even without additional HW support, and/or costly SYSCALL APIs), we need figure out how the Rust embedded ecosystem should be designed. I believe we have a unique opportunity to offer a correct by design approach, reaching far beyond what embedded Rust currently offers.

Some Challenges:

Mode changes, wake-up from deep sleep, hard fault recovery etc.
Liveness and robustness, (user injected NMIs, System Reset etc.)
...

Addressing these ambitious goals, while still leveraging on the ongoing efforts, HAL development etc., adds another dimension to the problem. We don't want to end up in a MISRA like situation, where only a crippled subset of Rust (and Rust developments) can be used for developing robust/reliable and safe and sound applications.

Ane approach may be to take a step back, and put embedded Rust in scope of the embedded system as a whole. This shifts the paradigm of seeing the obligations of embedded Rust merely to comprise the Rust code per-se, into the view that the Rust application co-exists with other processes and communicates with these through hardware I/O, special purpose processor registers etc. What are the semantics of embedded Rust in that context, and how can we model such external dependencies? (E.g., a register read, that clears a bit on another memory location, like a peripheral. Can we model bit-banding, etc.)

What other aspects should we consider?

SeqCst around critical sections.

The low level lock operation resides in the src/export.rs and implements mutual exclusion by manipulating either the global interrupt enable (interrupt::free) or masking the interrupts through the BASEPRI register (for the armv7m).

In order to manipulate the underlying HW, we need to perform a few ARM assembly instructions. Here is the crux. Currently, inline asm is not stabilized (see rust-lang/rfcs#2873 for ongoing work).

In the meantime (to get it working with current stable toolchain) we use a "hack", where the actual assembly functions are linked with the application.

Although not stated explicitly (at least not here https://doc.rust-lang.org/nomicon/ffi.html), it makes sense to assume that SeqCst is ensured for FFI calls (as they can as in our case have side effects).

As a proof of concept the ordering branch implements compiler_fence(Ordering::SeqCst) around critical sections. This implies a "compiler fence" around the closure to be executed as a critical section for the locked resource.

#[cfg(armv7m)]
#[inline(always)]
pub unsafe fn lock<T, R>(
    ptr: *mut T,
    priority: &Priority,
    ceiling: u8,
    nvic_prio_bits: u8,
    f: impl FnOnce(&mut T) -> R,
) -> R {
    let current = priority.get();

    if current < ceiling {
        if ceiling == (1 << nvic_prio_bits) {
            priority.set(u8::max_value());
            let r = interrupt::free(|_| {
                compiler_fence(Ordering::SeqCst);
                let r = f(&mut *ptr);
                compiler_fence(Ordering::SeqCst);
                r
            });
            priority.set(current);
            r
        } else {
            priority.set(ceiling);
            basepri::write(logical2hw(ceiling, nvic_prio_bits));
            compiler_fence(Ordering::SeqCst);
            let r = f(&mut *ptr);
            compiler_fence(Ordering::SeqCst);
            basepri::write(logical2hw(current, nvic_prio_bits));
            priority.set(current);
            r
        }
    } else {
        f(&mut *ptr)
    }
}

and

#[cfg(not(armv7m))]
#[inline(always)]
pub unsafe fn lock<T, R>(
    ptr: *mut T,
    priority: &Priority,
    ceiling: u8,
    _nvic_prio_bits: u8,
    f: impl FnOnce(&mut T) -> R,
) -> R {
    let current = priority.get();

    if current < ceiling {
        priority.set(u8::max_value());
        let r = interrupt::free(|_| {
            compiler_fence(Ordering::SeqCst);
            let r = f(&mut *ptr);
            compiler_fence(Ordering::SeqCst);
            r
        });
        priority.set(current);
        r
    } else {
        f(&mut *ptr)
    }
}

Initial measurements verify no added OH due to the explicit fences.

Is it needed? Likely not.
Does it hurt? Likely not.

What about aggressive LTO optimization. Well as long as semantic information is preserved throughout the compilation and link time optimization, nothing bad should happen. What this experiment demonstrates is that it is trivial to put a safeguard around our critical sections. With the assumed semantics of FFI enforcing SeqCst, it should not have any adverse effect on possible optimizations (as the SeqCst is already there).

Can we do better in the future?

Well maybe. What we need to enforce is that the data structure passed to the closure is accessed only from within the critical section (enforced by means of the HW manipulation). So a more fine grained consistency model might be possible to adopt. The C+++/LLVM model focus ordering regarding atomics. In our case a resource can be a non atomic data structure, so I'm not sure the atomic model is useful here.

Discussion: What exactly is a critical section?

In the original Real-Time For the Masses Model of Computation (RTFM-MoC), a resource is merely a named critical section (not related to any specific data-structure - rather a way to control concurrency). In this context, locking a resource just implies that that you can perform a sequence of operations guaranteed not to be preempted by any equally named critical section. In effect, the SeqCst gives us exactly that (even if we currently use it mostly for avoiding race conditions). What are the potential uses of named critical sections then? Well, if you want to ensure that a specific sequence of side effects is obtained, e.g., first enabling/powering some HW, then writing some registers in predefined order. How is that accomplished in RTIC. Well using the PAC abstraction, Register Blocks are (typically) treated as RTIC Resources. So in order to access the HW (through the Register Block) we can either have exclusive access or gain access to the block through locking the wrapping resource. The actual access is then done through an &mut to the Register Block. It is an abstraction that works, but can in cases feel to restrictive or too permissive. We might want to take a step back to the idea of named critical sections, and see if that would allow for better abstractions (take shared bus as an example). Is really exclusive access to atomic registers the right abstraction (what kind of guarantees does it really give)? See #29 for some further thoughts.

Road Map

Place Holder for further discussions.

The idea is that this issue would provide a list of "issues" and "rfcs" for discussing the overall direction of RTFM.

Feature requests

API to cancel / re-schedule tasks. Asked in #8; RFC in #7
RTFM & TrustZone / statically controlling secure / non-secure subroutines / memory. Requested in #5
Multiple timer queues. See #10 & #23
read-write locks. Originally proposed in #14 and then briefly revisited in rtic-rs/rtic#212
Generators. RFC in #22. See also #18
Statistics collection. See #21
Extended timing semantics. See #23
Plugin architecture #25
RTFM quickstart template. See #26
(..)

Memory Lane - Zero Cost Logging with deferred formatting

This RFC is discussing experiments on Zero Cost Logging for RTFM based applications.

v0.1.0, 2020-03-21
v0.1.1, 2020-03-21 (added more considerations)
v0.1.2, 2020-03-24 (altered syntax, to be more Rust like)
v0.2.0, 2020-03-25 (alternative approach leveraging messages queues, and relation to run-time verification)
v0.2.1 2020-03-31 (alternative approach considerations, improved clarity)

Motivation

Logging (aka printf debugging) is an instrumental tool (no pun) for the developer. However, the overhead of traditional target side logging implies costly formatting and transmission delays that might be intrusive to the behaviour of the application.

Deferred formatting.
By deferring formatting to the host side the overhead may be significantly reduced. This however implies that the host need to be able to reconstruct the type of data produced by the target.
Transport.
By using in memory buffers polled by the host instead of the target using the CPU to actively transmit data, we may reduce the negative impact of logging/tracing to the behaviour of the application.
Soundness.
In a preemptive system like RTFM, logging operations may be interrupted by higher priority task (that may also perform logging). Multiple logging streams may be adopted to allow for lock and race free access.

Overall Design

In order to address the above challenges different approaches may be adopted. Here one approach based on typed "memory lanes" is proposed, that exploits the declarative and static nature of RTFM.

Deferred formatting.

The task declaration is extended with a set of typed logging channels.

#[task(log = enum Task1 {PkgLen([u8;1]) , Header([[u8;4];1], NO_TIME_STAMP), Time([Instant,2]})]
fn task1(cx:task1::Context) {
  log.info(Task1::Time(cx.scheduled).unwrap();
  ...
  log.trace(Task1::PkgLen(pgk.len()).unwrap();
  ...
  log.info(Task1::Header(pkg.header).unwrap();
  ...
  log.info(Time(Instant::now())).unwrap(); // should be very close to the time stamp done by the logger.
}

RTFM will generate static SPSC queues for each enum variant, where the produces is the application and the consumer is the host.

The implementation of lanes.trace/info etc. enqueues the raw data (T:Copy).

On the host side the the corresponding data is dequeued (and the consumer pointer updated). Here might opt out a time stamp (likely time stampiing should be the default).

An Err(_) result indicate that that the queue is saturated (to this end we need to increase the size of the corresponding buffer).

The RTFM app proc macro generates the host side logger (valid for the particular compilation instance), with shared knowledge of the set of lanes and their enum types (including buffer sizes). The generated host level logger also holds information on the task instance.

A possible rendering of the above:

[0x0000_0123] Info  : task1: Time(0x0000_0100) // the scheduled time
[0x0000_0140] Trace : task1: PkgLen(32)
[0x0000_0160] Info  : task2: ... some logging from a higher prio task
// here we can think of the logger to split incoming data per task to different output streams (similar to the `itm` tool).
...
[-----------] Info  : task1: Header([1,2,3,4])
[0x0000_0223] Info  : task1: Time(0x0000_0210) // Instant.now()

If wall clock time translation of Instant is available, we could even provide time in seconds, etc. but this is another topic... (Notice, we do not want the host to time stamp the logging data since that will be fairly inaccurate in comparison.)

Transport.

The generated host side logger will poll the queues by asynchronous read-memory operations. To this end probe.rs may be used as an outset. The actual locations of the generated SPCS queues is found in the elf (after linking). Tools that may come in handy are:

xmas-elf/gimili, for retrieving elf information.
probe.rs, for asynchronous memory operations.
Soundness.
Since we use SPCS queues races on both sides should be possible to avoid (by careful implementation).

Considerations

This is just a sketch. Some initial experiments and work by @japaric (on deferred formatting and in-memory logging), and @korken89 (on elf parsing), have explored simar approaches. None however have taken the outset of RTFM and automatic code generation.

The hereby proposed approach brings some unique opportunities by exploiting type information present in the system under test. It also allows fine grained control over buffer sizes.

However, the approach comes with the complexity of code generation of the logger, and parsing of the elf to deduce the memory locations. To this end a wrapping build script could be a possible solution, generating a "runner" for the compilation instance.

The memory lane based approach offers logging/tracing without the need for additional communication channels for the transport, e.g. ITM/serial etc. Thus, it simplifies setup. Moreover, it would enable tracing on targets without ITM support, without allocating additional serial peripherals. Dependent on implementation it could complement traditional tracing (with target side formatting, such as ITM tracing, potentially using the log facade.

In theory it should play well with other debug features as breakpoints etc., over the same host interface (probe.rs already provides rudimentary gdb stubs, and initial vscode integration).

Zero Cost?

The question here to ask, "can we do better?".

Logging cost:
The cost of logging boils down to copying the data. It would be very hard to prove soundness (race freeness) without copying. (Perhaps possible by using a guard on the data, but the copy approach seems the right way here.) Since we use an SPCS queue, we already have info on the availability of new data, we need no additional signalling. Structured data as in structs, enums, arrays etc. are copied verbatim to memory (and is efficiently implemented by the Rust compiler). As time stamping can be opted out, we only need to take the cost in case we want.

In the extreme, a typed log message, would require no copying, on the sender side, just an increment by 1 on the producer side of the SPCS (a plain enum variant like ThisHasNoData, would still be implemented as byte sized data item, as we are using the producer pointer as a means to indicate an "event"). This should boil down to a few CPU cycles!

Transport cost:
The only foreseeable implication is bus contention (between the host and the target accessing the same memory bus on in the target's memory). This overhead might be mitigated for systems with several memory banks on different busses (and allocate the SPCS queues in a dedicated bank), an isosteric optimization not likely to be needed, unless under extreme stress (e.g., DMA activities).

Implementation complexity.

Well, there is no free lunch! This will take some effort to iron out. To mention a few remaining challenges. The code generation is likely quite tricky. The build system, to ensure consistency between source code and generated logger. The transport and queue soundness, needs careful implementation.

Limitations and correctness.

The approach builds on Copy types, and make the assumption that that we can re-construct the structural representation on the host side, directly by transmuting the data (retrieved from the target). This makes some assumptions on byte ordering, and layout. Since we do not depend on pointers, and other Rust types have explicit size, there is a good chance it holds, (under the assumption we use repr(C). Maybe there will be dragons? Let's find out!

rtic-sync: DynSender and DynReceiver

I'd like to have dynamically dispatched versions of Sender and Receiver with type erased queue size.

struct Sender<'a, T, const N: usize>(&'a Channel<T, N>);

// no queue size in the type!
struct DynSender<'a, T>(&'a dyn ChannelImpl<T>);

Similar feature is already implemented in embassy_sync.

Motivation

I'm trying to implement an actor model around RTIC. Each actor is supposed to have an address that can be used to send messages to it.

struct Addr<A: Actor, const N: usize> {
    sender: Sender<'static, A::Msg, N>
}

Having the channel's size embedded in the Address type is inconvenient and makes some patterns impossible.

Implementation

trait ChannelImpl<T> {
    fn send_footer(&mut self, idx: u8, val: T);
    fn try_send(&mut self, val: T) -> Result<(), TrySendError<T>>;
    // oh no! #![feature(async_fn_in_trait)] required, more on that later...
    async fn send(&mut self, val: T) -> Result<(), NoReceiver<T>>;
    fn try_recv(&mut self) -> Result<T, ReceiveError>;
    async fn recv(&mut self) -> Result<T, ReceiveError>;
    fn is_closed(&self) -> bool;
    fn is_full(&self) -> bool;
    fn is_empty(&self) -> bool;
    fn handle_sender_clone(&mut self);
    fn handle_receiver_clone(&mut self);
    fn handle_sender_drop(&mut self);
    fn handle_receiver_drop(&mut self);
}

impl<T, const N: usize> ChannelImpl<T> for Channel<T, N> {
     // -- snip! --
}

struct Sender<'a, T, const N: usize>(&'a Channel<T, N>);

impl<'a T, const N: usize> Sender<'a, T, N> {
        fn into_dyn(self) -> DynSender<'a, T> {
            let sender = DynSender(self.0);
            core::mem::forget(self);
            sender
        }

        #[inline(always)]
        fn try_send(&mut self, val: T) -> Result<(), TrySendError<T>> {
             // forward
             self.0.try_send(val)
        }
        
        // forward implementation for the other methods
        // -- snip! --
}

struct DynSender<'a, T>(&'a dyn ChannelImpl<T>);

impl<'a T> DynSender<'a, T> {
        #[inline(always)]
        fn try_send(&mut self, val: T) -> Result<(), TrySendError<T>> {
             // forward
             self.0.try_send(val)
        }
        
        // forward implementation for the other methods
        // -- snip! --
}

Of course, we can't have async functions in traits, but we could work around that by manually implementing SendFuture and RecvFuture types and returning them from regular functions.

fn send(&mut self, val: T) -> SendFuture;

I'd be happy to write a PR if I get a green light, I may want to ask for some help implementing the futures as I've never done this before.

Lock optimization RFC Do Not Merge

Hi Folks.

Guess what, the worlds fastest scheduler just got a tiny bit faster :)
This approach should merge nicely with Goodbye Exclusive #17 and mut Resources #14.

Edit: Typos and clarification, added disassembly and updated Notes.md.

Notes for lock optimization

Idea

Current implmentation always reads and writes BASEPRI on entry/exit of an interrupt (this is done by the cortex-m-rtfm/src/export::run which is a trampoline to execute the actual task).

Using this approch, we are reading BASEPRI if and only if we are actually changing BASEPRI.

On restoring BASEPRI (in lock) we chose to restore the original BASEPRI value if we at the outmost nesting level (initial priority of the task). In this way, we can avoid unnecessary BASEPRI accesses, and reduce register pressure.

If you want to play around checkout the lockopt branch and use:

> arm-none-eabi-objdump target/thumbv7m-none-eabi/release/examples/lockopt -d > lockopt.asm

We extend cortex-m-rtfm/src/export::Priority with additional fields to store init_logic (priority of the task) and old_basepri_hw. The latter field is initially None on creation.

// Newtype over `Cell` that forbids mutation through a shared reference
pub struct Priority {
    init_logic: u8,
    current_logic: Cell<u8>,
    #[cfg(armv7m)]
    old_basepri_hw: Cell<Option<u8>>,
}

impl Priority {
    #[inline(always)]
    pub unsafe fn new(value: u8) -> Self {
        Priority {
            init_logic: value,
            current_logic: Cell::new(value),
            old_basepri_hw: Cell::new(None),
        }
    }

    #[inline(always)]
    fn set_logic(&self, value: u8) {
        self.current_logic.set(value)
    }

    #[inline(always)]
    fn get_logic(&self) -> u8 {
        self.current_logic.get()
    }

    #[inline(always)]
    fn get_init_logic(&self) -> u8 {
        self.init_logic
    }

    #[cfg(armv7m)]
    #[inline(always)]
    fn get_old_basepri_hw(&self) -> Option<u8> {
        self.old_basepri_hw.get()
    }

    #[cfg(armv7m)]
    #[inline(always)]
    fn set_old_basepri_hw(&self, value: u8) {
        self.old_basepri_hw.set(Some(value));
    }
}

The corresponding lock is implemented as follows:

#[cfg(armv7m)]
#[inline(always)]
pub unsafe fn lock<T, R>(
    ptr: *mut T,
    priority: &Priority,
    ceiling: u8,
    nvic_prio_bits: u8,
    f: impl FnOnce(&mut T) -> R,
) -> R {
    let current = priority.get_logic();

    if current < ceiling {
        if ceiling == (1 << nvic_prio_bits) {
            priority.set_logic(u8::max_value());
            let r = interrupt::free(|_| f(&mut *ptr));
            priority.set_logic(current);
            r
        } else {
            match priority.get_old_basepri_hw() {
                None => priority.set_old_basepri_hw(basepri::read()),
                _ => (),
            };
            priority.set_logic(ceiling);
            basepri::write(logical2hw(ceiling, nvic_prio_bits));
            let r = f(&mut *ptr);
            if current == priority.get_init_logic() {
                basepri::write(priority.get_old_basepri_hw().unwrap());
            } else {
                basepri::write(logical2hw(priority.get_logic(), nvic_prio_bits));
            }
            priority.set_logic(current);
            r
        }
    } else {
        f(&mut *ptr)
    }
}

The highest priority is achieved through an interrupt_free and does not at all affect the BASEPRI. Thus it manipulates only the "logic" priority (used to optimize out locks).

For the normal case, on enter we check if the BASEPRI register has been read, if not we read it and update old_basepri_hw. On exit we check if we should restore a logical priority (inside a nested lock) or to restore the BASEPRI (previously stored in old_basepri_hw).

Safety

We can safely unwrap the get_old_basepri_hw: Option<u8> as the path leading up to the unwrap passes an update to Some or was already Some. Updating get_old_basepri_hw is monotonic, the API offers no way of making get_old_basepri_hw into None (besides new).

Moreover new is the only public function of Priority, thus we are exposing nothing dangerous to the user. (Externally changing old_basepri_hw could lead to memory unsafety, as an incorrect BASEPRI value may allow starting a task that should have been blocked, and once started access to resources with the same ceiling (or lower) is directly granted under SRP).

Implementation

Implementation mainly regards two files, the rtfm/src/export.rs (discussed above) and macros/src/codegen/hardware_tasks.rs. For the latter the task dispatcher is updated as follows:

        ...
        const_app.push(quote!(
            #[allow(non_snake_case)]
            #[no_mangle]
            #section
            #cfg_core
            unsafe fn #symbol() {
                const PRIORITY: u8 = #priority;
                #let_instant
                crate::#name(
                    #locals_new
                    #name::Context::new(&rtfm::export::Priority::new(PRIORITY) #instant)
                    );
            }
        ));
        ...

Basically we create Priority (on stack) and use that to create a Context. The beauty is that LLVM is completely optimizing out the data structure (and related code), but taking into account its implications to control flow. Thus, the locks AND initial reading of BASEPRI will be optimized at compile time at Zero cost.

Overall, using this approach, we don't need a trampoline (run). We reduce the overhead by at least two machine instructions (additional reading/writing of BASEPRI) for each interrupt. It also reduces the register pressure (as less information needs to be stored).

Evaluation

The examples/lockopt.rs shows that locks are effectively optimized out.

Old Implementation

00000130 <GPIOB>:
 130:	21a0      	movs	r1, #160	; 0xa0
 132:	f3ef 8011 	mrs	r0, BASEPRI
 136:	f381 8811 	msr	BASEPRI, r1
 13a:	f240 0100 	movw	r1, #0
 13e:	f2c2 0100 	movt	r1, #8192	; 0x2000
 142:	680a      	ldr	r2, [r1, #0]
 144:	3201      	adds	r2, #1
 146:	600a      	str	r2, [r1, #0]
 148:	21c0      	movs	r1, #192	; 0xc0
 14a:	f381 8811 	msr	BASEPRI, r1
 14e:	f380 8811 	msr	BASEPRI, r0
 152:	4770      	bx	lr

00000154 <GPIOC>:
 154:	f240 0100 	movw	r1, #0
 158:	f3ef 8011 	mrs	r0, BASEPRI
 15c:	f2c2 0100 	movt	r1, #8192	; 0x2000
 160:	680a      	ldr	r2, [r1, #0]
 162:	3202      	adds	r2, #2
 164:	600a      	str	r2, [r1, #0]
 166:	f380 8811 	msr	BASEPRI, r0
 16a:	4770      	bx	lr

With lock opt. We see a 20% improvement for short/small tasks.

00000128 <GPIOB>:
 128:	21a0      	movs	r1, #160	; 0xa0
 12a:	f3ef 8011 	mrs	r0, BASEPRI
 12e:	f381 8811 	msr	BASEPRI, r1
 132:	f240 0100 	movw	r1, #0
 136:	f2c2 0100 	movt	r1, #8192	; 0x2000
 13a:	680a      	ldr	r2, [r1, #0]
 13c:	3201      	adds	r2, #1
 13e:	600a      	str	r2, [r1, #0]
 140:	f380 8811 	msr	BASEPRI, r0
 144:	4770      	bx	lr

00000146 <GPIOC>:
 146:	f240 0000 	movw	r0, #0
 14a:	f2c2 0000 	movt	r0, #8192	; 0x2000
 14e:	6801      	ldr	r1, [r0, #0]
 150:	3102      	adds	r1, #2
 152:	6001      	str	r1, [r0, #0]
 154:	4770      	bx	lr

GPIOB/C are sharing a resource (C higher prio). Notice, for GPIOC there is no BASEPRI manipulation at all.

For GPIOB, there is a single read of BASEPRI (stored in old_basepri_hw) and just two writes, one for entering critical section, one for exiting. On exit we detect that we are indeed at the initial priority for the task, thus we restore the old_basepri_hw instead of a logic priority.

Limitations and Drawbacks

None spotted so far.

Observations

> llvm-objdump target/thumbv7m-none-eabi/release/examples/lockopt -d > lockopt.asm

> cargo objdump --example lockopt --release -- -d > lockopt.asm

Neither give assembly dump with symbols (very annoying to rely on arm-none-eabi-objdump for proper objdumps), maybe just an option is missing?

Binding panic handlers

Apologies if this is the wrong repository for the issue.

It would be really useful to be able to bind a task to the panic handler, allowing shared resources to be used during a panic. Maybe this is really hard to make sound, I'm not sure, but it would make it much easier to handle making systems safe and emitting diagnostics at panic. I appreciate that for a lot of people the various panic handler crates are used (panic-halt, panic-semihosting, etc) which wouldn't really tie in to RTFM, but I often need a custom panic handler to ensure hardware safety, which would be easier if it could integrate into RTFM.

Timer Queue without Systick

I am experimenting with RTFM on the nrf52840 and am having some issues with the timer queue because of the chip's unconventional Systick implementation.

According to page 19 of the nrf52840 Product Specification v1.1, the Systick does not count when the CPU is stopped, i.e. during WFI, WFE, or SLEEPONEXIT. This means that the idle function has to be an empty loop if otherwise not used, and cannot go to sleep. This somewhat limits the value of the timer queue since the application cannot turn off the CPU and save power when there are no timer events ready.

I expect the timer queue to remain using systick by default, but I wonder if there is room for some kind of interface to use another timer when advantageous for a particular target. Nordic suggests using the RTC timer instead, or a TIMER. I don't know what other platforms this might apply to.

Plugin architecture

This is not really an RFC as of yet, but rather a thought.

How would we best allow for customization of RTFM, say e.g., adding a custom reporting pass (based on the task/resource data collected).

E.g., in the original RTFM implementation, a dot graph for the complete system and resource sharing was produced.

We could of course build such things into the Rust RTFM mainline, but ... adding more and more will bloat RTFM, and dependent on use case we might want to present different types of graphs.

So maybe some sort of parse friendly output format should be used, and custom reporting could be done by an external crate (invoked by build script perhaps...).

What data should we put there, and in what format, json, html, toml, hit me up...

That was the easy part (unfortunately).

Other use cases for plugins could be custom syntax for tasks, code generation for testing, benchmarking, and other sorts of analysis. This is more tricky as it needs to be part of the build process.

One thing I was thinking about, is introducing clear passes into RTFM and hooks for each pass.

Maybe RTFM could depend on external crates giving default implementation to those hooks. If you want to make a custom build, you make a patch section in your Cargo.toml, overriding the default implementation for the hook(s) at hand. I assume that the result of running a hook should render some metadata, propagated by RTFM to downstream hooks. So e.g., if you have custom syntax, the custom (meta) data should be propagated downstream (for used by extended analysis pass and eventually by extended code generation).

Not sure how this approach would work out. There might be better alternatives.

Goodby Exclusive, Welcome Symmetrical Lock and [mut/&resource] optimization.

Currently RTFM has the notion of asymmetric access to shared resources. Tasks with highest priority accessing shared:T gets a &mut T API, while other tasks (if any) gets a resource proxy, with a lock API (former claim). The examples/lock.rs from the current upstream.

#[rtfm::app(device = lm3s6965)]
const APP: () = {
    struct Resources {
        #[init(0)]
        shared: u32,
    }

    #[init]
    fn init(_: init::Context) {
        rtfm::pend(Interrupt::GPIOA);
    }

    // when omitted priority is assumed to be `1`
    #[task(binds = GPIOA, resources = [shared])]
    fn gpioa(mut c: gpioa::Context) {
        hprintln!("A").unwrap();

        // the lower priority task requires a critical section to access the data
        c.resources.shared.lock(|shared| {
            // data can only be modified within this critical section (closure)
            *shared += 1;

            // GPIOB will *not* run right now due to the critical section
            rtfm::pend(Interrupt::GPIOB);

            hprintln!("B - shared = {}", *shared).unwrap();

            // GPIOC does not contend for `shared` so it's allowed to run now
            rtfm::pend(Interrupt::GPIOC);
        });

        // critical section is over: GPIOB can now start

        hprintln!("E").unwrap();

        debug::exit(debug::EXIT_SUCCESS);
    }

    #[task(binds = GPIOB, priority = 2, resources = [shared])]
    fn gpiob(c: gpiob::Context) {
        // the higher priority task does *not* need a critical section
        *c.resources.shared += 1;

        hprintln!("D - shared = {}", *c.resources.shared).unwrap();
    }

    #[task(binds = GPIOC, priority = 3)]
    fn gpioc(_: gpioc::Context) {
        hprintln!("C").unwrap();
    }
};

This is unfortunate, as

Introducing another task in the application may break code. gpiob would need to be changed API if allowing gpioc access to shared.
If we want gpioa to pass the resource access to another (external) function (like a HAL), we would need to wrap shared by Exclusive (which gives a lock API to inner).
Context cannot have 'static lifetime if allowed to have &mut T in resources. This will prevent from captures in static stored generators (which requires 'static lifetime). Generators can be useful as is (initial experiments confirm the possibility), and are used under the hood for async/await, so if we want to keep this door open, we will eventually run into this problem.

Proposal.

Offer only a single API (always resource proxy) for all shared resources.
Pros:

Symmetrical API with no risk of breakage.
Mitigates the need for Exclusive.
Potentially reducing the complexity of codegen internally in RTFM. (Right now there a bunch of edge cases in the codegen).

Cons:

More verbose user code, the price we pay for ensuring composition property. With the new multi-lock Mutex (korken), we can reduce rightward drift.

Performance:

Locks will be optimized out by the compiler in case your task has exclusive access.

[resource] vs [mut resource] (or [&resource] vs [resource])
In a related RFC, Jorge proposed to distinguish between mutable and immutable access. With the above proposal, this will amount in a slight syntax extension. (Following Rust I would think that [mut resource] would be appropriate, but the &syntax would be backwards compatible, so I don't have a strong opinion).

Pros:

Improved schedulability (less blocking from lower priority tasks)
Improved performance (unnecessary locks are optimized out at compile time)

Cons:

None.

Under the "Goodby Exclusive" proposal, the optimization can be done fully transparent to the API. lock will lend out &T/&mut T, for immutable/mutable access respectively and the compiler will block misuse.

I might have missed something, looking forward to input.
Best regards
Per

Cortex-m-rtfm quickstart template?

There is a quickstart for cortex-m-rt.
Its an excellent starting point for a bare metal project, and from there one can start adding rtfm specifics.

Goal is to make it as easy as possible to get started with RTFM.

Would we want a separate quickstart for cortex-m-rtfm, or alternatively would it be possible to make cortex-m-rt quickstart support RTFM (through some cli option).

Generators in RTFM, short progress report.

Jorge and I have been discussing the potential use of generators back and forth over quite some time. Finally, we sat down to bite the bullet, and here are some first experiences.

Some observations:

Generators depend on quite a few experimental features if using them outside of the compiler (if I understand correctly they are used internally for the implementation of async/await, but the user surfacing API is not stabelized.)
Generators can indeed be used on core only systems. We can even put generators in static memory as below:
Generators stored in static can capture only static memory. (The param x, in the example).
Generators can atm not be resumed with an argument.

#![feature(generator_trait)]
#![feature(generators)]
#![feature(never_type)]
#![feature(type_alias_impl_trait)]

use core::{mem::MaybeUninit, ops::Generator, pin::Pin};

#[rustfmt::skip]
type G = impl Generator<Yield = (), Return = !>;

fn task(x: &'static mut u32) -> G {
    move || loop {
        println!("Hello {}", &x);
        *x += 1;
        yield;
        println!("World {}", &x);
        yield;
    }
}

static mut X: MaybeUninit<G> = MaybeUninit::uninit();

fn main() {
    unsafe {
        static mut x: u32 = 0;
        X.as_mut_ptr().write(task(&mut x));
        let g: &mut dyn Generator<Yield = (), Return = !> =
            &mut *X.as_mut_ptr();
        Pin::new_unchecked(&mut *g).resume();
        Pin::new_unchecked(&mut *g).resume();
    }
}

So let's go, the main idea:
We want to be able to write sequences (linear code), that can yield and resume where left off. Under the task/resource model of RTFM that is Ok as long as resources are not locked at the point of yielding (holding a resource would imply that the system ceiling (BASEPRI) is held at the level of the held resource, thus that would essentially blocking other tasks from executing). Luckily, the closure based resource access of RTFM will prevent code from yielding inside of a lock.

Here is a snippet of hand written code for a task (after running the RTFM proc-macro).

type Generatorfoo1= impl Generator<Yield = (), Return = !>;
static mut GENERATOR_FOO1: core::mem::MaybeUninit<Generatorfoo1> =
    core::mem::MaybeUninit::uninit();
#[allow(non_snake_case)]
fn foo1(mut ctx: foo1::Context) -> Generatorfoo1 {
    move || loop {
        ctx.resources.shared2.lock(|v| {
            *v += 1;
        });
        yield;
        ctx.resources.shared2.lock(|v| {
           ....
        });
        yield;
        ...
    }
}

The Context/Resource proxy was hand written for this small example, but shows the proof of concept to work. I have verified that the locking prevents tasks from being executed properly. The implementation of Priority was changed to an owned Cell instead of a reference to make it work in static. This seems to impede the ability for the compiler to optimize out unnecessary locks (it will amount to a comparison and a branch as OH, and some extra non-needed code). I assume the reason is that its stored in static and thus the compiler cannot assume exclusive access the the Cell's inner value.

So in conclusion, we can have it working one way or another, under rfc #17, the implementation may be simplified. The problem with lock optimization needs some love. If optimization is not possible, the OH is acceptable if applied only to the case of locking inside of generators. (In a prior version of RTFM, the current ceiling value was passed around to achieve this lock optimization, that is always an option but a bit verbose...)

Some caveats:

We cannot use the message passing API due to lack of resume parameters. The user would need to setup a SPSC queue (similar to a legacy threads based RTOS for passing data).
Just a proof of concept as of yet, implementation pending.

Why?:
So why the interest in generators. Well they offer the sequential style programming, essentially a state machine, suitable to implement transactions. Secondly, this may open up for async/await under RTFM (we have the dispatchers/tasks, we have generators, what's left ...)

Thanks to Jorge for numerous discussions and code sketches.

Please use this issue for discussions/ideas/and progress reporting on generators under RTFM.

[RFC] Add API to re-schedule & cancel tasks

Summary

Add support for re-scheduling and canceling scheduled tasks.

Current behavior

Today there is no way to re-schedule or cancel scheduled tasks as:

#[rtfm::app]
const APP: () = {
    #[task(schedule = [timeout])]
    fn task(cx: task::Context) {
        // ... some work that needs timeout

        cx.schedule.timeout(SOME_TIME).unwrap();
        // ^^^ There is no way to make this re-schedule or cancel today
        // without an external resource and working around RTFM
    }

    #[task]
    fn timeout(cx: timeout::Context) {
        // Does the work if there was a timeout ...
    }
};

Proposal

Add an API to be able to re-schedule and cancel a scheduled task. This is something that also has been requested verbally and officially in rtfm-rs/cortex-m-rtfm#220, and is a sane feature. Creating watchdog & timeout tasks or scheduling on incomplete information is tricky today and needs working around RTFM.

Proposed syntax

For re-scheduling there are 2 options:

#[rtfm::app]
const APP: () = {
    #[task(schedule = [timeout])]
    fn task(cx: task::Context) {
        // ... some work that needs timeout

        // Option 1:
        cx.schedule.timeout(SOME_TIME).unwrap();
        // ^^^ This automatically re-schedules

        // Option 2: Add a specific API 
        cx.reschedule.timeout(SOME_TIME).unwrap();
    }

    #[task]
    fn timeout(cx: timeout::Context) {
        // Does the work if there was a timeout ...
    }
};

Here a Result should be return telling if the task was rescheduled (Ok), or if the task has already been pended for execution (Err).

And for canceling, the straight forward syntax as follows:

#[rtfm::app]
const APP: () = {
    #[task(schedule = [timeout])]
    fn task(cx: task::Context) {
        // ... some work that needs timeout
        cx.schedule.timeout(SOME_TIME).unwrap();

        // decide that we want to cancel the scheduled task
        cx.cancel.timeout().unwrap();
    }

    #[task]
    fn timeout(cx: timeout::Context) {
        // Does the work if there was a timeout ...
    }
};

Also here a Result should be return telling if the task was canceled (Ok), or if the task has already been started/executed (Err).

Not in scope

This proposal is for tasks with capacity of 1. Rescheduling and canceling task that have capacity would be a future extension if need arises.

Issues to tackle

~~The schedule list probably becomes an O(n) list instead of using the binary heap we have today~~ This can most likely be solved by running sift up/down.
Bike-shed the API
More concerns?

Mutliple timers, wall-clock time, Monotonic ratio, and Mode Changes

While the title touches on several topics they are somewhat connected/interdependent.
In the future we may want to support multiple timers (with separate timer queues). This has the advantage to reduce priority inversion, allow different pre-scalers to accommodate the trade off between range and accuracy, and reduce worst case OH for time queue operations.

However, this raises the question/need of a common notion of time (wall-clock). This would allow give a common notion to time offsets when spawning tasks in the future (all relating to the same time base, even if queued and dispatched by different timers).

This also relates to the problem of ratio between a timer and the timing reference (be that the system clock or the wall-clock). If we allow either the system-clock or the timer pre-scaler or clock tree to change, we are inducing a mode change in the system.

This implies a problem to the timing semantics (its unclear at what point in time a queued (outstanding) event should be dispatched).

So it boils down to defining a suitable timing semantics, (one could consider wall-clock time as being a reference). For practical reasons, we cannot expect to have exact wall-clock unless relying on an RTC, and that might not be fine grained (enough, and also might be costly to access), so an approximate wall-clock should do in most cases. Network time might be something that certain applications may need, but that is likely best handled at the application level.

So what about mode changes in the system. What are the general goals, challenges and possible solutions.

A mode represents an operation condition (RTFM now have two modes, init and run). In the setting of an IoT application, we might have a low-power mode (where the system awaits some stimuli), and one (or several) active modes. In the low-power mode, we might have access only to a limited set of timers/timer queues (e.g., an RTC based queue for wakening the system each 24h to poll some sensor), when woken by the RTC we can have another set of timers/queues for performing the measurement(s), and perhaps there is a third mode triggered by a radio wakeup, where we have a set of timers/queues for the management of radio communication etc. Notice here, that each mode (or state of operation), may well have a unique set of tasks and timing requirements. Task (and resources) may be shared in between modes (e.g., a task to read some sensor, may be scheduled under several modes, e.g. an behalf of the RTC wakeup, or by a request from the radio).

Take another use case, a control application. Under normal operation, we have a set of tasks, resources and constraints, e.g., we have access to sensor x,y,z. What if sensor z is faulty (e.g. the wire to z is broken), then we go into a limp-home mode, where we have access to only sensors x and y. In this mode the resource z is not available, and we still should be possible to make a best effort control. What if also x and y are broken, then we need to do an emergency stop (as the system is no longer possible to control, stability may not be possible). Emergency stop, should then be a third mode of operation.

One can think of zillions of such different use cases. Can RTFM support them? Well currently RTFM supports them only at application level. It is currently up to the implementer to "encode" the mode of operation, resources can be masked behind Option<T> etc. Its very generic, but also very weak regarding guarantees.

Alternatively, RTFM could be extended to nativly capture modes (e.g., similarly to the multi-core extension, where we have different task sets/resources, we could define different modes). Code reuse of can still be done, using the tasks only as trampolines to shared code.

A fundamental problem here is mode changes, what semantics do we want. What about outstanding messages, e.g., requesting a transition back to a low power state (for the IoT case), should likely be graceful to outstanding messages (they should be allowed to complete). In the case that we have detected a faulty z sensor, we need to transition to the limp-home state. In that case we want to carry over the resources to the new mode (besides the z resource that is broken). In the case that we need an emergency stop, the mode change may need to be immediate, but still we may need to finish of already started tasks, before the mode change takes place.

As seen the semantics wanted/required may vary from case to case, thus it may be hard to foresee all possible requirements. Hopefully, the mode change semantics can pinned down to a set of features. E.g., criteria for when mode changes should be taken (regarding outstanding messages and started tasks). Criteria for how new messages (emitted on behalf of already started tasks, vs. scheduled but not started tasks). Assumptions on resources/states, during and after transition. E.g., resuming after deep sleep may render all resources void, essentially a cold start if memory is not retained, while going into limp-home mode can carry over all resources. Once semantics have been pinned down, we can think about syntax, for transitions, how should carried over resources be marked etc. Can we think of hybrid transitions, where outstanding messages may execute under the new mode (e.g., logging messages may be allowed to carry over and complete under the new mode...).

Why would we want this kind of support from RTFM. To put it simple, static guarantees and robustness.

In the context of control applications, fault mode handling is very tricky to get right by hand.
Similarly in the context of IoT, where we want to exploit power modes aggressively it takes a lot to ensure robustness.

In both cases, a well designed framework that allows for correct by construction design could be extremely helpful. While its a grand challenge to come up with the framework, it would allow the programmer to fearlessly design applications, where based on the task/resource/mode change model, the framework would reject unsound models (e.g., preventing the use of data from a faulty sensor, or relying on some data in memory that would have been erased due non-retained memory during sleep).

add schedule replace/cancel to allow creation of watchdog tasks

maybe as spawn.replace.name(cycles: u32) and spawn.cancel.name() though that pollutes the task namespace so maybe respawn.name(cycles: u32)

[RFC] shared access to resources (resources = [&FOO])

Summary

Extend the syntax of the resources field to let the user specify when shared,
instead of exclusive (today's default), access to a resource is required.
Specifying shared access can reduce the number of required critical sections in
some scenarios.

Background

Consider this example:

#[rtfm::app(device = ..)]
const APP: () = {
    // NOTE: `ComplexThing` implements the `Sync` trait
    static mut FOO: ComplexThing = /* .. */;

    // ..

    #[task(priority = 1, resources = [FOO])]
    fn foo() {
       resources.FOO.lock(mutate);
    }

    #[task(priority = 2, resources = [FOO])]
    fn bar() {
       resources.FOO.lock(print);
    }

    #[task(priority = 3, resources = [FOO])]
    fn baz() {
        print(resources.FOO);
    }

    // ..
};

fn mutate(foo: &mut ComplexThing) {
    // ..
}

fn print(foo: &ComplexThing) {
    println!("{}", foo);
}

With the current rules, the task bar needs a critical section to access FOO.
However, that critical section is not really needed because task baz cannot
modify FOO.

Design

The parser will be modified to accept both $ident and &$ident in lists of
resources. The former syntax, which exists today, indicates exclusive access to
the resource -- its semantics are unchanged; the latter syntax indicates shared
access to the resource.

When shared access is specified the task will receive a shared reference (&_)
to the resource data rather than a Mutex proxy.

If shared access is requested from two or more tasks that run at different
priorities then the resource type must implement the Sync trait. This is
required to prevent types with interior mutability (like Cell and RefCell)
from being accessed without a critical section, and potentially modified, from
those tasks as that could result in UB.

With the proposed changes our opening example can be updated as follows:

#[rtfm::app(device = ..)]
const APP: () = {
    // ..

    #[task(priority = 1, resources = [FOO])]
    fn foo() {
       resources.FOO.lock(mutate);
    }

    // NOTE: `&FOO` instead of `FOO` in `resources`
    #[task(priority = 2, resources = [&FOO])]
    fn bar() {
        // NOTE: no critical section is needed this time
        let reference: &ComplexThing = resources.FOO;

        print(reference);
    }

    // NOTE: `&FOO` instead of `FOO` in `resources`
    #[task(priority = 3, resources = [&FOO])]
    fn baz() {
        print(resources.FOO);
    }

    // ..
};

Unresolved questions

It's unclear what to do in this scenario, other than raise a compiler error:

#[rtfm::app(device = ..)]
const APP: () = {
    // ..

    #[task(priority = 1, resources = [&FOO])]
    fn foo() { /* .. */ }

    #[task(priority = 2, resources = [FOO])]
    fn bar() { /* .. */ }

No optimization can be applied here. The task foo needs a critical section to
access the resource data, in any form, but the lock API doesn't make sense
here because that grants an exclusive reference (&mut _) to the data and
shared access was requested.

#[init()] functions should provide a CriticalSection

In my program's #[init()] function (using rtfm-0.5) I have to reconfigure a bunch of GPIO pins from the stm32f0xx_hal crate in order to pass them to HAL constructors like Spi::spi1() and I2c::i2c1(). Each of those GPIO-configuration calls requires a CriticalSection. OK, I can run the call inside a cortex_m::interrupt::free() callback ... but the RTFM framework already disables interrupts before it calls the init() function, so the read-PRIMASK, cpsid, and test-old-PRIMASK-and-branch sequences from cortex_m::interrupt::free are unnecessary inside init(). But I can't express that (unless I unsafely create a CriticalSection of my own at the top of init()).

It would be nice if init::Context contained a CriticalSection so I could pass a reference to it into the HAL crate's gpioa::PA0::into_alternate_af4() function (or whichever one I need to call).

Fine grained concurrency on thumbv6m (no BASEPRI).

The Cortex-m0/m0+ architecture is getting increasing attention, cheap, light-weight and low-power. While RTIC currently supports the m0/m0+, the locking mechanism implemented is course grain, essentially disabling interrupts globally when a resource is held. This approach is very simple and robust, but looses out when it comes to concurrency, and thus real-time performance.

In our early work on RTFM, the principles for implementing an SRP based scheduler on the v6m architecture was proposed.

A similar approach has been implemented is available in the m0 branch.

Please try, feedback appreciated.

/Per

Ability to specify the order in which interrupts are enabled and/or ability to manually enable interrupts

How to specify the order in which interrupts are enabled?

Use case:
TIM2: lower priority, WFI in interrupt handler
TIM3: higher priority, runs slightly later than TIM2
Used as a "shock absorber" (to ensure deterministic latency).

For this use case I need to enable TIM3 earlier than TIM2 otherwise everything will idle forever.

[tracking issue] 1.0 (previously 0.6) push

Extended timing semantics

As of today task do not carry deadline information and inter-arrival time. These are required for response time analysis and automatic priority assignment. This was part of the original work on RTFM, and in an experimental branch of RTFMv3 I had the deadline and inter-arrival info implemented and working.

Now the syntax has changed (and the internals as well, all for the better;)

One could think of adding to the task attribute:
#[task(..., deadline = D, inter-arrival = I)]

Where D, I is given in some suitable time format matching monotonic.
Priorities could be derived ordered by deadline (shorter deadline, higher priority).

If no deadline is given, then the deadline can be computed from the inter-arrival (rate monotonic). There are cases though where the deadline will be shorted than the inter-arrival (e.g., emitting IO with low jitter), or the opposite, where we can allow buffering of events processed under relaxed timing requirements)
(This is 101 scheduling for real-time systems, nothing fancy/new and certainly not my invention.)

An open question is how automatic priority assignment could co-exist with manual assignment. (This we never settled in the original RTFM work/implementation.)

Perhaps another attribute could be used to declare the range of assignable priorities.
#[rtfm_priorities(5..)]

Allowing 5..(1<<NR_PRIO_BITS)) to be used by RTFM for automatic assignment, and the lowest priorities reserved for manual assignment (1..=4) or equally (1..5), using Rust ranging syntax. 0 is the idle priority.

#[rtfm_priorities(5..=15)]
Would allow the user to manually assign priority 16. (Amounting to an interrupt disable, assuming we have 4 priority bits).

I guess that would cover the typical use cases:
We have some real time tasks with known (minimum) inter-arrival. These are typically our IOs, like serial, ethernet, etc. where transmission speed stipulates the inter-arrival.
In addition, we have application logic, with hard real-time requirements, e.g., periodic control loop.
These will all be annotated with the corresponding timing data and assigned by RTFM to priorities 5..=15 (for the example above).

Besides real time tasks, we can have other application logic, running without explicit deadlines, here fixed priorities makes sense, as we can use that to relate their importance. E.g., it may be more important to update a control model, than to emit logging data (or vice versa).

And finally, we might have some real-high priority things going on, which must be dealt with immediately (but does not have any sensible deadline), e.g., some panic button that should immediately put the system in a fault mode.

The last case might be solved by just putting a very short deadline (shortest among all), to give it the highest priority, but that is not really robust, as at some point we might want to change some other deadline associated (or inter-arrival, with derived deadline), and then it may no longer be the shortest.

Perhaps setting a deadline of 0, could indicate that it must be hardwired to the highest priority, that is of course robust.

Such tasks are usually left out of the overall schedulability analysis as they indicate an abnormal condition, for which the inter-arrival should be infinite under normal operation. In a related post, I touched upon the problem with mode changes, this can be seen as such a mode change. (If given a deadline, we can still check the response time for the abnormal case.)

There is also another crux:
If we have tasks without inter-arrival information (like the low priority logging and application logic), we cannot do full blown schedulability analysis. That makes full sense, since these tasks are "best effort", and we have a mixed-critical system. Typically response time analysis for higher priority tasks would not be possible, but RTFM is underpinned by SRP providing a solution. SRP ensures single non-transitive blocking, so we just have to take into account (for each resource) maximum critical section. Thus, even under overload we can still ensure that our real-time tasks will meet their deadlines, ain't that a blast!

Would allow

R&D: TrustZone

Background: With @solokeys, we're working towards replacing C with Rust, based on RTFM, and would like to take advantage of TrustZone to enhance security. We'd love to be one of the first commercial products demonstrating all the promise open source, Rust and RTFM have.

I do understand the current focus of RTFM is on supporting multi-core, I am not aware of the rest of the current roadmap (besides @korken89 telling me there is no current effort, and asking me to formulate this in an issue). I'm also still pretty much an RTFM noob, despite the nice workshop :)

May I suggest as next major R&D adding support for Cortex-M TrustZone? In a way, this can be considered a variation on multi-core: some resources are doubled, others are configurably split between "secure" and "non-secure" operation. Given RTFM's position as owner of all the things, it would naturally take on the responsibility of configuring and organizing these splits.

To give a concrete use case: For our security keys, I would like to both keep secret key material (for signatures/decryption) in a secure flash partition managed by littleFS, and have associated cryptographic operations implemented in secure mode, with "apps" (like FIDO2/OpenPGP/PIV/WireGuard/crazy-cryptocurrency-stuff/experimental that multiplex against dispatched USB requests) using the SG instruction as a kind of system call. In this way, a rogue application might be able to perform unauthorized operations, but not extract secret material (or at least, there would be more layers of security to pierce). I am not currently thinking of a general "plug-in" model with completely untrusted "apps", but rather simple things like USB glitching or other external attacks, that make a tested/trusted app go rogue: a SecureFault should stop this.

Current practice for this kind of things is using two chips, one general purpose microcontroller, and one "secure" element which is closed source and locked behind NDAs. My motivation is to replace this kind of obscurity with an open model -- in the future, Cortex-M35P might add the physical tamper-resistance to complete the picture. </lengthy use case excursion>

Of course, this kind of secured/trusted computing tends to get broken eventually (for instance the SAML11 Cortex-M23 seems to be fundamentally broken), but I still hope you're enticed by the prospect of doing multi-core with just one chip :) Or to put it more academically, coming up with the right abstractions to operate TrustZone. On the C side, I'm not aware of much beyond bloated frameworks and the incomplete LTZVisor.

On the Arm side, @hug-dev has been doing some work on implementing the Security Extensions as Rust attributes in an unpublished trust crate; I understand this is currently blocked on upstreaming those in LLVM.

If this is tackled, I'd suggest using the LPCXpresso55S69 board to implement against. FYI, I'm currently working on a basic HAL which runs in non-secure mode ("as if" there were no TrustZone), and you'll be aware of lpcxpresso55S69.

Thoughts?

[RFC] remove the difference between resources and late resources

Summary

Resources and late resources are distinct in RTFM, when it's not really needed. static definition with SHOUTING_NAMES are also a bit strange when accessed by the resources magic parameter.

This RFC proposes a new syntax for declaring resources that remove the distinction between resources and late resources, and feel a bit more rustic.

Background

Consider this example:

#[rtfm::app(device = ..)]
const APP: () = {
    // Strange syntax of late resource
    static mut FOO: ComplexThing = ();

    // As resources are presented as statics, they are in SHOUTING_SNAKE_CASE
    static mut BAR: i32 = 42;

    // `init::LateResource` is a magic type
    #[init]
    fn init() -> init::LateResources {
        init::LateResources {
            // strange SHOUTING_FIELD
            FOO: make_foo(),
        }
    }

    #[task(priority = 1, resources = [FOO, BAR])]
    fn foo() {
        // strange SHOUTING_FIELDS
        resources.BAR = resources.FOO.get_bar();
    }
};

Design

The main idea is to declare a struct representing the resources. This struct is returned by init. Then, you have a partial view of this struct in the handlers.

#[rtfm::app(device = ..)]
const APP: () = {
    // define the resources by defining a struct
    #[resources]
    struct Resources {
        // as this is fields of a struct, we don't use shouting case here
        foo: ComplexThing,
        bar: i32,
    }

    // just return the declared Resources struct
    #[init]
    fn init() -> Resources {
        Resources {
            foo: make_foo(),
            // no more static initialization
            bar: 42,
        }
    }

    // declare the fieds of our `resources` variable as a subset of the Resource fields
    #[task(priority = 1, resources = { foo, bar })]
    fn foo() {
        // natural naming
        resources.bar = resources.foo.get_bar();
    }
};

Drawbacks

init is no longer optional (when there is resources), thus complicating the simple case a bit.

As the init function returns all the resources, there is no more static initialisation of the static resources, potentially adding useless memcpy.

Resource proxy mutabability, mutex Trait and closure captures (async fn)

Resource proxy mutability

Mutable references to resource proxies

Currently in RTIC/RTFM we need a mutable reference to the resource proxies:

#[task(binds = GPIOA, resources = [shared])]
    fn gpioa(mut c: gpioa::Context) {
        // the lower priority task requires a critical section to access the data
        c.resources.shared.lock(|shared| {
            // data can only be modified within this critical section (closure)
            *shared += 1;
            c.resources.shared.lock(|shared2| { // <- here Rust would spot the error (*ERROR*)
                // both shared and shared2 would point to the same location

Rust requires that no two references in scope points to the same location. We use the borrow checker to ensure that at compile time.

Pros

Statically checked, error messages upfront.
Zero run-time cost (no ref counting).
...

Cons

The corresponding Mutex trait requires a mutable reference to the resource proxy. This has proven difficult to adopt for general purpose use.
Mutable references cannot be captured by multiple closures, making it difficult for async functions to use RTIC resources.
...

Alternative approach

Currently lock optimizations is done by the llvm backend using "meta data" holding the current priority. This is done by passing an additional parameter (priority), that is optimized out.

pub unsafe fn lock<T, R>(
    ptr: *mut T,
    priority: &Priority,
    ceiling: u8,
    nvic_prio_bits: u8,
    f: impl FnOnce(&mut T) -> R,
) -> R {
    let current = priority.get();

    if current < ceiling {
        if ceiling == (1 << nvic_prio_bits) {
            priority.set(u8::max_value());
            let r = interrupt::free(|_| f(&mut *ptr));
            priority.set(current);
            r
        } else {
            priority.set(ceiling);
            basepri::write(logical2hw(ceiling, nvic_prio_bits));
            let r = f(&mut *ptr);
            basepri::write(logical2hw(current, nvic_prio_bits));
            priority.set(current);
            r
        }
    } else {
        f(&mut *ptr)
    }
}

We could think of a similar approach, but applied to the resource proxies. In this case it would be reference counter (in fact a locked: bool would do). All resource proxies in the context could be initialised to false, and when locked set to true and when unlocked reset to false. When locking we could panic in case the (locked == true), indicating *ERROR* in the code above.

Pros

We could adhere to a relaxed requirement on the resource proxy (now &resource_proxy, instead of &mut resource_proxy), allowing the Mutex trait to be relaxed consequently.
Would allow sharing of resource proxies to closures, such as async functions.

Cons

Runt time checking instead of static analysis. (We cannot be sure without further analysis that code will not panic.)
Memory OH (likely a byte for each resource, or maybe even a word depending on padding). Even if llvm optimizes out the check, the memory for the locked field will still be allocated (but never touched).
Run-time OH in case llvm fails to optimize out the check.

Soundness

As we panic if a "double lock" occurs it should be safe.

Complexity of implementation

Fairly low.

Breaking change?

The change will likely not break current code, since mutable references will be downgraded to immutable references.

move/unique resources

EDIT 2020-04-38

RFC title changed
Syntax updated
Discussion regarding #[cfg(...)] for resource declarations and use added

Following up on a discussion with @korken89, and the recent comparison done by @therealprof on various ways to share data to interrupt handlers. RTFM came out on top regarding, robustness/reliability (static guarantees), performance and code-size, but argued against due to added complexity.

The added complexity with RTFM can to some extent be explained by the asymmetric resource access pattern (tasks with highest or unique access to a resource gets a owned pointer the resource, while other tasks need to use the lock API).

One way to approach the problem is to follow the lines of #17, with the addition of explicitly moved resources.

use lm3s6965::Interrupt;
use panic_semihosting as _;

#[rtfm::app(device = lm3s6965)]
const APP: () = {
    struct Resources {
        // An early resource
        #[init(0)]
        shared: u32,

        // A local (move), early resource
        #[task_local]
        #[init(1)]
        l1: u32,

        // An exclusive, early resource
        #[lock_free]
        #[init(1)]
        e1: u32,

        // A local (move), late resource
        #[task_local]
        l2: u32,

        // An exclusive, late resource
        #[lock_free]
        e2: u32,
    }

    #[init]
    fn init(_: init::Context) -> init::LateResources {
        rtfm::pend(Interrupt::UART0);
        rtfm::pend(Interrupt::UART1);
        init::LateResources { e2: 2, l2: 2 }
    }

    // `shared` cannot be accessed from this context
    #[idle(resources =[l1, e2])]
    fn idle(cx: idle::Context) -> ! {
        hprintln!("IDLE:l1 = {}", cx.resources.l1).unwrap();
        hprintln!("IDLE:e2 = {}", cx.resources.e2).unwrap();
        debug::exit(debug::EXIT_SUCCESS);
        loop {}
    }

    // `shared` can be accessed from this context
    #[task(priority = 1, binds = UART0, resources = [shared, l2, e1])]
    fn uart0(cx: uart0::Context) {
        let shared: &mut u32 = cx.resources.shared;
        *shared += 1;
        *cx.resources.e1 += 10;
        hprintln!("UART0: shared = {}", shared).unwrap();
        hprintln!("UART0:l2 = {}", cx.resources.l2).unwrap();
        hprintln!("UART0:e1 = {}", cx.resources.e1).unwrap();
    }

    // l2 should be rejected 
    // notice from a memory safety perspective its still sound
    // but it does not reflect the "scoping" of a task local resource
    #[task(priority = 1, binds = UART1, resources = [shared, l2, e1])]
    fn uart1(cx: uart1::Context) {
        let shared: &mut u32 = cx.resources.shared;
        *shared += 1;

        hprintln!("UART1: shared = {}", shared).unwrap();
        hprintln!("UART1:l2 = {}", cx.resources.l2).unwrap();
        hprintln!("UART1:e1 = {}", cx.resources.e1).unwrap();
    }
    // if priority is changed we should report a better error message
    // currently, we get an error since RTFM detects a potential race
};

UART0: shared = 1
UART0:l2 = 2
UART0:e1 = 11
UART1: shared = 2
UART1:l2 = 2
UART1:e1 = 11
IDLE:l1 = 1
IDLE:e2 = 2

Semantics:

#[lock_free], there might be several tasks with the same priority accessing the resource without critical section.
#[task_local], there must be only one task, similar to a task local resource, but (optionally) set-up by init. This is similar to move. (Using the keyword move causes problem, since it is reserved "r#move", would be acceptable but somewhat ugly.)

Implementation:

Implementation complexity Low.
Run-time OH. Zero.
Guarantees. A symmetric API for shared resources. The intention of lock_free resources will be clear.

Ergonomics/Usability:
Syntax can be discussed. It should however indicate the semantics in an intuitive way. lock_free refers to the scheduling (you get exclusive access). task_local the resource is moved to the task (becoming a task local resource).

Limitations:
This will not allow further move of resources at run-time (following the static nature and guarantees by RTFM).

Alternative solutions.
RTFM already allows you to express this behaviour implicitly, so doing nothing is an option. However, these new constructs might appeal to the non-RTFM:ers out there. With the current implementation, the asymmetric API can be confusing. This approach would go well with #17, letting the designer having explicit control over task_local and lock_free resources.

Implications.
A stepping stone towards a fully symmetric API. We can also get rid of task local variables as static, now they will be declared together with other resources instead, for good/bad, not sure.

Resources behind #[cfg(...)] cannot be statically analysed. Mitigation may be possible by generating run-time errors in setup code/and or generate code that the compiler statically rejects.

Soundness of RTIC

At some point we want formal proofs of RTIC correctness. How exactly remains to be seen (and what exact properties should be proven). For now we can start by annotating the code with informal argumentation:

2022-01-25 Small update to v6m implementation.

/// Lock implementation using BASEPRI and global CS
///
/// # Safety
///
/// The system ceiling is raised from current to ceiling
/// by computing a either
/// - rasing the BASEPRI to the ceiling value, or
/// - disable all interrupts in case we are want to
///   mask interrupts with maximum priority
///
/// Dereferencing a raw pointer inside CS
///
/// The priority.set/priority.get can safely be outside of the CS
/// As being a context local cell (not affected by preemptions).
/// It is merely used in order to omit masking in case current
/// priority is current priority >= ceiling.
///
/// Lock Efficiency:
/// Experiments validate (sub)-zero cost for CS implementation
/// (Sub)-zero as:
/// - Either zero OH (lock optimized out), or
/// - Amounting to an optimal assembly implementation
///   - The BASEPRI value is folded to a constant at compile time
///   - CS entry, single assembly instruction to write BASEPRI
///   - CS exit, single assembly instruction to write BASEPRI
///   - priority.set/get optimized out (their effect not)
/// - On par or better than any hand written implementation of SRP
///
/// Limitations:
/// The current implementation reads/writes BASEPRI once
/// even in some edge cases where this may be omitted.
/// Total OH of per task is max 2 clock cycles, negligible in practice
/// but can in theory be fixed.
///
#[cfg(armv7m)]
#[inline(always)]
pub unsafe fn lock<T, R>(
    ptr: *mut T,
    priority: &Priority,
    ceiling: u8,
    nvic_prio_bits: u8,
    _mask: &[u32; 4],
    f: impl FnOnce(&mut T) -> R,
) -> R {
    let current = priority.get();

    if current < ceiling {
        if ceiling == (1 << nvic_prio_bits) {
            priority.set(u8::max_value());
            let r = interrupt::free(|_| f(&mut *ptr));
            priority.set(current);
            r
        } else {
            priority.set(ceiling);
            basepri::write(logical2hw(ceiling, nvic_prio_bits));
            let r = f(&mut *ptr);
            basepri::write(logical2hw(current, nvic_prio_bits));
            priority.set(current);
            r
        }
    } else {
        f(&mut *ptr)
    }
}

And for the upcoming v6m SRP implementation.

/// Lock implementation using interrupt masking
///
/// # Safety
///
/// The system ceiling is raised from current to ceiling
/// by computing a 32 bit `mask` (1 bit per interrupt)
/// 1: ceiling >= priority > current
/// 0: else
///
/// On CS entry, `clear_enable_mask(mask)` disables interrupts
/// On CS exit,  `set_enable_mask(mask)` re-enables interrupts
///
/// The priority.set/priority.get can safely be outside of the CS
/// As being a context local cell (not affected by preemptions).
/// It is merely used in order to omit masking in case current
/// priority is current priority >= ceiling.
///
/// Dereferencing a raw pointer is done safely inside the CS
///
/// Lock Efficiency:
/// Early experiments validate (sub)-zero cost for CS implementation
/// (Sub)-zero as:
/// - Either zero OH (lock optimized out), or
/// - Amounting to an optimal assembly implementation
///   - if ceiling == (1 << nvic_prio_bits)
///     - we execute the closure in a global critical section (interrupt free)
///     - CS entry cost, single write to core register
///     - CS exit cost, single write to core register
///   else
///     - The `mask` value is folded to a constant at compile time
///     - CS entry, single write of the 32 bit `mask` to the `icer` register
///     - CS exit, single write of the 32 bit `mask` to the `iser` register
/// - priority.set/get optimized out (their effect not)
/// - On par or better than any hand written implementation of SRP
///
/// Limitations:
/// Current implementation does not allow for tasks with shared resources
/// to be bound to exception handlers, as these cannot be masked in HW.
///
/// Possible solutions:
/// - Mask exceptions by global critical sections (interrupt::free)
/// - Temporary lower exception priority
///
/// These possible solutions are set goals for future work
#[cfg(not(armv7m))]
#[inline(always)]
pub unsafe fn lock<T, R>(
    ptr: *mut T,
    priority: &Priority,
    ceiling: u8,
    nvic_prio_bits: u8,
    masks: &[u32; 4],
    f: impl FnOnce(&mut T) -> R,
) -> R {
    let current = priority.get();
    if current < ceiling {
        if ceiling == (1 << nvic_prio_bits) {
            // safe to manipulate outside critical section
            priority.set(ceiling);
            // execute closure under protection of raised system ceiling
            let r = interrupt::free(|_| f(&mut *ptr));
            // safe to manipulate outside critical section
            priority.set(current);
            r
        } else {
            // safe to manipulate outside critical section
            priority.set(ceiling);
            let mask = compute_mask(current, ceiling, masks);
            clear_enable_mask(mask);

            // execute closure under protection of raised system ceiling
            let r = f(&mut *ptr);

            set_enable_mask(mask);

            // safe to manipulate outside critical section
            priority.set(current);
            r
        }
    } else {
        // execute closure without raising system ceiling
        let r = f(&mut *ptr);
        r
    }
}

#[cfg(not(armv7m))]
#[inline(always)]
fn compute_mask(from_prio: u8, to_prio: u8, masks: &[u32; 4]) -> u32 {
    let mut res = 0;
    masks[from_prio as usize..to_prio as usize]
        .iter()
        .for_each(|m| res |= m);
    res
}

// enables interrupts
#[cfg(not(armv7m))]
#[inline(always)]
unsafe fn set_enable_mask(mask: u32) {
    (*NVIC::ptr()).iser[0].write(mask)
}

// disables interrupts
#[cfg(not(armv7m))]
#[inline(always)]
unsafe fn clear_enable_mask(mask: u32) {
    (*NVIC::ptr()).icer[0].write(mask)
}

Similarly we could argumentation for the computation of ceilings, the trampolines etc. Of course we also need to argument correctness at a higher abstraction level, and refer to the annotated code.

/ Per

Type based Layout for Rust RTIC

Background

Problem:

Detailed information on the memory layout (implied address/size) for RTIC resources have several important use cases (run-time verification, documentation and perhaps even static analysis).

layout-trait

A prototype is found here layout-trait.

pub trait GetLayoutType {
    fn get_layout_type<const N: usize>(layout: &mut heapless::Vec<Layout, N>);
}

impl<T> GetLayoutType for T {
    default fn get_layout_type<const N: usize>(_layout: &mut heapless::Vec<Layout, N>) {}
}

impl<T, U> GetLayoutType for T
where
    // for now assume this to ZST peripheral proxy
    T: Deref<Target = U>,
{
    default fn get_layout_type<const N: usize>(layout: &mut heapless::Vec<Layout, N>) {
        // hopefully there is a better way to do this
        // for now we crate a &ZST out of thin air!!!
        let t: &T = unsafe { core::mem::transmute(&()) };
        let data = t.deref();
        layout
            .push(Layout {
                address: data as *const _ as usize,
                size: core::mem::size_of_val(data),
            })
            .unwrap();
    }
}

This handles the case for SZT peripheral proxies (that implements Deref for access to corresponding peripheral RegisterBlock).

Other structured data is treated as thin references (&T).

pub trait GetLayout {
    fn get_layout<const N: usize>(&self, layout: &mut heapless::Vec<Layout, N>);
}

impl<T> GetLayout for T
{
    default fn get_layout<const N: usize>(&self, layout: &mut heapless::Vec<Layout, N>) {
        layout
            .push(Layout {
                address: self as *const _ as usize,
                size: core::mem::size_of_val(self.deref()),
            })
            .unwrap();

        T::get_layout_type(layout);
    }
}

impl<T, U> GetLayout for T
where
    T: Deref<Target = U>,
{
    fn get_layout<const N: usize>(&self, layout: &mut heapless::Vec<Layout, N>) {
        let data = self.deref();
        layout
            .push(Layout {
                address: data as *const _ as usize,
                size: core::mem::size_of_val(data),
            })
            .unwrap();
    }
}

A heapless::Vec<Layout> is used to represent the set of memory areas (ranges) accessible.

In order deal with type nesting a prototype of a custom derive is provided here layout-derive.

#[derive(Layout)]
struct Simple {
    data: u32,
    data2: u64,
}

Expands to:

...

impl layout_trait::GetLayout for Simple {
    fn get_layout<const N: usize>(
        &self,
        layout: &mut layout_trait::heapless::Vec<layout_trait::Layout, N>,
    ) {
        self.a.get_layout(layout);
        self.b.get_layout(layout);
    }
}
impl layout_trait::GetLayoutType for Simple {
    fn get_layout_type<const N: usize>(
        layout: &mut layout_trait::heapless::Vec<layout_trait::Layout, N>,
    ) {
        <u32>::get_layout_type(layout);
        <u64>::get_layout_type(layout);
    }
}

Derive functionality is implemented for structured data (structs, tuples, enums) with generic type parameters. (Unions are still TODO, but should follow the enum pattern).

Evaluation and soundness

The layout-trait and layout-derive includes example use cases. The use-cases are properly handled and showcase the feasibility of the approach. However, a proper testing framework is yet not in place (consider the work as a POC).

Regarding soundness: We make here the assumption that all types that provide Deref are zero-sized (this is indeed the case for the peripheral abstraction, but please be warned since we create an instance out of thin air, which is UB for non ZST).

Peripheral Proxies and HALs

The experiments are based on the assumption that peripheral access is based on svd2rust generated PACs. svd2rust generate proxies that implement Deref mapping the RegisterBlock struct to corresponding address in memory based on the SVD information. The (default) assumption here is that the generated proxies represent non-overlapping regions in memory.

RTIC takes the peripheral proxies. Moving individual proxies into RTIC resources is possible. RTIC then guarantees unique access to peripherals during task execution.

So far so good, but here comes the HAL.

HALs might intentionally break the uniqueness (alias free) property of svd2rust resource proxies (and that for good reasons). One can even argue that Rust aliasing rules are not invalided by splitting a RegisterBlock into its parts (individual registers). However, using so based on the current svd2rust resource proxies requires unsafe code.

As an example, a GPIO.split() implementation consumes the GPIO and produces an array of individual GPIO pins.

So how do we get the layout for these GPIO pins? Well there are several options here.

the HAL implementation explicitly implements GetLayout/GetLayoutType for the returned data structure. This might provide a high fidelity resolution of addresses reachable through each part.
the returned type wraps an unsafe "copy" of the whole GPIO proxie, and use layout-derive on the type. In essence this is a conservative representation, stating that each part of the GPIO is allowed access to the whole GPIO block. Seen from an RTIC perspective this means that the HAL breaks the aliasing guarantee. But as argued earlier, each individual access to underlying HW is atomic w.r.t. other accesses so it is still sound.
Developing a new type of resource peripheral abstraction where each register implements Deref. In this case, the HAL can freely combine parts into structs (like turning the problem around, instead of split, we combine). Individual registers that should belong in several HAL abstractions, could use an internal unsafe copy.

Related work

Async idle

The networking and async I/O support in embedded rust is slowly getting better. For that support to be accessible in RTIC, I think it would make sense to allow the idle task to be async.

e.g.

#[idle]
async fn idle(cx: idle::Context) {
    ...
}

The regular idle would still work of course, but when the function is prefixed with async, it's allowed to return (and would return an impl Future<Output = ()>).

Users can already run an executor in the idle task of course, but having an executor built into RTIC would let it do smart things by default, like wfi when the future isn't running.

Future Possibilities

This could eventually be extended to allow for all tasks to be async, but idle seems like a safe bet first.

Emulation of Monotonic Timers

This is a placeholder for discussing emulation of ARM targets.

Currently we use qemu for examples and some CI tests, on the model of TI Stellaris
S6965

The PAC is hand generated, but there is an svd2rust generated counterpart.
pac

As the qemu model supports TIMERS we could potentially implement monotonic for it and have tests running in CI without HW.

qemu

The Luminary Micro Stellaris LM3S811EVB emulation includes the following devices:

    Cortex-M3 CPU core.

    64k Flash and 8k SRAM.

    Timers, UARTs, ADC and I2C interface.

    OSRAM Pictiva 96x16 OLED with SSD0303 controller on I2C bus.

The Luminary Micro Stellaris LM3S6965EVB emulation includes the following devices:

    Cortex-M3 CPU core.

    256k Flash and 64k SRAM.

    Timers, UARTs, ADC, I2C and SSI interfaces.

    OSRAM Pictiva 128x64 OLED with SSD0323 controller connected via SSI.

Alternatively, we could try out
xpack

A fork of qemu supporting

    STM32F103RB
    STM32F107VC
    STM32F405RG
    STM32F407VG
    STM32F407ZG
    STM32F429ZI
    STM32L152RE

This would potentially allow prototyping and CI testing of timer implementations on popular targets.

Another alternative is to look into the renode project, but that might be harder to get to work in a CI setting.
renode

Synchroneous task

Synchronous tasks

In the originating work on Real Time for the Masses, tasks could also be run synchronously, i.e., one could call into other tasks as a function call, this allowed to re-factor code along the following lines (not actual syntax)

task a () // no resources used
  c (some_args) // synchronous invocation of task_c


task b () // no resources used
  c (some_args) // sync call to task_c

task c (some_parameters) // task that uses X
  X.lock(
     ...
  )

The static analysis could figure out that X was (synchronously) accessed by tasks a and b. The resource usage was local to c, so you did not need to declare it as a resource of both a and b.

Also as full program analysis was done you did not need to declare the use of resources, it was derived in the analysis.

In RTIC we don't aim for full program analysis (re-parsing all Rust is out of the scope here), so deriving resource usage and call structure is not possible. Nevertheless we could declare sync = [c] in the task attribute and have cx.sync.c(some_arg), to achieve this type of re-factorisation. Not sure if its a common enough pattern to make it worth implementing.

Resource handling take 2

After further evaluation of #30, here comes a second take on the problem.

First sketch (this will be updated).

...
#[rtic::app(device = ... )]
mod app {
  
  #[shared] 
  struct Shared {
    x: u64,
    
    #[lock_free] 
    y: i64,  
  }

  #[local]
  struct Local {
    rx: Producer<'static, u8, U4>,     
    tx: Consumer<'static, u8, U4>, 
  }

  #[init(local = [queue: Queue<u8, U4> = Queue::new()]]
  fn init(cx: init::Context) -> (Shared, Local) {
    (rx, tx) = cx.local.queue.split();
    (Shared { x: 0, y: 1}, Local { rx, tx })
  }

  #[task(shared = [x, y], local = [l: u8 = 0])]
  fn foo(mut cx: foo::Context) {
    *cx.local.l += 1; 
    *cx.shared.y += 1; 
    cx.shared.x.lock(|x| *x += 1);
  }

  #[task(shared = [x, y], local = [rx])]
  fn bar(mut cx: bar:st:Context) {
    if let Some(v) = cx.local.rx.dequeue() {
      *cx.shared.y += v as u64; 
      cx.shared.x.lock(|x| *x += v as u64);
    }
  }

  #[task(local = [tx])]
  fn bar(cx: bar::Context) {
    let _ = cx.local.tx.enqueue(1);
  }
}

Major takeaways.

init cannot take shared resources (rather init will initialize the shared resources), as a consequence there is no #[init(...)]. The return of init is now the full shared and local structs instead of a "remastered" struct.
local attribute in init requires initialization (see init in example).
local attribute in task can be locally initialized (see foo in example).
local attribute in task can be initialized by init (see bar/baz in example). RTIC "moves" ownership from init into tasks (later taking ownership of the resource on spawn).
lock_free access is checked ensuring all tasks accessing the resource are at the same priority (hence lock free access of the shared resource is sound).

Pros/cons to RTIC 0.5, and #30.

Pros: The major advantage is that we can get rid of the "static mut transform". In essence all tasks are now pure Rust code. We get rid of #[init()] and all resources are now returned by init, thus no special case for "late resources" are longer needed. Overall this gives a symmetric, simple and easy to understand abstraction.
Cons: Well, the "static mut transform" is ergonomic, now the user needs to move the static mut into the task attribute. This "cost" should be weighted against the advantages of pure Rust in all tasks. Besides that, there are no obvious loss of ergonomics.

Alternative light-weight syntax:

  ...
  #[task(shared = [x, y], local = [l: u8 = 0])]
  fn foo(mut cx: foo::Context) {
    *cx.l += 1; 
    *cx.y += 1; 
    cx.x.lock(|x| *x += 1);
  }

  #[task(shared = [x, y], local = [rx])]
  fn bar(mut cx: bar:st:Context) {
    if let Some(v) = cx.rx.dequeue() {
      *cx.y += v as u64; 
      cx.x.lock(|x| *x += v as u64);
    }
  }

RTIC on RISC-V - roadmap / tasks

Hi everyone!

I think it would be great to port RTIC to RISC-V, especially that core is already separated from cortex-m code and RISC-V is going more and more mainstream.

Each and everyday RISC-V gets more adoption on the silicon market. Notably, Espressif have launched its ESP32-C3 chip that is RISC-V based, which is an easy enabler for Rust embedded WiFi applications. It might also expose huge ESP32 maker community to Rust.

Main obstacle for wide Rust adoption on ESP32 was lack of support in upstream LLVM for xtensa-lx arch which is not the case with RISC-V.

I guess everything else is more or less in place, like embedded-hal implementation and experimental WiFi library (see esp-rs) It needs further investingation but I expect peripherals to be very similar as in xtensa-lx version of the chip.

I would love to contribute along the way there but I am not at the position where I'd be able to make it entirely on my own since I have very limited skill set beside couple of years of general embedded dev experience and small RTIC experience from end user perspective.

I think plenty of users are very eager to make this happen, so perhaps spending an effort on providing a roadmap along with set of tasks small enough to play with it as small side projects would make this happen.

Spawn and message queues.

Here we can discuss ideas around the spawn API and underlying implementation.

2020-07-23 initial post, first thoughts/ideas.
2020-10-04 spawn from anywhere
2020-10-05 spawn from anywhere is here, POC implementation available
2020-10-18 implemented and merged to master for testing in upcoming 0.6-alpha

Background

RTIC builds on the Stack Resource Policy (SRP), providing many outstanding properties (deadlock free scheduling, single stack execution, bounded priority inversion etc.). This model has no notion of the internal communication within the system, spawn/schedule is implemented on-top of the model (for which safety as well as analysis RTIC is responsible to). It can be seen as spawned or scheduled task emerge from the environment (and indeed they do under the hood, as pended interrupts). RTIC has knowledge on the sender/receiver relations through the app and task attributes, allowing us to under the hood generate RTIC resources for queues, timers etc.

Aside: multi-core support is not part of SRP, our approach essentially abstracts a multi-core system as partitioned into multiple single core systems with local environments (so a message passing goes from the sender's environment to the receiver's environment).

Message queues

Focusing in on message passing and safety. As mentioned in the Background, SRP does not really cover message passing/message queues directly. However we may leverage an the concept of resources for the implementation (locking of queues).

E.g., if we have two senders a at priority 2, and b at priority 4, spawning a task c at priority 3, then we can see this as a MPSC queue. One approach is to adopt the RTIC resource abstraction and use locks on the sender side. One can also exploit the underlying HW for lock free implementations (note that locks is not enough to deal with the multi-core case, as SRP based locking offer only core-local protection).

Locking has the drawback of priority inversion, in the above, (other) tasks at priority 3 will be blocked due to queue locks from task a (priority 2), as the send queue would have the ceiling 4. Notice though that the blocking is non-transitive (a property of SRP, see below).

Schedulability

Under SRP, we have bounded priority inversion. A task at priority i, is blocked at most for the largest critical section duration of lower priority tasks j<i accessing a resource at priority k >= i. It may sound like theoretical "mumbo jumbo", but it brings a valuable insight. If the critical section is short enough for queue accesses, then it will NEVER dominate the blocking (other resources accesses will dominate).

Let us assume, that we see all queues as having a ceiling equal to the maximum priority of the system, then we can simply use an interrupt free critical section. Assume this critical section is only a few machine instructions, the impact to schedulability will be small. (In comparison, only the tasks at higher priority than the highest priority sender in the system will actually benefit from the resource based locking.)

What about lock free queues? Using lock free data structures has both pros and cons. It (typically (1)) requires HW support for the atomics and execution time is non-uniform. For Cortex M3 and above, the execution time (number of possible re-tries) will depend on number of preemptions independent on actual contention. If the interarrival time in between interrupts at each priority level is known this can be taken into consideration for analysis. However, atomics are not HW supported on M0.

As mentioned above, for multi-core RTIC critical sections (locks) does not suffice, we need either HW atomics or some software synchronization.
(1) https://en.wikipedia.org/wiki/Dekker%27s_algorithm).

Why bother?

As of now, we provide a spawn API, where each spawnable task is a method on the spawn struct provided to the task context. It is unclear how (or even if) the user can install a callback to RTIC tasks on creation of a driver using the current API. As the underlying SRP scheduling does not really care from where a task is requested for execution, this should be possible and a welcome extension to the programming model (maybe also facilitating the triggering mechanism for async dispatchers, but that is another story). The only thing we need to ascertain is that the queue management is safe.

Potential solutions.

Make spawn of messages "free for all" (i.e. a direct API to ready queue for the task). This would allow the spawns list to be removed from the task attribute, and allow callbacks to be freely used (e.g., as part of initializing a driver). This requires either lock free queues or the adoption of global critical sections (the latter may not be terrible regarding schedulability as discussed above).
Change or add to the context a more general way to get a handle to the specific spawn, e.g. cx.spawn(task), where task implements some trait Spawnable. Here we have to make sure that the priority used in the locking mechanism is correctly handled (an under the hood detail). This requires the RTIC task that invokes the driver (typically on behalf on an interrupt) to pass the current priority (part of context) somehow (as opposed the the "free for all" approach discussed above).

Spawn from anywhere, 2020-10-04

Use global critical section for the underlying queue (low implementation effort, CS for just allocation/de-alloc should be enough)
Optimization, for tasks without arguments we can implement the queue as a simple counter.

Syntax:

#[rtic::app(device = lm3s6965)]
mod app {
    #[init(spawn = [foo, foo2])]
    fn init(_c: init::Context) {
        foo::spawn(1, 2).unwrap();
    }

    #[task()]
    fn foo(_c: foo::Context, x: i32, y: u32) {
        hprintln!("foo {}, {}", x, y).unwrap();
        if x == 2 {
            debug::exit(debug::EXIT_SUCCESS);
        }
        foo2::spawn(2).unwrap();
    }

    #[task()]
    fn foo2(_c: foo2::Context, x: i32) {
        hprintln!("foo2 {}", x).unwrap();
        foo::spawn(x, 0).unwrap();
    }

    // RTIC requires that unused interrupts are declared in an extern block when
    // using software tasks; these free interrupts will be used to dispatch the
    // software tasks.
    extern "C" {
        fn SSI0();
    }
}

Implementation/code gen:

Each priority level p holding software tasks, will associated a dispatcher D(p).
A dispatcher D(p) holds an array [Q;n], where
Q(c,d[a;c]) , where c is an optional capacity, d[a;c] is the static data storage of arguments a with size c; in case a = {} (no arguments) the d[{};c] is a simple counter.

The dispatcher also holds a FIFO [i;sum(c)], for tasks spawned (but not yet dispatched), this allows to implement fairness and ordering between tasks at the same priority level. If the capacity of a single task is reached, a spawn is rejected and never inserted in the FIFO. When the dispatcher is run it will always pick the "oldest" task to run. Spawning a task amounts to pending the associated interrupt.

We need to generate code for

the dispatcher
Dequeues the FIFO, i = FIFO.deq, calls dispatch on Q[i], which dequeues the arguments and calls the corresponding software task, this is done in a while loop until FIFO is empty.
the spawn API
Each software task amounts to a module, which contains a free function fn spawn(args) -> Result<(), (args)> that enqueues the the arguments, if capacity is not reached it will enqueue its index in the FIFO. (args matching the task signature payload.)

POC implementation, using `mod` instead of `const`.

https://github.com/rtic-rs/cortex-m-rtic/tree/spawn_experiment

For a software task (foo in the above), the following code is generated.

///Software task
pub mod foo {
    ...
    pub fn spawn(_0: i32, _1: u32) -> Result<(), (i32, u32)> {
        use rtic::Mutex as _;
        let input = (_0, _1);
        unsafe {
            if let Some(index) = crate::app::foo_FQ.dequeue() {
                crate::app::foo_INPUTS
                    .get_unchecked_mut(usize::from(index))
                    .as_mut_ptr()
                    .write(input);
                crate::app::P1_RQ.enqueue_unchecked((crate::app::P1_T::foo, index));
                rtic::pend(lm3s6965::Interrupt::SSI0);
                Ok(())
            } else {
                Err(input)
            }
        }
    }
}

The actual code generation is done in macros/src/codegen/module, since we generate the spawn function in the namespace of the software task.

The corresponding code:

items.push(quote!(
        #(#cfgs)*
        pub fn spawn(#(#args,)*) -> Result<(), #ty> {
            // #let_instant // do we need it?
            use rtic::Mutex as _;

            let input = #tupled;
            // TODO: use critical section, now we are unsafe
            unsafe {
                if let Some(index) = #app_path::#fq.dequeue() {
                    #app_path::#inputs
                        .get_unchecked_mut(usize::from(index))
                        .as_mut_ptr()
                        .write(input);

                    // #write_instant, do we need?

                    #app_path::#rq.enqueue_unchecked((#app_path::#t::#name, index));

                    #pend

                    Ok(())
                } else {
                    Err(input)
                }
            }
        }));

The current POC is likely unsound, we need critical sections or atomic/lock free implementations of the underlying queues.

Another caveat, is that the RTIC channel analysis is a bit too smart, if a software task is not spawned nor dispatched RTIC will not generate corresponding queues (and dispatcher). For a spawn from anywhere the implementation need to change to always generate code. (The analysis is done in the syntax crate.) If moving to the new spawn from anywhere, the syntax can be simplified, as well as the accompanying analysis and code generation.

Some remarks:
We need to decide where to put the queues. For now they all reside in the centralized namespace of the RTIC mod app, prepended by the task name. However, one could think of using the enclosing modules instead. This is mostly a matter of taste, as access to queues outside the API remains unsafe (due to static declarations) so its not really a matter of safety, nor end-user convenience or application performance (only matters for the implementation behind the scenes).

The dispatcher is currently untouched, and the new spawn implementation can live concurrently with the old spawn. The implementation does not yet completely follow that of the sketched design.

[RFC] Restartable sequences

Summary

Add functionality similar to Linux's restartable sequences, which allows for the running of code sequences atomically, without disabling interrupt handlers.

Motivation

This allows the execution of side-effect-free code atomically without blocking interrupts. For example, a device could repeatedly poll its input pins in the idle task, and continuously commit a fresh, but consistent, view of the state of its inputs to be read and used by other tasks.

Design

Add a per-core sequence number that's atomically incremented at the beginning of every interrupt handler, and run a given lambda until we have the same sequence number before and after.

The implementation should hopefully be fairly straight-forward, and will probably resemble something like the following:

struct RSeq<T, F: Fn() -> Result<T, ()>> {
  f: F,
}

impl<T: Copy, F: Fn() -> Result<T, ()>> RSeq<T, F> {
  fn new(f: F) -> RSeq<T, F> {
    RSeq {
      f
    }
  }

  /// Get the result of an atomically executed restartable sequence.
  fn run(self) -> Result<T, ()> {
    loop {
      let before: u32 = get_sequence_number();
      let value = (self.f)()?;
      let after: u32 = get_sequence_number();
      if before == after {
        return Ok(value);
      }
    }
  }

  /// Execute a restartable sequence and commit its result, atomically.
  fn commit<U>(self, g: impl Fn(T) -> U) -> Result<U, ()> {
    loop {
      let before: u32 = get_sequence_number();
      let value = (self.f)()?;

      let commit = interrupt::free(|| {
        let after: u32 = get_sequence_number();
        if before == after {
          Some(g(value))
        } else {
          None
        }
      });

      if let Some(result) = commit {
        return Ok(result);
      }
    }
  }
}

Example

(handwavey strawman)

static mut INPUT_VALUE: AtomicUsize = AtomicBool::new(0);
static mut BUTTON0_PIN: Option<gpio::gpioa::PA0<gpio::Input<gpio::PullUp>>> = None;
static mut BUTTON1_PIN: Option<gpio::gpioa::PA1<gpio::Input<gpio::PullUp>>> = None;
static mut BUTTON2_PIN: Option<gpio::gpioa::PA2<gpio::Input<gpio::PullUp>>> = None;
static mut BUTTON3_PIN: Option<gpio::gpioa::PA3<gpio::Input<gpio::PullUp>>> = None;

#[app(device = stm32f1xx_hal::stm32)]
const APP: () = {
  #[init]
  fn init() {
    let mut gpioa = device.GPIOA.split(&mut rcc.apb2);

    unsafe {
      BUTTON0_PIN = Some(gpioa.pa0.into_pull_up_input(&mut gpioa.crl))
      BUTTON1_PIN = Some(gpioa.pa1.into_pull_up_input(&mut gpioa.crl))
      BUTTON2_PIN = Some(gpioa.pa2.into_pull_up_input(&mut gpioa.crl))
      BUTTON3_PIN = Some(gpioa.pa3.into_pull_up_input(&mut gpioa.crl))
    };
  }

  #[interrupt]
  fn USB_HP_CXN_TX() {
    // Send the inputs across USB.
  }

  #[idle]
  fn idle() -> ! {
    loop {
      RSeq::new(|| {
        unsafe {
          Ok((
            BUTTON0_PIN.as_ref().unwrap().is_low(),
            BUTTON1_PIN.as_ref().unwrap().is_low(),
            BUTTON2_PIN.as_ref().unwrap().is_low(),
            BUTTON3_PIN.as_ref().unwrap().is_low(),
          ))
        }
      }).commit(|(b0, b1, b2, b3)| {
        unsafe {
          INPUT_VALUE.store(convert_to_bits(b0, b1, b2, b3));
        }
      });
    }
  }
};

Unresolved questions

What should the return type of a restartable sequence be?

Three obvious options:

T
Option<T> (or Result<T, ()>
- +: lets the sequence bail out in commit without taking a critical section
Result<T, E>
- +: lets the user return an error while bailing out in commit without taking a critical section
- +: maybe lets the compiler generate the better code if the error type is !?
- -: breaks type inference?

What do we do if we can never finish?

If there's too much work in the restartable sequence relative to the frequency of interrupts, we might never actually commit, and it probably won't be super obvious what's happening. Should we have a cap (user specified? hard-coded assumption?) on the number of retries we attempt before we assume that we're never going to succeed, so that we can return failure instead of just silently doing nothing but generating heat?

Is it possible to get rid of the horrible `unsafe`s?

Shared access to resources (#129) might help?

Exporting app description during compliation for tracing purposes?

Yesterday I met with @perlindgren to discuss the state of RTIC Scope now that a v0.2.0 release is approaching. The topic of how RTIC Scope associates ITM packets with RTIC tasks was discussed. Currently, in a preparatory information recovery step before the target is flashed and traced, the source code of the RTIC application is parsed so that the #[app(...)] mod app { ... } module can be extraced and forwarded to rtic_syntax::parse2. From the returned structures, the hardware tasks and their binds are read and thus all necesarry information required to associate ITM packets (relating to interrupts) to RTIC tasks have been recovered.

This approach is not stable. Among other reasons,rtic_scope is not meant to be used as a library and it has yet to reach a stable release. Using it for information recovery will be a game of catch-up which I'd like to avoid. I believe it is of interest that RTIC Scope does not succumb to entropy (too quickly) after my thesis is done later this year. This will require off-loading some work to upstream RTIC instead.

During the meeting the possibility of extracting a description of the RTIC app during compilation came up. For example, a JSON description that the tracer (RTIC Scope, or something else) catches and deserializes. This description would, for example, contain a list of all the tasks and what interrupts they are bound to. This description could be locked behind some #[rtic::app(export_json_description=true)] argument flag.

Pros

Information structures are already available. #[rtic::app] just needs to export it to JSON. An initial implementation can probably derive these structures from serde.
Less source code parsing required for tracers; easier implementations.
rtic_syntax will only be used for these serde structures unless moved to some other crate.

Cons

More to maintain in RTIC.

Possible pitfalls

For software task, an auxiliary cortex-m-rtic-trace crate can be used for its setup functions and #[trace] macro. During recovery the source code is parsed again so that these can be counted and associated unique IDs and task names. #[trace] is a simple macro that wraps the decorated function with two statements: one for when the task enters, and one for then it exits. E.g.

#[trace]
fn some_task() {
    let _x = 42;
    
    #[trace]
    fn nested() {}
    nested()
}

#[trace]
fn some_other_task() {
    let _y = 42;
}

expands to

fn some_task() {
    cortex_m_rtic_trace::__software_task_enter(0);
    let _x = 42;
    fn nested() {
        cortex_m_rtic_trace::__software_task_enter(1);
        cortex_m_rtic_trace::__software_task_enter(1);
    }
    nested();
    cortex_m_rtic_trace::__software_task_exit(0);
}

fn some_other_task() {
    cortex_m_rtic_trace::__software_task_enter(2);
    let _x = 42;
    cortex_m_rtic_trace::__software_task_exit(2);
}

Can #[rtic::app] find the #[trace] macros and record it as some "unknown" macro to the associated function and add it to the JSON description?

I'll let this simmer a bit and bring it up in a later weekly RTIC meeting. Afterwards I'll draft up an RFC if we decide to go ahead with this.

Anything to amend, @perlindgren?

[RFC] ride the Rust trains

Summary

Publish pre-releases of new minor versions, e.g. v0.5.0-beta.1, that target
the beta channel of the Rust compiler. When the beta release of the compiler
makes it way into stable publish the final release of the new minor version,
e.g. v0.5.0.

Motivation

The framework currently depends on (or works around) unstable core APIs like
MaybeUninit and unstable language features like const_fn, but thankfully these
features will soon make their way into stable.

We'd like to make the benefits of internally using these soon-to-be-stabilized
APIs / features to our stable users as soon as possible, but at the same time
we'd like to have some time to test these changes before publishing a new minor
version.

Doing pre-releases that target beta hits the sweet spot between these two
requirements. It gives us a 6 week test cycle while at the same time letting
stable users evaluate the next minor version of RTFM without switching to
the nightly channel.

Design

Let's illustrate the proposed timeline with an example.

Assume that MaybeUninit will be stabilized by 1.35. This means that the API
will not require a feature gate by 1.35-beta, which is scheduled for release
on April 12.

The idea would be to pre-release RTFM v0.5.0-alpha, which internally uses
MaybeUninit and requires rustc 1.35-beta, on that date or slightly later and
then have a 3-week alpha period. During the alpha period we are allowed to
make breaking changes to the API of RTFM, e.g. v0.5.0-alpha.2 changes an API
introduced in v0.5.0-alpha.1.

Once the alpha period ends we'll publish v0.5.0-beta.1 and start a 3-week beta
period where no breaking changes can occur, e.g. v0.5.0-beta.2 can fix bugs but
not change the API exposed by v0.5.0-beta.1.

When Rust 1.35 is released on May 24, or slightly later, we'll release v0.5.0 of
RTFM and yank all the alpha and beta pre-releases.

The table below shows the full timeline for our example

Date	Stable	Beta	RTFM
March 01	1.33	1.34
April 12	1.34	1.35	0.5.0.alpha
May 3	1.34	1.35	0.5.0-beta
May 24	1.35	1.36	0.5.0

These are meant to be guidelines; the RTFM devs may decide to tweak the
timeline, for example by shrinking the alpha period to 2 weeks or extending it
to 4 weeks.

Add support for the async functions

(Sorry if this is duplicate but the only similar thing I found was issue #50 )

Embedded development often depends on the state machines. The most convenient way of encoding state machines in Rust is async functions. While it is possible to encode it using rtic it involves a bit of unsafe boilerplate which would ideally be moved to macros.

For example here is a fragment of a proof of concept I did for myself:

// Access to local and shared variables
static mut UART_READ_CONTEXT: Option<core::ptr::NonNull<uart_read_task::Context>> = None;
#[task(
    local = [
        uart_read_consumer, // The queue for new futures (part of implementation)
        red_led // An example resource
    ]
)]
// A task to wake up when there is work to do
fn uart_read_task(mut ctx: uart_read_task::Context) {
    // We need to store a future somewhere
    static mut UART_READ_FUTURE: Option<core::mem::MaybeUninit<<UartReadAsyncTask as AsyncTask>::Future>> = None;
    // Waker vtable
    static mut UART_READ_FUTURE_VTABLE: core::task::RawWakerVTable = core::task::RawWakerVTable::new(
        |data| unsafe { core::task::RawWaker::new(data, &UART_READ_FUTURE_VTABLE) },
        |_| { let _ = uart_read_task::spawn(); },
        |_| { let _ = uart_read_task::spawn(); },
        |_| {}
    );
    unsafe {
        if UART_READ_FUTURE.is_none() {
            // There is no running future - spawn a new one if queue is not  empty
            UART_READ_FUTURE = ctx.local.uart_read_consumer.dequeue().map(|rlt| {
                core::mem::MaybeUninit::new(rlt.run())
            });
        }
        // If there is a running future - run it
        if let Some(future) = UART_READ_FUTURE.as_mut() {
            use core::future::Future;
            use core::task::*;
            UART_READ_CONTEXT = core::ptr::NonNull::new(core::mem::transmute(&mut ctx));
            let future = core::pin::Pin::new_unchecked(&mut *future.as_mut_ptr());
            let waker = Waker::from_raw(RawWaker::new(core::ptr::null(), &UART_READ_FUTURE_VTABLE));
            let mut task_context = Context::from_waker(&waker);
            match future.poll(&mut task_context) {
                Poll::Pending => {},
                Poll::Ready(()) => {
                    UART_READ_FUTURE = None;
                }
            }
            UART_READ_CONTEXT = None;
        }
    }
}

pub struct UartReadAsyncTask {
    uart: uart::Uart<uart::Config<uart::Pads<sercom::Sercom3, sercom::IoSet1, gpio::Pin<gpio::PA23, gpio::Alternate<gpio::C>>, gpio::Pin<gpio::PA22, gpio::Alternate<gpio::C>>>, uart::EightBit>, uart::Duplex>
}

struct UartReadFutureContex;

impl UartReadFutureContex {
    // This allows to access to context. Chances are this is not really needed for full implementation
    // but is a required workaround as I don't know RTIC data true lifetimes
    fn lock<F, R>(&mut self, f: F) -> R
    where
        F: FnOnce(&mut uart_read_task::Context) -> R
    {
        unsafe {
            f(UART_READ_CONTEXT.unwrap().as_mut())
        }
    }
}

impl AsyncTask for UartReadAsyncTask {
    type Output = ();
    type Future = impl core::future::Future<Output = Self::Output>;
    fn run(self) -> Self::Future {
        let UartReadAsyncTask {
            uart
        } = self;
        uart_read_task_async(UartReadFutureContex, uart)
    }
}

async fn uart_read_task_async(mut ctx: UartReadFutureContex, mut uart: uart::Uart<uart::Config<uart::Pads<sercom::Sercom3, sercom::IoSet1, gpio::Pin<gpio::PA23, gpio::Alternate<gpio::C>>, gpio::Pin<gpio::PA22, gpio::Alternate<gpio::C>>>, uart::EightBit>, uart::Duplex>) {
    // Just an echo
    let mut next_light_level = embedded_hal::digital::v2::PinState::High;
    loop {
        let word = uart.async_read().await.unwrap();
        if word == b'\r' {
            ctx.lock(|ctx| {
                ctx.local.red_led.set_state(next_light_level).unwrap();
                next_light_level = !next_light_level;
            });
        }
        uart.async_write(word).await.unwrap();
        if word == b'\r' {
            uart.async_write(b'\n').await.unwrap();
        }
    }
}

// WAR for lack of typeof
pub trait AsyncTask {
    type Output;
    type Future: core::future::Future<Output = Self::Output>;
    fn run(self) -> Self::Future;
}

And the spawn equivalent would be:

uart_read_producer.enqueue(UartReadAsyncTask {
    uart: uart3
}).map_err(|_| ()).unwrap();
let _ = uart_read_task::spawn();

Currently this requires:

#![feature(generic_associated_types)]
#![feature(type_alias_impl_trait)]
#![feature(pin_static_ref)]

Modular RTIC

A place to discuss the design of modular RTIC.

Lockall API

Problem

The current lock API allows locking only a single resource. Thanks to the Mutex design, it is possible to lock multiple resources, but requires some "boilerplate" code. E.g.,

    #[task(shared = [shared1, shared2, shared3])]
    fn locks(c: locks::Context) {
        let s1 = c.shared.shared1;
        let s2 = c.shared.shared2;
        let s3 = c.shared.shared3;

        (s1, s2, s3).lock(|s1, s2, s3| {
            *s1 += 1;
            *s2 += 1;
            *s3 += 1;

            hprintln!("Multiple locks, s1: {}, s2: {}, s3: {}", *s1, *s2, *s3).unwrap();
        });

        debug::exit(debug::EXIT_SUCCESS); // Exit QEMU simulator
    }

RTIC ensures at compile time (and at zero-cost) that each resource can only be locked once. (Allowing a resource to be locked numerous times would break with the soundness invariant for mutable aliasing.) This is done by moving in the resource proxy in the lock (the lock consume the proxy). While elegant, this requires the resource proxies to be split before the lock.

The multi-lock will conceptually lead to a nested critical sections: something like (s1.lock(s2.lock(s3.lock(...)...)...). In case resources have different ceilings, the rendered code may (dependent on lock order) perform numerous system ceiling updates.

Currently multi lock is limited to a fixed set of nesting levels. While this is typically not an issue in practice, it feels from a design point of view a bit unsatisfying.

Suggested alternative approach

The complete set of shared resources for the task can be locked at once, e.g.:

    #[task(shared = [shared1, shared2, shared3])]
    fn locks(c: locks::Context) {
        c.locks(|s| {
            *s.shared1 += 1;
            *s.shared2 += 1;
            *s.shared3 += 1;
            
            hprintln!("Multiple locks, s1: {}, s2: {}, s3: {}", *s.shared1, *s.shared2, *s.shared3).unwrap();
        });

        debug::exit(debug::EXIT_SUCCESS); // Exit QEMU simulator
    }

and, it should also allow

    #[task(shared = [shared1, shared2, shared3])]
    fn locks(c: locks::Context) {
        c.locks(| Shared {shared1, shared2, shared3} | {
            **shared1 += 1;
            **shared2 += 1;
            **shared3 += 1;
            
            hprintln!("Multiple locks, s1: {}, s2: {}, s3: {}", *shared1, *shared2, *shared3).unwrap();
        });

        debug::exit(debug::EXIT_SUCCESS); // Exit QEMU simulator
    }

Soundness

The lock will consume the entire lock proxy structure, so the "locked once" invariant is preserved, and thus soundness can be maintained.

Performance

This approach will lead to a single critical section (and thus reduce the number of system ceiling updates compared to the current-multi lock).

Implementation

While seemingly straightforward, the devil is always in the details. Under the hood a resource proxy for the complete set of shared resources needs to be generated for each context (task). Additional ceiling analysis is required. In release mode the compiler should be able to optimize out overhead leading to a zero-cost abstraction (but it needs to be implemented and verified).

Design decisions include: where the additional types should be created. What name should be adopted for the passed structure (in the second example above Shared was suggested).

Pros and cons

Foreseen advantages: Better ergonomics. Guaranteed performance (as good or better than current multi-lock). The artificial limitation of number of lockable resources is limited (though this is in practice likely not a problem).

Foreseen disadvantages: Added complexity to RTIC code generation. New API may lead to users to unnecessary locking (as a lock-all is so easy and "inviting"). Lock-all will lock immediately at the highest ceiling, thus in cases restricting concurrency.

Conclusion

An easy to use and understand API extension. As long as the programmer understands the implication the programmer can take a deliberate design decisions, taking impact to concurrency in consideration.

Status

POC in branch lockall

Tracing RTIC tasks over ITM

I'm working on ITM trace support for RTIC application tasks. At the moment, I have proof of concepts for tracing both software and hardware tasks. The approaches to trace them differ: hardware tasks are practically traced for free while software tasks are traced with the overhead of a dedicated DWT comparator and a u32 register write on task enter/exit (depending on dispatcher configuration). The approaches are detailed below. The purpose of this thead is to discuss the final implementation; whether or not it should be included into RTIC or be offered as a seperate crate, etc.

Common between both approaches is the setup of ITM and DWT. And of course the serial link between target and host. My setup pushes the ITM packets over SWO which I then read with an hs-probe. The received bitstream is then decoded via itm-decode.

Hardware tasks

From the ITM we get TracePacket::ExceptionTraces containing integers by enabling exception traces in ITM/DWT. The integer is an offset of the vector table and must be translated back to the correct PAC::Interrupt enum. This is done by

parsing the source code of the RTIC application with rtic_syntax::parse2;
parsing the device argument into the form of first::second where
it is assumed that
- first is the name of the PAC crate;
- second is the crate feature that must be enabled for the PAC crate;
- the interrupts of the device are available under first::second::Interrupts
For example, stm32f4::stm32f401 is a valid device argument.
parsing out all binds used by the application;

building an adhoc cdylib crate that maps the application binds to their interrupt numbers. For example, if an stm32f4::stm32f401 application binds EXTI0 and EXTI1, the generated crate contains

 use cortex_m::interrupt::Nr;
 use stm32f4::stm32f401::Interrupt;

 #[no_mangle]
 pub extern fn rtic_scope_func_EXTI0() -> u8 {
     Interrupt::EXTI0.nr()
 }

 #[no_mangle]
 pub extern fn rtic_scope_func_EXTI1 () -> u8 {
     Interrupt::EXTI1.nr()
 }

This crate (shared object) is then dynamically loaded and a Map<Ident, u8> is built.

See https://github.com/tmplt/rtic-scope for the implementation (sans all the error handling a program like this should have). An example run yields

git clone https://github.com/tmplt/rtic-scope.git
cd rtic-scope/scope
cargo run -- ../rtic-apps/src/bin/tracing-example.rs
[...]
blah binds EXTI0 (6)
exti1 binds EXTI1 (7)
exti2 binds EXTI2 (8)

where the number in the parentheses are the exception numbers of the binds.

Software tasks

Software tasks are governed by the set of dispatchers declared in the rtic::app macro, and can technically be traced just like hardware tasks if each dispatcher only handles a single task. This approach has not been further investigated; it is assumed that dispatchers usually handle multiple software tasks. Thus, to properly trace them, a DWT comparator is employed to emit a single TracePacket::DataTraceValue when a specific address is written to. The idea is then to assign a unique integer ID to each software task (preferrably starting from 0 so as to minimize the payload sent over ITM) that is written to this address when the softare task is entered and exited. A Map<Ident, u8> would then be exposed by rtic_syntax to associate the software task ID back to its name on the host.

A proof of concept is this application using this external crate. The decoded trace contains

ExceptionTrace { exception: ExternalInterrupt(7), action: Entered }
DataTraceValue { comparator: 1, access_type: Write, value: [42, 0, 0, 0] }
ExceptionTrace { exception: ExternalInterrupt(7), action: Exited }

Bringing it all together

Now that ITM packets are emitted at the start and end of both software and hardware tasks, we enable both local and global timestamps so that we can associate timestamps to enter/exit events of each task. Global timestamps are emitted in intervals and denote the time since target boot; local timestamps are emitted after every set of back-to-back ITM packets and denote the time since the last local timestamp. The foundation for this association has been done in itm-decode. In short, a decode yields a (Vec<TracePacket>, Timestamp), the timestamp containing a base from the latest global timestamp and a delta from the sum of the local timestamps since the last global.

With all this in hand we can graphically plot the execution trace over time, alike an osilloscope (but RTIC tasks enter/exit statuses instead of signals) as @perlindgren so described it when he roped me into this thesis. @Yatekii has been working on a proof-of-concept web browser frontend application that does just this.

Questions

Should tracing be added to RTIC, or be implemented as some external rtic-rs/cortex-m-rtic-trace crate?

Pros if included in RTIC:

Enabling tracing could be as simple as setting some trace = true argument that configures ITM/DWT before init. For software tasks, some trace_dwt_unit = DWT0 argument must also be passed. Code is then injected into the dispatchers to write the software task ID on enter/exit.

Cons:

Adds complexity to an already complex crate, for a feature I predict a very small subset of users will utilize.
If the target does not support some ITM/DWT feature required by the tracing implementation, there is no way of notifying the user.

Pros if implemented in a external crate:

Adds no complexity to common RTIC crates.
Likely much more flexible.
If some required hardware feature is missing, the proper error type can simply be returned and handled in init.

Cons:

A small amount of boilerplate for the user: either one must add calls that write to the watched address at the start and end of software tasks or decorate them with some other macro (assuming procedural macros can be chained). This, however, also enables the user to trace subsections of their tasks (and even allowing them to assign custom IDs if they so want). Perhaps some:
```
#[task]
fn important(_: important::Context) {
    #[trace("some important section")]
    {
        // ...
    }
    
    // code we dont care to trace
    
    #[trace("some other important section")]
    {
        // ...
    }
}
```

Problems in either case:

From what I can gather, all ITM/DWT configuration can be done via cortex_ms CorePeripherals, but what should be done if we need to touch the PAC for something vendor-specific? Establing some ITM/DWT trait and make HALs implement them?

Thoughts?

Statistics feature

Do we want to add (opt-in) run-time collection of statistics.
What statistics should we support (e.g., for each msg queue, the max used nr of elements), etc.
Should we have an API for accessing the statistics, (or should it be just accessible from gdb).
Other thoughts on statistics...

Multiple Timer Queues and Timer trait

Rough idea/placeholder for discussion.

Motivation:

Currently Systick is used for triggering timer events. While this is a viable approach it comes with some cons.

Systick is only 24 bits, causing OH in case the next message lies further into the future than representable by 24 bits. (Some architectures may allow for dividing the core clock, which could extend the "horizon".)
Using a single "tick" source, we can have only a single timer queue. The timer task executing on behalf of a message to be dispatched at lower priority than the "tick" source causes interference (sort of a priority inversion) to the system.
The complexity of insertion to the (single) timer queue is log n + some, meaning larger n gives higher cost.

Idea.
Allow for multiple timer queues and a Timer trait for the "tick" implementation. These are two separate issues but not orthogonal, since to implement multiple timer queues we need multiple timers (and to that end we need a Trait based approach, as RTFM is device agnostic.

It would also allow for the trade off between range and precision. E.g., a 32 bit timer, with a pre-scaler to the core clock of 2^16, would give us a 48 bit range (compared to core) with a granularity of 2^16 (compared to core). This could be very useful messages postponed far in the future, for which the granularity will be sufficient. In combination with low-power modes (wfi) this can save energy. One could even think of RTC based implementations of the Timer trait, allowing for deeper sleep modes (but I think we defer that to another "issue", as wake-up would likely require special handling.) On systems with only 16 bit timers (like the ST L0:s), pre-scaling and higher granularity in combination is already in the works.

Syntax:
Multiple timer queues would require some additional information passed to app. In a first step one could make the binding of schedule to identify the Timer implementation. (In the long term, one could think of giving a free list of Timer implementers, and for RTFM to do the best fit assignment of message queues (and set the pre-scaling for each timer), based on timing requirements for the messages. (But this requires additional information/syntax extensions, this will go in a separate "issue" as well.)

References:
For the original C implementation of RTFM, we made some initial experiments with Abstract Timers and Their Implementation, exploring (some of the) potential improvements.

http://ltu.diva-portal.org/smash/record.jsf?pid=diva2%3A1013030&dswid=-513

rtic-rs / rfcs Goto Github PK

rfcs's Introduction

Real-Time Interrupt-driven Concurrency

The developers

Hibernating

Contact

RFCs

rfcs's People

Contributors

Stargazers

Watchers

Forkers

rfcs's Issues

Feature requests

Motivation

Overall Design

Considerations

Zero Cost?

Implementation complexity.

Limitations and correctness.

Motivation

Implementation

Notes for lock optimization

Idea

Safety

Implementation

Evaluation

Limitations and Drawbacks

Observations

Summary

Current behavior

Proposal

Proposed syntax

Not in scope

Issues to tackle

Summary

Background

Design

Unresolved questions

Summary

Background

Design

Drawbacks

Resource proxy mutability

Mutable references to resource proxies

Pros

Cons

Alternative approach

Pros

Cons

Soundness

Complexity of implementation

Breaking change?

EDIT 2020-04-38

Background

layout-trait

Evaluation and soundness

Peripheral Proxies and HALs

Related work

Future Possibilities

Synchronous tasks

Background

Message queues

Schedulability

Why bother?

Potential solutions.

Spawn from anywhere, 2020-10-04

Syntax:

Implementation/code gen:

POC implementation, using mod instead of const.

Summary

Motivation

Design

Example

Unresolved questions

What should the return type of a restartable sequence be?

What do we do if we can never finish?

Is it possible to get rid of the horrible unsafes?

Pros

Cons

POC implementation, using `mod` instead of `const`.

Is it possible to get rid of the horrible `unsafe`s?