How should we expose synchronization primitives to implement a lock-free executor for

I would ask <a class="user-mention notranslate" data-hovercard-type="user" data-hoverc

repost from closed PR: Ideas discussed today (12/1/15): <p dir="

Lock-free executor about rclcpp HOT 9 OPEN

ros2 commented on July 18, 2024

Lock-free executor

from rclcpp.

Comments (9)

jacquelinekay commented on July 18, 2024

I'm going to look at abstracting out std::mutex and locking/synchronization logic from rclcpp this week.

from rclcpp.

jacquelinekay commented on July 18, 2024

Here is a naive lock-free executor implementation with "n" threads that uses a lock-free queue:

Start one writer thread and n - 1 reader threads.

Writer thread:

  while (rclcpp::utilities::ok() && spinning.load()) {
    // get_next_executable will block until work is available. A timeout can be added
    auto any_exec = get_next_executable();
    lock_free_queue.push(any_exec);
  }

Reader thread:

  while (rclcpp::utilities::ok() && spinning.load()) {
    auto any_exec = lock_free_queue.pop();
    execute_any_executable(any_exec);
  }

This implementation relies on two fundamental assumptions:

lock_free_queue is a correct implementation of a lock-free single producer multi-consumer queue. I will probably use an external for this, e.g. Boost lockfree: http://www.boost.org/doc/libs/1_59_0/doc/html/lockfree.html
execute_any_executable is a read-only operation.

We think that 2 is a reasonable assumption, but it might not be true, given the difficulty we've had with MultiThreadedExecutor in Opensplice.

from rclcpp.

dirk-thomas commented on July 18, 2024

Before considering a dependency on boost we should make sure that it doesn't affect using the real time approach on a embedded devices. What are the effects on the library/executable size and/or memory requirements when adding it?

from rclcpp.

jacquelinekay commented on July 18, 2024

I would ask @codebot to weigh in here about Boost and embedded devices.

From what I know, for the class of 32-bit microcontrollers that Morgan is focusing on, you generally have one core and probably don't even have an operating system, so multithreading is both impractical and unavailable. Therefore you wouldn't need a lock-free executor and should just use the SingleThreadedExecutor instead. The lock-free executor is for deterministic synchronization in a multi-threaded setting.

As for the case of multithreading on an embedded device with multiple cores, I don't know what the numbers are like off the top of my head. From referencing Morgan's 2015 ROSCon talk, the next larger class of microcontrollers from 32-bit MCUs, which includes small multicore systems, probably has at least 1GB of RAM. From briefly looking at Stack Overflow and other sources, compiling a program with Boost can have an overhead on the order of 1 MB. http://stackoverflow.com/questions/2839172/why-my-c-output-executable-is-so-big Adding 1MB executable size when you have 1GB of RAM sounds safe to me, but then again I don't have any embedded systems experience.

This also means that the lock-free executor using Boost must be in a separate library from other stuff that would be interesting to an embedded developer, since the STM32-class of microcontrollers will have RAM in the order of KB.

The Boost lockfree queue is designed to be allocation free. From http://www.boost.org/doc/libs/1_59_0/doc/html/lockfree.html:
"... all data structures of boost.lockfree can be configured to avoid memory allocations (instead the specific calls will fail). This is especially useful for real-time systems that require lock-free memory allocations."

from rclcpp.

jacquelinekay commented on July 18, 2024

As a matter of fact, there's a problem with using the Boost lockfree queue here that points out a fundamental problem with this outline.

boost::lockfree::queue requires that T be trivially constructible/destructible, i.e. a Plain Old Datatype.

executor::AnyExecutable does not fit this bill, since it contains shared pointers as member variables and its destructor compares and exchanges an atomic boolean (thus it is not trivial).

I asked about this on Stack Overflow and first suggestion was to use Intel TBB concurrent_queue:

http://stackoverflow.com/questions/34009485/non-pod-types-in-lock-free-data-structures

which is not a lock free, but is very fast.

A discussion on the TBB forums leads me to believe that most lock-free data structures will be restricted in the size of the types because you need to use a DCAS (double word compare and swap) instruction to remove a node from a singly linked list. I won't pretend to understand everything that's going on here, but it sounds like to use a truly lock-free queue I would have to store raw pointers instead of shared pointers, which would require a bit more refactoring and forethought.

from rclcpp.

wjwwood commented on July 18, 2024

There maybe other issues that you cannot work around but for the trivially constructable issue you could try swapping those to weak pointers.

from rclcpp.

jacquelinekay commented on July 18, 2024

It appears that weak_ptr does not have a trivial destructor.

from rclcpp.

jacquelinekay commented on July 18, 2024

repost from closed PR:

Ideas discussed today (12/1/15):

We can only put POD structures into lock-free queues. That's how we got to using raw pointers here.
Storing raw pointers is dangerous; we could consider some kind of mock object instead, but still have a problem of managing ownership and storage of the underlying data.
We're currently not queuing work, but rather relying on DDS to manage it. It's not ideal to introduce a queue on our side, irrespective of whether it's lock-free or not.

To make a decision on how to proceed here, we need to get a better idea of the requirements. @jacquelinekay, please write up a proposal for what is and what is not supported in a real-time-safe / lock-free manner, e.g.:

are message-based callbacks supported?
vs. just timers, similar to how OROCOS RTT works: a real-time-safe component can only ask to be invoked at a given frequency; when it wakes up, it is able to check for work
are multiple threads operating on a single subscriber supported?
- vs. just single-threaded executors for real-time-safe work (could have implications on design of executor to allow, for example, a node with one real-time subscription and one non-real-time subscription)

from rclcpp.

gbiggs commented on July 18, 2024

For what it's worth, the RTM executor works in the same way as Orocos's. A real fine component is limited to periodic invocation.

Being able to have an event-driven node that is real-time safe would be great, but I suspect that the majority use case for real-time is periodically executing control loops. Even a (relatively linear) data processing pipeline executing in such an environment seems likely to be executed at a fixed rate with each step executed in turn in a fixed sequence.

Some data on the commonality of real-time event-based systems would be useful here; RTM doesn't have any amongst its users but that's because it doesn't support it.

from rclcpp.

Lock-free executor about rclcpp HOT 9 OPEN

Comments (9)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent