Git Product home page Git Product logo

Comments (11)

wjwwood avatar wjwwood commented on July 4, 2024

Well, I'm not sure that's the right fix for this. Instead, maybe there should be a thread-agnostic signal handler in the Python layer that acts like the rclcpp signal handler which triggers the sigint guard condition for all wait sets:

So to recap, I think the issue is that rclpy_wait (which as you said calls rcl_wait) is not thread-safe and should be. I think the solution is to have one signal handler which is registered during "rclpy init" (either in C or Python, not sure which yet), rather than within rclpy_wait, and then multiple guard conditions, one for each wait set just like rclcpp.

I'm not sure it ever make sense to have one "stored signal handler to be restored" for each thread. I did mention that when we were talking in person, but after thinking about it more, I'm not sure that would work correctly.

from rclpy.

pbaughman avatar pbaughman commented on July 4, 2024

@dejanpan

I think we're getting burned by this right now. We have a python process with multiple nodes and multiple executors on different threads, but our process is still blowing up with

"Executor.wait_for_ready_callbacks() called concurrently in multiple threads"

Apparently in c++ it's OK to run multiple executors on multiple threads, but in python we're being prevented from doing this by the global g_wait_set_spinning bool defined in the module https://github.com/ros2/rclpy/blob/master/rclpy/rclpy/executors.py#L34

We can work around this by having a single worker thread pump messages, but this will mean needing additional synchronization whenever we want to access data received by a subscriber from the 'main' thread

from rclpy.

esteve avatar esteve commented on July 4, 2024

I think the following might work:

  • when a WaitSet is created, create a GuardCondition and add it as a
    PyCapsule to the Python object for the WaitSet
  • in rclpy_wait, retrieve the GuardCondition from the PyCapsule
  • wait on the GuardCondition before calling rcl_wait, after
    https://github.com/ros2/rclpy/blob/master/rclpy/src/rclpy/_rclpy.c#L2572
  • trigger the GuardCondition once rcl_wait finishes to let other rclpy_wait calls run

I'll go ahead and implement this, but feel free to add any feedback in
case there's something I'm missing.

from rclpy.

esteve avatar esteve commented on July 4, 2024

@wjwwood I've been working on this, but then I realized that I'd have to call rcl_wait twice, once with a WaitSet that only contains the GuardCondition and a timeout set to -1, and then for the existing rcl_wait call. Is that what you thought or is there more to it? Thanks.

from rclpy.

wjwwood avatar wjwwood commented on July 4, 2024

@esteve sorry, I haven't had time to try and wrap my head around it. I'll get back to you asap.

from rclpy.

dejanpan avatar dejanpan commented on July 4, 2024

@esteve did you want to open a PR that @wjwwood or someone else can look at or how did we want to proceed with this one?

from rclpy.

esteve avatar esteve commented on July 4, 2024

@dejanpan I wanted to run it by @wjwwood first to make sure I understood the problem correctly, but I'll create a pull request with what I have just to have something tangible that we discuss over.

from rclpy.

wjwwood avatar wjwwood commented on July 4, 2024

I don't fully follow your second comment about needing to call it twice, but from the bullet list proposal, I'd wonder if it actually solves the issue, because if a waitset is waiting and has not work, but another waitset is waiting on it to finish before actually waiting despite the fact that it has data available, then you'd starve a ready waitset with a blocking waitset. Which effectively doesn't let you wait from two threads at the same time, if I understand it correctly.

Perhaps I don't have a grasp on the solution you've proposed yet. I'm happy to keep discussing it, but a pr, even a partial one, might be a more concrete thing to discuss.

Sorry for taking so long to look at it.

from rclpy.

esteve avatar esteve commented on July 4, 2024

@wjwwood I pushed master...esteve:thread_safe_rclpy_wait but I'm not sure if that's the correct approach. @pbaughman do you have a test or a script that fails? I'd like to check if my branch fixed the bug. Thanks.

from rclpy.

pbaughman avatar pbaughman commented on July 4, 2024

@esteve - I do not - I'm actually not exactly sure what subtle failures were seen before the big scary runtime error RuntimeError: Executor.wait_for_ready_callbacks() called concurrently in multiple threads' was added.

Maybe something like this:

import threading                                                  
import time                                                       
                                                                  
import rclpy                                                      
import rclpy.executors                                            
import std_msgs.msg                                               
                                                                  
                                                                  
class ThreadSpinner(threading.Thread):                            
                                                                  
    def __init__(self, node_name):                                
        super().__init__()                                        
        self._node = rclpy.create_node(node_name)                 
        self._sub = self._node.create_subscription(               
            std_msgs.msg.String,                                  
            "/chatter",                                           
            self._cb                                              
        )                                                         
                                                                  
    def run(self):                                                
        executor = rclpy.executors.SingleThreadedExecutor()       
        try:                                                      
            executor.add_node(self._node)                         
            while rclpy.ok():                                     
                executor.spin_once(1.1)                           
        finally:                                                  
            executor.remove_node(self._node)                      
                                                                  
    def _cb(self, msg):                                           
        print("{} got {}".format(self._node.get_name(), msg.data))
                                                                  
                                                                  
rclpy.init()                                                      
                                                                  
spinner1 = ThreadSpinner("Node1")                                 
spinner2 = ThreadSpinner("Node2")                                 
                                                                  
spinner1.start()                                                  
spinner2.start()                                                  
                                                                  
try:                                                              
    while True:                                                   
        time.sleep(100)                                           
except KeyboardInterrupt:                                         
    rclpy.shutdown()                                              
terminal 1> ros2 run demo_nodes_py talker
terminal 2> python3 script.py
# Expected output:
Node1 got Hello World: 10
Node2 got Hello World: 10
Node1 got Hello World: 11
Node2 got Hello World: 11
Node1 got Hello World: 12
Node2 got Hello World: 12
# etc. . .

Would probably want to run it for a while to make sure both nodes continue to receive data

from rclpy.

wjwwood avatar wjwwood commented on July 4, 2024

I'm actually not exactly sure what subtle failures were seen before the big scary runtime error

It is to do with the management of the signal handlers from within the C extension, as the description of the issue says:

Before rcl_wait is called, an existing signal handler is stored so that it can be called from within rclpy's signal handler. The existing signal handler is restored after waking up.

This is not currently robust to rclpy_wait being called from multiple threads; each thread ought to store its own signal handler to restore.

So, running it for a long time is not the issue, I think instead you'd need to run it, sigint it, run it again, in a loop until you see (or do not see) some failure or deadlock.

from rclpy.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.