Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Threads should have their own signal handler #192

Closed
dhood opened this issue Jun 19, 2018 · 11 comments · Fixed by #308
Closed

Threads should have their own signal handler #192

dhood opened this issue Jun 19, 2018 · 11 comments · Fixed by #308
Assignees
Labels
enhancement New feature or request

Comments

@dhood
Copy link
Member

dhood commented Jun 19, 2018

Before rcl_wait is called, an existing signal handler is stored so that it can be called from within rclpy's signal handler. The existing signal handler is restored after waking up.

This is not currently robust to rclpy_wait being called from multiple threads; each thread ought to store its own signal handler to restore.

@dhood dhood added the enhancement New feature or request label Jun 19, 2018
@wjwwood
Copy link
Member

wjwwood commented Jun 19, 2018

Well, I'm not sure that's the right fix for this. Instead, maybe there should be a thread-agnostic signal handler in the Python layer that acts like the rclcpp signal handler which triggers the sigint guard condition for all wait sets:

So to recap, I think the issue is that rclpy_wait (which as you said calls rcl_wait) is not thread-safe and should be. I think the solution is to have one signal handler which is registered during "rclpy init" (either in C or Python, not sure which yet), rather than within rclpy_wait, and then multiple guard conditions, one for each wait set just like rclcpp.

I'm not sure it ever make sense to have one "stored signal handler to be restored" for each thread. I did mention that when we were talking in person, but after thinking about it more, I'm not sure that would work correctly.

@pbaughman
Copy link

pbaughman commented Feb 1, 2019

@dejanpan

I think we're getting burned by this right now. We have a python process with multiple nodes and multiple executors on different threads, but our process is still blowing up with

"Executor.wait_for_ready_callbacks() called concurrently in multiple threads"

Apparently in c++ it's OK to run multiple executors on multiple threads, but in python we're being prevented from doing this by the global g_wait_set_spinning bool defined in the module https://github.com/ros2/rclpy/blob/master/rclpy/rclpy/executors.py#L34

We can work around this by having a single worker thread pump messages, but this will mean needing additional synchronization whenever we want to access data received by a subscriber from the 'main' thread

@esteve
Copy link
Member

esteve commented Feb 12, 2019

I think the following might work:

  • when a WaitSet is created, create a GuardCondition and add it as a
    PyCapsule to the Python object for the WaitSet
  • in rclpy_wait, retrieve the GuardCondition from the PyCapsule
  • wait on the GuardCondition before calling rcl_wait, after
    https://github.com/ros2/rclpy/blob/master/rclpy/src/rclpy/_rclpy.c#L2572
  • trigger the GuardCondition once rcl_wait finishes to let other rclpy_wait calls run

I'll go ahead and implement this, but feel free to add any feedback in
case there's something I'm missing.

@esteve
Copy link
Member

esteve commented Feb 12, 2019

@wjwwood I've been working on this, but then I realized that I'd have to call rcl_wait twice, once with a WaitSet that only contains the GuardCondition and a timeout set to -1, and then for the existing rcl_wait call. Is that what you thought or is there more to it? Thanks.

esteve added a commit to esteve/rclpy that referenced this issue Feb 12, 2019
@wjwwood
Copy link
Member

wjwwood commented Feb 15, 2019

@esteve sorry, I haven't had time to try and wrap my head around it. I'll get back to you asap.

@dejanpan
Copy link

dejanpan commented Mar 4, 2019

@esteve did you want to open a PR that @wjwwood or someone else can look at or how did we want to proceed with this one?

@esteve
Copy link
Member

esteve commented Mar 4, 2019

@dejanpan I wanted to run it by @wjwwood first to make sure I understood the problem correctly, but I'll create a pull request with what I have just to have something tangible that we discuss over.

@wjwwood
Copy link
Member

wjwwood commented Mar 5, 2019

I don't fully follow your second comment about needing to call it twice, but from the bullet list proposal, I'd wonder if it actually solves the issue, because if a waitset is waiting and has not work, but another waitset is waiting on it to finish before actually waiting despite the fact that it has data available, then you'd starve a ready waitset with a blocking waitset. Which effectively doesn't let you wait from two threads at the same time, if I understand it correctly.

Perhaps I don't have a grasp on the solution you've proposed yet. I'm happy to keep discussing it, but a pr, even a partial one, might be a more concrete thing to discuss.

Sorry for taking so long to look at it.

esteve added a commit to esteve/rclpy that referenced this issue Mar 22, 2019
esteve added a commit to esteve/rclpy that referenced this issue Mar 22, 2019
@esteve
Copy link
Member

esteve commented Mar 22, 2019

@wjwwood I pushed master...esteve:thread_safe_rclpy_wait but I'm not sure if that's the correct approach. @pbaughman do you have a test or a script that fails? I'd like to check if my branch fixed the bug. Thanks.

@pbaughman
Copy link

pbaughman commented Mar 22, 2019

@esteve - I do not - I'm actually not exactly sure what subtle failures were seen before the big scary runtime error RuntimeError: Executor.wait_for_ready_callbacks() called concurrently in multiple threads' was added.

Maybe something like this:

import threading                                                  
import time                                                       
                                                                  
import rclpy                                                      
import rclpy.executors                                            
import std_msgs.msg                                               
                                                                  
                                                                  
class ThreadSpinner(threading.Thread):                            
                                                                  
    def __init__(self, node_name):                                
        super().__init__()                                        
        self._node = rclpy.create_node(node_name)                 
        self._sub = self._node.create_subscription(               
            std_msgs.msg.String,                                  
            "/chatter",                                           
            self._cb                                              
        )                                                         
                                                                  
    def run(self):                                                
        executor = rclpy.executors.SingleThreadedExecutor()       
        try:                                                      
            executor.add_node(self._node)                         
            while rclpy.ok():                                     
                executor.spin_once(1.1)                           
        finally:                                                  
            executor.remove_node(self._node)                      
                                                                  
    def _cb(self, msg):                                           
        print("{} got {}".format(self._node.get_name(), msg.data))
                                                                  
                                                                  
rclpy.init()                                                      
                                                                  
spinner1 = ThreadSpinner("Node1")                                 
spinner2 = ThreadSpinner("Node2")                                 
                                                                  
spinner1.start()                                                  
spinner2.start()                                                  
                                                                  
try:                                                              
    while True:                                                   
        time.sleep(100)                                           
except KeyboardInterrupt:                                         
    rclpy.shutdown()                                              
terminal 1> ros2 run demo_nodes_py talker
terminal 2> python3 script.py
# Expected output:
Node1 got Hello World: 10
Node2 got Hello World: 10
Node1 got Hello World: 11
Node2 got Hello World: 11
Node1 got Hello World: 12
Node2 got Hello World: 12
# etc. . .

Would probably want to run it for a while to make sure both nodes continue to receive data

@wjwwood
Copy link
Member

wjwwood commented Mar 22, 2019

I'm actually not exactly sure what subtle failures were seen before the big scary runtime error

It is to do with the management of the signal handlers from within the C extension, as the description of the issue says:

Before rcl_wait is called, an existing signal handler is stored so that it can be called from within rclpy's signal handler. The existing signal handler is restored after waking up.

This is not currently robust to rclpy_wait being called from multiple threads; each thread ought to store its own signal handler to restore.

So, running it for a long time is not the issue, I think instead you'd need to run it, sigint it, run it again, in a loop until you see (or do not see) some failure or deadlock.

@sloretz sloretz self-assigned this Apr 5, 2019
@sloretz sloretz added the in progress Actively being worked on (Kanban column) label Apr 5, 2019
@sloretz sloretz added in review Waiting for review (Kanban column) and removed in progress Actively being worked on (Kanban column) labels Apr 18, 2019
@sloretz sloretz removed the in review Waiting for review (Kanban column) label Apr 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants