-
Notifications
You must be signed in to change notification settings - Fork 163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure no redundant rcl_logging initialization and finalization (alternative) #573
Conversation
Signed-off-by: Ivan Santiago Paunovic <ivanpauno@ekumenlabs.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, the atomic will make rcl_init()
thread-safe with itself (same for rcl_shutdown()
), but it does not make them thread-safe with each other, which is what I was speaking about before. And I think that's at least as important as (but probably more important) self thread-safety for the functions.
Furthermore, if the calling code has to ensure rcl_init()
and rcl_shutdown()
are not called concurrently, then ensuring they are not called concurrently with themselves is likely trivial, and so it doesn't offer us much to use the atomic counter (though arguably it doesn't cost much either) rather than just a normal counting variable.
Also, the docstring's for rcl_init()
and rcl_shutdown()
were never updated to say they were thread-safe.
rcutils_get_error_string().str); | ||
goto fail; | ||
if (0u == rcutils_atomic_fetch_add_uint64_t(&g_logging_ref_count, 1)) { | ||
ret = rcl_logging_configure(&context->global_arguments, &allocator); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
⚡️ context switch here, then:
- in thread 2, start
rcl_shutdown()
- my theory is that
rcl_shutdown()
will fail because thoughg_logging_ref_count
is > 0, trying to dorcl_logging_fini()
will fail as it has not yet been configured
- my theory is that
My conclusion is that rcl_init()
is not thread-safe with rcl_shutdown()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mmm, I see what you meant now.
So, I always considered that the upper layer will ensure that each context
calls rcl_init
and rcl_shutdown
in a thread safe manner (rcl_init
and rcl_shutdown
are still no thread-safe).
That's the case now in rclcpp (it's not the case in rclpy, but that's definitely wrong).
Under that situation, using an atomic count adds protection between rcl_init
and rcl_shutdown
calls on different objects, and avoids the need of a "global" mutex.[*]
My main reason for pushing this alternative, it's that it solves the problem in a more "minimalistic" way (only one PR, without adding functions like rcl_logging_increase_ref_count
that we might want to deprecate later). It's easily to backport too.
[*] If calls in each context between rcl_init
and rcl_shutdown
are protected by an upper layer, the following situations are possible:
- Only one context: It's thread safe because the upper layer ensures that.
- Two contexts, init1-init2-shutdown1-shutdown2: The atomic ensures thread safety between init1 and init 2. The upper layer ensures thread safety between init1 and shutdown1.
- Two contexts, init1-shutdown2-shutdown1: Calling shutdown in a non-inited context is handled correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But what about this case:
init1-shutdown1-init2-...
Where shutdown1 is interrupted after adjusting the atomic count, but before it does log fini, and context switches to init2 which then increments it, finding that according to the count logging was not initialized, and tries to initialize logging when it already has been. Either the "initialize while already initialized" case will fail, or if that silently passes, when shutdown1 continues it will shutdown logging and leave init2 initialized but without logging.
I remain unconvinced that this is thread-safe even when using separate context objects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh 🤦♂️, I completed miss that case. Thanks William!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's ok, this stuff is tricky. I've only learned to be very suspicious of creating thread-safety with just atomics after messing it up many, many times. :)
Maybe it's possible to salvage this approach with enough iteration, but honestly I think a global mutex provided by rclpy/rclcpp is the safest thing to do to protect the init/shutdown and therefore logging init.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it's possible to salvage this approach with enough iteration, but honestly I think a global mutex provided by rclpy/rclcpp is the safest thing to do to protect the init/shutdown and therefore logging init.
Yes, I agree.
@fujitatomoya do you want to iterate and open PRs in rclpy and rclcpp? If not, I can take it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can go ahead to take them! thanks for your effort!
Signed-off-by: Ivan Santiago Paunovic <ivanpauno@ekumenlabs.com>
Alternative to #560.