Lifecycle primary state error transitions #283

thebyohazard · 2020-05-21T19:46:53Z

The purpose of this PR is clarify where error transitions from primary states should be in the node lifecycle design. The original conversation is over in rcl_interfaces PR#97. In that conversation, I suggested having an additional transition from Inactive to ErrorProcessing, having in mind a node that connects to external hardware. If there is an error in the external hardware when the node is inactive, then there is error recovery code that needs to be executed, and a transition to ErrorProcessing is the natural choice for that. I have edited the diagrams and the article to show and describe such a transition.

Also in the other PR, @tfoote suggested we discuss here whether or not there should also be a transition from Unconfigured to ErrorProcessing:

Originally there wasn't expected to be anything happening in the inactive state or configured states, but there may be other threads or activity that could cause transitions (such as hardware errors) that would change the status. So to that end adding a transition from Unconfigured to ErrorProcessing would also make sense since something could go wrong in that state too, potentially recoverable with some error recovery.

The reason I didn't include this at first in the other PR was that I have been designing my lifecycle nodes to where nothing happens in the node until configuration and no connections to the hardware are made until configuration, so it was not possible to have an error occur in the unconfigured state. Also, the design says regarding the Unconfigured state,

In this state there is expected to be no stored state

So my contrived example of a way such a transition could maybe be undesirable: if the onError transition expects some object to be filled out, it may not be filled out if the node isn't configured yet. But to that I'd say that an onError transition should probably check that anyway. Besides, it's the code in the lifecycle node that would raise the error in the first place, so if you don't need to raise an error, don't raise it.

On the argument for the transition, the design currently says regarding the ErrorProcessing state:

It is possible to enter this state from any state where user code will be executed.

And I think there definitely could be some kind of user code being executed in the Unconfigured state if a node has some combination of lifecycle components and also non-lifecycle components. The non-lifecycle components could always be running even when the node is unconfigured and then cause the errors.

So I guess my vote is to add the Unconfigured to ErrorProcessing error transition, but I honestly haven't delved too deeply into lifecycle to know what else it might break.

…ror processing Signed-off-by: thebyohazard <patrick@jlpengineering.com>

Signed-off-by: thebyohazard <patrick@jlpengineering.com>

thebyohazard · 2020-05-21T19:47:10Z

One other thing I noticed: the original diagram doesn't show transitions for onDeactivate[FAILURE], onCleanup[FAILURE], onShutdown[FAILURE], or onError[Error Raised], which are in the implementation. Shall I add them to this PR, or should that be a separate branch/PR for documentation purposes?

fujitatomoya · 2020-05-22T13:00:51Z

@thebyohazard

Shall I add them to this PR, or should that be a separate branch/PR for documentation purposes?

I believe those should be added along with implementation.

… transition and failure transitions currently in the implementation * update article to explicitly mention the added transitions Signed-off-by: thebyohazard <patrick@jlpengineering.com>

articles/node_lifecycle.md

Signed-off-by: thebyohazard <patrick@jlpengineering.com>

sloretz · 2020-06-25T16:46:32Z

@Karsten1987 @wjwwood friendly ping :)

thebyohazard · 2020-07-02T22:13:26Z

@fujitatomoya I agree with you that a shutdown failure should go back to the previous state to retry. You could be relying on the shutdown code to execute and that can't be guaranteed right now. However, this and the related PRs are enhancements, but making that change to the implementation could potentially be a breaking change for existing systems. If we were to fix it, I definitely think a separate PR in rcl_lifecycle should be opened other than the one that adds the error transitions. I also don't like the idea of the design not matching the implementation, so I would vote to leave the design matching the current implementation and making another set of PRs to work on fixing shutdown failure to have the expected behavior.

ivanpauno · 2020-08-27T14:03:36Z

@Karsten1987 @wjwwood friendly ping

fujitatomoya · 2020-12-07T07:31:19Z

related PRs based on this design,

Primary state error transitions rcl_interfaces#97, transition ids assignment.
Primary state error transitions rcl#618, register transition map.
Primary state error transitions rclcpp#1064, update LifecycleNode class with raise_error.
Primary state error transitions demos#436, demo sample.

I think that some follow-up will be required (rebase, requested changes, and test code is missing), if @thebyohazard is not responding and nobody working on this, I'm willing to do this.

UomoJallo · 2022-02-01T14:41:39Z

@fujitatomoya @Karsten1987 @wjwwood any update about this PR? Or what do you suggest in case of a detected error while in ACTIVE state?

Should the node try first to deactivate itself and then cleanup to reach the UNCONFIGURED state? Even if it should be better to avoid self transitions...

tfoote

Adding these transitions make sense to me.

fujitatomoya · 2022-02-09T16:14:18Z

could be related to ros2/rclcpp#1880 (just leaving the reference)

ros-discourse · 2023-07-11T15:14:28Z

This pull request has been mentioned on ROS Discourse. There might be relevant details there:

https://discourse.ros.org/t/deferrable-canceleable-lifecycle-transitions/32318/1

cboostjvisser · 2024-07-15T07:35:14Z

This feature seems almost finished. What can we do to get this merged?
There are quite some PR's referencing from/to this PR. Anyone know what the order of merging should be?
@thebyohazard @fujitatomoya @tfoote @Karsten1987 @gbiggs @wjwwood

thebyohazard added 2 commits May 21, 2020 11:52

edit lifecycle state machine to show a transition from inactive to er…

a034d33

…ror processing Signed-off-by: thebyohazard <patrick@jlpengineering.com>

Edit article to describe errors in the Inactive state.

af4f18a

Signed-off-by: thebyohazard <patrick@jlpengineering.com>

* update lifecycle asta and png to show unconfigured->errorProcessing…

f6b4467

… transition and failure transitions currently in the implementation * update article to explicitly mention the added transitions Signed-off-by: thebyohazard <patrick@jlpengineering.com>

fujitatomoya reviewed May 28, 2020

View reviewed changes

articles/node_lifecycle.md Outdated Show resolved Hide resolved

articles/node_lifecycle.md Show resolved Hide resolved

edit article: error can be triggered in Unconfigured State

8db7505

Signed-off-by: thebyohazard <patrick@jlpengineering.com>

sloretz assigned wjwwood Jun 11, 2020

sloretz requested a review from Karsten1987 June 11, 2020 23:47

fujitatomoya approved these changes Jul 3, 2020

View reviewed changes

fujitatomoya mentioned this pull request Oct 4, 2021

Primary state error transitions ros2/rcl_interfaces#97

Open

tfoote approved these changes Feb 4, 2022

View reviewed changes

bpwilcox mentioned this pull request Feb 9, 2022

Reconsider lifecycle state transition when returning FAILURE for on_shutdown transition. ros2/rclcpp#1763

Open

fujitatomoya mentioned this pull request May 25, 2022

Lifecycle node trigger_transition() function caused transition publish failed ros2/rclcpp#1941

Closed

This was referenced Jun 15, 2023

Deferrable + Cancelable lifecycle change_state transition functions ros2/rclcpp#2213

Open

Deferrable + Cancelable lifecycle change_state transition function implementation ros2/rclcpp#2214

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lifecycle primary state error transitions #283

Lifecycle primary state error transitions #283

thebyohazard commented May 21, 2020

thebyohazard commented May 21, 2020

fujitatomoya commented May 22, 2020

sloretz commented Jun 25, 2020

thebyohazard commented Jul 2, 2020

ivanpauno commented Aug 27, 2020

fujitatomoya commented Dec 7, 2020

UomoJallo commented Feb 1, 2022

tfoote left a comment

fujitatomoya commented Feb 9, 2022

ros-discourse commented Jul 11, 2023

cboostjvisser commented Jul 15, 2024

Lifecycle primary state error transitions #283

Are you sure you want to change the base?

Lifecycle primary state error transitions #283

Conversation

thebyohazard commented May 21, 2020

thebyohazard commented May 21, 2020

fujitatomoya commented May 22, 2020

sloretz commented Jun 25, 2020

thebyohazard commented Jul 2, 2020

ivanpauno commented Aug 27, 2020

fujitatomoya commented Dec 7, 2020

UomoJallo commented Feb 1, 2022

tfoote left a comment

Choose a reason for hiding this comment

fujitatomoya commented Feb 9, 2022

ros-discourse commented Jul 11, 2023

cboostjvisser commented Jul 15, 2024