Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes if:cancelled() composite steps not running and normal composite steps not interrupting when the job is cancelled. #2638

Merged
merged 2 commits into from
Jul 10, 2023

Conversation

fhammerl
Copy link
Contributor

@fhammerl fhammerl commented Jun 1, 2023

This fix gives composite steps their own cancellation tokens instead of sharing one from their parent step (usually the workflow step that uses: the composite action).
This separation of tokens exposed a bug where action_status was not set to cancelled when the job is cancelled, which would allow an if: success() composite step to not be interrupted in the event of cancellation. Fixed it by setting action_status to cancelled in CompositeActionHandler when the job is cancelled.

Bug description: an if:cancelled() composite action step is not always executed when the job is cancelled while the composite action is running.

Such a step should run given:

  • The composite action is entered (the parent step that uses:' it starts executing)
  • The step itself is cancelled()
  • The job is cancelled

Workflow steps have their own cancellation tokens. When a job is cancelled, the currently running step's cancellation token is triggered, unless the step is always or cancelled. The token of a 'success' step is cancelled. This token is used to interrupt any process invoked by that step, e.g. 'sleep 600'.

Other steps, with their 'untriggered' cancellation tokens in the workflow are then evaluated for their if: conditions and run accordingly.

Composite action steps all share their parent steps cancellation token (in order to inherit a stepTimeout set on their parent step - this was introduced when composite actions could not have conditional steps). Just like in workflow steps: When a job is cancelled, the currently running step's cancellation token is triggered, unless the step is always or cancelled. The token of a 'success' step is cancelled. This token is used to interrupt any process invoked by that step, e.g. 'sleep 600'.

Other steps, with their 'untriggered' cancellation tokens in the workflow are then evaluated for their if: conditions and try to run accordingly, but because they share a cancellation token, this token interrupts any process started by them, leading to none of these processes finishing, even if they are cancelled or always.
This bug does not show if the interrupted composite step is always or cancelled, because then the shared token is not cancelled.

Bug repro

...   
    steps:
      - uses: fhammerl/nested-composite/actions/composite-steps@main

fhammerl/nested-composite/actions/composite-steps@main

name: My Composite Action
description: A composite action with four steps

runs:
  using: "composite"
  steps:
    - name: Sleep 
      run: sleep 121 # Cancelling the job will interrupt this composite step
      shell: bash

    - run: echo foo
      if: cancelled() # Doesn't run due to the bug
      shell: bash

@fhammerl fhammerl requested a review from a team as a code owner June 1, 2023 19:56
@@ -406,7 +415,7 @@ public void RegisterPostJobStep(IStep step)

/// <summary>
/// An embedded execution context shares the same record ID, record name, logger,
/// and a linked cancellation token.
Copy link
Contributor Author

@fhammerl fhammerl Jun 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This token used to be linked so that stepTimeout would propagate down
We don't need to share a token if we set a new stepTimeout instead, calculated as parent step's timeout - timeElapsed

@@ -593,7 +602,31 @@ public void SetTimeout(TimeSpan? timeout)
if (timeout != null)
{
_cancellationTokenSource.CancelAfter(timeout.Value);
m_timeoutSetAt = DateTime.UtcNow;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

m_timeoutSetAt - perhaps m_timeoutStartedAt?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just m_startedAt? so that elapsed = DateTime.UtcNow - m_startedAt.Value

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather keep it m_timeoutStartedAt to clearly imply it's for timeout related stuff, unlike _record.StartedAt

@@ -420,6 +422,8 @@ private async Task RunStepAsync(IStep step)
{
Trace.Info($"Starting: {step.DisplayName}");
step.ExecutionContext.Debug($"Starting: {step.DisplayName}");
// composite steps inherit the timeout from the parent, set by https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#jobsjob_idstepstimeout-minutes
step.ExecutionContext.SetTimeout(step.ExecutionContext.Parent.GetRemainingTimeout());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inherit the parent's remaining timeout

@fhammerl fhammerl force-pushed the fhammerl/fix-composite-cancelled-steps-not-running branch from 8f70490 to c35f112 Compare July 7, 2023 14:31
Also make composite step inherit timeout from parent
@fhammerl fhammerl force-pushed the fhammerl/fix-composite-cancelled-steps-not-running branch from c35f112 to e930403 Compare July 10, 2023 10:02
Copy link
Contributor

@nikola-jokic nikola-jokic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@rentziass rentziass left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@fhammerl fhammerl changed the title [WIP] Fixes if:cancelled() composite steps not running and normal composite steps not interrupting when the job is cancelled. Fixes if:cancelled() composite steps not running and normal composite steps not interrupting when the job is cancelled. Jul 10, 2023
@fhammerl fhammerl merged commit f25c9df into main Jul 10, 2023
10 checks passed
@fhammerl fhammerl deleted the fhammerl/fix-composite-cancelled-steps-not-running branch July 10, 2023 11:32
ashb pushed a commit to ashb/runner that referenced this pull request Jul 11, 2023
…te steps not interrupting when the job is cancelled. (actions#2638)

* Set composite step's action_status when job is cancelled

Also make composite step inherit timeout from parent

* Fix eof line
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants