Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

client: ensure task only runs with prestart hooks #18662

Merged
merged 2 commits into from
Oct 5, 2023

Conversation

lgfa29
Copy link
Contributor

@lgfa29 lgfa29 commented Oct 4, 2023

Since the allocation in the task runner is updated in a separate goroutine, a race condition may happen where the task is started but the prestart hooks are skipped because the allocation became terminal.

Checking for a terminal allocation before proceeding with the task start ensures the task only runs if the prestart hooks are also executed.

Since shouldShutdown() only uses terminal allocation status, it remains true after the first transition, so it's safe to check it again after the prestart hooks as it will never revert to false.

Some other implementations ideas I considered:

  1. Move the check for shouldShutdown() from within prestart() to before it is called. I think this would be more or less equivalent to this approach.
  2. Create a read lock on the task runner alloc so that shouldShutdown() is guaranteed not to change while prestart() runs and the task starts. This is probably the most "correct" approach, but since shouldShutdown() can only transition from false to true, checking it again after prestart() seems enough.

Closes #18659

Since the allocation in the task runner is updated in a separate
goroutine, a race condition may happen where the task is started but the
prestart hooks are skipped because the allocation became terminal.

Checking for a terminal allocation before proceeding with the task start
ensures the task only runs if the prestart hooks are also executed.

Since `shouldShutdown()` only uses terminal allocation status, it
remains `true` after the first transition, so it's safe to check it
again after the prestart hooks as it will never revert to `false`.
@lgfa29 lgfa29 added backport/1.4.x backport to 1.4.x release line backport/1.5.x backport to 1.5.x release line backport/1.6.x backport to 1.6.x release line labels Oct 4, 2023
Copy link
Member

@schmichael schmichael left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Sad that shouldShutdown() really means "should-shutdown-because-alloc-is-terminal" ... I think it would be safe to stuff the killCtx and shutdownCtx checks inside of it as well, but then we have to consider what that would do in the checks that call shouldShutdown(). So this seems like a nice precise fix.

Copy link
Member

@tgross tgross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch!

@lgfa29
Copy link
Contributor Author

lgfa29 commented Oct 5, 2023

LGTM! Sad that shouldShutdown() really means "should-shutdown-because-alloc-is-terminal" ... I think it would be safe to stuff the killCtx and shutdownCtx checks inside of it as well, but then we have to consider what that would do in the checks that call shouldShutdown(). So this seems like a nice precise fix.

Yeah, I wanted to make this part of the code more "transactional": either the task starts in full or it doesn't start at all. But that would be a much bigger change I think, so I went for the smaller change necessary to fix the problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/1.4.x backport to 1.4.x release line backport/1.5.x backport to 1.5.x release line backport/1.6.x backport to 1.6.x release line
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Task is started despite skipped pre-start hooks
3 participants