Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Task Lifecycle PostStart Hook #8366

Closed
10 tasks done
jazzyfresh opened this issue Jul 6, 2020 · 7 comments · Fixed by #8390
Closed
10 tasks done

Task Lifecycle PostStart Hook #8366

jazzyfresh opened this issue Jul 6, 2020 · 7 comments · Fixed by #8390
Assignees
Milestone

Comments

@jazzyfresh
Copy link
Contributor

jazzyfresh commented Jul 6, 2020

feature for the task lifecycle stanza
relevant to PostStop Hook #8193

Behavior

  • PostStart hooks start after the main tasks have started (see question 1)
  • Ephemeral PostStart hooks run until the task completes successfully
  • Sidecar PostStart hooks run until the end of the allocation

Implementation

  • add another task map to the client/allocrunner/task_hook_coordinator.go to track main tasks that haven't started
  • add another context to the TaskHookCoordinator to signal PostStart hook tasks to start
  • add another switch case to close the context for PostStart hook tasks to start
  • add constants & cases to structs.go & api/tasks.go for TaskLifecyclePoststartHook
  • poststart tasks start after main tasks have started
  • account for poststart hooks in scheduler for allocation resource addition (structs.go in the AllocatedResources methods)
    • probably dont have to do anything, treat like a main task
  • unit tests
    • allocrunner tests that confirm poststart tasks start after main tasks start
    • allocrunner tests that confirm poststart sidecar tasks are killed when the allocation is killed
  • e2e tests

Questions

  1. what does it mean for main tasks to be "started": after the tasks have been sent to be run by the task driver or after they have been confirmed to be running by the task driver?
  • defer to the task driver, if the task driver sets a task's StartedAt to something non-zero
  • potential long-term implementation for involving consul services
@jazzyfresh
Copy link
Contributor Author

jazzyfresh commented Jul 8, 2020

Discussion w/ @schmichael

Question 1. what does it mean for main tasks to be "started": after the tasks have been sent to be run by the task driver or after they have been confirmed to be running by the task driver?

  • defer to the task driver, if the task driver sets a task's StartedAt to something non-zero
  • ^this was the historical discussion we had long long ago before we sent the RFC out, which included a discussion of needing a StartTimeout to give services time to spin up (TODO: @jazzyfresh follow up with @notnoop is this implemented already?)
  • potential long-term implementation for involving consul services (this was proposed ages ago, but it was not tracked in a ticket or project board, TODO: jasmine)
  • TODO-RFC: lifecycle transition configuration (allows users to configure how tasks transition from one phase to another)

@tgross tgross modified the milestones: 0.12.1, 0.12.2 Jul 23, 2020
@jazzyfresh
Copy link
Contributor Author

jazzyfresh commented Aug 11, 2020

TODO

  • e2e tests (copy from poststop branch)
    • e2e service job test fails bc the ***-ran files are not present? this does not happen with the batch job test
  • better unit tests (more tests? allocrunner??)
    • something that catches the bug we identified today (poststart never started bc of small optimization in taskStateUpdate
  • manual testing
    • service jobs
    • batch jobs
    • sidecar works too
  • bug fixing

@tgross tgross modified the milestones: 0.12.2, 0.12.4 Aug 17, 2020
@jazzyfresh
Copy link
Contributor Author

Added e2e tests, this is about ready for review

@jazzyfresh
Copy link
Contributor Author

I think there's a bug in the e2e service job test tho, i'll be looking into that today

@jazzyfresh
Copy link
Contributor Author

jazzyfresh commented Aug 20, 2020

e2e test is almost there! Yesterday debugged a race condition where we were killing the allocation before the prestart task has a chance to finish (which blocks progress of all other tasks). Fix was to block the allocation kill signal until poststart has started.

There is still a tiny bit more to debug, why isn't the main task cleaning up its main-running file? The sidecar task has the same trap functionality, but it cleans up its sidecar-running file.

Screenshot from 2020-08-20 08-11-34
Screenshot from 2020-08-20 08-12-05

@jazzyfresh
Copy link
Contributor Author

Next thing to look into is stop functionality in some mysterious place in alloc runner/task runner

@github-actions
Copy link

github-actions bot commented Nov 2, 2022

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 2, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants