Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nomad job stop sends interrupt signal for docker driver? #8932

Closed
jf opened this issue Sep 19, 2020 · 7 comments · Fixed by #9009
Closed

nomad job stop sends interrupt signal for docker driver? #8932

jf opened this issue Sep 19, 2020 · 7 comments · Fixed by #9009

Comments

@jf
Copy link
Contributor

jf commented Sep 19, 2020

Nomad version

Nomad v0.13.0-dev (fb170f37a05d712e3046d604c362804c7934cfc9+CHANGES)

Operating system and Environment details

Ubuntu 20.04.1 LTS

Issue

Reproduction steps

Not too sure why nobody has noticed, but it seems like nomad job stop sends a docker task the interrupt signal, as opposed to the expected TERM as per https://www.nomadproject.io/docs/job-specification/task#kill_signal

To confirm this, I basically added debug logging to

func (h *taskHandle) Kill(killTimeout time.Duration, signal os.Signal) error {

My debug output for the signal parameter gives interrupt.

My first clue was from my alloc logs, where I see 'Sent interrupt'. The line from nomad job stop is called out below

Time                  Type        Description
2020-09-19T10:50:04Z  Killing     Sent interrupt. Waiting 30s before force killing
2020-09-19T10:50:04Z  Killed      Task successfully killed
2020-09-19T10:50:04Z  Terminated  Exit Code: 1, Exit Message: "Docker container exited with non-zero exit code: 1"
2020-09-19T10:50:02Z  Killing     Sent interrupt. Waiting 30s before force killing                 --> did `nomad job stop` here
2020-09-19T10:45:51Z  Started     Task started by client
2020-09-19T10:45:37Z  Driver      Downloading image
2020-09-19T10:45:37Z  Task Setup  Building Task Directory
2020-09-19T10:45:34Z  Received    Task received by client

Job file (if appropriate)

any standard job file will do; you can use the example file from nomad job init too

@jf jf changed the title nomad job stop sends signal interrupt for docker driver? nomad job stop sends interrupt signal for docker driver? Sep 19, 2020
@shoenig
Copy link
Member

shoenig commented Sep 21, 2020

Thanks for reporting, and the PR, @jf!

Let me see if I can reproduce this going back in Nomad versions - pretty sure the Docker driver was intended to use SIGTERM to behave more like the real docker stop command, but now I'm wondering if that was actually true.

@jf
Copy link
Contributor Author

jf commented Sep 21, 2020

Sure. I just had the impression (as I believe most folks would as well) from the docs (https://www.nomadproject.io/docs/job-specification/task#kill_signal) that my tasks would be getting TERM instead of KILL

@shoenig shoenig self-assigned this Sep 22, 2020
@shoenig
Copy link
Member

shoenig commented Sep 29, 2020

I was able to confirm this hasn't worked as intended since at least Nomad v0.8.x.

Given that fixing this is a change in behavior I think we should add a section in the version upgrade guide + call it out as a backwards incompatibility in the changelog. Also add some test(s). I'll go ahead and merge #8933 and then take care of the rest.

@shoenig shoenig added this to the 0.13 milestone Sep 29, 2020
@jf
Copy link
Contributor Author

jf commented Sep 29, 2020

Great, thank you @shoenig ! Would you mind tagging me when you do the other stuff? Especially for the tests, so I can learn and then hopefully take care of that for any future PRs. Thank you!

@DWSR
Copy link

DWSR commented Sep 30, 2020

Does this affect stopping allocations individually as well?

@jf
Copy link
Contributor Author

jf commented Oct 2, 2020

Does this affect stopping allocations individually as well?

I believe so. I dont see why stopping allocations would end up using another code path. Can you try it out and see if you encounter this as well?

shoenig added a commit that referenced this issue Oct 2, 2020
This PR adds a version specific upgrade note about the docker stop
signal behavior. Also adds test for the signal logic in docker driver.

Closes #8932 which was fixed in #8933
shoenig added a commit that referenced this issue Oct 2, 2020
This PR adds a version specific upgrade note about the docker stop
signal behavior. Also adds test for the signal logic in docker driver.

Closes #8932 which was fixed in #8933
roaks3 pushed a commit that referenced this issue Oct 7, 2020
This PR adds a version specific upgrade note about the docker stop
signal behavior. Also adds test for the signal logic in docker driver.

Closes #8932 which was fixed in #8933
fredrikhgrelland pushed a commit to fredrikhgrelland/nomad that referenced this issue Oct 22, 2020
This PR adds a version specific upgrade note about the docker stop
signal behavior. Also adds test for the signal logic in docker driver.

Closes hashicorp#8932 which was fixed in hashicorp#8933
jrasell added a commit that referenced this issue Dec 2, 2020
lifecycle: update e2e test for service job with new docker signal #8932
@github-actions
Copy link

github-actions bot commented Nov 1, 2022

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 1, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants