Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

progress_deadline set to 0 fails to validate #17335

Closed
gorset opened this issue May 27, 2023 · 6 comments · Fixed by #17342
Closed

progress_deadline set to 0 fails to validate #17335

gorset opened this issue May 27, 2023 · 6 comments · Fixed by #17342
Assignees
Labels
stage/accepted Confirmed, and intend to work on. No timeline committment though. theme/jobspec type/bug

Comments

@gorset
Copy link

gorset commented May 27, 2023

Nomad 1.5.6 with #16761 introduces a check for kill_timeout, but fails to consider progress_deadline set to 0.

It fails with

Job validation errors:
1 error occurred:
	* Task group some-group validation failed: 1 error occurred:
	* Task some-task has a kill timout (5s) longer than the group's progress deadline (0s)

We don't have a kill timeout set, so 5s is the default.

As per documentation, progress_deadline set to 0 has special meaning:

If the progress_deadline is set to 0, the first allocation to be marked as unhealthy causes the deployment to fail.

We now have jobs that fails to validate/deploy. Is this a bug, or is progress_deadline set to 0 no longer supported?

@lgfa29 lgfa29 added theme/jobspec stage/accepted Confirmed, and intend to work on. No timeline committment though. labels May 29, 2023
@lgfa29 lgfa29 self-assigned this May 29, 2023
@lgfa29 lgfa29 added this to Needs Triage in Nomad - Community Issues Triage via automation May 29, 2023
@lgfa29 lgfa29 moved this from Needs Triage to In Progress in Nomad - Community Issues Triage May 29, 2023
@lgfa29
Copy link
Contributor

lgfa29 commented May 29, 2023

Thanks for the report @gorset.

Unfortunately this is an unexpected regression and I have opened #17342 to fix it. Since this validation is performed at the servers, I can't think of a good workaround for this problem....apologies for the problems it caused.

@martinmcnulty
Copy link

Is there any chance of a release 1.3.16 that includes this fix...? 🤞

@gorset
Copy link
Author

gorset commented Aug 13, 2023

Thank you @lgfa29. Looks good after upgrading to 1.6.1.

@lgfa29
Copy link
Contributor

lgfa29 commented Aug 18, 2023

Is there any chance of a release 1.3.16 that includes this fix...? 🤞

Hi @martinmcnulty 👋

Normally we wouldn't, as we usually only release n-2 versions and, with 1.6.0 released, the 1.3.x line is deprecated. But the team agrees that this is a particularly bad bug to have around so we'll release a 1.3.16 just with this bug fix.

If all goes well it will be available by end of day today.

@lgfa29
Copy link
Contributor

lgfa29 commented Aug 18, 2023

@martinmcnulty v1.3.16 with the fix is now available
https://releases.hashicorp.com/nomad/1.3.16

@martinmcnulty
Copy link

martinmcnulty commented Aug 21, 2023

@lgfa29 Many thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stage/accepted Confirmed, and intend to work on. No timeline committment though. theme/jobspec type/bug
Projects
Development

Successfully merging a pull request may close this issue.

3 participants