Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent kill_timeout greater than progress_deadline #16761

Merged
merged 6 commits into from
Apr 4, 2023
Merged

Conversation

Juanadelacuesta
Copy link
Member

@Juanadelacuesta Juanadelacuesta commented Apr 3, 2023

This PR addresses the bug reported on #8487

If a kill_timeout is greater than the job's progress_deadline, allocations may keep running (after initial kill signal) for long enough that the deploy fails. However, the allocations that were scheduled by that job will still be pending after job failure, and will be placed and started once the previous allocations do exit. This PR adds a validation to avoid this situation.

Copy link
Member

@jrasell jrasell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good, thanks @Juanadelacuesta! It looks like the new validation has meant some tests are now breaking, which need to be fixed up. Lets also add a changelog entry to this PR.

@@ -5136,7 +5144,7 @@ func (u *UpdateStrategy) IsEmpty() bool {
return true
}

return u.MaxParallel == 0
return *u == *DefaultUpdateStrategy
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this doing an equality check rather than checking if the strategy is empty?

@@ -5145,6 +5153,38 @@ func (u *UpdateStrategy) Rolling() bool {
return u.Stagger > 0 && u.MaxParallel > 0
}

func (u *UpdateStrategy) Canonicalize() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I worry that having this new functionality will mean users upgrading will see their job specification change between versions. I wonder why we need this, when the canonicalization should be handled by before it reaches the RPC.

I took a look at the use of structs.DefaultUpdateStrategy and it seems this only currently gets used by mocks and tests, which suggests we currently rely on the API default instantiation as mentioned above.

@@ -0,0 +1,3 @@
```release-note:improvement
code: Prevent Kill Timeout greater than Progress Deadline on Update block
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code or core? It'd be nice if the parameters were detailed in their jobspec form, as it makes it easier for users to grok.

Copy link
Member

@jrasell jrasell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of nit pick changes, but LGTM, thanks!

We will need to add backport labels before merging this in.

.changelog/16761.txt Outdated Show resolved Hide resolved
nomad/structs/structs.go Outdated Show resolved Hide resolved
Juanadelacuesta and others added 2 commits April 4, 2023 17:53
Co-authored-by: James Rasell <jrasell@users.noreply.github.com>
Co-authored-by: James Rasell <jrasell@users.noreply.github.com>
@Juanadelacuesta Juanadelacuesta added backport/1.3.x backport to 1.3.x release line backport/1.4.x backport to 1.4.x release line backport/1.5.x backport to 1.5.x release line labels Apr 4, 2023
@Juanadelacuesta Juanadelacuesta merged commit ed80d70 into main Apr 4, 2023
@Juanadelacuesta Juanadelacuesta deleted the b-gh-8487 branch April 4, 2023 16:17
jrasell added a commit that referenced this pull request Apr 5, 2023
* func: add validation for kill timeout smaller than progress dealine

* style: add changelog

* style: typo in changelog

* style: remove refactored test

* Update .changelog/16761.txt

Co-authored-by: James Rasell <jrasell@users.noreply.github.com>

* Update nomad/structs/structs.go

Co-authored-by: James Rasell <jrasell@users.noreply.github.com>

---------

Co-authored-by: James Rasell <jrasell@users.noreply.github.com>
Juanadelacuesta added a commit that referenced this pull request Apr 6, 2023
* func: add validation for kill timeout smaller than progress dealine

* style: add changelog

* style: typo in changelog

* style: remove refactored test

* Update .changelog/16761.txt



* Update nomad/structs/structs.go



---------

Co-authored-by: Juana De La Cuesta <juanita.delacuestamorales@hashicorp.com>
Co-authored-by: James Rasell <jrasell@users.noreply.github.com>
@lgfa29 lgfa29 added backport/1.4.x backport to 1.4.x release line and removed backport/1.4.x backport to 1.4.x release line labels May 16, 2023
lgfa29 pushed a commit that referenced this pull request May 16, 2023
* func: add validation for kill timeout smaller than progress dealine

* style: add changelog

* style: typo in changelog

* style: remove refactored test

* Update .changelog/16761.txt

Co-authored-by: James Rasell <jrasell@users.noreply.github.com>

* Update nomad/structs/structs.go

Co-authored-by: James Rasell <jrasell@users.noreply.github.com>

---------

Co-authored-by: James Rasell <jrasell@users.noreply.github.com>
@lgfa29 lgfa29 added backport/1.5.x backport to 1.5.x release line and removed backport/1.5.x backport to 1.5.x release line labels May 16, 2023
lgfa29 added a commit that referenced this pull request May 16, 2023
…release/1.4.x (#17205)

* test: fix TestJobEndpoint_Scale_BatchJob

* Prevent kill_timeout greater than progress_deadline  (#16761)

* func: add validation for kill timeout smaller than progress dealine

* style: add changelog

* style: typo in changelog

* style: remove refactored test

* Update .changelog/16761.txt

Co-authored-by: James Rasell <jrasell@users.noreply.github.com>

* Update nomad/structs/structs.go

Co-authored-by: James Rasell <jrasell@users.noreply.github.com>

---------

Co-authored-by: James Rasell <jrasell@users.noreply.github.com>

---------

Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>
Co-authored-by: Juana De La Cuesta <juanita.delacuestamorales@hashicorp.com>
Co-authored-by: James Rasell <jrasell@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/1.3.x backport to 1.3.x release line backport/1.4.x backport to 1.4.x release line backport/1.5.x backport to 1.5.x release line
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants