Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scheduler: fix panic when preempting and evicting allocs #6792

Merged
merged 4 commits into from
Dec 3, 2019

Conversation

schmichael
Copy link
Member

@schmichael schmichael commented Dec 3, 2019

Sorry about the noisy review. I search/replaced all the noErr helpers to require.NoError. Commit
6112ad9 contains just the fix.

Fixes #6787

In ProposedAllocs the proposed alloc slice was being copied while its
contents were not. Since RemoveAllocs nils elements of the proposed
alloc slice and is called twice, it could panic on the second call when
erroneously accessing a nil'd alloc.

The fix is to not copy the proposed alloc slice and pass the slice
returned by the 1st RemoveAllocs call to the 2nd call, thus maintaining
the trimmed length.

Fixes #6787

In ProposedAllocs the proposed alloc slice was being copied while its
contents were not. Since RemoveAllocs nils elements of the proposed
alloc slice and is called twice, it could panic on the second call when
erroneously accessing a nil'd alloc.

The fix is to not copy the proposed alloc slice and pass the slice
returned by the 1st RemoveAllocs call to the 2nd call, thus maintaining
the trimmed length.
@schmichael schmichael added this to the 0.10.2 milestone Dec 3, 2019
Copy link
Contributor

@langmartin langmartin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Member

@preetapan preetapan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@notnoop
Copy link
Contributor

notnoop commented Dec 3, 2019

Code LGTM.

This seems like a serious regression. It would be useful to elaborate in the issue (and upgrade guides) explanation the triggering conditions and affected as well as workarounds (e.g. disabling system preemption for folks that don't need it?).

@schmichael
Copy link
Member Author

This seems like a serious regression. It would be useful to elaborate in the issue (and upgrade guides) explanation the triggering conditions and affected as well as workarounds (e.g. disabling system preemption for folks that don't need it?).

The bug has existed since 0.9.0-beta1, so it's somewhat rare to hit thankfully.

I pushed a docs update to document it. It's a bit of a funky place to mention it, but I can't think of anywhere else more appropriate.

Rendered version:

image

@schmichael schmichael merged commit b2dd21a into master Dec 3, 2019
@schmichael schmichael deleted the b-propose-panic branch December 3, 2019 18:40
schmichael added a commit that referenced this pull request Dec 3, 2019
scheduler: fix panic when preempting and evicting allocs
schmichael added a commit that referenced this pull request Dec 4, 2019
schmichael added a commit that referenced this pull request Dec 4, 2019
schmichael added a commit that referenced this pull request Dec 4, 2019
docs: add #6792 backport to 0.9.7 changelog
@github-actions
Copy link

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jan 24, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Nomad server crash after raft quorum on 0.9.3, 0.10.1
4 participants