Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

drainer: defer CSI plugins until last #12324

Merged
merged 1 commit into from
Mar 22, 2022
Merged

drainer: defer CSI plugins until last #12324

merged 1 commit into from
Mar 22, 2022

Conversation

tgross
Copy link
Member

@tgross tgross commented Mar 18, 2022

Fixes #11614 (when combined with #11892)

When a node is drained, system jobs are left until last so that
operators can rely on things like log shippers running even as their
applications are getting drained off. Include CSI plugins in this set
so that Controller plugins deployed as services can be handled as
gracefully as Node plugins that are running as system jobs.


The changeset here is fairly trivial and I've tested it by hand; I don't know that a unit test is going to tell us anything interesting here. But I'm going to follow up this PR next week with an E2E test for CSI node drain behavior more generally.

When a node is drained, system jobs are left until last so that
operators can rely on things like log shippers running even as their
applications are getting drained off. Include CSI plugins in this set
so that Controller plugins deployed as services can be handled as
gracefully as Node plugins that are running as system jobs.
last. You should always use this flag when draining a node running
[CSI node plugins][internals-csi].
stopping system job allocations. By default system jobs (and CSI
plugins) are stopped last, after the `deadline` time has expired.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to reviewers: this "after the deadline time has expired" bit is existing behavior that didn't seem to be explicitly documented.

last. You should always use this flag when draining a node running
[CSI node plugins][internals-csi].
stopping system job allocations. By default system jobs (and CSI
plugins) are stopped last, after the `deadline` time has expired.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to reviewers: this "after the deadline time has expired" bit is existing behavior that didn't seem to be explicitly documented.

@tgross tgross requested a review from shoenig March 22, 2022 14:08
Copy link
Member

@shoenig shoenig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@tgross tgross merged commit 879e137 into main Mar 22, 2022
@tgross tgross deleted the csi_node_drain branch March 22, 2022 14:26
tgross added a commit that referenced this pull request May 2, 2022
In #12324 we made it so that plugins wait until the node drain is
complete, as we do for system jobs. But we neglected to mark the node
drain as complete once only plugins (or system jobs) remaining, which
means that the node drain is left in a draining state until the
`deadline` time expires. This was incorrectly documented as expected
behavior in #12324.
tgross added a commit that referenced this pull request May 3, 2022
In #12324 we made it so that plugins wait until the node drain is
complete, as we do for system jobs. But we neglected to mark the node
drain as complete once only plugins (or system jobs) remaining, which
means that the node drain is left in a draining state until the
`deadline` time expires. This was incorrectly documented as expected
behavior in #12324.
@github-actions
Copy link

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 24, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Smarter node draining handling when using CSI plugins
2 participants