Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backport of CSI: persist previous mounts on client to restore during restart into release/1.3.x #17878

Conversation

hc-github-team-nomad-core
Copy link
Contributor

Backport

This PR is auto-generated from #17840 to be assessed for backporting due to the inclusion of the label backport/1.3.x.

WARNING automatic cherry-pick of commits failed. Commits will require human attention.

The below text is copied from the body of the original PR.


When claiming a CSI volume, we need to ensure the CSI node plugin is running before we send any CSI RPCs. This extends even to the controller publish RPC because it requires the storage provider's "external node ID" for the client. This primarily impacts client restarts because the client's initial fingerprint won't include restored plugins.

Unfortunately there's no mapping of volume to plugin ID available in the jobspec, so we don't have enough information to wait on plugins until we either get the volume from the server or retrieve the plugin ID from data we've persisted on the client.

If we always require getting the volume from the server before making the claim, a client restart for disconnected clients will cause all the allocations that need CSI volumes to fail. Even while connected, checking in with the server to verify the volume's plugin before trying to make a claim RPC is inherently racy, so we'll leave that case as-is and it will fail the claim if the node plugin needed to support a newly-placed allocation is flapping such that the node fingerprint is changing.

So instead of blocking for information from the server, this changeset persists a minimum subset of data about the volume and its plugin in the client state DB, and retrieves that data during the CSI hook's prerun to avoid re-claiming and remounting the volume unnecessarily. This fixes the client restart bug but also reduces Nomad RPC and CSI RPC traffic during client restarts.

This changeset also updates the RPC handler to use the external node ID from the claim whenever it is available.

Fixes: #13028


Note for reviewers: this looks like a bit of a monster of a PR because of all the plumbing required for adding a new client state DB object. The bulk of the non-plumbing changes are in client/allocrunner/csi_hook.go, which GitHub is helpfully folding. 😀 An unfortunate amount of the churn in that file is having to break up the claimVolume function a bit, but I think the top-level flow in Prerun is a lot more legible as a result. In addition to expanding the test coverage in the unit test, I've also end-to-end tested this with the AWS EBS plugin and restarting the client in various configurations. (Also note this will not do anything for #15415)

@hc-github-team-nomad-core hc-github-team-nomad-core force-pushed the backport/csi-agent-restart-13028/accurately-optimum-monkfish branch 2 times, most recently from 25a59e9 to de5c624 Compare July 10, 2023 17:20
@hashicorp-cla
Copy link

hashicorp-cla commented Jul 10, 2023

CLA assistant check
All committers have signed the CLA.

)

When claiming a CSI volume, we need to ensure the CSI node plugin is running
before we send any CSI RPCs. This extends even to the controller publish RPC
because it requires the storage provider's "external node ID" for the
client. This primarily impacts client restarts but also is a problem if the node
plugin exits (and fingerprints) while the allocation that needs a CSI volume
claim is being placed.

Unfortunately there's no mapping of volume to plugin ID available in the
jobspec, so we don't have enough information to wait on plugins until we either
get the volume from the server or retrieve the plugin ID from data we've
persisted on the client.

If we always require getting the volume from the server before making the claim,
a client restart for disconnected clients will cause all the allocations that
need CSI volumes to fail. Even while connected, checking in with the server to
verify the volume's plugin before trying to make a claim RPC is inherently racy,
so we'll leave that case as-is and it will fail the claim if the node plugin
needed to support a newly-placed allocation is flapping such that the node
fingerprint is changing.

This changeset persists a minimum subset of data about the volume and its plugin
in the client state DB, and retrieves that data during the CSI hook's prerun to
avoid re-claiming and remounting the volume unnecessarily.

This changeset also updates the RPC handler to use the external node ID from the
claim whenever it is available.

Fixes: #13028
@tgross tgross force-pushed the backport/csi-agent-restart-13028/accurately-optimum-monkfish branch from f8165d8 to 33ce2c6 Compare July 10, 2023 17:30
@tgross tgross merged commit f9fff7f into release/1.3.x Jul 10, 2023
19 of 20 checks passed
@tgross tgross deleted the backport/csi-agent-restart-13028/accurately-optimum-monkfish branch July 10, 2023 18:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants