Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatically Retry Retriable RPC failures to CSI Plugins #6863

Closed
endocrimes opened this issue Dec 16, 2019 · 3 comments · Fixed by #7549
Closed

Automatically Retry Retriable RPC failures to CSI Plugins #6863

endocrimes opened this issue Dec 16, 2019 · 3 comments · Fixed by #7549

Comments

@endocrimes
Copy link
Contributor

The CSI Specification defines various gRPC Errors and how they may be retried. As part of our work to implement support for CSI, we should audit the CSI Calls that we require and implement a reasonable backoff strategy and automatic retries where possible.

@endocrimes
Copy link
Contributor Author

This has been done for most of the attachment flow in recent PRs using grpc-retry middleware on a per request basis.

@tgross
Copy link
Member

tgross commented Mar 30, 2020

Just for tracking my work to audit the calls we're making, here's the list of interfaces, whether they're "ok" as is, and whether I've "done" (checked that off in the PR I'll open) yet. With some of the calls where we're making gRPC retries already, there's some language in the spec about which things we can retry and whether we need to modify the request first; I'm going to re-audit those as part of this just to make sure we're not assuming some of our early work was 100% without checking.

Edit: also added a field for #7278

Controller client RPCs

RPC used in ok? has cancel? done?
ControllerGetCapabilities plugin fingerprint
ControllerPublishVolume controller endpoint 🚫 #7549
ControllerUnpublishVolume controller endpoint 🚫 #7549
ValidateVolumeCapabilities volume registration 🚫 #7549

Node client RPCs

RPC used in ok? has cancel? done?
NodeGetCapabilities plugin fingerprint
NodeGetInfo plugin fingerprint
NodeStageVolume volume attach
NodeUnstageVolume volume attach
NodePublishVolume volume attach
NodeUnpublishVolume volume attach

Identity client RPCs

RPC used in ok? has cancel? done?
GetPluginInfo plugin fingerprint 🚫 #7549
GetPluginCapabilities plugin fingerprint
Probe plugin health check

tgross added a commit that referenced this issue Mar 30, 2020
The CSI Specification defines various gRPC Errors and how they may be retried. After auditing all our CSI RPC calls in #6863, this changeset:

* adds retries and backoffs to the where they were needed but not implemented
* annotates those CSI RPCs that do not need retries so that we don't wonder whether it's been left off accidentally
* added a timeout and cancellation context to the `Probe` call, which didn't have one.
@github-actions
Copy link

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 10, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants