Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

csi: dynamic plugins don't become available until next fingerprint #7296

Closed
tgross opened this issue Mar 9, 2020 · 4 comments
Closed

csi: dynamic plugins don't become available until next fingerprint #7296

tgross opened this issue Mar 9, 2020 · 4 comments

Comments

@tgross
Copy link
Member

tgross commented Mar 9, 2020

When a CSI plugin job is run, it's registered as a dynamic plugin as soon as the plugin supervisor hook gets a OK response back from PluginProbe. But the plugin's capabilities aren't fingerprinted until the dynamic plugin manager's fingerprint loop fires, which can be ~30s later. This causes a delay in availability of plugins to serve other CSI operations.

(Note: this is an optimization issue and not a correctness issue.)

@tgross
Copy link
Member Author

tgross commented Mar 23, 2020

The fingerprinting updates in #7386 have made a big improvement in this timing but it could still be faster. I'm going to mark this as an improvement to complete after we ship the beta.

@tgross tgross removed their assignment Mar 25, 2020
@tgross tgross removed this from the 0.11.0 milestone Mar 30, 2020
@tgross
Copy link
Member Author

tgross commented May 5, 2020

Some clarification on the current impact: when an alloc for a CSI plugin is placed, the nomad job status and nomad alloc status show the job running at the usual pace. The Nomad client fingerprint has to fire to complete the registration of the plugin before it is available for use by volumes.

This is primarily an impact to operators when first launching a plugin for a given Nomad client, including adding new clients to the cluster or when upgrading a plugin in place. (It also makes local development a bit tedious if you're frequently adding/removing plugins, but that primarily impacts Nomad team developers.)

@tgross
Copy link
Member Author

tgross commented Oct 9, 2020

Now that #8699 has landed and we have an accurate view of expected plugin counts, I've been able to verify the timing behaviors here. The node update is edge-triggered, so client state changes from the plugin registration are happening immediately. The source of the remaining delay is the gap between the taskrunner's PreRun and PostStart hooks, or in other words primarily the amount of time it takes to download the container image. I've been able to confirm this with pre-downloaded Docker images. That makes this issue safe to finally close.

@tgross tgross closed this as completed Oct 9, 2020
@github-actions
Copy link

github-actions bot commented Nov 1, 2022

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 1, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

1 participant