Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Start serving the plugin API prior to registering with kubelet #55

Merged
merged 1 commit into from
Sep 14, 2022

Conversation

cdesiniotis
Copy link
Contributor

@cdesiniotis cdesiniotis commented Sep 12, 2022

This is the recommended order when starting a plugin. Additionally, not starting the server prior to registration causes issues starting with K8s 1.25. It is unclear exactly what changed in 1.25 that brings this issue to surface, but it is possible kubelet now implicitly assumes a plugin is starting to serve the API upon receiving a registration request.

Without this change, we see the following kubelet logs:

"Got registration request from device plugin with resource" resourceName="nvidia.com/***" 
"Unable to connect to device plugin client with socket path" err="failed to dial device plugin: context deadline exceeded" path="/var/lib/kubelet/device-plugins/kubevirt-***.sock"

Signed-off-by: Christopher Desiniotis cdesiniotis@nvidia.com

This is the recommended order when starting a plugin.
Additionally, not starting the server prior to registration
causes issues starting with K8s 1.25. It is unclear exactly
what changed in 1.25 that brings this issue to surface, but
it is possible kubelet now implicitly assumes a plugin
is starting to serve the API upon receiving a registration
request.

Without this change, we see the following kubelet logs:

"Got registration request from device plugin with resource" resourceName="nvidia.com/***"
"Unable to connect to device plugin client with socket path" err="failed to dial device plugin:
context deadline exceeded" path="/var/lib/kubelet/device-plugins/kubevirt-***.sock"

Signed-off-by: Christopher Desiniotis <cdesiniotis@nvidia.com>
@rthallisey rthallisey merged commit d0339b1 into NVIDIA:master Sep 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants