-
Notifications
You must be signed in to change notification settings - Fork 174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provisioner does not allow rescheduling if a Node is deleted after a pod is scheduled #121
Comments
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /close |
@k8s-triage-robot: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/remove-lifecycle rotten |
/reopen |
@amacaskill: Reopened this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Repro using VolumeSnapshot to delay provisioning: https://gist.github.com/pwschuurman/fd9c8c50889ce2382bcdca259c51d3e4
Some ideas on how to handle this:
|
The Kubernetes project currently lacks enough contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
I think this is the same issue as kubernetes/kubernetes#100485 |
Another option we discussed is: remove the annotation when the provisioner tries to access a Node that doesn't exist by detecting errors.NewNotFound |
/assign @sunnylovestiramisu |
Reproduced the error by the following step:
|
Manually testes with the fix #139
|
We should cherry-pick to external-provisioner 3.2, 3.3, 3.4 |
/close |
@sunnylovestiramisu: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
If a node is deleted while a pod is scheduled on a node (but before a claim is provisioned), a pod can become indefinitely stuck in a Pending state.
Typically when a failure occurs in provisioning, the provisioner will relinquish control back to the Scheduler, to reschedule the Pod somehwere else. This is done by removing the
volume.kubernetes.io/selected-node
annotation from the PVC. The controller returnsProvisioningFinished
inprovisionClaimOperation
. This can happen in the case when storage cannot be scheduled on the selected node: https://github.com/kubernetes-sigs/sig-storage-lib-external-provisioner/blob/master/controller/controller.go#L1420However, if a Node becomes unavailable after it has been selected by the Scheduler, the provisioner will not remove this annotation, since it returns
ProvisioningNoChange
inprovisionClaimOperation
. This is potentially useful in some situations where there is eventual consistency for a Node to become available, once it has been selected. However, for the case when a Node is deleted, this is an unrecoverable condition, and requires the user to intervene (either by adding the exact node back (infeasible for dynamically provisioned node names), deleting/re-creating the pod and allowing the Scheduler to reschedule, or manually removing theselected-node
annotation on the PVC).The text was updated successfully, but these errors were encountered: