Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provisioner does not allow rescheduling if a Node is deleted after a pod is scheduled #121

Closed
pwschuurman opened this issue Mar 2, 2022 · 19 comments
Assignees
Labels
triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@pwschuurman
Copy link

If a node is deleted while a pod is scheduled on a node (but before a claim is provisioned), a pod can become indefinitely stuck in a Pending state.

Typically when a failure occurs in provisioning, the provisioner will relinquish control back to the Scheduler, to reschedule the Pod somehwere else. This is done by removing the volume.kubernetes.io/selected-node annotation from the PVC. The controller returns ProvisioningFinished in provisionClaimOperation. This can happen in the case when storage cannot be scheduled on the selected node: https://github.com/kubernetes-sigs/sig-storage-lib-external-provisioner/blob/master/controller/controller.go#L1420

However, if a Node becomes unavailable after it has been selected by the Scheduler, the provisioner will not remove this annotation, since it returns ProvisioningNoChange in provisionClaimOperation. This is potentially useful in some situations where there is eventual consistency for a Node to become available, once it has been selected. However, for the case when a Node is deleted, this is an unrecoverable condition, and requires the user to intervene (either by adding the exact node back (infeasible for dynamically provisioned node names), deleting/re-creating the pod and allowing the Scheduler to reschedule, or manually removing the selected-node annotation on the PVC).

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 31, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jun 30, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closing this issue.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@amacaskill
Copy link
Member

/remove-lifecycle rotten

@k8s-ci-robot k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Oct 13, 2022
@amacaskill
Copy link
Member

/reopen

@k8s-ci-robot
Copy link
Contributor

@amacaskill: Reopened this issue.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot reopened this Oct 13, 2022
@pwschuurman
Copy link
Author

Repro using VolumeSnapshot to delay provisioning: https://gist.github.com/pwschuurman/fd9c8c50889ce2382bcdca259c51d3e4

  1. Create a VolumeSnapshot that references a non-existent disk (or a disk that takes a lot of time to be copied in order for the VolumeSnapshot to become ready)
  2. Create a PVC that references the VolumeSnapshot as a DataSource
  3. Create a pod that references said PVC. Scheduler will select a node for the pod, and add the volume.kubernetes.io/selected-node annotation to the PVC.
  4. While operation from (1) is still pending, delete the node that the PVC is selected for. This could happen under normal conditions due to node repair, upgrade, autoscaling.
  5. Once the VolumeSnapshot becomes ready, the provisioner will start to emit failed to get target node. PVC must be deleted (or annotation removed) to fix this problem.

Some ideas on how to handle this:

  1. Add a timeout that will remove the annotation after some period of time. If a volume.kubernetes.io/selected-node annotation becomes, stale remove it from the PVC. This is troublesome as some delays can take a long time (eg: waiting for snapshot to be created), and may not fit into a well define timeout period.
  2. Update csi-provisioner to use an informer, rather than a lister. This would allow the provisioner to be aware of deletion events for a node, and remove the annotation for affected volumes. The provisioner would likely need to keep a cache of node -> volume, in order to update affected volumes.
  3. Update the scheduler to keep remove the annotation on node deletion.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 21, 2023
@msau42
Copy link

msau42 commented Feb 22, 2023

/remove-lifecycle stale
/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 22, 2023
@msau42
Copy link

msau42 commented Feb 22, 2023

I think this is the same issue as kubernetes/kubernetes#100485

@sunnylovestiramisu
Copy link
Contributor

Another option we discussed is: remove the annotation when the provisioner tries to access a Node that doesn't exist by detecting errors.NewNotFound

@msau42
Copy link

msau42 commented Mar 7, 2023

/assign @sunnylovestiramisu

@sunnylovestiramisu
Copy link
Contributor

sunnylovestiramisu commented Mar 8, 2023

Reproduced the error by the following step:

  1. kubetest --build --up
  2. Deploy a pd csi driver via [gcp-compute-persistent-disk-csi-driver/deploy/kubernetes/deploy-driver.sh](https://goto.google.com/src)
  3. Create a storage class, create a pvc with annotation: volume.kubernetes.io/selected-node, create a pod
  4. PVC stayed in PENDING state
  5. Check csi-provisioner logs via k logs -n gce-pd-csi-driver csi-gce-pd-controller-container csi-provisioner
W0308 00:51:37.588114       1 controller.go:934] Retrying syncing claim "xxxxxx", failure 12
E0308 00:51:37.588141       1 controller.go:957] error syncing claim "xxxxxx": failed to get target node: node "non-exist-node" not found
I0308 00:51:37.588381       1 event.go:285] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default", Name:"task-pv-claim", UID:"xxxxxx", APIVersion:"v1", ResourceVersion:"4824", FieldPath:""}): type: 'Warning' reason: 'ProvisioningFailed' failed to get target node: node "non-exist-node" not found

@sunnylovestiramisu
Copy link
Contributor

sunnylovestiramisu commented Mar 9, 2023

Manually testes with the fix #139

  1. Copy the sig-storage-lib-external-provisioner with the fix to external-attacher vendor
  2. make container of a new external-attacher image
  3. Upload to GCR and then replace the driver link in stable-master image.yaml
  4. Spin up a k8s cluster on GCE via kubetest --build --up
  5. Deploy a pd csi driver via [gcp-compute-persistent-disk-csi-driver/deploy/kubernetes/deploy-driver.sh
  6. Create a storage class, create a pvc with annotation: volume.kubernetes.io/selected-node, create a pod
  7. PVC in state "Successfully provisioned volume pvc-xxxxxx"

@sunnylovestiramisu
Copy link
Contributor

We should cherry-pick to external-provisioner 3.2, 3.3, 3.4

@sunnylovestiramisu
Copy link
Contributor

/close

@k8s-ci-robot
Copy link
Contributor

@sunnylovestiramisu: Closing this issue.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

6 participants