Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(localpv-hostpath):pv deletion caused panic, if node not found #1662

Merged
merged 3 commits into from
Apr 10, 2020
Merged

fix(localpv-hostpath):pv deletion caused panic, if node not found #1662

merged 3 commits into from
Apr 10, 2020

Conversation

kmova
Copy link
Contributor

@kmova kmova commented Apr 10, 2020

Ref: openebs/openebs#2993

What this PR does / why we need it:

After the PV is created and node affinity is set
based on kubernetes.io/hostname label, either:

  • hostname label changed on the node or
  • the node is deleted from the cluster.

This PR fixes the panic, by logging the details
of the node hostname that no longer exists
in the cluster.

The code doesn't force delete the PV, considering that
the node might intermittently be out of the cluster.

If the user decides that node is not going to come back,
then the user can go ahead with force delete of pv using
kubectl delete pv.

An alternate case is that node hostname has really changed.
In this case, the user will have to perform a manual migration
of the localpv-hostpath PV to the new node using the
following steps:

  • Scale down the deployment or sts using the PVC.
  • Save the PVC and PV yamls files.
  • Delete the PVC and PV
  • Modify the saved PV yaml will the node hostname and apply
    Note: when re-applying the yamls, the uuid of pv and pvc objects will change, so the metadata around self-UUID,etc needs to be cleared.
  • Modify the saved PVC yaml for stale references and re-apply.
  • Update the PV yaml with the uuid of the newly created PVC
  • Scale up the deployment.

Signed-off-by: kmova kiran.mova@mayadata.io

Special notes for your reviewer:
Tested by renaming the hostname after PV was created. The deletion will not panic, but display the following error message and continue to re-concile delete:

E0410 09:10:56.656083       1 controller.go:952] error syncing volume "pvc-3feef744-a565-4216-a3fd-050d8b89a93d": failed to delete volume pvc-3feef744-a565-4216-a3fd-050d8b89a93d: failed to delete volume pvc-3feef744-a565-4216-a3fd-050d8b89a93d: Unable to get the Node with the NodeHostName [gkea-kmova-helm-default-pool-1bdf01a5-2mx3]

Manually running the kubectl delete pv clears the PV from the system. If the node was re-named, then the PV folder will continue to exist on the node and that also needs to be manualy cleared.

Checklist:

  • Fixes #
  • Labelled this PR & related issue with documentation tag
  • PR messages has document related information
  • Labelled this PR & related issue with breaking-changes tag (no breaking changes)
  • PR messages has breaking changes related information (no breaking changes)
  • Labelled this PR & related issue with requires-upgrade tag (no upgrade changes)
  • PR messages has upgrade related information (no upgrade changes)
  • Commit has unit tests (not in this PR)
  • Commit has integration tests (Will be added in future PRs BDD: Local PV hostpath for hostname label change.  #1661 )

After the PV is created and node affinity is set
based on kubernetes.io/hostname label, either:
- hostname label changed on the node or
- the node is deleted from the cluster.

This PR fixes the panic, by logging the details
of the node hostname that no longer exists
in the cluster.

The code doesn't force delete the PV, considering that
the node might intermittently be out of the cluster.

If the user decides that node is not going to come back,
then the user can go ahead with force delete of pv using
`kubectl delete pv`.

An alternate case is that node hostname has really changed.
In this case, user will have to perform a manual migration
of the localpv-hostpath PV to the new node using the
following steps:
- Scale down the deployment or sts using the PVC.
- Save the PVC and PV yamls files.
- Delete the PVC and PV
- Modify the saved PV yaml will the node hostname and apply
  Note: when re-applying the yamls, the uuid of pv and pvc objects
  will change, so the metadata around self-uuid,etc needs to be
  cleared.
- Modify the saved PVC yaml for stale references and re-apply.
- Update the PV yaml with the uuid of the newly created PVC
- Scale up the deployment.

Signed-off-by: kmova <kiran.mova@mayadata.io>
@kmova kmova added the pr/release-note PR should be included in release notes label Apr 10, 2020
Copy link
Contributor

@vishnuitta vishnuitta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changes are good.. and do we want throw error if len is not 1? testcase can come in later PR

@@ -134,8 +134,12 @@ func (p *Provisioner) GetNodeObjectFromHostName(hostName string) (*v1.Node, erro
Limit: 1,
}
nodeList, err := p.kubeClient.CoreV1().Nodes().List(listOptions)
if err != nil {
return nil, errors.Errorf("Unable to get the Node with the NodeHostName")
if err != nil || nodeList.Items == nil || len(nodeList.Items) == 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need both the checks, for Items not nil and len(Items)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AmitKumarDas -- what is the idiomatic go way for above checks?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Len check only for arrays is good

Copy link
Contributor

@akhilerm akhilerm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes are good. given a small comment on the error checking.

kmova added 2 commits April 10, 2020 12:39
Signed-off-by: kmova <kiran.mova@mayadata.io>
…path.go

Signed-off-by: kmova <kiran.mova@mayadata.io>
@vishnuitta vishnuitta merged commit 728a003 into openebs-archive:master Apr 10, 2020
vishnuitta pushed a commit that referenced this pull request Apr 10, 2020
…1662) (#1664)

After the PV is created and node affinity is set
based on kubernetes.io/hostname label, either:
- hostname label changed on the node or
- the node is deleted from the cluster.

This PR fixes the panic, by logging the details
of the node hostname that no longer exists
in the cluster.

The code doesn't force delete the PV, considering that
the node might intermittently be out of the cluster.

If the user decides that node is not going to come back,
then the user can go ahead with force delete of pv using
`kubectl delete pv`.

An alternate case is that node hostname has really changed.
In this case, user will have to perform a manual migration
of the localpv-hostpath PV to the new node using the
following steps:
- Scale down the deployment or sts using the PVC.
- Save the PVC and PV yamls files.
- Delete the PVC and PV
- Modify the saved PV yaml will the node hostname and apply
  Note: when re-applying the yamls, the uuid of pv and pvc objects
  will change, so the metadata around self-uuid,etc needs to be
  cleared.
- Modify the saved PVC yaml for stale references and re-apply.
- Update the PV yaml with the uuid of the newly created PVC
- Scale up the deployment.

Signed-off-by: kmova <kiran.mova@mayadata.io>
(cherry picked from commit 728a003)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr/release-note PR should be included in release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants