Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 Machine deletion: try up to 10s to delete the Node, then move on #1452

Merged

Conversation

tahsinrahman
Copy link
Contributor

What this PR does / why we need it:

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #1446

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Sep 26, 2019
controllers/machine_controller.go Outdated Show resolved Hide resolved
controllers/machine_controller.go Outdated Show resolved Hide resolved
controllers/machine_controller.go Outdated Show resolved Hide resolved
@tahsinrahman tahsinrahman force-pushed the fix-delete-machine branch 3 times, most recently from 3119c2c to ac36bf3 Compare September 26, 2019 17:56
return ctrl.Result{}, err
waitErr := wait.PollImmediate(2*time.Second, 10*time.Second, func() (bool, error) {
if err = r.deleteNode(ctx, cluster, m.Status.NodeRef.Name); err != nil && !apierrors.IsNotFound(err) {
return false, nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should record the error as deleteNodeErr and report that back to the user in the event instead of waitErr (I fear waitErr will be "timed out waiting for condition")?

Copy link
Contributor Author

@tahsinrahman tahsinrahman Sep 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, waitErr will be "timed out waiting for condition"
do you mean doing this deleteNodeErr = r.deleteNode..... ? and keeping waitErr as it is?


var deleteNodeErr error
err = wait.PollImmediate(2*time.Second, 10*time.Second, func() (bool, error) {
if deleteNodeErr = r.deleteNode(ctx, cluster, m.Status.NodeRef.Name); err != nil && !apierrors.IsNotFound(err) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are assigning to deleteNodeErr and then testing against err

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks! fixed

}
return true, nil
})
if deleteNodeErr != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't this also trigger if apierrors.IsNotFound(deleteNodeErr) is true giving a false error?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks! fixed

if waitErr != nil {
// TODO: remove m.Name after #1203
r.Log.Error(deleteNodeErr, "timed out deleting Machine's node, moving on", "node", m.Status.NodeRef.Name, "machine", m.Name)
r.recorder.Eventf(m, corev1.EventTypeWarning, "FailedDeleteNode", "error deleting Machine's node: %v", deleteNodeErr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume the answer is "no" but it's worth asking: the if condition here is a non-nil waitErr. Is there a chance that waitErr is not nil, but deleteNodeErr is nil (since we're including deleteNodeErr in the event)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, if waitErr is not nil, deleteNodeErr will definitely not-nil
if deleteNodeErr is nil, then conditionFunc will return true and waitErr will be nil

@ncdc
Copy link
Contributor

ncdc commented Sep 26, 2019

/lgtm
/assign @vincepri @detiber
for final review

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 26, 2019
Copy link
Member

@vincepri vincepri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@ncdc
Copy link
Contributor

ncdc commented Sep 26, 2019

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ncdc, tahsinrahman

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 26, 2019
@vincepri
Copy link
Member

Whoops thanks, I assumed you approved without looking :D

@ncdc
Copy link
Contributor

ncdc commented Sep 26, 2019

No, mistake on my part. I meant to /approve and then assign to you.

@k8s-ci-robot k8s-ci-robot merged commit 6fc621a into kubernetes-sigs:master Sep 26, 2019
@tahsinrahman tahsinrahman deleted the fix-delete-machine branch September 27, 2019 20:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Machine deletion: try up to n times to delete the Node, then move on
5 participants