Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 fix issue when webhook server refreshing cert #260

Merged
merged 1 commit into from
Jan 16, 2019

Conversation

mengqiy
Copy link
Member

@mengqiy mengqiy commented Dec 18, 2018

fixes #191
fixes #192

Tests are on the way.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Dec 18, 2018
@k8s-ci-robot k8s-ci-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Dec 18, 2018
@mengqiy mengqiy changed the title ✨ fix issue when webhook server refreshing cert [WIP] ✨ fix issue when webhook server refreshing cert Dec 18, 2018
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 18, 2018
@mengqiy
Copy link
Member Author

mengqiy commented Dec 18, 2018

/cc @anfernee

@k8s-ci-robot
Copy link
Contributor

@mengqiy: GitHub didn't allow me to request PR reviews from the following users: anfernee.

Note that only kubernetes-sigs members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

/cc @anfernee

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@anfernee
Copy link
Member

/lgtm

it doesn't fix #191 tho.

@k8s-ci-robot
Copy link
Contributor

@anfernee: changing LGTM is restricted to assignees, and only kubernetes-sigs/controller-runtime repo collaborators may be assigned issues.

In response to this:

/lgtm

it doesn't fix #191 tho.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Copy link
Contributor

@droot droot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change looks good. will wait for the tests before lgtm.

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Dec 19, 2018
@mengqiy
Copy link
Member Author

mengqiy commented Dec 19, 2018

/hold
hold it until the tests are ready.

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 19, 2018
@mengqiy mengqiy changed the title [WIP] ✨ fix issue when webhook server refreshing cert 🐛 fix issue when webhook server refreshing cert Dec 19, 2018
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 19, 2018
@mengqiy
Copy link
Member Author

mengqiy commented Dec 19, 2018

Made some additional changes in pkg/webhook/server.go and added tests.

@mengqiy
Copy link
Member Author

mengqiy commented Dec 19, 2018

it doesn't fix #191 tho.

@anfernee Explained in #191 :)

@anfernee @droot PTAL

@@ -0,0 +1,148 @@
/*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a test case to test the case where the server exceeds cert refresh interval? you can change defaultCertRefreshInterval to a smaller interview. I am worried that if you start a server on the same port immediately after shutdown, you will have port conflict.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC
https://github.com/kubernetes-sigs/controller-runtime/pull/260/files#diff-8b0412ec4fd52af1419bd0fcd0f2e101R121 is doing what you ask. The server restarts multiple times and rotates the certs.

go serveFn()
case <-stop:
return nil
case e := <-errCh:
return e
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ideally if you start server here, it should just work.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO if there is an unexpected error, we should surface the error instead of keeping retrying.

continue
}
log.Info("server is shutting down to reload the certificates.")
err = srv.Shutdown(context.Background())
shutdownHappend = true
err = s.httpServer.Shutdown(context.Background())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is Shutdown is async, meaning return of shutdown doesn't necessarily mean the server is already done. But return of ListenAndServeTLS guarantees it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shutdown gracefully shuts down the server without interrupting any active connections. Shutdown works by first closing all open listeners, then closing all idle connections, and then waiting indefinitely for connections to return to idle and then shut down

Per https://golang.org/pkg/net/http/#Server.Shutdown, it is graceful shutdown and synchronized call. Did I miss anything?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am confused of those 2 methods, you are right. pls ignore it.

if err != nil {
log.Error(err, "encountering error when shutting down")
return err
}
timer = time.Tick(wait.Jitter(defaultCertRefreshInterval, 0.1))
go serveFn()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this might have port conflict.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the question here is that shutting down the server will unbind the port, but if it will complete in time before the server tries to bind the same port next time.
I have been searching online for this question for quite a while. But I'm still not sure what the 100% correct thing to do here :/
It seems when the Listener is closed, it should have unbinded the port.
Some tests I added in this PR rotates the cert and reloads the server for multiple times. I haven't seen any issue about port conflict.

I'm open to suggestions if you have a better solution. If not, I will probably merge it as is :)

@mengqiy mengqiy force-pushed the fixwebhookserver branch 2 times, most recently from 43a3f31 to acd82f6 Compare January 9, 2019 21:36
}

cg := &generator.SelfSignedCertGenerator{}
s.CertDir, err = ioutil.TempDir("/tmp", "controller-runtime-")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we doing cleanup of this temp dir ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mengqiy
Copy link
Member Author

mengqiy commented Jan 15, 2019

PTAL

@anfernee
Copy link
Member

/lgtm

@k8s-ci-robot
Copy link
Contributor

@anfernee: changing LGTM is restricted to assignees, and only kubernetes-sigs/controller-runtime repo collaborators may be assigned issues.

In response to this:

/lgtm

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@droot droot added lgtm "Looks good to me", indicates that a PR is ready to be merged. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Jan 16, 2019
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

Approval requirements bypassed by manually added approval.

This pull-request has been approved by: mengqiy

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@droot droot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 16, 2019
@k8s-ci-robot k8s-ci-robot merged commit d561f55 into kubernetes-sigs:master Jan 16, 2019
@mengqiy mengqiy deleted the fixwebhookserver branch February 19, 2019 19:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
4 participants