Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

apiserver: Update genericapiserver to panic on listener error #42272

Merged
merged 1 commit into from
Apr 20, 2017

Conversation

marun
Copy link
Contributor

@marun marun commented Feb 28, 2017

Previously runServer would try to listen again if a listener error occurred. This commit changes the response to a panic to allow a process manager (systemd/kubelet/etc) to react to the failure.

Release note:

The Kubernetes API server now exits if it encounters a networking failure (e.g. the networking interface hosting its address goes away) to allow a process manager (systemd/kubelet/etc) to react to the problem.  Previously the server would log the failure and try again to bind to its configured address:port.

cc: @liggitt @sttts @deads2k @derekwaynecarr

@k8s-github-robot k8s-github-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. release-note-label-needed labels Feb 28, 2017

err := server.Serve(listener)

stopRequested := strings.Contains(err.Error(), "use of closed network connection")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not keep the old logic with the channel? This looks fishy to compare error texts.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

definitely don't check error text that drifts between releases

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


stopRequested := strings.Contains(err.Error(), "use of closed network connection")
if stopRequested {
glog.Info("Stop requested")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be a bit more verbose: Stop listening for network connection on ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


err := server.Serve(listener)

stopRequested := strings.Contains(err.Error(), "use of closed network connection")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

definitely don't check error text that drifts between releases

if stopRequested {
glog.Info("Stop requested")
} else {
panic(fmt.Sprintf("Stopping due to error: %v", err))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still expected a HandleCrash call

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is dispute over that TODO

if stopRequested {
glog.Info("Stop requested")
} else {
panic(fmt.Sprintf("Stopping due to error: %v", err))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we have two listeners potentially add the addr here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

ln.Close()
}()

go func() {
defer utilruntime.HandleCrash()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens on panic of the go-routine now? Will apiserver terminate or will the socket stay bound, but nobody listens anymore for incoming connections?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added this back as per @liggitt. I misread a comment on HandleCrash that suggested it might no longer be necessary.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Feb 28, 2017
@marun marun force-pushed the apiserver-fail-fast branch from e0cbf12 to 0da4be0 Compare February 28, 2017 19:14
@k8s-github-robot k8s-github-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-label-needed labels Feb 28, 2017
@sttts
Copy link
Contributor

sttts commented Mar 1, 2017

@k8s-bot test this

@sttts
Copy link
Contributor

sttts commented Mar 1, 2017

lgtm

@sttts
Copy link
Contributor

sttts commented Mar 1, 2017

/approve
/lgtm

@k8s-ci-robot
Copy link
Contributor

@sttts: you can't LGTM a PR unless you are an assignee.

In response to this comment:

/approve
/lgtm

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@sttts sttts self-assigned this Mar 1, 2017
@sttts
Copy link
Contributor

sttts commented Mar 1, 2017

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 1, 2017
@k8s-github-robot k8s-github-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 1, 2017
@liggitt liggitt added this to the v1.7 milestone Mar 1, 2017
@marun
Copy link
Contributor Author

marun commented Mar 2, 2017

This PR is also useful for #42224 (Make bind port zero work with kube-apiserver) since binding more than once would complicate use of an ephemeral port.

@k8s-reviewable
Copy link

This change is Reviewable

@marun
Copy link
Contributor Author

marun commented Mar 13, 2017

@k8s-bot cvm gce e2e test this
@k8s-bot non-cri node e2e test this

@marun
Copy link
Contributor Author

marun commented Mar 30, 2017

@k8s-bot kops aws e2e test this

@marun marun force-pushed the apiserver-fail-fast branch from 0da4be0 to 462b6aa Compare April 6, 2017 18:45
@k8s-github-robot k8s-github-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 6, 2017
@marun
Copy link
Contributor Author

marun commented Apr 6, 2017

rebased

Previously runServer would try to listen again if a listener error
occurred.  This commit changes the response to a panic to allow a
process manager (systemd/kubelet/etc) to react to the failure.
@marun marun force-pushed the apiserver-fail-fast branch from 462b6aa to 30fb3be Compare April 18, 2017 22:49
@marun
Copy link
Contributor Author

marun commented Apr 18, 2017

rebased

@marun
Copy link
Contributor Author

marun commented Apr 19, 2017

@sttts tests are passing again!

@marun marun added the sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. label Apr 19, 2017
@sttts
Copy link
Contributor

sttts commented Apr 20, 2017

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 20, 2017
@k8s-github-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: marun, sttts

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@k8s-github-robot
Copy link

Automatic merge from submit-queue

@k8s-github-robot k8s-github-robot merged commit afc01d9 into kubernetes:master Apr 20, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants