New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

KEP for adding initializationFailureThreshold to health probes #860

Merged

k8s-ci-robot merged 1 commit into kubernetes:master from matthyx:master

Apr 10, 2019

Contributor

matthyx commented Feb 28, 2019

/assign @kubernetes/sig-node-proposals

k8s-ci-robot added sig/node kind/design labels

Contributor

k8s-ci-robot commented Feb 28, 2019

Thanks for your pull request. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please follow instructions at https://git.k8s.io/community/CLA.md#the-contributor-license-agreement to sign the CLA.

It may take a couple minutes for the CLA signature to be fully registered; after that, please reply here with a new comment and we'll verify. Thanks.

If you've already signed a CLA, it's possible we don't have your GitHub username or you're using a different email address. Check your existing CLA data and verify that your email is set on your git commits.
If you signed the CLA as a corporation, please sign in with your organization's credentials at https://identity.linuxfoundation.org/projects/cncf to be authorized.
If you have done the above and are still having issues with the CLA being reported as unsigned, please email the CNCF helpdesk: helpdesk@rt.linuxfoundation.org

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

k8s-ci-robot added cncf-cla: no kind/kep size/L labels

k8s-ci-robot requested review from dchen1107 and derekwaynecarr

February 28, 2019 08:23

matthyx force-pushed the master branch from bc9c737 to d625463 Compare

February 28, 2019 08:26

k8s-ci-robot added cncf-cla: yes and removed cncf-cla: no labels

matthyx mentioned this pull request

API change: add initializationFailureThreshold to health probes kubernetes/kubernetes#71449

Closed

Contributor Author

matthyx commented Mar 5, 2019

/assign @RobertKrawitz

Contributor

k8s-ci-robot commented Mar 5, 2019

@matthyx: GitHub didn't allow me to assign the following users: RobertKrawitz.

Note that only kubernetes members and repo collaborators can be assigned and that issues/PRs can only have 10 assignees at the same time.
For more information please see the contributor guide

In response to this:

/assign @RobertKrawitz

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Contributor Author

matthyx commented Mar 5, 2019

/assign @derekwaynecarr
@derekwaynecarr , can you merge this KEP now that I have the greenlight from sig-node and @RobertKrawitz as reviewer.

k8s-ci-robot assigned derekwaynecarr

Contributor Author

matthyx commented Mar 5, 2019

The target release is 1.15

RobertKrawitz suggested changes

View reviewed changes

keps/sig-node/20190221-maxInitialFailureCount-health-probes.md Outdated Show resolved Hide resolved

keps/sig-node/20190221-maxInitialFailureCount-health-probes.md Outdated Show resolved Hide resolved

keps/sig-node/20190221-maxInitialFailureCount-health-probes.md Outdated Show resolved Hide resolved

keps/sig-node/20190221-maxInitialFailureCount-health-probes.md Outdated


		### Goals

		Add an elegant and self-explanatory solution to running slow starting containers in Kubernetes with health probes enabled.

Contributor

RobertKrawitz Mar 5, 2019

Wording: something like

Allow slow starting containers to run safely during startup with health probes enabled.

"Elegant" and "self-explanatory" are opinion; the goal should be objective.

Contributor

RobertKrawitz Mar 5, 2019

I think you also need to answer comments from @thockin @liggitt @yujuhong on kubernetes/kubernetes#71449 and update this based on your decisions on that.

keps/sig-node/20190221-maxInitialFailureCount-health-probes.md Outdated


		Add an elegant and self-explanatory solution to running slow starting containers in Kubernetes with health probes enabled.

		By using `maxInitialFailureCount` * `periodSeconds` > T you have the following benefits:

Contributor

RobertKrawitz Mar 5, 2019

This belongs under the proposal. The proposal is specifically for maxInitialFailureCount, so start by noting that, then describe how it works. The description of the mechanism needs to be more specific, that failures up to a point (maxInitialFailureCount) are ignored until the container has successfully started. Once a probe has ever succeeded, the container is considered to have been initialized, and the existing criteria are used to determine failure.

For example, if periodSeconds is 10, maxInitialFailureCount is 20, and failureThreshold is 3:

The kubelet will allow the kubelet 200 seconds to start (20 probes, spaced 10 seconds apart).
If a probe succeeds at any time during that interval, the container is considered to have started, and failureThreshold is used thereafter.

For example:

The container fails 20 probes at startup. It is considered to have failed, and is terminated.
The container fails 10 probes at startup, succeeds once, and fails 3 more probes. The probe is considered to have started at 100 seconds, and even though it has hit the failure condition of 3 consecutive failures 140 seconds into its lifetime (less than the 200 permitted by maxInitialFailureCount, it is considered to have failed and is terminated.

Contributor

RobertKrawitz Mar 5, 2019

Also, what happens if maxInitialFailureCount is set to less than failureThreshold (which needs to be addressed in the code as well)?

Contributor Author

matthyx Mar 6, 2019

I am going to add clear numerical examples...

keps/sig-node/20190221-maxInitialFailureCount-health-probes.md Outdated Show resolved Hide resolved

keps/sig-node/20190221-maxInitialFailureCount-health-probes.md Outdated Show resolved Hide resolved

keps/sig-node/20190221-maxInitialFailureCount-health-probes.md Outdated Show resolved Hide resolved

keps/sig-node/20190221-maxInitialFailureCount-health-probes.md Outdated


		If the `kubelet` is restarted, and `maxInitialFailureCount` is set very high, the effect would be the same as having previously a high `failureThreshold` potentially making the container downtime much longer if a dead-lock occurs during initialization.

		Documentation must be clearly stating that this feature should only be used in particular cases.

Contributor

RobertKrawitz Mar 5, 2019

This isn't a risk. It's either something that needs to be done, in which case it's a goal, or that you won't do anything, in which case it's a non-goal.

There is potentially a risk involved in someone setting an extremely high threshold, and therefore stuck pods not getting killed in a timely manner, but that risk exists today.

Contributor Author

matthyx Mar 6, 2019

I probably need to rewrite the issue on stateless kubelet

keps/sig-node/20190221-maxInitialFailureCount-health-probes.md Outdated Show resolved Hide resolved

Contributor Author

matthyx commented Mar 6, 2019

@RobertKrawitz sorry, I am in the middle of editing... I have just marked the points already done as resolved to keep track of the things I have to finish later. (I am doing this during my lunch pause and coffee breaks)

RobertKrawitz mentioned this pull request

REQUEST: New membership for RobertKrawitz kubernetes/org#573

Closed

6 tasks

Contributor Author

matthyx commented Mar 6, 2019

@RobertKrawitz you can review again whenever you have time, thanks for your patience!

RobertKrawitz suggested changes

View reviewed changes

keps/sig-node/20190221-maxInitialFailureCount-health-probes.md Outdated


		## Summary

		Slow starting containers are difficult to address with the current status of health probes: they are either killed before being up, or could be left dead-locked during a very long time before being killed.

Contributor

RobertKrawitz Mar 7, 2019

Nit: dead-locked => deadlocked

keps/sig-node/20190221-maxInitialFailureCount-health-probes.md Outdated


		## Motivation

		The main problem with slow starting containers is that you have to leave them enough time to start before having `livenessProbe` fail `failureThreshold` times (otherwise `kubelet` kills them before they have a chance to be up).

Contributor

RobertKrawitz Mar 7, 2019

Nit: again, don't write in the second person.

More substantively, I'm not quite sure what the "problem" is here from the way it's written. I can see a few problems, but you need to be explicit about what you're trying to solve here.

keps/sig-node/20190221-maxInitialFailureCount-health-probes.md Outdated

+              ## Motivation
+              The main problem with slow starting containers is that you have to leave them enough time to start before having `livenessProbe` fail `failureThreshold` times (otherwise `kubelet` kills them before they have a chance to be up).
+              This is even worse when load on the nodes can increase this long startup time since you have to take into account the worse case scenario (let's call it **T**).

Contributor

RobertKrawitz Mar 7, 2019

This proposal doesn't address that, and it's hard for me to see how it could without much deeper changes (the pod creator cannot know what the worst case scenario is). That's OK; this does help, but I would like you to add as an explicit non-goal that this proposal does not attempt to take into account system load; it simply allows a longer grace period at the start.

keps/sig-node/20190221-maxInitialFailureCount-health-probes.md Outdated


		You can handle that today with various strategies:

		- Delay the initial `livenessProbe` sufficiently to permit the container to start up (set `initialDelaySeconds` greater than T). While this ensures that the `livenessProbe` will not fail during the startup period, it also prevents the container from being killed early in case of a deadlock. Also, since the `livenessProbe` isn't run during that delay, there is no feedback loop on the container state.

Contributor

RobertKrawitz Mar 7, 2019

How does setting initialDelaySeconds prevent the container from being killed early if it deadlocks any more than this proposal does? Is the concern about the container coming up (so that a liveness probe would see it succeed) and only after that deadlock? If that's what you mean, say so. If it's something else, please state that as well.

keps/sig-node/20190221-maxInitialFailureCount-health-probes.md Outdated

+              You can handle that today with various strategies:
+              - Delay the initial `livenessProbe` sufficiently to permit the container to start up (set `initialDelaySeconds` greater than **T**). While this ensures that the `livenessProbe` will not fail during the startup period, it also prevents the container from being killed early in case of a deadlock. Also, since the `livenessProbe` isn't run during that delay, there is no feedback loop on the container state.
+              - Increase the allowed number of `livenessProbe` failures until `kubelet` kills the container (set `failureThreshold` so that `failureThreshold` times `periodSeconds` is greater than **T**). While this gives enough time for the container to start up and allows a feedback loop, it prevents the container from being killed in a timely manner in case of a deadlock.

Contributor

RobertKrawitz Mar 7, 2019

Killed in a timely manner if it deadlocks or otherwise hangs after it has initially successfully come up.

keps/sig-node/20190221-maxInitialFailureCount-health-probes.md Outdated

+              - The container fails 10 probes at startup, starts successfully, and after a long time fails 3 probes. The container is considered to have failed and is terminated after 30 seconds of downtime.
+              - The container fails 10 probes at startup, succeeds once, and fails 3 more probes. The container is considered to have started at 100 seconds, and even though it is still within the first 200 seconds of its lifetime covered by `maxInitialFailureCount`, it is considered to have failed because of `failureThreshold` and the fact that it had initially started successfully, therefore it is terminated after 30 seconds of downtime.
+              You can see the implementation in PR [#71449].

Contributor

RobertKrawitz Mar 7, 2019

"This is implemented in PR..." -- again, avoid the second person.

keps/sig-node/20190221-maxInitialFailureCount-health-probes.md Outdated

+              - The container is running: `hasInitialized` is reverted to false until the next probe which will succeed and reset it to true immediately after.
+              - The container is dead-locked: `hasInitialized` is reverted to false, which means the container will have a maximum downtime of `periodSeconds` times `maxInitialFailureCount`. This is also the case today with a high `failureThreshold`.
+              #### maxInitialFailureCount is set smaller than failureThreshold

Contributor

RobertKrawitz Mar 7, 2019

This isn't a risk. This is a case that you need to decide how you want to handle, and state it in the description.

Contributor

RobertKrawitz Mar 7, 2019

Take this out of the risks and add whatever you decide to do to the description.

keps/sig-node/20190221-maxInitialFailureCount-health-probes.md Outdated

+              The following test cases can be covered by calling the `fakeExecProber` a number of times to verify:
+              - the container is killed after `maxInitialFailureCount` if it has never initialized (emulated by always calling `fakeExecProber{probe.Success, nil}`)
+              - the container is killed after `failureThreshold` once it has initialized (emulated by calling `fakeExecProber{probe.Success, nil}` once) with total probes > `maxInitialFailureCount`

Contributor

RobertKrawitz Mar 7, 2019

I don't know that you need to test this; what's important in this regard is simply that it fails after failureThreshold failures after it has initialized successfully. But the case of it failing with the total number of probes < maxInitialFailureCount is simply a more interesting corner case.

Contributor Author

matthyx Mar 7, 2019

I think this is important to check for regressions concerning the hasInitialized boolean. If it is not properly set after the first probe success, the container will be killed when total probes surpasses maxInitialFailureCount regardless of failureThreshold.

keps/sig-node/20190221-maxInitialFailureCount-health-probes.md Outdated

+              - the container is killed after `failureThreshold` once it has initialized (emulated by calling `fakeExecProber{probe.Success, nil}` once) with total probes > `maxInitialFailureCount`
+              - the container is killed after `failureThreshold` once it has initialized (emulated by calling `fakeExecProber{probe.Success, nil}` once) with total probes < `maxInitialFailureCount`
+              ## Implementation History

Contributor

RobertKrawitz Mar 7, 2019

You've also implemented (and tested?) code in the PR, correct? You should note the prototype that is concurrently under review.

Contributor Author

matthyx Mar 7, 2019

yes, works well in my case - this is why I am pushing for this feature

keps/sig-node/20190221-maxInitialFailureCount-health-probes.md Outdated


		### Risks and Mitigations

		#### Stateless kubelet

Contributor

RobertKrawitz Mar 7, 2019

This is really a limitation rather than a risk; you don't have a way at present of persisting hasInitialized. But OK, it's reasonable enough to put it here.

matthyx force-pushed the master branch from 2528cab to 1367cce Compare

March 7, 2019 08:00

k8s-ci-robot added the sig/network label

matthyx force-pushed the master branch 2 times, most recently from 9e13b9d to 568f257 Compare

March 7, 2019 10:38

Contributor Author

matthyx commented Mar 7, 2019

@RobertKrawitz I have edited the KEP to reflect your excellent review, thanks!

RobertKrawitz suggested changes

View reviewed changes

keps/sig-node/20190221-maxInitialFailureCount-health-probes.md Outdated

+              - Delay the initial `livenessProbe` sufficiently to permit the container to start up (set `initialDelaySeconds` greater than **startup time**). While this ensures no `livenessProbe` will run and fail during the startup period (triggering a kill), it also delays deadlock detection if the container starts faster than `initialDelaySeconds`. Also, since the `livenessProbe` isn't run at all during startup, there is no feedback loop on the actual startup time of the container.
+              - Increase the allowed number of `livenessProbe` failures until `kubelet` kills the container (set `failureThreshold` so that `failureThreshold` times `periodSeconds` is greater than **startup time**). While this gives enough time for the container to start up and allows a feedback loop, it prevents the container from being killed in a timely manner if it deadlocks or otherwise hangs after it has initially successfully come up.
+              However, none of these strategies give an acceptable protection against deadlocks, which is the primary use case of `livenessProbe`.

Contributor

RobertKrawitz Mar 7, 2019

This proposal allows faster detection of deadlocks or otherwise unresponsive pods after they have started up, which is a good thing, but it doesn't actually protect against deadlocks.

keps/sig-node/20190221-maxInitialFailureCount-health-probes.md Outdated

+              - long data initialization: only the first startup takes a lot of time
+              - heavy workload: every startups take a lot of time
+              - underpowered/overloaded node: startup times depend on external factors

Contributor

RobertKrawitz Mar 7, 2019

This is really a different situation; this problem is extrinsic to the pod.

Contributor Author

matthyx Mar 7, 2019

True, but the result at the pod level is also a delayed startup... let's remove it for the and I might come back to the subject in a new proposal.

Contributor

RobertKrawitz Mar 7, 2019

This is a genuine cause of slow startup. List it, but note that this proposal is not intended to solve that problem.

keps/sig-node/20190221-maxInitialFailureCount-health-probes.md Outdated


		### Non-Goals

		- Make probe parameters (`initialDelaySeconds`, `failureThreshold`, `periodSeconds`, ...) auto-tunable depending on node performance (only provide a way to allow a longer grace period at the start of the container).

Contributor

RobertKrawitz Mar 7, 2019

More broadly, this does not address the issue of pod load affecting startup (or any other liveness probe that may be delayed due to load). This proposal is strictly at the pod level, not the node level.

keps/sig-node/20190221-maxInitialFailureCount-health-probes.md Outdated

+              - The container is running: `hasInitialized` is reverted to false until the next probe which will succeed and reset it to true immediately after.
+              - The container is dead-locked: `hasInitialized` is reverted to false, which means the container will have a maximum downtime of `periodSeconds` times `maxInitialFailureCount`. This is also the case today with a high `failureThreshold`.
+              #### maxInitialFailureCount is set smaller than failureThreshold

Contributor

RobertKrawitz Mar 7, 2019

Take this out of the risks and add whatever you decide to do to the description.

RobertKrawitz reviewed

View reviewed changes

keps/sig-node/20190221-maxInitialFailureCount-health-probes.md Outdated

    
            @@ -74,7 +74,7 @@ There are various strategies to handle this situation with the current API:
          
              - Delay the initial `livenessProbe` sufficiently to permit the container to start up (set `initialDelaySeconds` greater than **startup time**). While this ensures no `livenessProbe` will run and fail during the startup period (triggering a kill), it also delays deadlock detection if the container starts faster than `initialDelaySeconds`. Also, since the `livenessProbe` isn't run at all during startup, there is no feedback loop on the actual startup time of the container.

              - Increase the allowed number of `livenessProbe` failures until `kubelet` kills the container (set `failureThreshold` so that `failureThreshold` times `periodSeconds` is greater than **startup time**). While this gives enough time for the container to start up and allows a feedback loop, it prevents the container from being killed in a timely manner if it deadlocks or otherwise hangs after it has initially successfully come up.

              However, none of these strategies give an acceptable protection against deadlocks, which is the primary use case of `livenessProbe`.

              However, none of these strategies provide an timely answer to slow starting containers stuck in a deadlock, which in theory is the primary reason of setting up a `livenessProbe`.

Contributor

RobertKrawitz Mar 7, 2019

No need for "in theory".

Contributor

RobertKrawitz commented Mar 7, 2019

/lgtm
This looks good to me at this point.

Contributor

k8s-ci-robot commented Mar 7, 2019

@RobertKrawitz: changing LGTM is restricted to assignees, and only kubernetes/enhancements repo collaborators may be assigned issues.

In response to this:

/lgtm
This looks good to me at this point.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Contributor Author

matthyx commented Mar 8, 2019

@derekwaynecarr can you review it as well?

Contributor Author

matthyx commented Mar 8, 2019

@RobertKrawitz after working again on the implementation, I wonder if it wouldn't make more sense to rename the new parameter to InitializationFailureThreshold to clearly understand its relationship with FailureThreshold, what do you think?

Contributor

RobertKrawitz commented Mar 8, 2019

I agree, that will be more obvious.

matthyx force-pushed the master branch from 4023cd9 to a2b7791 Compare

March 8, 2019 15:22

Contributor Author

matthyx commented Mar 8, 2019

I agree, that will be more obvious.

done

matthyx changed the title ~~KEP for adding maxInitialFailureCount to health probes~~ KEP for adding initializationFailureThreshold to health probes


          Adding new KEP for initializationFailureThreshold

4d12500

matthyx force-pushed the master branch from a2b7791 to 4d12500 Compare

March 11, 2019 06:58

Contributor Author

matthyx commented Mar 11, 2019

Can we try to merge it in order to launch the review process initiated by an issue in enhancements?

Contributor

RobertKrawitz commented Mar 11, 2019

cc @derekwaynecarr @dchen1107 unless more review is needed this probably should get merged.

Contributor

RobertKrawitz commented Mar 11, 2019

/lgtm

k8s-ci-robot assigned RobertKrawitz

k8s-ci-robot added the lgtm label

Contributor

RobertKrawitz commented Mar 11, 2019

/assign @derekwaynecarr

matthyx mentioned this pull request

LivenessProbe should start after ReadinessProbe Succeeded if ReadinessProbe is specified kubernetes/kubernetes#27114

Closed

Contributor Author

matthyx commented Mar 21, 2019

@derekwaynecarr @dchen1107 now that code thaw for 1.14 has passed, can we move on to this KEP?

derekwaynecarr reviewed

View reviewed changes

keps/sig-node/20190221-initializationFailureThreshold-health-probes.md Show resolved Hide resolved

derekwaynecarr requested changes

View reviewed changes

Member

derekwaynecarr left a comment

I think this is fine for merging as provisional.

Before it transitions to implementable, we should make clear the graduation criteria for the feature. I assume this needs to be mapped to feature gates, and we should have some documentation on how we handle apiserver and node version skew.

keps/sig-node/20190221-initializationFailureThreshold-health-probes.md


		### Risks and Mitigations

		#### Stateless kubelet

Member

derekwaynecarr Apr 8, 2019

the kep should include a section on graduation criteria?

i assume this new field would have to go through the normal process of feature gates.

Contributor Author

matthyx Apr 8, 2019

There is already an API review here: https://github.com/orgs/kubernetes/projects/13#card-19727773
I am quite new to the KEP process, do you mean something else?

Contributor Author

matthyx commented Apr 9, 2019

Alright, now I see what you mean by graduation criteria... well if you look at the checklist, we have other points to validate first.
Maybe we could merge first, then look at the points in order?
First step would be to open an issue in release milestones, but I need the merge first.

Member

derekwaynecarr commented Apr 10, 2019

I am fine w/ merging and iterating in follow-on PRs adding missing items before it goes implementable.

/approve
/lgtm

Contributor

k8s-ci-robot commented Apr 10, 2019

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: derekwaynecarr, matthyx, RobertKrawitz

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~keps/sig-node/OWNERS~~ [derekwaynecarr]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot added the approved label

k8s-ci-robot merged commit 62727f3 into kubernetes:master

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved cncf-cla: yes kind/design kind/kep lgtm sig/network sig/node size/L