Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KEP-753: Update documentation to clarify restartPolicy behavior #4880

Merged
merged 1 commit into from
Oct 10, 2024

Conversation

gjkim42
Copy link
Member

@gjkim42 gjkim42 commented Sep 29, 2024

  • One-line PR description:

The implementation is different from the current documentation.
This updates the documentation to clarify the relationship between the container-level restartPolicy and the pod-level restartPolicy.

We may introduce a new concept (or policy) for the restart behavior of the initialization.

/cc @SergeyKanzhelev

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/node Categorizes an issue or PR as relevant to SIG Node. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Sep 29, 2024
@pacoxu
Copy link
Member

pacoxu commented Sep 30, 2024

/lgtm
if we do not want to change the behavior, we should sync the current behavior to the KEP.

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 30, 2024
Once sidecar container is started (`postStart` completed and startup probe
succeeded), this containers will be restarted even when the Pod `restartPolicy`
is `Never` or `OnFailure`.
The `restartPolicy` field for individual init containers can override the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sidecar container failed during startup will NOT be restarted

failed meant:

  • PostStart failed
  • Image pull failed
  • CRI failed

In all these cases the implemented behavior is that we will restart the sidecar container even if the Pod restartPolicy is Never.

If one want to express the semantic of "fail if the sidecar cannot start" - it can be implemented by moving this logic into the Init container executed just before the sidecar. Like if certificates are needed, but we cannot download them.

@thockin any objections with this behavior?

cc: @kubernetes/api-reviewers

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One could argue that "CRI failed" and "Image pull failed" are not failures of the container, but failures to START the container, and so retrying is not a "restart" at all.

Restarting when PostStart fails is kind of dubious, though. At that point, the container HAS started. If the policy is "Never", it seems like we should not restart. Tell me why I am wrong?

Copy link
Member

@SergeyKanzhelev SergeyKanzhelev Oct 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Restarting when PostStart fails is kind of dubious, though. At that point, the container HAS started. If the policy is "Never", it seems like we should not restart. Tell me why I am wrong?

We are talking about sidecar containers. They will restart if failed even in Pod with restartPolicy=Never. So the only question is whether we should allow sidecars (which has started already, just executing the PostStart) to be able to "fail" the Pod if PostStart has failed. Originally we thought it may be useful to be able to express this. But now I am not that confident. It feels a bit strange to have different restart behavior if code failed during PostStart or PostStart itself has failed. This is why I am asking for more opinions

Copy link
Member

@thockin thockin Oct 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I was thinking it was if the container had restart policy "Never". Yes, I agree with you. They can't fail the pod that way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you for confirming.

/lgtm
/approve

Copy link
Member

@SergeyKanzhelev SergeyKanzhelev Oct 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/assign @mrunalp

for actual approval

Copy link
Contributor

@mrunalp mrunalp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: gjkim42, mrunalp, SergeyKanzhelev

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 10, 2024
@k8s-ci-robot k8s-ci-robot merged commit 10703e0 into kubernetes:master Oct 10, 2024
4 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v1.32 milestone Oct 10, 2024
@SergeyKanzhelev SergeyKanzhelev mentioned this pull request Oct 14, 2024
11 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/node Categorizes an issue or PR as relevant to SIG Node. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Development

Successfully merging this pull request may close these issues.

6 participants