Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request PRR approval for two KEPs: "In-place Pod Vertical Scaling" and "Kubelet CRI extension for In-Place Pod Vertical Scaling" #2474

Merged
merged 5 commits into from
May 13, 2021

Conversation

vinaykul
Copy link
Member

@vinaykul vinaykul commented Feb 9, 2021

As per updated process for KEP inclusion into release milestone, a Production Readiness Review (PRR) section is needed. Adding it, with answers required for items for alpha feature target.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Feb 9, 2021
@k8s-ci-robot
Copy link
Contributor

Hi @vinaykul. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/node Categorizes an issue or PR as relevant to SIG Node. labels Feb 9, 2021
@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Feb 9, 2021
@vinaykul
Copy link
Member Author

vinaykul commented Feb 9, 2021

/assign @derekwaynecarr
/assign @dchen1107

@dims
Copy link
Member

dims commented Feb 9, 2021

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Feb 9, 2021
@kikisdeliveryservice
Copy link
Member

Related-to: #2273

@vinaykul
Copy link
Member Author

vinaykul commented May 4, 2021

Tracking issue for this enhancement is #1287

Copy link
Member

@ehashman ehashman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vinayakankugoyal you will need to finish filling out your kep.yaml and set a PRR approver:

# The target maturity stage in the current dev cycle for this KEP.
stage: alpha|beta|stable
# The most recent milestone for which work toward delivery of this KEP has been
# done. This can be the current (upcoming) milestone, if it is being actively
# worked on.
latest-milestone: "v1.19"
# The milestone at which this feature was, or is targeted to be, at each stage.
milestone:
alpha: "v1.19"
beta: "v1.20"
stable: "v1.22"

@vinayakankugoyal
Copy link
Contributor

@vinayakankugoyal you will need to finish filling out your kep.yaml and set a PRR approver:

# The target maturity stage in the current dev cycle for this KEP.
stage: alpha|beta|stable
# The most recent milestone for which work toward delivery of this KEP has been
# done. This can be the current (upcoming) milestone, if it is being actively
# worked on.
latest-milestone: "v1.19"
# The milestone at which this feature was, or is targeted to be, at each stage.
milestone:
alpha: "v1.19"
beta: "v1.20"
stable: "v1.22"

Hi @ehashman - I have no context about this KEP. Was I added by mistake?

@vinaykul
Copy link
Member Author

vinaykul commented May 4, 2021

@vinayakankugoyal you will need to finish filling out your kep.yaml and set a PRR approver:

# The target maturity stage in the current dev cycle for this KEP.
stage: alpha|beta|stable
# The most recent milestone for which work toward delivery of this KEP has been
# done. This can be the current (upcoming) milestone, if it is being actively
# worked on.
latest-milestone: "v1.19"
# The milestone at which this feature was, or is targeted to be, at each stage.
milestone:
alpha: "v1.19"
beta: "v1.20"
stable: "v1.22"

Hi @ehashman - I have no context about this KEP. Was I added by mistake?

@vinayakankugoyal Yes, Elana was responding to me. Please ignore.

@vinaykul vinaykul changed the title Add PRR section to Kubelet CRI extension KEP Request PRR approval for two KEPs: "In-place Pod Vertical Scaling" and "Kubelet CRI extension for In-Place Pod Vertical Scaling" May 5, 2021
@vinaykul
Copy link
Member Author

vinaykul commented May 5, 2021

@ehashman @dchen1107 @derekwaynecarr Please review PRR section for the two KEPs related to this feature. Thanks,

@ehashman
Copy link
Member

ehashman commented May 5, 2021

sorry @vinayakankugoyal! Different Vinay K! darn autocomplete 🤐

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 5, 2021
Copy link
Member

@ehashman ehashman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/assign

Please rebase and I'll get to these ASAP.

@vinaykul
Copy link
Member Author

vinaykul commented May 6, 2021

/test pull-enhancements-verify

@vinaykul
Copy link
Member Author

vinaykul commented May 6, 2021

/assign

Please rebase and I'll get to these ASAP.

Done.

@vinaykul
Copy link
Member Author

vinaykul commented May 7, 2021

PRR review comments addressed in above commit.

@ehashman
Copy link
Member

ehashman commented May 7, 2021

Hi @vinaykul, this looks almost ready to go. It is still missing sections for Upgrade/Downgrade Strategy and Version Skew Strategy per my comment above. Please complete these: https://github.com/kubernetes/enhancements/tree/master/keps/NNNN-kep-template#upgrade--downgrade-strategy

@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels May 7, 2021
### Version Skew Strategy
Kubelet and the CRI runtime versions are expected to match so we don't have to worry about.
Previous versions of clients that are unaware of the new ResizePolicy fields would set them
to nil. API server mutates such updates by copying non-nil values from old Pod to current Pod
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens when the API server supports the feature but the kubelets do not? Keep in mind that kubelets can be up to n-2 versions behind the API server.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case feature would not work as expected. Enabling the feature-flag will allow apiserver to accept PATCH to PodSpec.Containers[].Resources field, but kubelet will interpret it as a change to the container definition and restart the container. What's worse is that on restart it would apply the new resources spec and thus it may oversubscribe the node.

I'm not sure if there's a good way to mitigate this without ugly stop-gap apiserver hacks to check node version. Is stretching out alpha to n+2 a reasonable approach?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ehashman I've updated the KEPs to target 1.24 for beta. This way we will not have to worry about the above issue in n-2 kubelet versions surprising anyone as alpha has feature-gate default-off.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case feature would not work as expected. Enabling the feature-flag will allow apiserver to accept PATCH to PodSpec.Containers[].Resources field, but kubelet will interpret it as a change to the container definition and restart the container. What's worse is that on restart it would apply the new resources spec and thus it may oversubscribe the node.

I'm not sure if there's a good way to mitigate this without ugly stop-gap apiserver hacks to check node version. Is stretching out alpha to n+2 a reasonable approach?

This is why we ask these questions, to catch them before we've implemented it without a strategy :)

I don't think we should delay beta, but this would delay GA, since we'd need to ensure that the feature gate was on in kubelets before removing it in the API server.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch indeed! I believe we turn the feature gate on in beta. So, is a surprise such as this OK in beta?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It shouldn't be a surprise, but rather, a documented failure mode.

@ehashman
Copy link
Member

Please update the writeup for version skew strategy documenting possible failure modes, and this is good to go from a PRR perspective.

@vinaykul
Copy link
Member Author

Please update the writeup for version skew strategy documenting possible failure modes, and this is good to go from a PRR perspective.

Updated the version skew capturing the above issue. I noticed some features are beta false. We are good if we go beta true at n+2.

Copy link
Member

@ehashman ehashman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

I think PRR is good to go for alpha.

@derekwaynecarr @dchen1107 can you please give this a final approval?

@dchen1107
Copy link
Member

I requested @Random-Liu a last minute review on CRI-related extension last Tuesday. Through the offline discussion, he didn't have much concern since both containerd and runc are doing checkpoint already. But both of us can imagine some state reconciliation during containerd restart to avoid the potential race condition. We agreed those details can be addressed during the implementation phase. I am approving this KEP.

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 13, 2021
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dchen1107, ehashman, vinaykul

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 13, 2021
@k8s-ci-robot k8s-ci-robot merged commit 68fc272 into kubernetes:master May 13, 2021
@k8s-ci-robot k8s-ci-robot added this to the v1.22 milestone May 13, 2021
allocations will see values that may not represent actual allocations. As a
mitigation, this change needs to be documented and highlighted in the
release notes, and in top-level Kubernetes documents.
1. Resizing memory lower: Lowering cgroup memory limits may not work as pages
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

memory qos may help on v2 depending on a threshold we apply for memory.high.

see: #2571

@derekwaynecarr
Copy link
Member

agree with @dchen1107 and see no major issues for proceeding.

@carlisia
Copy link

Hello @vinaykul 👋, 1.22 Docs Shadow here.
This enhancement is marked as ‘Needs Docs’ for the 1.22 release.

Please follow the steps detailed in the documentation to open a PR against dev-1.22 branch in the k/website repo. This PR can be just a placeholder at this time and must be created before Fri July 9, 11:59 PM PDT.

Also, take a look at Documenting for a release to familiarize yourself with the docs requirement for the release.

Thank you! 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. sig/node Categorizes an issue or PR as relevant to SIG Node. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants