Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support in-place update for pod #7000

Merged
merged 122 commits into from
Apr 10, 2024
Merged

feat: support in-place update for pod #7000

merged 122 commits into from
Apr 10, 2024

Conversation

free6om
Copy link
Contributor

@free6om free6om commented Apr 9, 2024

Background

In StatefulSet, Deployment, and DaemonSet, any changes to the PodTemplate fields result in Pod recreation, which is not ideal for applications with high availability requirements, such as databases.
In Kubernetes versions below 1.27, the Pod API supports in-place updates for certain fields. Starting from version 1.27, the Resources field also supports in-place updates.
On one hand, based on feedback from the community and our customers, there is a strong demand for in-place updates. On the other hand, to meet the community's demand for in-place updates, several open-source or self-developed projects have emerged, such as Openkruise and the self-developed Resources In-Place Update by Kuaishou, as well as our own solution in the KubeBlocks commercial version.
The goal of this solution is to enable RSM to support the in-place update fields supported by the Pod API. Specifically:

  • Starting from Kubernetes version 1.20, support in-place updates for labels, annotations, spec.containers[*].image, spec.initContainers[*].image, spec.activeDeadlineSeconds, and spec.tolerations.
  • Starting from Kubernetes version 1.27 (with InPlacePodVerticalScaling enabled), support in-place updates for cpu and memory in spec.containers[*].resources.
  • Support the IgnorePodVerticalScaling feature gate to adapt to self-developed solutions like InPlacePodVerticalScaling by Kuaishou and KubeBlocks commercial.

Implementation

API

This solution does not involve any API changes.

RSM

  • RSM adds the ability to get the Kubernetes version to determine which fields support in-place updates.
  • RSM updates the pod template revision generation algorithm to filter out the fields that support in-place updates, i.e., the fields that support in-place updates are not included in the revision calculation.
  • RSM supports the IgnorePodVerticalScaling switch. When enabled, if there are changes in cpu and memory in Resources, RSM will ignore them.
    The issue with this design is that the consistency between RSM Spec and Pod Spec cannot be guaranteed. When subsequent Pod recreation occurs, Resources will be set to the old values. This issue needs to be resolved by the user, such as updating both RSM Spec and Pod Spec simultaneously.

Testing

After modifying the fields that support in-place updates, the Pod objects are ultimately updated without recreation (i.e., the UID remains unchanged).

fixed #6910

Update 2024/8/7

To resolve 'unknown revision v0.0.0' error caused by dependency of "k8s.io/apiserver/pkg/util/feature", manual replaces are put into the go.mod. Check this for more info.

@free6om free6om added this to the Release 0.9.0 milestone Apr 9, 2024
@free6om free6om self-assigned this Apr 9, 2024
@github-actions github-actions bot added the size/XXL Denotes a PR that changes 1000+ lines. label Apr 9, 2024
Copy link

codecov bot commented Apr 9, 2024

Codecov Report

Attention: Patch coverage is 63.08540% with 134 lines in your changes are missing coverage. Please review.

Project coverage is 65.79%. Comparing base (f96f464) to head (9852672).

Files Patch % Lines
pkg/controller/rsm2/in_place_update_util.go 72.95% 45 Missing and 21 partials ⚠️
pkg/controller/rsm2/revision_util.go 37.93% 12 Missing and 6 partials ⚠️
pkg/controller/rsm2/reconciler_update.go 29.16% 13 Missing and 4 partials ⚠️
pkg/controller/rsm/update_plan.go 45.45% 10 Missing and 2 partials ⚠️
pkg/controller/rsm2/instance_util.go 59.09% 5 Missing and 4 partials ⚠️
pkg/controller/builder/builder_pod.go 0.00% 8 Missing ⚠️
pkg/controller/rsm2/reconciler_status.go 57.14% 2 Missing and 1 partial ⚠️
controllers/apps/systemaccount_util.go 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #7000      +/-   ##
==========================================
+ Coverage   65.64%   65.79%   +0.14%     
==========================================
  Files         340      340              
  Lines       41402    41678     +276     
==========================================
+ Hits        27180    27421     +241     
- Misses      11876    11905      +29     
- Partials     2346     2352       +6     
Flag Coverage Δ
unittests 65.79% <63.08%> (+0.14%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

for _, container := range src.Spec.InitContainers {
for i, c := range dst.Spec.InitContainers {
if container.Name == c.Name {
dst.Spec.InitContainers[i].Image = container.Image
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a break?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed: 382925a

dst.Spec.ActiveDeadlineSeconds = src.Spec.ActiveDeadlineSeconds
mergeList(&src.Spec.Tolerations, &dst.Spec.Tolerations, func(item corev1.Toleration) func(corev1.Toleration) bool {
return func(t corev1.Toleration) bool {
return false
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does it directly return false here? If the src Tolerations is not empty, won't the dst keep growing? And there's no deduplication.

Copy link
Contributor Author

@free6om free6om Apr 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tolerations is append-only according to the description of the K8s docs.
but the append logic here is a problem, should be a replacement.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed: 054f0ac

mergeList(&template.Tolerations, &templateExt.Spec.Tolerations,
func(item corev1.Toleration) func(corev1.Toleration) bool {
return func(t corev1.Toleration) bool {
return false
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed: 9852672

if err != nil {
return false
}
if semver.Compare(kubeVersion, "v1.29") >= 0 {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in place update is available since 1.27 (alpha), if there aren't two much differences(for this part) between 1.27 and 1.29, it can be set to 1.27

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should do a InPlacePodVerticalScaling feature gate test in versions less than 1.29, as line 57 did.

@free6om free6om merged commit 0ba22d5 into main Apr 10, 2024
58 checks passed
@free6om free6om deleted the support/in-place-update branch April 10, 2024 10:39
@free6om
Copy link
Contributor Author

free6om commented Apr 10, 2024

/cherry-pick release-0.9

Copy link

🤖 says: Error cherry-picking.

Auto-merging go.mod
CONFLICT (content): Merge conflict in go.mod
Auto-merging go.sum
CONFLICT (content): Merge conflict in go.sum
error: could not apply 0ba22d5... feat: support in-place update for pod (#7000)
hint: After resolving the conflicts, mark them with
hint: "git add/rm ", then run
hint: "git cherry-pick --continue".
hint: You can instead skip this commit with "git cherry-pick --skip".
hint: To abort and get back to the state before "git cherry-pick",
hint: run "git cherry-pick --abort".

Copy link

🤖 says: ‼️ cherry pick action failed.
See: https://github.com/apecloud/kubeblocks/actions/runs/8629915427

@free6om free6om changed the title feat: support pod in-place update feat: support in-place update for pod Apr 10, 2024
@free6om free6om restored the support/in-place-update branch April 10, 2024 10:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size/XXL Denotes a PR that changes 1000+ lines.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG]All pods restart after hscale out / hscale in
5 participants