fix(statefulset): fix maxUnavailable for rolling upgrades not taking … #1480

Yesphet · 2023-12-29T15:25:59Z

…into account pods that fail later in the updateIndexes

0. Backgroud

Advanced statefulset provides MaxUnavailable guarantee that the number of unavailable pods during the update cannot exceed this value. However, in the current implementation code, there is no consideration for unavailable pods with smaller ordinal or later in the update order. For example, for an asts with maxUnavailable= 1 and replicas=3, if the asts updates while pod-0 is in a crash, it will still trigger a rolling upgrade, and 2 pods will be unavailable because pod-2 is updated and restarted.

It can be easily reproduced in this way：

create a asts

apiVersion: apps.kruise.io/v1beta1
kind: StatefulSet
metadata:
  name: test
spec:
  selector:
    matchLabels:
      app: busybox-test
  replicas: 3
  podManagementPolicy: Parallel
  updateStrategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 1
  template:
    metadata:
      labels:
        app: busybox-test
    spec:
      containers:
      - name: busybox
        image: busybox
        env:
        - name: MY_POD_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        - name: Restart
          value: "2"
        command: ["sh","-c"]
        args:
        - |-
          [ "$MY_POD_NAME" == "test-0" ] && exit 1;tail -f /dev/null

List pods, test-0 is in crash

Update an env of the container in asts to trigger rolling update

kubectl patch asts test --type='json' -p='[{"op": "replace", "path": "/spec/template/spec/containers/0/env/1/value", "value":"3"}]'

We can see that test-2 is deleted which breaking the maxUnavailable guarantee

I have tested k8s official statefulset with MaxUnavailable feature gate enabled, it can still guarantee that the number of unavailable pods does not exceed in this scenario.

Ⅰ. Describe what this PR does

So in this PR, I separate out the update pods logics to a new function rollingUpdateStatefulsetPods, counts all unavailable pods before update pods.

Ⅱ. Does this pull request fix one issue?

no

Ⅲ. Describe how to verify it

refer to part 0

Ⅳ. Special notes for reviews

In k8s official statefulsets controller, if the unavailable pods count reach the MaxUnavailable limit, the rolling update won't make progress even if the first pod in the update sequence is unavailable. But in my implementation, rolling updates are still process in this case because update one unavailable pod does not increase the total number of unavailable Pods.

kruise-bot · 2023-12-29T15:26:15Z

Welcome @Yesphet! It looks like this is your first PR to openkruise/kruise 🎉

codecov · 2024-01-02T14:15:32Z

Codecov Report

Attention: 23 lines in your changes are missing coverage. Please review.

Comparison is base (fa7a1da) 47.66% compared to head (46e4e41) 48.41%.
Report is 4 commits behind head on master.

Files	Patch %	Lines
pkg/controller/statefulset/stateful_set_control.go	66.66%	17 Missing and 6 partials ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #1480      +/-   ##
==========================================
+ Coverage   47.66%   48.41%   +0.75%     
==========================================
  Files         157      157              
  Lines       22427    22460      +33     
==========================================
+ Hits        10689    10875     +186     
+ Misses      10570    10391     -179     
- Partials     1168     1194      +26

Flag	Coverage Δ
unittests	`48.41% <66.66%> (+0.75%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Yesphet · 2024-01-10T02:42:11Z

@veophi @Fei-Guo Could you please help to review this PR when you are free?

Yesphet · 2024-01-19T02:52:42Z

@veophi @Fei-Guo @zmberg It's been two weeks, and I still haven't received any feedback on this PR. I understand that everyone is busy, I just want to ensure this PR hasn't been overlooked. Your time and attention to it would be much appreciated ~

furykerry · 2024-01-22T01:55:28Z

sorry, guys are fully occupied these days, i will take a look at this patch.

pkg/controller/statefulset/stateful_set_control.go

…into account pods that fail later in the updateIndexes. Signed-off-by: Yesphet <mildtheorem@gmail.com>

furykerry · 2024-01-23T07:45:50Z

/lgtm

zmberg · 2024-01-23T09:32:40Z

@Yesphet good.

zmberg · 2024-01-23T09:34:38Z

/approve

kruise-bot · 2024-01-23T09:34:48Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: zmberg

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [zmberg]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Yesphet · 2024-01-23T10:59:17Z

@furykerry @zmberg thanks guys~

…into account pods that fail later in the updateIndexes. (openkruise#1480)

…into account pods that fail later in the updateIndexes. (#1480) Signed-off-by: Yesphet <mildtheorem@gmail.com>

kruise-bot requested review from Fei-Guo and veophi December 29, 2023 15:26

kruise-bot added the size/L size/L: 100-499 label Dec 29, 2023

Yesphet force-pushed the fix/asts_maxunavailable branch from 9f7882e to de224fa Compare December 30, 2023 04:02

Yesphet changed the title ~~fix(statefulset): fix maxUnavailable for rolling upgrades not taking …~~ [WIP] fix(statefulset): fix maxUnavailable for rolling upgrades not taking … Dec 30, 2023

kruise-bot added the do-not-merge/work-in-progress label Dec 30, 2023

Yesphet force-pushed the fix/asts_maxunavailable branch 2 times, most recently from a322891 to f5997f3 Compare January 2, 2024 14:06

Yesphet changed the title ~~[WIP] fix(statefulset): fix maxUnavailable for rolling upgrades not taking …~~ fix(statefulset): fix maxUnavailable for rolling upgrades not taking … Jan 2, 2024

kruise-bot removed the do-not-merge/work-in-progress label Jan 2, 2024

Yesphet force-pushed the fix/asts_maxunavailable branch from f5997f3 to d4cecdd Compare January 4, 2024 07:29

furykerry reviewed Jan 22, 2024

View reviewed changes

pkg/controller/statefulset/stateful_set_control.go Show resolved Hide resolved

pkg/controller/statefulset/stateful_set_control.go Show resolved Hide resolved

fix(statefulset): fix maxUnavailable for rolling upgrades not taking …

46e4e41

…into account pods that fail later in the updateIndexes. Signed-off-by: Yesphet <mildtheorem@gmail.com>

Yesphet force-pushed the fix/asts_maxunavailable branch from d4cecdd to 46e4e41 Compare January 23, 2024 03:27

kruise-bot assigned furykerry Jan 23, 2024

kruise-bot added the lgtm label Jan 23, 2024

kruise-bot added the approved label Jan 23, 2024

kruise-bot merged commit 8af135d into openkruise:master Jan 23, 2024
27 checks passed

zmberg added this to the 1.6 milestone Mar 7, 2024

ppbits pushed a commit to ppbits/kruise that referenced this pull request Apr 4, 2024

fix(statefulset): fix maxUnavailable for rolling upgrades not taking …

18a2f36

…into account pods that fail later in the updateIndexes. (openkruise#1480)

ABNER-1 pushed a commit that referenced this pull request Sep 19, 2024

fix(statefulset): fix maxUnavailable for rolling upgrades not taking …

a67947e

…into account pods that fail later in the updateIndexes. (#1480) Signed-off-by: Yesphet <mildtheorem@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(statefulset): fix maxUnavailable for rolling upgrades not taking … #1480

fix(statefulset): fix maxUnavailable for rolling upgrades not taking … #1480

Yesphet commented Dec 29, 2023 •

edited

Loading

kruise-bot commented Dec 29, 2023

codecov bot commented Jan 2, 2024 •

edited

Loading

Yesphet commented Jan 10, 2024

Yesphet commented Jan 19, 2024

furykerry commented Jan 22, 2024

furykerry commented Jan 23, 2024

zmberg commented Jan 23, 2024

zmberg commented Jan 23, 2024

kruise-bot commented Jan 23, 2024

Yesphet commented Jan 23, 2024

fix(statefulset): fix maxUnavailable for rolling upgrades not taking … #1480

fix(statefulset): fix maxUnavailable for rolling upgrades not taking … #1480

Conversation

Yesphet commented Dec 29, 2023 • edited Loading

0. Backgroud

Ⅰ. Describe what this PR does

Ⅱ. Does this pull request fix one issue?

Ⅲ. Describe how to verify it

Ⅳ. Special notes for reviews

kruise-bot commented Dec 29, 2023

codecov bot commented Jan 2, 2024 • edited Loading

Codecov Report

Yesphet commented Jan 10, 2024

Yesphet commented Jan 19, 2024

furykerry commented Jan 22, 2024

furykerry commented Jan 23, 2024

zmberg commented Jan 23, 2024

zmberg commented Jan 23, 2024

kruise-bot commented Jan 23, 2024

Yesphet commented Jan 23, 2024

Yesphet commented Dec 29, 2023 •

edited

Loading

codecov bot commented Jan 2, 2024 •

edited

Loading