-
Notifications
You must be signed in to change notification settings - Fork 775
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(statefulset): fix maxUnavailable for rolling upgrades not taking … #1480
Conversation
Welcome @Yesphet! It looks like this is your first PR to openkruise/kruise 🎉 |
9f7882e
to
de224fa
Compare
a322891
to
f5997f3
Compare
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## master #1480 +/- ##
==========================================
+ Coverage 47.66% 48.41% +0.75%
==========================================
Files 157 157
Lines 22427 22460 +33
==========================================
+ Hits 10689 10875 +186
+ Misses 10570 10391 -179
- Partials 1168 1194 +26
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
f5997f3
to
d4cecdd
Compare
sorry, guys are fully occupied these days, i will take a look at this patch. |
…into account pods that fail later in the updateIndexes. Signed-off-by: Yesphet <mildtheorem@gmail.com>
d4cecdd
to
46e4e41
Compare
/lgtm |
@Yesphet good. |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: zmberg The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@furykerry @zmberg thanks guys~ |
…into account pods that fail later in the updateIndexes. (openkruise#1480)
…into account pods that fail later in the updateIndexes. (#1480) Signed-off-by: Yesphet <mildtheorem@gmail.com>
…into account pods that fail later in the updateIndexes
0. Backgroud
Advanced statefulset provides MaxUnavailable guarantee that the number of unavailable pods during the update cannot exceed this value. However, in the current implementation code, there is no consideration for unavailable pods with smaller ordinal or later in the update order. For example, for an asts with maxUnavailable= 1 and replicas=3, if the asts updates while pod-0 is in a crash, it will still trigger a rolling upgrade, and 2 pods will be unavailable because pod-2 is updated and restarted.
It can be easily reproduced in this way:
I have tested k8s official statefulset with MaxUnavailable feature gate enabled, it can still guarantee that the number of unavailable pods does not exceed in this scenario.
Ⅰ. Describe what this PR does
So in this PR, I separate out the update pods logics to a new function
rollingUpdateStatefulsetPods
, counts all unavailable pods before update pods.Ⅱ. Does this pull request fix one issue?
no
Ⅲ. Describe how to verify it
refer to part 0
Ⅳ. Special notes for reviews
In k8s official statefulsets controller, if the unavailable pods count reach the MaxUnavailable limit, the rolling update won't make progress even if the first pod in the update sequence is unavailable. But in my implementation, rolling updates are still process in this case because update one unavailable pod does not increase the total number of unavailable Pods.