-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KEP-961: Bump Statefulset maxUnavailable to beta (built on top of #3997) #4474
KEP-961: Bump Statefulset maxUnavailable to beta (built on top of #3997) #4474
Conversation
Signed-off-by: kerthcet <kerthcet@gmail.com>
Signed-off-by: kerthcet <kerthcet@gmail.com>
Signed-off-by: kerthcet <kerthcet@gmail.com>
Signed-off-by: kerthcet <kerthcet@gmail.com>
Signed-off-by: kerthcet <kerthcet@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left several comments
|
||
Controller tests in `pkg/controller/statefulset` cover all cases. Since pod statuses | ||
are faked in integration tests, simulating rolling update is difficult. We're opting | ||
for e2e tests instead. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Integration tests should have links to testgrid, see the template (https://github.com/kubernetes/enhancements/blob/master/keps/NNNN-kep-template/README.md).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure I follow 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need a proof that the tests are stable/not flaky. Can you go over all the added tests and confirm that? An example of this would be: https://github.com/kubernetes/enhancements/blob/master/keps/sig-apps/3335-statefulset-slice/README.md#integration-tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do, thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So those tests (and actually, before that, the fixes) need to be merged before we merge the KEP I assume?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The KEP is just an indictation of the plan for beta and not actually transitioning to it. It should be enough to add only the tests that are available now and add the upcoming tests once they are merged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But we don't currently have relevant integration tests that would have reports to add here, do we?
Thanks @knelasevero |
will expect. Implementing Choice 4 using PMP would be the easiest. | ||
|
||
#### Implementation | ||
|
||
The alpha release we are going with Choice 4 with support for both PMP=Parallel and PMP=OrderedReady. | ||
For PMP=Parallel, we will use Choice 2 | ||
For PMP=OrderedReady, we will use Choice 3 to ensure we can support ordering guarantees while also | ||
For PMP=OrderedReady, we will use Choice 3 to ensure we can support ordering guarantees while also |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to me we are using Choice one with PMP=OrderedReady
in our implementation.
this part does not happen today as described in choice 3
At this time both 2 and 3 are terminating.
Can you please double check the choices and see if they are up to date?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
Also please update if we're continuing with the choice for Beta or are we changing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went over it and fixed the text. Also I prefer the simpler route here and not change expected behavior. If behavior would change I would keep this in alpha for a couple more releases before coming back to promotion again.
I added the fix of the newly found bug in the implementation description
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not see big drawbacks using the simpler choice 1 instead of choice 3, but it might be a good idea to discuss this with a sig.
out of order Terminations of pods. | ||
2. Pods with ordinal 4 and 3 will start Terminating at the same time(because of maxUnavailable). When any of 4 or 3 are running and ready, pods with ordinal 2 will start Terminating. This could violate | ||
ordering guarantees, since if 3 is running and ready, then both 4 and 2 are terminating at the same | ||
2. Pods with ordinal 4 and 3 will start Terminating at the same time(because of maxUnavailable). When any of 4 or 3 are running and ready, pods with ordinal 2 will start Terminating. This could violate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ouch. I think this should be running and available
. It seems to me that we should consider minReadySeconds in PMP=Parallel. We do not care about available when burst starting for the first time, but we should care about this when scaling down. I am able to bring down the available replicas to 1 for the following statefulset.
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: nginx-roll
spec:
replicas: 5
minReadySeconds: 20
podManagementPolicy: Parallel
updateStrategy:
type: RollingUpdate
rollingUpdate:
partition: 1
maxUnavailable: 2
selector:
matchLabels:
app: nginx-roll
template:
metadata:
labels:
app: nginx-roll
spec:
containers:
- name: nginx
image: ghcr.io/nginxinc/nginx-unprivileged:latest
ports:
- containerPort: 80
name: web
yup; there is a bug for that already: kubernetes/kubernetes#112307
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
blocker for beta^
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for finding this. I added mentions to this in the implementation part and in the e2e as a requirement. I changed all mentions to running and ready
to running and available
in the text
/assign as PRR shadow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a few references to a metric that isn't introduced anywhere else and seems to have been rejected in a previous revision of this along with some other suggestions/questions from other reviewers. I also think that this would be challenging to actively monitor on large clusters and/or across fleets.
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: knelasevero The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
will expect. Implementing Choice 4 using PMP would be the easiest. | ||
|
||
#### Implementation | ||
|
||
The alpha release we are going with Choice 4 with support for both PMP=Parallel and PMP=OrderedReady. | ||
For PMP=Parallel, we will use Choice 2 | ||
For PMP=OrderedReady, we will use Choice 3 to ensure we can support ordering guarantees while also | ||
For PMP=OrderedReady, we will use Choice 3 to ensure we can support ordering guarantees while also |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
Also please update if we're continuing with the choice for Beta or are we changing something?
|
||
Metric Name: statefulset_unavailability_violation | ||
|
||
Description: This metric counts the number of times a StatefulSet exceeds its maxUnavailable threshold during a rolling update. This metric increases whenever a StatefulSet is processed and it's observed that spec.replicas - status.readyReplicas > maxUnavailable. The metric is labeled with namespace, statefulset, and a reason label, where reason could be exceededMaxUnavailable. This provides a clear indication of which StatefulSets are not complying with the defined maxUnavailable constraint, allowing for quick identification and remediation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't necessary agree here, because of two reasons
(1) not everyone is using kube-state-metrics, so depending on that is not ideal
(2) I woud expect changes to maxUnavailable in the middle of the rolllout and things like that a real corner case - and from the perspective a proposed metric actually gives us visibility into whether this feature works.
As I mentioned above, we don't want name/namespace to be on the metric though - to just have a high-level monitoring of the feature.
Hey, just so you prioritize accordingly, #961 (comment) This in the end won't make 1.30, if you want to go through other PRs first that will fit in |
|
||
- kube_statefulset_spec_strategy_rollingupdate_max_unavailable: This metric reflects the configured maxUnavailable value for StatefulSets. Significant deviations from expected values during updates, or if the actual unavailable pod count consistently exceeds this configuration, may indicate misconfigurations or issues with feature behavior. | ||
- StatefulSet Update Progress: Observing the stability and speed of StatefulSet rollouts relative to the maxUnavailable setting. Unusually slow updates or increased pod unavailability beyond the configured threshold could signal problems. | ||
- Cluster Stability Metrics: Key indicators of cluster health, such as increased error rates in control plane components or higher pod restart rates, post-feature adoption, can also guide the decision to rollback. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the last two are not actionable, can we mention specific metrics that should be observed? E.g. what kube-state-metrics and statefulset queue metrics?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added the unresolved block to get back to this after the metric discussion
|
||
Administrators should monitor the following metrics to assess the need for a rollback: | ||
|
||
- kube_statefulset_spec_strategy_rollingupdate_max_unavailable: This metric reflects the configured maxUnavailable value for StatefulSets. Significant deviations from expected values during updates, or if the actual unavailable pod count consistently exceeds this configuration, may indicate misconfigurations or issues with feature behavior. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could be more concrete and mention the available pod count metric and replicas metric here as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you mean metrics like kube_statefulset_status_replicas, kube_statefulset_status_replicas_available, kube_statefulset_status_replicas_updated in ksm?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, also note that kube_statefulset_replicas
is better than kube_statefulset_status_replicas
for this
Signed-off-by: Lucas Severo Alves <lseveroa@redhat.com>
04243ba
to
63eeffb
Compare
The Kubernetes project currently lacks enough contributors to adequately respond to all PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
The Kubernetes project currently lacks enough active contributors to adequately respond to all PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /close |
@k8s-triage-robot: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Supersedes #3997
One-line PR description: Bump statefulset maxUnavailable to beta
Issue link: #961
Other comments:
Previous commits: kept Kante's commits here
My commits: Went over review notes not addressed and changed a few things related to proposed metric and some descriptions.