OCPBUGS-12210: Prevent partially filled HPA behaviors from crashing kube-controller-manager #1876

jkyros · 2024-01-31T16:55:38Z

The short version here is that:

If you supply partial HPA behaviors (e.g. ScaleUp but not ScaleDown, etc ) in kube < 1.27, it will send the kube-controller-manager into CrashLoopBackOff
This is fixed in kube 1.27+ by defaulting to autoscaling v2: Autoscaling: advance v2 as the preferred API version over v1 kubernetes/kubernetes#114358 but we can't backport that type of change
So, since we're storing as v1 but consuming as v2 in the controller, we need to make sure that the behaviors aren't nil in the v2 object when someone creates or edits a v1 object to have partially filled behaviors

This PR:

~~Defauts any nil behaviors when converting from v1 -> internal~~
Makes the controller fill in missing behavior with defaults
Removes the "defaulter cheating" in the unit test that was masking the crash
Adds a test case to verify that it works
Is targeted straight to 4.13 because it's useless after that (if it were preexisting carry, it would have been dropped in 4.14)

Updstream details:

I did inquire upstream but we're already outside the n-3 supported versions and the fix is useless post 1.27, so the "juice wasn't worth the squeeze". We still have at least one customer that needs this fixed so we'd just need to get this into 4.13 and 4.12 since they're still supported.

Here is a straightforward crasher ( you might have to wait a little bit until the HPA touches it, but you should be able to see kube-controller-manager pods go into CrashLoopBackoff):

apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
  name: crasher
  namespace: test 
  labels:
    app: test
  annotations:
    autoscaling.alpha.kubernetes.io/behavior: '{"ScaleDown":{"StabilizationWindowSeconds":600,"SelectPolicy":"Max","Policies":[{"type":"Pods","value":1,"periodSeconds":1}]}}'
spec:
  scaleTargetRef:
    kind: Deployment
    name: test
    apiVersion: apps/v1
  minReplicas: 8
  maxReplicas: 25
  targetCPUUtilizationPercentage: 120
---
kind: Deployment
apiVersion: apps/v1
metadata:
  name: test
  namespace: test
  labels:
    app: test
spec:
  replicas: 2
  selector:
    matchLabels:
      app: test
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: test
    spec:
      containers:
        - resources:
            limits:
              cpu: 500m
              memory: 128Mi
            requests:
              cpu: 25m
              memory: 128Mi
          readinessProbe:
            httpGet:
              path: /
              port: 8080
              scheme: HTTP
            timeoutSeconds: 1
            periodSeconds: 10
            successThreshold: 1
            failureThreshold: 3
          terminationMessagePath: /dev/termination-log
          name: nginx
          livenessProbe:
            httpGet:
              path: /
              port: 8080
              scheme: HTTP
            timeoutSeconds: 1
            periodSeconds: 10
            successThreshold: 1
            failureThreshold: 3
          ports:
            - containerPort: 8080
              protocol: TCP
          imagePullPolicy: Always
          image: 'nginxinc/nginx-unprivileged:latest'
      restartPolicy: Always
      terminationGracePeriodSeconds: 30
      dnsPolicy: ClusterFirst
      securityContext: {}
      schedulerName: default-scheduler
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 25%
      maxSurge: 25%
  revisionHistoryLimit: 10
  progressDeadlineSeconds: 600

Fixes: OCPBUGS-12210

openshift-ci-robot · 2024-01-31T16:55:45Z

@jkyros: This pull request references Jira Issue OCPBUGS-12210, which is invalid:

expected the bug to target the "4.13.z" version, but no target version was set
expected Jira Issue OCPBUGS-12210 to depend on a bug targeting a version in 4.14.0, 4.14.z and in one of the following states: VERIFIED, RELEASE PENDING, CLOSED (ERRATA), CLOSED (CURRENT RELEASE), CLOSED (DONE), CLOSED (DONE-ERRATA), but no dependents were found

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

The short version here is that:

If you supply partial HPA behaviors (e.g. ScaleUp but not ScaleDown, etc ) in kube < 1.27, it will send the kube-controller-manager into CrashLoopBackOff

This is fixed in kube 1.27+ by defaulting to autoscaling v2: Autoscaling: advance v2 as the preferred API version over v1 kubernetes/kubernetes#114358 but we can't backport that type of change

So, since we're storing as v1 but consuming as v2 in the controller, we need to make sure that the behaviors aren't nil in the v2 object when someone creates or edits a v1 object to have partially filled behaviors

This PR:

Defauts any nil behaviors when converting from v1 -> internal

Adds a test case to verify that it works

Is targeted straight to 4.13 because it's useless after that (if it were preexisting carry, it would have been dropped in 4.14)

Updstream details:

I did inquire upstream but we're already outside the n-3 supported versions and the fix is useless post 1.27, so the "juice wasn't worth the squeeze". We still have at least one customer that needs this fixed so we'd just need to get this into 4.13 and 4.12 since they're still supported.

Here is a straightforward crasher ( you might have to wait a little bit until the HPA touches it, but you should be able to see kube-controller-manager pods go into CrashLoopBackoff):
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
 name: crasher
 namespace: test 
 labels:
   app: test
 annotations:
   autoscaling.alpha.kubernetes.io/behavior: '{"ScaleDown":{"StabilizationWindowSeconds":600,"SelectPolicy":"Max","Policies":[{"type":"Pods","value":1,"periodSeconds":1}]}}'
spec:
 scaleTargetRef:
   kind: Deployment
   name: test
   apiVersion: apps/v1
 minReplicas: 8
 maxReplicas: 25
 targetCPUUtilizationPercentage: 120
---
kind: Deployment
apiVersion: apps/v1
metadata:
 name: test
 namespace: test
 labels:
   app: test
spec:
 replicas: 2
 selector:
   matchLabels:
     app: test
 template:
   metadata:
     creationTimestamp: null
     labels:
       app: test
   spec:
     containers:
       - resources:
           limits:
             cpu: 500m
             memory: 128Mi
           requests:
             cpu: 25m
             memory: 128Mi
         readinessProbe:
           httpGet:
             path: /
             port: 8080
             scheme: HTTP
           timeoutSeconds: 1
           periodSeconds: 10
           successThreshold: 1
           failureThreshold: 3
         terminationMessagePath: /dev/termination-log
         name: nginx
         livenessProbe:
           httpGet:
             path: /
             port: 8080
             scheme: HTTP
           timeoutSeconds: 1
           periodSeconds: 10
           successThreshold: 1
           failureThreshold: 3
         ports:
           - containerPort: 8080
             protocol: TCP
         imagePullPolicy: Always
         image: 'nginxinc/nginx-unprivileged:latest'
     restartPolicy: Always
     terminationGracePeriodSeconds: 30
     dnsPolicy: ClusterFirst
     securityContext: {}
     schedulerName: default-scheduler
 strategy:
   type: RollingUpdate
   rollingUpdate:
     maxUnavailable: 25%
     maxSurge: 25%
 revisionHistoryLimit: 10
 progressDeadlineSeconds: 600
Fixes: OCPBUGS-12210

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot · 2024-01-31T16:55:50Z

@jkyros: the contents of this pull request could be automatically validated.

The following commits are valid:

a98f50a|UPSTREAM: 11358: Prevent partially-filled HPA behaviors: the upstream PR kubernetes/kubernetes#11358 has merged

Comment /validate-backports to re-evaluate validity of the upstream PRs, for example when they are merged upstream.

jkyros · 2024-01-31T16:57:55Z

/jira refresh

openshift-ci-robot · 2024-01-31T16:57:58Z

@jkyros: This pull request references Jira Issue OCPBUGS-12210, which is invalid:

expected Jira Issue OCPBUGS-12210 to depend on a bug targeting a version in 4.14.0, 4.14.z and in one of the following states: VERIFIED, RELEASE PENDING, CLOSED (ERRATA), CLOSED (CURRENT RELEASE), CLOSED (DONE), CLOSED (DONE-ERRATA), but no dependents were found

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

joelsmith · 2024-01-31T18:03:58Z

/lgtm

aravindhp · 2024-01-31T18:04:29Z

/lgtm

I am fine with this approach

deads2k · 2024-01-31T19:19:21Z

the referenced upstream doesn't exist: 11358

openshift-ci-robot · 2024-01-31T19:38:23Z

@jkyros: the contents of this pull request could be automatically validated.

The following commits are valid:

0ef37f6|UPSTREAM: 114358: Prevent partially-filled HPA behaviors: the upstream PR kubernetes/kubernetes#114358 has merged

Comment /validate-backports to re-evaluate validity of the upstream PRs, for example when they are merged upstream.

jkyros · 2024-01-31T19:38:58Z

the referenced upstream doesn't exist: 11358

Fixed, I'm a clown, I left out the 4, should have been 114358: kubernetes#114358

deads2k · 2024-01-31T20:05:10Z

pkg/apis/autoscaling/v1/conversion.go

@@ -449,6 +450,20 @@ func Convert_v1_HorizontalPodAutoscaler_To_autoscaling_HorizontalPodAutoscaler(i
 	// drop round-tripping annotations after converting to internal
 	out.Annotations, _ = autoscaling.DropRoundTripHorizontalPodAutoscalerAnnotations(out.Annotations)

+	// Until kube 1.27 we're still storing autoscaling as v1, but the HPA controller is consuming it as v2. Behaviors are an annotation in v1 and can be partially


what does this do to the annotations in question in a flow like

write object to API/etcd, scaleup gets set

read object as v1, do annotations make sense?

They get converted back out properly, but I assume as part out of the round-trip to the "internal" version when the controller updates it as v2 (I didn't think it round tripped v1 -> internal -> v1 on the way in on initial creation, but maybe I'm wrong and it does).

e.g. if I omit ScaleUp:

cat << EOF | oc create -f - apiVersion: autoscaling/v1 kind: HorizontalPodAutoscaler metadata: name: crasher namespace: test labels: app: test annotations: autoscaling.alpha.kubernetes.io/behavior: '{"ScaleDown":{"StabilizationWindowSeconds":600,"SelectPolicy":"Max","Policies":[{"type":"Pods","value":1,"periodSeconds":1}]}}' spec: scaleTargetRef: kind: Deployment name: test apiVersion: apps/v1 minReplicas: 8 maxReplicas: 25 targetCPUUtilizationPercentage: 120 EOF

When I ask for it back, ScaleUp in the autoscaling.alpha.kubernetes.io/behavior: annotation has been filled in by the default value.

[jkyros@jkyros-thinkpadp1gen5 ocpbugs-12210]$ oc get hpa.v1.autoscaling -n test crasher -o yaml apiVersion: autoscaling/v1 kind: HorizontalPodAutoscaler metadata: annotations: autoscaling.alpha.kubernetes.io/behavior: '{"ScaleUp":{"StabilizationWindowSeconds":0,"SelectPolicy":"Max","Policies":[{"Type":"Pods","Value":4,"PeriodSeconds":15},{"Type":"Percent","Value":100,"PeriodSeconds":15}]},"ScaleDown":{"StabilizationWindowSeconds":600,"SelectPolicy":"Max","Policies":[{"Type":"Pods","Value":1,"PeriodSeconds":1}]}}' autoscaling.alpha.kubernetes.io/conditions: '[{"type":"AbleToScale","status":"True","lastTransitionTime":"2024-01-31T20:54:22Z","reason":"ScaleDownStabilized","message":"recent recommendations were higher than current one, applying the highest recent recommendation"},{"type":"ScalingActive","status":"True","lastTransitionTime":"2024-01-31T20:54:22Z","reason":"ValidMetricFound","message":"the HPA was able to successfully calculate a replica count from cpu resource utilization (percentage of request)"},{"type":"ScalingLimited","status":"False","lastTransitionTime":"2024-01-31T20:54:22Z","reason":"DesiredWithinRange","message":"the desired count is within the acceptable range"}]' autoscaling.alpha.kubernetes.io/current-metrics: '[{"type":"Resource","resource":{"name":"cpu","currentAverageUtilization":0,"currentAverageValue":"0"}}]' converted.jkyros.io: autoscaling -> v1 creationTimestamp: "2024-01-31T20:54:20Z" labels: app: test name: crasher namespace: test resourceVersion: "92141" uid: 3b0646ec-ed98-4b2f-a989-8bc201d7a56f spec: maxReplicas: 25 minReplicas: 8 scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: test targetCPUUtilizationPercentage: 120 status: currentCPUUtilizationPercentage: 0 currentReplicas: 8 desiredReplicas: 8

TL;DR yep the converters work, the annotations make sense

deads2k · 2024-01-31T20:07:34Z

where does the crash happen? is it in conversion code? if the failure happens in the kube-controller-manager, I'd rather see a kube-controller-manager patch because the blast radius is much smaller and doesn't have any persistent effect in the cluster.

openshift-ci-robot · 2024-02-01T04:52:06Z

@jkyros: the contents of this pull request could be automatically validated.

The following commits are valid:

c302002|UPSTREAM: 114358: Default missing fields in HPA behaviors: the upstream PR kubernetes/kubernetes#114358 has merged

Comment /validate-backports to re-evaluate validity of the upstream PRs, for example when they are merged upstream.

The description in the autoscaling API for the HorizontalPodAutoscaler suggests that HorizontalPodAutoscalerSpec's Behavior field (and its ScaleUp and ScaleDown members) are optional. And that if not supplied, defaults will be used. That's true if the entire Behavior is nil because, we go through "normalizeDesiredReplicas" instead of "normalizeDesiredReplicasWithBehaviors", but if the structure is only partially supplied, leaving some members nil, it results in nil dereferences when we end up going though normalizeDesiredReplicasWithBehaviors. So we end up in a situation where: - If Behavior is entirely absent (nil) we use defaults (good) - If Behavior is partially specified we panic (very bad) - If stabilizationWindowSeconds is nil in either ScaleUp or Scaledown, we panic (also very bad) In general, this only happens with pre-v2 HPA objects because v2 does properly fill in the default values. This commit prevents the panic by using the defaulters to ensure that unpopulated fields in the behavior objects get filled in with their defaults before they are used, so they can safely be dereferenced by later code that performs calculations on them.

openshift-ci-robot · 2024-02-01T06:26:52Z

@jkyros: the contents of this pull request could be automatically validated.

The following commits are valid:

0e0f66f|UPSTREAM: 114358: Default missing fields in HPA behaviors: the upstream PR kubernetes/kubernetes#114358 has merged

Comment /validate-backports to re-evaluate validity of the upstream PRs, for example when they are merged upstream.

jkyros · 2024-02-01T06:36:14Z

Eads and I talked over slack (thanks David).

In summary, since the crash is in the controller, not the converter, we'd prefer to fix it in the controller to reduce the blast radius since this fix is only for two old versions.

I've retooled this PR such that:

The fix is in the controller now
It runs the HPA object through the behavior defaulters before we do any calculations on it (so the defaults are filled by the time it gets there)
I also added a test case for partial behaviors
I took out a "cheat" in the unit tests that was preventing us from finding the crash before now

This does come with the side effect that if the controller modifies the object, it gets written back out with the missing values filled in, but that seems better to me than refusing to scale until someone tediously fills them in?

openshift-ci-robot · 2024-02-01T06:40:31Z

@jkyros: This pull request references Jira Issue OCPBUGS-12210, which is invalid:

expected Jira Issue OCPBUGS-12210 to depend on a bug targeting a version in 4.14.0, 4.14.z and in one of the following states: VERIFIED, RELEASE PENDING, CLOSED (ERRATA), CLOSED (CURRENT RELEASE), CLOSED (DONE), CLOSED (DONE-ERRATA), but no dependents were found

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

The short version here is that:

If you supply partial HPA behaviors (e.g. ScaleUp but not ScaleDown, etc ) in kube < 1.27, it will send the kube-controller-manager into CrashLoopBackOff

This is fixed in kube 1.27+ by defaulting to autoscaling v2: Autoscaling: advance v2 as the preferred API version over v1 kubernetes/kubernetes#114358 but we can't backport that type of change

So, since we're storing as v1 but consuming as v2 in the controller, we need to make sure that the behaviors aren't nil in the v2 object when someone creates or edits a v1 object to have partially filled behaviors

This PR:

~~Defauts any nil behaviors when converting from v1 -> internal~~

Makes the controller fill in missing behavior with defaults

Removes the "defaulter cheating" in the unit test that was masking the crash

Adds a test case to verify that it works

Is targeted straight to 4.13 because it's useless after that (if it were preexisting carry, it would have been dropped in 4.14)

Updstream details:

I did inquire upstream but we're already outside the n-3 supported versions and the fix is useless post 1.27, so the "juice wasn't worth the squeeze". We still have at least one customer that needs this fixed so we'd just need to get this into 4.13 and 4.12 since they're still supported.

Here is a straightforward crasher ( you might have to wait a little bit until the HPA touches it, but you should be able to see kube-controller-manager pods go into CrashLoopBackoff):
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
 name: crasher
 namespace: test 
 labels:
   app: test
 annotations:
   autoscaling.alpha.kubernetes.io/behavior: '{"ScaleDown":{"StabilizationWindowSeconds":600,"SelectPolicy":"Max","Policies":[{"type":"Pods","value":1,"periodSeconds":1}]}}'
spec:
 scaleTargetRef:
   kind: Deployment
   name: test
   apiVersion: apps/v1
 minReplicas: 8
 maxReplicas: 25
 targetCPUUtilizationPercentage: 120
---
kind: Deployment
apiVersion: apps/v1
metadata:
 name: test
 namespace: test
 labels:
   app: test
spec:
 replicas: 2
 selector:
   matchLabels:
     app: test
 template:
   metadata:
     creationTimestamp: null
     labels:
       app: test
   spec:
     containers:
       - resources:
           limits:
             cpu: 500m
             memory: 128Mi
           requests:
             cpu: 25m
             memory: 128Mi
         readinessProbe:
           httpGet:
             path: /
             port: 8080
             scheme: HTTP
           timeoutSeconds: 1
           periodSeconds: 10
           successThreshold: 1
           failureThreshold: 3
         terminationMessagePath: /dev/termination-log
         name: nginx
         livenessProbe:
           httpGet:
             path: /
             port: 8080
             scheme: HTTP
           timeoutSeconds: 1
           periodSeconds: 10
           successThreshold: 1
           failureThreshold: 3
         ports:
           - containerPort: 8080
             protocol: TCP
         imagePullPolicy: Always
         image: 'nginxinc/nginx-unprivileged:latest'
     restartPolicy: Always
     terminationGracePeriodSeconds: 30
     dnsPolicy: ClusterFirst
     securityContext: {}
     schedulerName: default-scheduler
 strategy:
   type: RollingUpdate
   rollingUpdate:
     maxUnavailable: 25%
     maxSurge: 25%
 revisionHistoryLimit: 10
 progressDeadlineSeconds: 600
Fixes: OCPBUGS-12210

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

jkyros · 2024-02-01T15:16:08Z

/test e2e-aws-ovn-serial

deads2k · 2024-02-01T19:18:05Z

This looks like a great low-risk solution for 4.13 & 4.12. Thanks!

Agreed, thank you

/approve

openshift-ci · 2024-02-01T19:18:47Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: aravindhp, deads2k, jkyros, joelsmith

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~DOWNSTREAM_OWNERS~~ [deads2k]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

joelsmith · 2024-02-01T20:04:45Z

/label backport-risk-assessed

openshift-ci · 2024-02-01T20:04:54Z

@joelsmith: Can not set label backport-risk-assessed: Must be member in one of these teams: [openshift-staff-engineers]

In response to this:

/label backport-risk-assessed

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

aravindhp · 2024-02-01T20:06:12Z

/label backport-risk-assessed

jkyros · 2024-02-05T16:28:55Z

@weinliu I know this isn't an actual "cherry pick" since we're stuffing it straight into 4.13, and it's not in a part of the repo that we "own", but would you be able to take a look here for the cherry-pick-approved label? (Or at least sign off from the QE side so we can have someone add the label?). Thank you much!

weinliu · 2024-02-06T02:36:24Z

/label cherry-pick-approved

openshift-ci · 2024-02-06T02:36:34Z

@weinliu: Can not set label cherry-pick-approved: Must be member in one of these teams: [openshift-staff-engineers]

In response to this:

/label cherry-pick-approved

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

weinliu · 2024-02-06T02:45:54Z

@sunilcio could you help?

sunilcio · 2024-02-06T04:51:31Z

/label cherry-pick-approved

openshift-ci-robot · 2024-02-06T04:55:22Z

@jkyros: Jira Issue OCPBUGS-12210: All pull requests linked via external trackers have merged:

openshift/kubernetes#1876

Jira Issue OCPBUGS-12210 has been moved to the MODIFIED state.

In response to this:

The short version here is that:

If you supply partial HPA behaviors (e.g. ScaleUp but not ScaleDown, etc ) in kube < 1.27, it will send the kube-controller-manager into CrashLoopBackOff

This is fixed in kube 1.27+ by defaulting to autoscaling v2: Autoscaling: advance v2 as the preferred API version over v1 kubernetes/kubernetes#114358 but we can't backport that type of change

So, since we're storing as v1 but consuming as v2 in the controller, we need to make sure that the behaviors aren't nil in the v2 object when someone creates or edits a v1 object to have partially filled behaviors

This PR:

~~Defauts any nil behaviors when converting from v1 -> internal~~

Makes the controller fill in missing behavior with defaults

Removes the "defaulter cheating" in the unit test that was masking the crash

Adds a test case to verify that it works

Is targeted straight to 4.13 because it's useless after that (if it were preexisting carry, it would have been dropped in 4.14)

Updstream details:

I did inquire upstream but we're already outside the n-3 supported versions and the fix is useless post 1.27, so the "juice wasn't worth the squeeze". We still have at least one customer that needs this fixed so we'd just need to get this into 4.13 and 4.12 since they're still supported.

Here is a straightforward crasher ( you might have to wait a little bit until the HPA touches it, but you should be able to see kube-controller-manager pods go into CrashLoopBackoff):
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
 name: crasher
 namespace: test 
 labels:
   app: test
 annotations:
   autoscaling.alpha.kubernetes.io/behavior: '{"ScaleDown":{"StabilizationWindowSeconds":600,"SelectPolicy":"Max","Policies":[{"type":"Pods","value":1,"periodSeconds":1}]}}'
spec:
 scaleTargetRef:
   kind: Deployment
   name: test
   apiVersion: apps/v1
 minReplicas: 8
 maxReplicas: 25
 targetCPUUtilizationPercentage: 120
---
kind: Deployment
apiVersion: apps/v1
metadata:
 name: test
 namespace: test
 labels:
   app: test
spec:
 replicas: 2
 selector:
   matchLabels:
     app: test
 template:
   metadata:
     creationTimestamp: null
     labels:
       app: test
   spec:
     containers:
       - resources:
           limits:
             cpu: 500m
             memory: 128Mi
           requests:
             cpu: 25m
             memory: 128Mi
         readinessProbe:
           httpGet:
             path: /
             port: 8080
             scheme: HTTP
           timeoutSeconds: 1
           periodSeconds: 10
           successThreshold: 1
           failureThreshold: 3
         terminationMessagePath: /dev/termination-log
         name: nginx
         livenessProbe:
           httpGet:
             path: /
             port: 8080
             scheme: HTTP
           timeoutSeconds: 1
           periodSeconds: 10
           successThreshold: 1
           failureThreshold: 3
         ports:
           - containerPort: 8080
             protocol: TCP
         imagePullPolicy: Always
         image: 'nginxinc/nginx-unprivileged:latest'
     restartPolicy: Always
     terminationGracePeriodSeconds: 30
     dnsPolicy: ClusterFirst
     securityContext: {}
     schedulerName: default-scheduler
 strategy:
   type: RollingUpdate
   rollingUpdate:
     maxUnavailable: 25%
     maxSurge: 25%
 revisionHistoryLimit: 10
 progressDeadlineSeconds: 600
Fixes: OCPBUGS-12210

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-bot · 2024-02-06T06:18:51Z

[ART PR BUILD NOTIFIER]

This PR has been included in build openshift-enterprise-pod-container-v4.13.0-202402060538.p0.g8f85140.assembly.stream for distgit openshift-enterprise-pod.
All builds following this will include this PR.

jkyros · 2024-02-06T16:41:20Z

Thanks everyone! I'm only going to take this back one more to 4.12 (since it's EUS):
/cherry-pick release-4.12

openshift-cherrypick-robot · 2024-02-06T16:44:08Z

@jkyros: #1876 failed to apply on top of branch "release-4.12":

Applying: UPSTREAM: 114358: Default missing fields in HPA behaviors
Using index info to reconstruct a base tree...
M	pkg/controller/podautoscaler/horizontal.go
M	pkg/controller/podautoscaler/horizontal_test.go
Falling back to patching base and 3-way merge...
Auto-merging pkg/controller/podautoscaler/horizontal_test.go
CONFLICT (content): Merge conflict in pkg/controller/podautoscaler/horizontal_test.go
Auto-merging pkg/controller/podautoscaler/horizontal.go
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0001 UPSTREAM: 114358: Default missing fields in HPA behaviors
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

In response to this:

Thanks everyone! I'm only going to take this back one more to 4.12 (since it's EUS):
/cherry-pick release-4.12

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-merge-robot · 2024-02-06T23:08:19Z

Fix included in accepted release 4.13.0-0.nightly-2024-02-06-120750

openshift-ci bot requested review from mfojtik and soltysh January 31, 2024 17:04

openshift-ci bot assigned joelsmith Jan 31, 2024

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jan 31, 2024

openshift-ci bot assigned aravindhp Jan 31, 2024

jkyros force-pushed the OCPBUGS12210-jan-30 branch from a98f50a to 0ef37f6 Compare January 31, 2024 19:38

openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Jan 31, 2024

deads2k reviewed Jan 31, 2024

View reviewed changes

jkyros force-pushed the OCPBUGS12210-jan-30 branch from 0ef37f6 to c302002 Compare February 1, 2024 04:51

jkyros force-pushed the OCPBUGS12210-jan-30 branch from c302002 to 0e0f66f Compare February 1, 2024 06:26

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 1, 2024

aravindhp added jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. and removed jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Feb 1, 2024

openshift-ci bot added the backport-risk-assessed Indicates a PR to a release branch has been evaluated and considered safe to accept. label Feb 1, 2024

openshift-ci bot assigned duanwei33, gangwgr, kasturinarra, sunilcio, wangke19, xingxingxia and zhouying7780 Feb 1, 2024

openshift-ci bot added the cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. label Feb 6, 2024

openshift-merge-bot bot merged commit 8f85140 into openshift:release-4.13 Feb 6, 2024
18 checks passed

jkyros mentioned this pull request Feb 14, 2024

[release-4.12] OCPBUGS-15527: Prevent partially filled HPA behaviors from crashing kube-controller-manager #1887

Merged

GoVulnBot mentioned this pull request Nov 17, 2024

x/vulndb: potential Go vuln in github.com/openshift/kubernetes: CVE-2024-0793 golang/vulndb#3277

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OCPBUGS-12210: Prevent partially filled HPA behaviors from crashing kube-controller-manager #1876

OCPBUGS-12210: Prevent partially filled HPA behaviors from crashing kube-controller-manager #1876

jkyros commented Jan 31, 2024 •

edited

Loading

openshift-ci-robot commented Jan 31, 2024

openshift-ci-robot commented Jan 31, 2024

jkyros commented Jan 31, 2024

openshift-ci-robot commented Jan 31, 2024

joelsmith commented Jan 31, 2024

aravindhp commented Jan 31, 2024

deads2k commented Jan 31, 2024

openshift-ci-robot commented Jan 31, 2024

jkyros commented Jan 31, 2024 •

edited

Loading

deads2k Jan 31, 2024

jkyros Jan 31, 2024

deads2k commented Jan 31, 2024

openshift-ci-robot commented Feb 1, 2024

openshift-ci-robot commented Feb 1, 2024

jkyros commented Feb 1, 2024

openshift-ci-robot commented Feb 1, 2024

jkyros commented Feb 1, 2024

deads2k commented Feb 1, 2024

openshift-ci bot commented Feb 1, 2024

joelsmith commented Feb 1, 2024

openshift-ci bot commented Feb 1, 2024

aravindhp commented Feb 1, 2024

jkyros commented Feb 5, 2024

weinliu commented Feb 6, 2024

openshift-ci bot commented Feb 6, 2024

weinliu commented Feb 6, 2024

sunilcio commented Feb 6, 2024

openshift-ci-robot commented Feb 6, 2024

openshift-bot commented Feb 6, 2024

jkyros commented Feb 6, 2024

openshift-cherrypick-robot commented Feb 6, 2024

openshift-merge-robot commented Feb 6, 2024

OCPBUGS-12210: Prevent partially filled HPA behaviors from crashing kube-controller-manager #1876

OCPBUGS-12210: Prevent partially filled HPA behaviors from crashing kube-controller-manager #1876

Conversation

jkyros commented Jan 31, 2024 • edited Loading

openshift-ci-robot commented Jan 31, 2024

openshift-ci-robot commented Jan 31, 2024

jkyros commented Jan 31, 2024

openshift-ci-robot commented Jan 31, 2024

joelsmith commented Jan 31, 2024

aravindhp commented Jan 31, 2024

deads2k commented Jan 31, 2024

openshift-ci-robot commented Jan 31, 2024

jkyros commented Jan 31, 2024 • edited Loading

deads2k Jan 31, 2024

Choose a reason for hiding this comment

jkyros Jan 31, 2024

Choose a reason for hiding this comment

deads2k commented Jan 31, 2024

openshift-ci-robot commented Feb 1, 2024

openshift-ci-robot commented Feb 1, 2024

jkyros commented Feb 1, 2024

openshift-ci-robot commented Feb 1, 2024

jkyros commented Feb 1, 2024

deads2k commented Feb 1, 2024

openshift-ci bot commented Feb 1, 2024

joelsmith commented Feb 1, 2024

openshift-ci bot commented Feb 1, 2024

aravindhp commented Feb 1, 2024

jkyros commented Feb 5, 2024

weinliu commented Feb 6, 2024

openshift-ci bot commented Feb 6, 2024

weinliu commented Feb 6, 2024

sunilcio commented Feb 6, 2024

openshift-ci-robot commented Feb 6, 2024

openshift-bot commented Feb 6, 2024

jkyros commented Feb 6, 2024

openshift-cherrypick-robot commented Feb 6, 2024

openshift-merge-robot commented Feb 6, 2024

jkyros commented Jan 31, 2024 •

edited

Loading

jkyros commented Jan 31, 2024 •

edited

Loading