Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[release-4.12] OCPBUGS-15527: Prevent partially filled HPA behaviors from crashing kube-controller-manager #1887

Conversation

jkyros
Copy link

@jkyros jkyros commented Feb 14, 2024

This is a manual backport of #1876. It looks like we probably also need to take it back through 4.11 even though we originally weren't intending to.

(Manual backport because TestMultipleHPAs doesn't exist in 4.12 so it looks like a collision)

The description in the autoscaling API for the HorizontalPodAutoscaler
suggests that HorizontalPodAutoscalerSpec's Behavior field (and its
ScaleUp and ScaleDown members) are optional. And that if not supplied,
defaults will be used.

That's true if the entire Behavior is nil because, we go
through "normalizeDesiredReplicas" instead of
"normalizeDesiredReplicasWithBehaviors", but if the structure is only
partially supplied, leaving some members nil, it results in nil
dereferences when we end up going though
normalizeDesiredReplicasWithBehaviors.

So we end up in a situation where:
- If Behavior is entirely absent (nil) we use defaults (good)
- If Behavior is partially specified we panic (very bad)
- If stabilizationWindowSeconds is nil in either ScaleUp or Scaledown,
  we panic (also very bad)

In general, this only happens with pre-v2 HPA objects because v2 does
properly fill in the default values.

This commit prevents the panic by using the defaulters to ensure that
unpopulated fields in the behavior objects get filled in with their
defaults before they are used, so they can safely be dereferenced by
later code that performs calculations on them.
@openshift-ci-robot
Copy link

@jkyros: An error was encountered searching for bug OCPBUGS-15527 on the Jira server at https://issues.redhat.com/. No known errors were detected, please see the full error message for details.

Full error message. You do not have the permission to see the specified issue.: request failed. Please analyze the request body for more details. Status code: 403:

Please contact an administrator to resolve this issue, then request a bug refresh with /jira refresh.

In response to this:

This is a manual backport of #1876. It looks like we probably also need to take it back through 4.11 even though we originally weren't intending to.

(Manual backport because my diff was unlucky enough to collide with the TestMultipleHPAs test from 4.12)

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot openshift-ci-robot added the backports/validated-commits Indicates that all commits come to merged upstream PRs. label Feb 14, 2024
@openshift-ci-robot
Copy link

@jkyros: the contents of this pull request could be automatically validated.

The following commits are valid:

Comment /validate-backports to re-evaluate validity of the upstream PRs, for example when they are merged upstream.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Feb 14, 2024
@openshift-ci-robot
Copy link

@jkyros: This pull request references Jira Issue OCPBUGS-15527, which is invalid:

  • expected Jira Issue OCPBUGS-15527 to depend on a bug targeting a version in 4.13.0, 4.13.z and in one of the following states: VERIFIED, RELEASE PENDING, CLOSED (ERRATA), CLOSED (CURRENT RELEASE), CLOSED (DONE), CLOSED (DONE-ERRATA), but no dependents were found

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

This is a manual backport of #1876. It looks like we probably also need to take it back through 4.11 even though we originally weren't intending to.

(Manual backport because TestMultipleHPAs doesn't exist in 4.12 so it looks like a collision)

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot openshift-ci-robot added the jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. label Feb 14, 2024
@jkyros
Copy link
Author

jkyros commented Feb 15, 2024

/jira refresh

@openshift-ci-robot openshift-ci-robot added jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. and removed jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Feb 15, 2024
@openshift-ci-robot
Copy link

@jkyros: This pull request references Jira Issue OCPBUGS-15527, which is valid. The bug has been moved to the POST state.

6 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.12.z) matches configured target version for branch (4.12.z)
  • bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, POST)
  • dependent bug Jira Issue OCPBUGS-12210 is in the state Closed (Done-Errata), which is one of the valid states (VERIFIED, RELEASE PENDING, CLOSED (ERRATA), CLOSED (CURRENT RELEASE), CLOSED (DONE), CLOSED (DONE-ERRATA))
  • dependent Jira Issue OCPBUGS-12210 targets the "4.13.z" version, which is one of the valid target versions: 4.13.0, 4.13.z
  • bug has dependents

Requesting review from QA contact:
/cc @wangke19

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot requested a review from wangke19 February 15, 2024 01:49
Copy link

openshift-ci bot commented Feb 15, 2024

@jkyros: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@wangke19
Copy link

/label qe-approved

@openshift-ci openshift-ci bot added the qe-approved Signifies that QE has signed off on this PR label Feb 18, 2024
@openshift-ci-robot
Copy link

@jkyros: An error was encountered searching for dependent bug OCPBUGS-12210 for bug OCPBUGS-15527 on the Jira server at https://issues.redhat.com/. No known errors were detected, please see the full error message for details.

Full error message. You do not have the permission to see the specified issue.: request failed. Please analyze the request body for more details. Status code: 401:

Please contact an administrator to resolve this issue, then request a bug refresh with /jira refresh.

In response to this:

This is a manual backport of #1876. It looks like we probably also need to take it back through 4.11 even though we originally weren't intending to.

(Manual backport because TestMultipleHPAs doesn't exist in 4.12 so it looks like a collision)

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@joelsmith
Copy link

/lgtm
@deads2k could you please approve this backport?

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Feb 20, 2024
@openshift-ci-robot
Copy link

@jkyros: This pull request references Jira Issue OCPBUGS-15527, which is valid.

6 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.12.z) matches configured target version for branch (4.12.z)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)
  • dependent bug Jira Issue OCPBUGS-12210 is in the state Closed (Done-Errata), which is one of the valid states (VERIFIED, RELEASE PENDING, CLOSED (ERRATA), CLOSED (CURRENT RELEASE), CLOSED (DONE), CLOSED (DONE-ERRATA))
  • dependent Jira Issue OCPBUGS-12210 targets the "4.13.z" version, which is one of the valid target versions: 4.13.0, 4.13.z
  • bug has dependents

Requesting review from QA contact:
/cc @wangke19

In response to this:

This is a manual backport of #1876. It looks like we probably also need to take it back through 4.11 even though we originally weren't intending to.

(Manual backport because TestMultipleHPAs doesn't exist in 4.12 so it looks like a collision)

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@deads2k
Copy link

deads2k commented Mar 4, 2024

/approve
/backport-risk-accessed

@deads2k deads2k added approved Indicates a PR has been approved by an approver from all required OWNERS files. backport-risk-assessed Indicates a PR to a release branch has been evaluated and considered safe to accept. labels Mar 4, 2024
@deads2k deads2k added the cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. label Mar 4, 2024
@deads2k
Copy link

deads2k commented Mar 4, 2024

/refresh

Copy link

openshift-ci bot commented Mar 4, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: deads2k, jkyros, joelsmith

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

1 similar comment
Copy link

openshift-ci bot commented Mar 4, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: deads2k, jkyros, joelsmith

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-bot openshift-merge-bot bot merged commit 9946c63 into openshift:release-4.12 Mar 4, 2024
18 checks passed
@openshift-ci-robot
Copy link

@jkyros: Jira Issue OCPBUGS-15527: All pull requests linked via external trackers have merged:

Jira Issue OCPBUGS-15527 has been moved to the MODIFIED state.

In response to this:

This is a manual backport of #1876. It looks like we probably also need to take it back through 4.11 even though we originally weren't intending to.

(Manual backport because TestMultipleHPAs doesn't exist in 4.12 so it looks like a collision)

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-bot
Copy link

[ART PR BUILD NOTIFIER]

This PR has been included in build openshift-enterprise-pod-container-v4.12.0-202403042037.p0.g9946c63.assembly.stream.el8 for distgit openshift-enterprise-pod.
All builds following this will include this PR.

@openshift-merge-robot
Copy link

Fix included in accepted release 4.12.0-0.nightly-2024-03-05-025545

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. backport-risk-assessed Indicates a PR to a release branch has been evaluated and considered safe to accept. backports/validated-commits Indicates that all commits come to merged upstream PRs. cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. qe-approved Signifies that QE has signed off on this PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.