Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Continuous HPA updates with CPU Utilization trigger #5821

Closed
uucloud opened this issue May 23, 2024 · 3 comments
Closed

Continuous HPA updates with CPU Utilization trigger #5821

uucloud opened this issue May 23, 2024 · 3 comments
Labels
bug Something isn't working stale All issues that are marked as stale due to inactivity

Comments

@uucloud
Copy link

uucloud commented May 23, 2024

Report

When I submit a ScaledObject that includes both CPU utilization triggers and other resource triggers, the KEDA operator may continuously update hpa and never stops.

Expected Behavior

Only one update occurs

Actual Behavior

Continuous HPA updates

Steps to Reproduce the Problem

  1. Create a ScaledObject with both CPU utilization triggers and other resource triggers.
  2. Ensure the CPU utilization trigger is not the last one in the ScaledObject.
  3. Use a Kubernetes cluster with a version below 1.27 (e.g., 1.26).
  4. Observe the continuous triggering of "Found difference in the HPA spec according to ScaledObject" by the KEDA operator.
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: uucloud-test-so
  namespace: default
spec:
  scaleTargetRef:
    name: uucloud-test  # your deployment
  minReplicaCount:  1                                     
  maxReplicaCount:  2                                   
  triggers:
    - metadata:
        value: "50"
      metricType: Utilization
      type: cpu
    - metadata:
        value: "50"
      metricType: Utilization
      type: memory

Logs from KEDA operator

No response

KEDA Version

2.12.1

Kubernetes Version

< 1.27

Scaler Details

Resource

Anything else?

This issue is fundamentally the same as the one encountered in kubernetes/kubernetes#74099. The root cause is Kubernetes reordering spec.metrics, the HPA v1 conversion logic causing the position of the CPU utilization HPA metric to be adjusted.

When creating or updating an HPA, the conversion logic in these segments of code link1 and link2 converts the first CPU utilization metric into the HPA v1 metric (if there are multiple CPU utilization triggers, the others are lost), and stores the remaining metrics in annotations. When converting back from HPA v1 to HPA, it appends the CPU utilization metric to the end (link3).

This results in a situation where, if the ScaledObject has multiple resource triggers and one of them is a CPU utilization trigger, the final HPA will always have the CPU utilization trigger at the end. Additionally, if there are multiple CPU utilization triggers, only one will remain (though having multiple CPU utilization triggers in one HPA configuration might seem to have little practical value...). This causes the KEDA operator to continuously detect differences and persistently update the HPA.

In Kubernetes 1.27 and later, this issue is resolved because the autoscaling v1 schema is deprioritized behind v2, meaning it no longer defaults to converting to HPA v1. The relevant change is shown below:

diff --git a/pkg/apis/autoscaling/install/install.go b/pkg/apis/autoscaling/install/install.go
index 3740aee3155..424fc5ce85d 100644
--- a/pkg/apis/autoscaling/install/install.go
+++ b/pkg/apis/autoscaling/install/install.go
@@ -40,6 +40,5 @@ func Install(scheme *runtime.Scheme) {
        utilruntime.Must(v2.AddToScheme(scheme))
        utilruntime.Must(v2beta1.AddToScheme(scheme))
        utilruntime.Must(v1.AddToScheme(scheme))
-       // TODO: move v2 to the front of the list in 1.24
-       utilruntime.Must(scheme.SetVersionPriority(v1.SchemeGroupVersion, v2.SchemeGroupVersion, v2beta1.SchemeGroupVersion, v2beta2.SchemeGroupVersion))
+       utilruntime.Must(scheme.SetVersionPriority(v2.SchemeGroupVersion, v1.SchemeGroupVersion, v2beta1.SchemeGroupVersion, v2beta2.SchemeGroupVersion))
@JorTurFer
Copy link
Member

Hello,
Thanks for reporting this. Just to understand the issue, this affects to k8s 1.26 or bellow, and 1.27 has already fixed this, right? We currently only support >= 1.27 officially. Personally I don't have troubles fixing this if it's easy, but I'd like to know @tomkerkhove and @zroubalik thoughts

Copy link

stale bot commented Jul 26, 2024

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale All issues that are marked as stale due to inactivity label Jul 26, 2024
Copy link

stale bot commented Aug 2, 2024

This issue has been automatically closed due to inactivity.

@stale stale bot closed this as completed Aug 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working stale All issues that are marked as stale due to inactivity
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants