Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ScaledJob: introduce MultipleScalersCalculation #2016

Merged
merged 23 commits into from
Sep 13, 2021

Conversation

TsuyoshiUshio
Copy link
Contributor

@TsuyoshiUshio TsuyoshiUshio commented Aug 6, 2021

This PR fix issue of multi scalers on Scaled Job. And introduce a new option.
The current login assume only one scaler is active and available.
The most of the cases, it is ok for that. That is why we don't get any issues about it.

However @zroubalik pointed out this issue, I come up with the fix and do some refactoring to make it clear.

I'm not writing documentation yet, before that, I'd like to confirm this logic if fine for you guys.
Once It looks, ok, I'll go ahead full testing and write the documentation and make this PR as not draft.

Spec

Introduce a new option MultipleScalersCalculation on ScaledJob.

apiVersion: keda.sh/v1alpha1
kind: ScaledJob
metadata:
  name: {scaled-job-name}
spec:
  jobTargetRef:
         :
  scalingStrategy:
    strategy: "custom"                        # Optional. Default: default. Which Scaling Strategy to use. 
    customScalingQueueLengthDeduction: 1      # Optional. A parameter to optimize custom ScalingStrategy.
    customScalingRunningJobPercentage: "0.5"  # Optional. A parameter to optimize custom ScalingStrategy.
    pendingPodConditions:                     # Optional. A parameter to calculate pending job count per the specified pod conditions
      - "Ready"
      - "PodScheduled"
      - "AnyOtherCustomPodCondition"
    MultipleScalersCalculation: "max" # Optional. Default: max. If multiple scaler is exists, how to calculate the queueLength and maxValue (that is the expected number of pods that will be created)
  triggers:

Available Options

The option is how keda calculate queueLength and maxValue that is the number of pod will be created.

max : default, pick a scaler that has the max number of queueLength.
min: pick a scaler that has the min number of queueLength.
avg: sum up all the active scalers metrics and devided by the number of the active scalers.
sum: sum up all the active scalers metrics

Checklist

  • Commits are signed with Developer Certificate of Origin (DCO - learn more)
  • Tests have been added
  • A PR is opened to update the documentation on (repo) (if applicable)
  • Changelog has been updated

Fixes #

Document #
kedacore/keda-docs#505

Signed-off-by: Tsuyoshi Ushio <ushio@simplearchitect.com>
Signed-off-by: Tsuyoshi Ushio <ushio@simplearchitect.com>
Signed-off-by: Tsuyoshi Ushio <ushio@simplearchitect.com>
Signed-off-by: Tsuyoshi Ushio <ushio@simplearchitect.com>
Signed-off-by: Tsuyoshi Ushio <ushio@simplearchitect.com>
Signed-off-by: Tsuyoshi Ushio <ushio@simplearchitect.com>
@TsuyoshiUshio TsuyoshiUshio marked this pull request as ready for review August 9, 2021 03:56
@TsuyoshiUshio
Copy link
Contributor Author

I tested, and I add test case and improve logging for clarifying on the multiple scalers scenario.
One question is, the change of controller-gen.kubebuilder.io/version: v0.5.0 from v0.3.0 Should I keep it to v0.3.0 ? It is auto generated, so that I keep it for now.

config/crd/bases/keda.sh_scaledjobs.yaml Outdated Show resolved Hide resolved
pkg/scaling/scale_handler.go Outdated Show resolved Hide resolved
pkg/scaling/scale_handler.go Outdated Show resolved Hide resolved
pkg/scaling/scale_handler.go Outdated Show resolved Hide resolved
pkg/scaling/scale_handler.go Outdated Show resolved Hide resolved
pkg/scaling/scale_handler.go Outdated Show resolved Hide resolved
TsuyoshiUshio and others added 5 commits September 4, 2021 09:50
Co-authored-by: Zbynek Roubalik <726523+zroubalik@users.noreply.github.com>
Signed-off-by: Tsuyoshi Ushio <ushio@simplearchitect.com>
Co-authored-by: Zbynek Roubalik <726523+zroubalik@users.noreply.github.com>
Signed-off-by: Tsuyoshi Ushio <ushio@simplearchitect.com>
Signed-off-by: Tsuyoshi Ushio <ushio@simplearchitect.com>
Signed-off-by: Tsuyoshi Ushio <ushio@simplearchitect.com>
…lation

Signed-off-by: Tsuyoshi Ushio <ushio@simplearchitect.com>
@TsuyoshiUshio
Copy link
Contributor Author

Hi @zroubalik
I update this PR as you requested. Could you have a look?
I also update the documentation according to the name suggestion.
kedacore/keda-docs#505

CHANGELOG.md Outdated Show resolved Hide resolved
pkg/metrics/scaledjob_prometheus_metrics.go Outdated Show resolved Hide resolved
pkg/scaling/scaledjob/scale_metrics.go Outdated Show resolved Hide resolved
pkg/scaling/scaledjob/scale_metrics.go Show resolved Hide resolved
pkg/scaling/scaledjob/scale_metrics.go Outdated Show resolved Hide resolved
pkg/scaling/scaledjob/scale_metrics.go Outdated Show resolved Hide resolved
@zroubalik zroubalik changed the title Fix issue of ScaledJob scaling and introduce MultiScalersOption ScaledJob: introduce MultipleScalersCalculation Sep 7, 2021
Signed-off-by: Tsuyoshi Ushio <ushio@simplearchitect.com>
Signed-off-by: Tsuyoshi Ushio <ushio@simplearchitect.com>
Signed-off-by: Tsuyoshi Ushio <ushio@simplearchitect.com>
Signed-off-by: Tsuyoshi Ushio <ushio@simplearchitect.com>
@TsuyoshiUshio
Copy link
Contributor Author

Hi @zroubalik
I updated the PR. Could you have a look?

Copy link
Member

@zroubalik zroubalik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good! I have a few very minor nits. Thanks @TsuyoshiUshio !

pkg/scaling/scaledjob/scale_metrics.go Outdated Show resolved Hide resolved
pkg/scaling/scaledjob/scale_metrics.go Outdated Show resolved Hide resolved
pkg/scaling/scaledjob/scale_metrics.go Outdated Show resolved Hide resolved
pkg/scaling/scaledjob/scale_metrics_test.go Outdated Show resolved Hide resolved
pkg/scaling/scaledjob/scale_metrics.go Outdated Show resolved Hide resolved
CHANGELOG.md Outdated Show resolved Hide resolved
@zroubalik zroubalik added this to the v2.5.0 milestone Sep 10, 2021
TsuyoshiUshio and others added 7 commits September 11, 2021 10:43
Co-authored-by: Zbynek Roubalik <726523+zroubalik@users.noreply.github.com>
Signed-off-by: Tsuyoshi Ushio <ushio@simplearchitect.com>
Co-authored-by: Zbynek Roubalik <726523+zroubalik@users.noreply.github.com>
Signed-off-by: Tsuyoshi Ushio <ushio@simplearchitect.com>
Co-authored-by: Zbynek Roubalik <726523+zroubalik@users.noreply.github.com>
Signed-off-by: Tsuyoshi Ushio <ushio@simplearchitect.com>
Co-authored-by: Zbynek Roubalik <726523+zroubalik@users.noreply.github.com>
Signed-off-by: Tsuyoshi Ushio <ushio@simplearchitect.com>
Co-authored-by: Zbynek Roubalik <726523+zroubalik@users.noreply.github.com>
Signed-off-by: Tsuyoshi Ushio <ushio@simplearchitect.com>
Co-authored-by: Zbynek Roubalik <726523+zroubalik@users.noreply.github.com>
Signed-off-by: Tsuyoshi Ushio <ushio@simplearchitect.com>
Signed-off-by: Tsuyoshi Ushio <ushio@simplearchitect.com>
Signed-off-by: Tsuyoshi Ushio <ushio@simplearchitect.com>
@TsuyoshiUshio
Copy link
Contributor Author

Hi @zroubalik I accept all your suggestion, however, CI / Validate PR looks start to fail. I looked the code change again, however, I just changed ScaledJob related part. The CI fails with the ScaledObjectController. Do you know this issue?

ScaledObjectController
/__w/keda/keda/controllers/keda/scaledobject_controller_test.go:50
  functional tests
  /__w/keda/keda/controllers/keda/scaledobject_controller_test.go:204
    deploys ScaledObject and creates HPA, when IdleReplicaCount, MinReplicaCount and MaxReplicaCount is defined [It]
    /__w/keda/keda/controllers/keda/scaledobject_controller_test.go:269

    Timed out after 20.001s.
    Expected
        <v1.ConditionStatus>: Unknown
    to equal
        <v1.ConditionStatus>: True

@zroubalik
Copy link
Member

Hi @zroubalik I accept all your suggestion, however, CI / Validate PR looks start to fail. I looked the code change again, however, I just changed ScaledJob related part. The CI fails with the ScaledObjectController. Do you know this issue?

ScaledObjectController
/__w/keda/keda/controllers/keda/scaledobject_controller_test.go:50
  functional tests
  /__w/keda/keda/controllers/keda/scaledobject_controller_test.go:204
    deploys ScaledObject and creates HPA, when IdleReplicaCount, MinReplicaCount and MaxReplicaCount is defined [It]
    /__w/keda/keda/controllers/keda/scaledobject_controller_test.go:269

    Timed out after 20.001s.
    Expected
        <v1.ConditionStatus>: Unknown
    to equal
        <v1.ConditionStatus>: True

That is just a flaky test :( I rerun the job and it fine now.

Copy link
Member

@zroubalik zroubalik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good, great job @TsuyoshiUshio !

@TsuyoshiUshio TsuyoshiUshio merged commit af5981a into main Sep 13, 2021
@TsuyoshiUshio TsuyoshiUshio deleted the tsushi/fixscaledjobmultitrigger branch September 13, 2021 17:15
nilayasiktoprak pushed a commit to nilayasiktoprak/keda that referenced this pull request Oct 23, 2021
* Fix bug of ScaledJob multiple triggers

Signed-off-by: Tsuyoshi Ushio <ushio@simplearchitect.com>

* fix pre-commit issue

Signed-off-by: Tsuyoshi Ushio <ushio@simplearchitect.com>

* Update logs on scale handler and add crds

Signed-off-by: Tsuyoshi Ushio <ushio@simplearchitect.com>

* update change log

Signed-off-by: Tsuyoshi Ushio <ushio@simplearchitect.com>

* Add test case for exceeding MaxReplicaCount in sum

Signed-off-by: Tsuyoshi Ushio <ushio@simplearchitect.com>

* fix pre-commit

Signed-off-by: Tsuyoshi Ushio <ushio@simplearchitect.com>

* Update pkg/scaling/scale_handler.go

Co-authored-by: Zbynek Roubalik <726523+zroubalik@users.noreply.github.com>
Signed-off-by: Tsuyoshi Ushio <ushio@simplearchitect.com>

* Update pkg/scaling/scale_handler.go

Co-authored-by: Zbynek Roubalik <726523+zroubalik@users.noreply.github.com>
Signed-off-by: Tsuyoshi Ushio <ushio@simplearchitect.com>

* revert version and refactor scaling logic

Signed-off-by: Tsuyoshi Ushio <ushio@simplearchitect.com>

* Change option name from MultipleScalersOption to MultipleScalersCalculation

Signed-off-by: Tsuyoshi Ushio <ushio@simplearchitect.com>

* update changelog for multiple triggers section

Signed-off-by: Tsuyoshi Ushio <ushio@simplearchitect.com>

* Error Handling

Signed-off-by: Tsuyoshi Ushio <ushio@simplearchitect.com>

* Refactor logging

Signed-off-by: Tsuyoshi Ushio <ushio@simplearchitect.com>

* remove scaledjob prometheus

Signed-off-by: Tsuyoshi Ushio <ushio@simplearchitect.com>

* Update pkg/scaling/scaledjob/scale_metrics.go

Co-authored-by: Zbynek Roubalik <726523+zroubalik@users.noreply.github.com>
Signed-off-by: Tsuyoshi Ushio <ushio@simplearchitect.com>

* Update pkg/scaling/scaledjob/scale_metrics.go

Co-authored-by: Zbynek Roubalik <726523+zroubalik@users.noreply.github.com>
Signed-off-by: Tsuyoshi Ushio <ushio@simplearchitect.com>

* Update pkg/scaling/scaledjob/scale_metrics.go

Co-authored-by: Zbynek Roubalik <726523+zroubalik@users.noreply.github.com>
Signed-off-by: Tsuyoshi Ushio <ushio@simplearchitect.com>

* Update pkg/scaling/scaledjob/scale_metrics_test.go

Co-authored-by: Zbynek Roubalik <726523+zroubalik@users.noreply.github.com>
Signed-off-by: Tsuyoshi Ushio <ushio@simplearchitect.com>

* Update pkg/scaling/scaledjob/scale_metrics.go

Co-authored-by: Zbynek Roubalik <726523+zroubalik@users.noreply.github.com>
Signed-off-by: Tsuyoshi Ushio <ushio@simplearchitect.com>

* Update CHANGELOG.md

Co-authored-by: Zbynek Roubalik <726523+zroubalik@users.noreply.github.com>
Signed-off-by: Tsuyoshi Ushio <ushio@simplearchitect.com>

* Move section to new section

Signed-off-by: Tsuyoshi Ushio <ushio@simplearchitect.com>

* fix pre-commit white space issue

Signed-off-by: Tsuyoshi Ushio <ushio@simplearchitect.com>

Co-authored-by: Zbynek Roubalik <726523+zroubalik@users.noreply.github.com>
Signed-off-by: nilayasiktoprak <nilayasiktoprak@gmail.com>
@taylorchu
Copy link

is there a reason why we use queuelength instead of maxscale in max/min/avg calculation?

For example consider 2 scalers:

  1. queuelength=10, target=4 will produce maxscale=3
  2. queuelength=5, target=1 will produce maxscale=5

Right now max in this implementation, will use maxscale=3. To me it makes more sense to use maxscale=5. Similarly min seems more intuitive to pick the one that will produce maxscale=3.

@taylorchu
Copy link

@TsuyoshiUshio @zroubalik thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants