Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(nats-jetstream): correctly count messages that should be redelivered (waiting for ack) towards keda value #3809

Conversation

toniopelo
Copy link
Contributor

@toniopelo toniopelo commented Nov 2, 2022

Until now, the keda nats jetstream scaler did only use the num_pending value returned by the nats monitoring endpoint for a specifc consumer. The problem was that messages that should be re-delivered (retried) are not returned as part of the num_pending counter but they are in a separate counter called num_ack_pending.
This PR use the sum of these two counter to determine the value that should be used by keda instead of only the num_pending value.

This fixes two problems:

  • Keda would never scale up a deployment/job based on a consumer if this consumer only has messages that are waiting for a retry.
  • Keda would scale down deployment/job too fast because when a consumer pulls a message from nats, it decrements immediatly the num_pending counter and increment the num_ack_pending counter. Keda would then think that there is no work to be done and scale down the deployment/job that are still processing the messages (after a cooldown time if setup).

Checklist

  • When introducing a new scaler, I agree with the scaling governance policy
  • Commits are signed with Developer Certificate of Origin (DCO - learn more)
  • Tests have been added
  • A PR is opened to update our Helm chart (repo) (if applicable, ie. when deployment manifests are modified)
  • A PR is opened to update the documentation on (repo) (if applicable)
  • Changelog has been updated and is aligned with our changelog requirements

Fixes #3787

Relates to #

@toniopelo toniopelo requested a review from a team as a code owner November 2, 2022 21:07
@toniopelo toniopelo force-pushed the fix/nats-jetstream-message-redelivery-are-ignored branch from 0cf7bbc to 1d259f3 Compare November 2, 2022 21:09
@toniopelo toniopelo changed the title fix: count messages that should be retried as pending messages (value used for scaling) fix(nats-jetstream): correctly count messages that should be redelivered (waiting for ack) towards keda value Nov 2, 2022
@toniopelo
Copy link
Contributor Author

@JorTurFer That was a pretty straightforward fix after all :).
I didn't add tests because this specific line was not tested before (so it seems) and I don't have the golang knowledge to setup tests, I hope this is not a no-go. Else, we would have to find somebody to add tests on this PR or at least provide me with some guidance on how to do it :). But as it's a single line change it should leave the code coverage untouched.

@JorTurFer
Copy link
Member

JorTurFer commented Nov 4, 2022

/run-e2e nats*
Update: You can check the progress here

@toniopelo
Copy link
Contributor Author

toniopelo commented Nov 5, 2022

@JorTurFer e2e passed, I checked the last point in the list to make the PR checks happy and updated the branch.
Do you think this can be merged an released ?

@zroubalik
Copy link
Member

zroubalik commented Nov 7, 2022

@toniopelo could you please rebase this PR? I think we can merge it then. Thanks!

…t of pending messages used for scaling

Signed-off-by: Antoine Laffargue <antoine.laffargue@gmail.com>
Signed-off-by: Antoine Laffargue <antoine.laffargue@gmail.com>
@toniopelo toniopelo force-pushed the fix/nats-jetstream-message-redelivery-are-ignored branch from 8735888 to 9ed545e Compare November 7, 2022 18:33
@toniopelo
Copy link
Contributor Author

@zroubalik Nice! Just rebased it :)

@JorTurFer
Copy link
Member

JorTurFer commented Nov 9, 2022

/run-e2e nats*
Update: You can check the progress here

@JorTurFer JorTurFer enabled auto-merge (squash) November 9, 2022 07:26
@JorTurFer JorTurFer merged commit 971ab94 into kedacore:main Nov 9, 2022
@toniopelo
Copy link
Contributor Author

When can I expect this to be released @JorTurFer ?
The scaler is unusable in production right now :/

@JorTurFer
Copy link
Member

Hi @toniopelo ,
You can see the expected release dates in the roadmap.md file. Due to KubeCon we delayed next release to December, the only suggestion I can give if you need this fix immediately, is to use main tag directly. main tag is generated on every commit, so it has these changes, but it could be not stable, so if you use it, I'd suggest to pull from main and push to another registry in order to freeze the version and reduce the chance of having errors.

@toniopelo
Copy link
Contributor Author

Hi @JorTurFer, thanks for the information, that's crystal clear!
I'll see what I do, thanks for everything :)

@JorTurFer
Copy link
Member

You're welcome, happy to help

@JorTurFer JorTurFer mentioned this pull request Jan 17, 2023
1 task
pedro-stanaka pushed a commit to pedro-stanaka/keda that referenced this pull request Jan 18, 2023
…red (waiting for ack) towards keda value (kedacore#3809)

* fix: keda now include the messages that should be retried in the count of pending messages used for scaling

Signed-off-by: Antoine Laffargue <antoine.laffargue@gmail.com>

* chore: update changelog

Signed-off-by: Antoine Laffargue <antoine.laffargue@gmail.com>

Signed-off-by: Antoine Laffargue <antoine.laffargue@gmail.com>
@pedro-stanaka pedro-stanaka mentioned this pull request Jan 18, 2023
7 tasks
pedro-stanaka pushed a commit to pedro-stanaka/keda that referenced this pull request Jan 18, 2023
…red (waiting for ack) towards keda value (kedacore#3809)

* fix: keda now include the messages that should be retried in the count of pending messages used for scaling

Signed-off-by: Antoine Laffargue <antoine.laffargue@gmail.com>

* chore: update changelog

Signed-off-by: Antoine Laffargue <antoine.laffargue@gmail.com>

Signed-off-by: Antoine Laffargue <antoine.laffargue@gmail.com>
Signed-off-by: Pedro Tanaka <pedro.stanaka@gmail.com>
pedro-stanaka pushed a commit to pedro-stanaka/keda that referenced this pull request Jan 19, 2023
…red (waiting for ack) towards keda value (kedacore#3809)

* fix: keda now include the messages that should be retried in the count of pending messages used for scaling

Signed-off-by: Antoine Laffargue <antoine.laffargue@gmail.com>

* chore: update changelog

Signed-off-by: Antoine Laffargue <antoine.laffargue@gmail.com>

Signed-off-by: Antoine Laffargue <antoine.laffargue@gmail.com>
Signed-off-by: Pedro Tanaka <pedro.stanaka@gmail.com>
pedro-stanaka pushed a commit to pedro-stanaka/keda that referenced this pull request Jan 19, 2023
…red (waiting for ack) towards keda value (kedacore#3809)

* fix: keda now include the messages that should be retried in the count of pending messages used for scaling

Signed-off-by: Antoine Laffargue <antoine.laffargue@gmail.com>

* chore: update changelog

Signed-off-by: Antoine Laffargue <antoine.laffargue@gmail.com>

Signed-off-by: Antoine Laffargue <antoine.laffargue@gmail.com>
Signed-off-by: Pedro Tanaka <pedro.stanaka@gmail.com>
JorTurFer added a commit that referenced this pull request Jan 19, 2023
* fix: CVE-2022-3172 (#3693)

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>

* fix: Respect optional parameter inside envs for ScaledJobs (#3694)

Signed-off-by: Jorge Turrado <jorge_turrado@hotmail.es>
Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>

* fix(prometheus scaler): Detect Inf before casting float to int (#3762)

* fix(prometheus scaler): Detect Inf before casting float to int

Signed-off-by: Jorge Turrado <jorge_turrado@hotmail.es>

* Improve the log message

Signed-off-by: Jorge Turrado <jorge_turrado@hotmail.es>

Signed-off-by: Jorge Turrado <jorge_turrado@hotmail.es>
Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>

* fix(nats-jetstream): correctly count messages that should be redelivered (waiting for ack) towards keda value (#3809)

* fix: keda now include the messages that should be retried in the count of pending messages used for scaling

Signed-off-by: Antoine Laffargue <antoine.laffargue@gmail.com>

* chore: update changelog

Signed-off-by: Antoine Laffargue <antoine.laffargue@gmail.com>

Signed-off-by: Antoine Laffargue <antoine.laffargue@gmail.com>
Signed-off-by: Pedro Tanaka <pedro.stanaka@gmail.com>

* NewRelic scaler crashes on logging (#3946)

Signed-off-by: Laszlo Kishalmi <laszlo.kishalmi@partech.com>

Signed-off-by: Laszlo Kishalmi <laszlo.kishalmi@partech.com>
Signed-off-by: Pedro Tanaka <pedro.stanaka@gmail.com>
Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>

* Fix stackdriver client returning 0 for metric types of double (#3788)

* Update stackdriver client to handle metrics of value type double

Signed-off-by: Eric Takemoto <24865872+octothorped@users.noreply.github.com>

* move change log note to below general

Signed-off-by: Eric Takemoto <24865872+octothorped@users.noreply.github.com>

* parse activation value as float64

Signed-off-by: Eric Takemoto <24865872+octothorped@users.noreply.github.com>

* change target value to float64 for GCP pub/sub and stackdriver

Signed-off-by: Eric Takemoto <24865872+octothorped@users.noreply.github.com>

Signed-off-by: Eric Takemoto <24865872+octothorped@users.noreply.github.com>
Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>

* Fixing conflicts after cherry-pick

Signed-off-by: Pedro Tanaka <pedro.stanaka@gmail.com>

* fix: Close is called twice on PushScaler's deletion (#3599)

Signed-off-by: ytz <1020560484@qq.com>
Signed-off-by: taenyang <1020560484@qq.com>
Signed-off-by: Pedro Tanaka <pedro.stanaka@gmail.com>

* fix/datadog-scaler-null-last-point (#3954)

Signed-off-by: Tony Lee <dogzzdogzz@gmail.com>
Signed-off-by: Tony Lee <tony.lee@shopback.com>
Signed-off-by: Zbynek Roubalik <zroubalik@gmail.com>
Co-authored-by: Tony Lee <tony.lee@shopback.com>
Co-authored-by: Zbynek Roubalik <zroubalik@gmail.com>
Signed-off-by: Pedro Tanaka <pedro.stanaka@gmail.com>

* fix(mongodb): escape username and password (#3989)

Fixes #3992

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>

* Hacking generated files to version CI expects

Signed-off-by: Pedro Tanaka <pedro.stanaka@gmail.com>

* Updating aws-sdk and golang packages to fix CVEs

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>

* Updating golang/text package to fix CVE

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>

* Using same version of aws sdk as in main

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>
Signed-off-by: Jorge Turrado <jorge_turrado@hotmail.es>
Signed-off-by: Antoine Laffargue <antoine.laffargue@gmail.com>
Signed-off-by: Pedro Tanaka <pedro.stanaka@gmail.com>
Signed-off-by: Laszlo Kishalmi <laszlo.kishalmi@partech.com>
Signed-off-by: Eric Takemoto <24865872+octothorped@users.noreply.github.com>
Signed-off-by: ytz <1020560484@qq.com>
Signed-off-by: taenyang <1020560484@qq.com>
Signed-off-by: Tony Lee <dogzzdogzz@gmail.com>
Signed-off-by: Tony Lee <tony.lee@shopback.com>
Signed-off-by: Zbynek Roubalik <zroubalik@gmail.com>
Co-authored-by: Jorge Turrado Ferrero <Jorge_turrado@hotmail.es>
Co-authored-by: Antoine LAFFARGUE <antoine.laffargue@gmail.com>
Co-authored-by: Laszlo Kishalmi <laszlo.kishalmi@gmail.com>
Co-authored-by: Eric Takemoto <eric.takemoto@gocrisp.com>
Co-authored-by: taenyang <1020560484@qq.com>
Co-authored-by: Tony Lee <dogzzdogzz@gmail.com>
Co-authored-by: Tony Lee <tony.lee@shopback.com>
Co-authored-by: Zbynek Roubalik <zroubalik@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[NATS Jetstream] Do not take message retries into account when scaling
3 participants