[NATS Jetstream] Do not take message retries into account when scaling #3787

toniopelo · 2022-10-30T20:30:05Z

Report

Messages that have already been delivered once but need to be redelivered (did not reach the max number of delivery attempt) are not counted by the scaler and the deployment is not scaled up.

Expected Behavior

Messages in a redelivery state should be taken into account when computing the number of pending messages to be processed by a consumer.

Actual Behavior

These messages are not taken into account.

Steps to Reproduce the Problem

Deploy nats along with keda (repo: https://nats-io.github.io/k8s/helm/charts/, chart: nats)
Use natsbox pod (deployed with the nats chart) to create a stream and a pull consumer via nats cli
- kubectl exec -it natsbox-pod-name -- sh
- nats stream add (interactive)
- nats consumer add (interactive) (choose a subject that is covered by the stream you created and setup the consumer so it can redeliver a message several times, having a low ackwait time will also help)
Deploy a basic ScaledObject with a nats-jetstream trigger depending on the stream and consumer we just created.
Wait for the deployment to scale down to zero.
Publish a message on the stream/consumer subject and process it right away without acknowledging it, so it should be redelivered: nats pub my-subject-name 'test' && nats consumer next my-stream-name my-consumer-name --no-ack
One the AckWait time you setup on the consumer is reached, the message should be available for redelivery on next pull of the consumer. (You can test it with nats consumer next my-stream-name my-consumer-name --no-ack again to see it is redelivered, but you'll have to wait for AckWait time again).
In this situation, keda should see that the consumer have some message pending, but it don't and the deployment is not scaled up, this cause some message to never be delivered until an other unprocessed message trigger the scale up.

Logs from KEDA operator

Nothing to show here, the keda operator just doesn't detect any deployment to scale up

KEDA Version

2.8.1

Kubernetes Version

1.25

Platform

No response

Scaler Details

NATS Jetstream

Anything else?

I'm pretty sure the problem is located on this line, we should probably use a different prop or combine several props returned by the nats metrics endpoint to have the desired count of messages. I don't know what is the set of props returned by the nats monitoring endpoint but I might take a look if I have some time.

The text was updated successfully, but these errors were encountered:

JorTurFer · 2022-11-01T19:41:32Z

Hi @toniopelo ,
This seems totally legit. There are other fields in the response that could be used. I'm not an expert on NATS, but it's true that seems not enough only with NumPending. Maybe we could add a parameter in the metadata to include or not other fields.
Are you willing to contribute it?

toniopelo · 2022-11-01T21:04:54Z

Hi @JorTurFer, thanks for you reactivity! I'm not an expert on NATS either but I stumble across this issue so I guess this is a limitation of the current implementation.
I would gladly contribute, I just have zero experience in golang (except a debug once or twice), but this seems definitely feasible, even for a golang newbie like me :).
I'll give it a try if I manage to have a proper dev environment working (didn't try yet) and I'll keep you updated.

JorTurFer · 2022-11-01T21:27:32Z

Nice!
You could use devcontainers to have a dev environment ready to run, KEDA supports it, so you don't need to battle with that.
The section about debugging could be useful to you too in order to debug it if you needed.
Last but not least, if you use devcontainers, remember you are running inside a container, so if you use minikube or any other local cluster, you won't reach it as default, there is another section about how to solve it in the docs.

Of course, you can ask us and we will help you (by github or by KEDA slack channels)

toniopelo added the bug Something isn't working label Oct 30, 2022

tomkerkhove moved this to Proposed in Roadmap - KEDA Core Oct 30, 2022

tomkerkhove added this to Roadmap - KEDA Core Oct 30, 2022

JorTurFer assigned toniopelo Nov 1, 2022

toniopelo mentioned this issue Nov 2, 2022

fix(nats-jetstream): correctly count messages that should be redelivered (waiting for ack) towards keda value #3809

Merged

6 tasks

JorTurFer moved this from Proposed to In Progress in Roadmap - KEDA Core Nov 3, 2022

JorTurFer moved this from In Progress to In Review in Roadmap - KEDA Core Nov 3, 2022

JorTurFer closed this as completed in #3809 Nov 9, 2022

Repository owner moved this from In Review to Ready To Ship in Roadmap - KEDA Core Nov 9, 2022

JorTurFer moved this from Ready To Ship to Done in Roadmap - KEDA Core Mar 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NATS Jetstream] Do not take message retries into account when scaling #3787

[NATS Jetstream] Do not take message retries into account when scaling #3787

toniopelo commented Oct 30, 2022

JorTurFer commented Nov 1, 2022

toniopelo commented Nov 1, 2022

JorTurFer commented Nov 1, 2022

[NATS Jetstream] Do not take message retries into account when scaling #3787

[NATS Jetstream] Do not take message retries into account when scaling #3787

Comments

toniopelo commented Oct 30, 2022

Report

Expected Behavior

Actual Behavior

Steps to Reproduce the Problem

Logs from KEDA operator

KEDA Version

Kubernetes Version

Platform

Scaler Details

Anything else?

JorTurFer commented Nov 1, 2022

toniopelo commented Nov 1, 2022

JorTurFer commented Nov 1, 2022