Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NATS Jetstream] Do not take message retries into account when scaling #3787

Closed
toniopelo opened this issue Oct 30, 2022 · 3 comments · Fixed by #3809
Closed

[NATS Jetstream] Do not take message retries into account when scaling #3787

toniopelo opened this issue Oct 30, 2022 · 3 comments · Fixed by #3809
Assignees
Labels
bug Something isn't working

Comments

@toniopelo
Copy link
Contributor

Report

Messages that have already been delivered once but need to be redelivered (did not reach the max number of delivery attempt) are not counted by the scaler and the deployment is not scaled up.

Expected Behavior

Messages in a redelivery state should be taken into account when computing the number of pending messages to be processed by a consumer.

Actual Behavior

These messages are not taken into account.

Steps to Reproduce the Problem

  1. Deploy nats along with keda (repo: https://nats-io.github.io/k8s/helm/charts/, chart: nats)
  2. Use natsbox pod (deployed with the nats chart) to create a stream and a pull consumer via nats cli
    • kubectl exec -it natsbox-pod-name -- sh
    • nats stream add (interactive)
    • nats consumer add (interactive) (choose a subject that is covered by the stream you created and setup the consumer so it can redeliver a message several times, having a low ackwait time will also help)
  3. Deploy a basic ScaledObject with a nats-jetstream trigger depending on the stream and consumer we just created.
  4. Wait for the deployment to scale down to zero.
  5. Publish a message on the stream/consumer subject and process it right away without acknowledging it, so it should be redelivered: nats pub my-subject-name 'test' && nats consumer next my-stream-name my-consumer-name --no-ack
  6. One the AckWait time you setup on the consumer is reached, the message should be available for redelivery on next pull of the consumer. (You can test it with nats consumer next my-stream-name my-consumer-name --no-ack again to see it is redelivered, but you'll have to wait for AckWait time again).
  7. In this situation, keda should see that the consumer have some message pending, but it don't and the deployment is not scaled up, this cause some message to never be delivered until an other unprocessed message trigger the scale up.

Logs from KEDA operator

Nothing to show here, the keda operator just doesn't detect any deployment to scale up

KEDA Version

2.8.1

Kubernetes Version

1.25

Platform

No response

Scaler Details

NATS Jetstream

Anything else?

I'm pretty sure the problem is located on this line, we should probably use a different prop or combine several props returned by the nats metrics endpoint to have the desired count of messages. I don't know what is the set of props returned by the nats monitoring endpoint but I might take a look if I have some time.

@toniopelo toniopelo added the bug Something isn't working label Oct 30, 2022
@JorTurFer
Copy link
Member

Hi @toniopelo ,
This seems totally legit. There are other fields in the response that could be used. I'm not an expert on NATS, but it's true that seems not enough only with NumPending. Maybe we could add a parameter in the metadata to include or not other fields.
Are you willing to contribute it?

@toniopelo
Copy link
Contributor Author

Hi @JorTurFer, thanks for you reactivity! I'm not an expert on NATS either but I stumble across this issue so I guess this is a limitation of the current implementation.
I would gladly contribute, I just have zero experience in golang (except a debug once or twice), but this seems definitely feasible, even for a golang newbie like me :).
I'll give it a try if I manage to have a proper dev environment working (didn't try yet) and I'll keep you updated.

@JorTurFer
Copy link
Member

Nice!
You could use devcontainers to have a dev environment ready to run, KEDA supports it, so you don't need to battle with that.
The section about debugging could be useful to you too in order to debug it if you needed.
Last but not least, if you use devcontainers, remember you are running inside a container, so if you use minikube or any other local cluster, you won't reach it as default, there is another section about how to solve it in the docs.

Of course, you can ask us and we will help you (by github or by KEDA slack channels)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Archived in project
2 participants