Proxy letting through too many requests before additional replicas ready #1038

noyoshi · 2024-05-21T21:14:05Z

Report

0-4 HTTPScaledObject with targetPendingRequests = 40
send 10 requests, scales up to 1, everything works
send 150 requests, all 150 requests go to the single replica while the others are scaling up

Expected Behavior

The keda proxy should know how many replicas of a given deployment are up, and only allow (N*X) requests through, where N=num of replicas of the deployment running, and X = the targetPendingRequest value

Actual Behavior

our server cannot physically handle all 150 requests with 1 replica, which causes the requests to fail. KEDA sends all the pending requests to the single replica.

Steps to Reproduce the Problem

0-4 HTTPScaledObject with targetPendingRequests = 40
send 10 requests, scales up to 1, everything works
send 150 requests, all 150 requests go to the single replica while the others are scaling up

Logs from KEDA HTTP operator

example

HTTP Add-on Version

0.7.0

Kubernetes Version

< 1.27

Platform

Other

Anything else?

No response

The text was updated successfully, but these errors were encountered:

JorTurFer · 2024-05-26T14:27:13Z

Hello @noyoshi
This is an interesting scenario tbh. I guess that something like a rate limiter can be useful at this point, but I'm not totally sure about how to implement this. Do you have any suggestion? Are you willing to contribute this this feature?
FYI @zroubalik @wozniakjan

wozniakjan · 2024-05-28T13:41:59Z

what would be the desired response from the interceptor in this case? Should it start responding with 429 when there are too many requests?

Overall stability might be slightly harder to achieve because interceptor would probably need to get involved with some form of loadbalancing too. There might be a situation, where

150 requests saturated the single replica
new replica is spinning up but is not ready yet
interceptor is rejecting new requests with 429
new replica becomes ready
the initial single replica managed to process 0 requests in that window
interceptor sees new replicas and stops responding 429
interceptor sends requests to Service which will round-robin load balance among replicas
first replica is still saturated and failing to process
second replica gets only 50% of the new load resulting in 50% fail rate

I guess we could introduce some form of request window per replica and instead of using Service for routing requests, route these to Endpoints directly based on the size of internal request windows?

noyoshi · 2024-06-06T15:49:49Z

@wozniakjan hey! Sorry for the late reply.

The interceptor shouldn't be responding with 429 - it should hold onto the requests like normal. I think it should just "release" the requests, allowing N*M requests to go to the service, where N = the autoscaling request count, and M = the number of running pods in the deployment

noyoshi · 2024-07-18T23:02:53Z

This is all under the assumption that the scaleup threshold was properly configured and it is the max concurrent requests a replica can handle at once. Otherwise, this would technically leave some throughput on the table in some cases where the single replica could handle more than N requests.

So probably something that should be up to the user to configure IMO.

noyoshi added the bug Something isn't working label May 21, 2024

zroubalik mentioned this issue Jun 7, 2024

Roadmap to GA #911

Open

19 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proxy letting through too many requests before additional replicas ready #1038

Proxy letting through too many requests before additional replicas ready #1038

noyoshi commented May 21, 2024

JorTurFer commented May 26, 2024

wozniakjan commented May 28, 2024

noyoshi commented Jun 6, 2024

noyoshi commented Jul 18, 2024

Proxy letting through too many requests before additional replicas ready #1038

Proxy letting through too many requests before additional replicas ready #1038

Comments

noyoshi commented May 21, 2024

Report

Expected Behavior

Actual Behavior

Steps to Reproduce the Problem

Logs from KEDA HTTP operator

HTTP Add-on Version

Kubernetes Version

Platform

Anything else?

JorTurFer commented May 26, 2024

wozniakjan commented May 28, 2024

noyoshi commented Jun 6, 2024

noyoshi commented Jul 18, 2024