Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proxy letting through too many requests before additional replicas ready #1038

Open
Tracked by #911
noyoshi opened this issue May 21, 2024 · 4 comments
Open
Tracked by #911
Labels
bug Something isn't working

Comments

@noyoshi
Copy link

noyoshi commented May 21, 2024

Report

  • 0-4 HTTPScaledObject with targetPendingRequests = 40
  • send 10 requests, scales up to 1, everything works
  • send 150 requests, all 150 requests go to the single replica while the others are scaling up

Expected Behavior

The keda proxy should know how many replicas of a given deployment are up, and only allow (N*X) requests through, where N=num of replicas of the deployment running, and X = the targetPendingRequest value

Actual Behavior

our server cannot physically handle all 150 requests with 1 replica, which causes the requests to fail. KEDA sends all the pending requests to the single replica.

Steps to Reproduce the Problem

  • 0-4 HTTPScaledObject with targetPendingRequests = 40
  • send 10 requests, scales up to 1, everything works
  • send 150 requests, all 150 requests go to the single replica while the others are scaling up

Logs from KEDA HTTP operator

example

HTTP Add-on Version

0.7.0

Kubernetes Version

< 1.27

Platform

Other

Anything else?

No response

@noyoshi noyoshi added the bug Something isn't working label May 21, 2024
@JorTurFer
Copy link
Member

Hello @noyoshi
This is an interesting scenario tbh. I guess that something like a rate limiter can be useful at this point, but I'm not totally sure about how to implement this. Do you have any suggestion? Are you willing to contribute this this feature?
FYI @zroubalik @wozniakjan

@wozniakjan
Copy link
Member

what would be the desired response from the interceptor in this case? Should it start responding with 429 when there are too many requests?

Overall stability might be slightly harder to achieve because interceptor would probably need to get involved with some form of loadbalancing too. There might be a situation, where

  • 150 requests saturated the single replica
  • new replica is spinning up but is not ready yet
  • interceptor is rejecting new requests with 429
  • new replica becomes ready
  • the initial single replica managed to process 0 requests in that window
  • interceptor sees new replicas and stops responding 429
  • interceptor sends requests to Service which will round-robin load balance among replicas
  • first replica is still saturated and failing to process
  • second replica gets only 50% of the new load resulting in 50% fail rate

I guess we could introduce some form of request window per replica and instead of using Service for routing requests, route these to Endpoints directly based on the size of internal request windows?

@noyoshi
Copy link
Author

noyoshi commented Jun 6, 2024

@wozniakjan hey! Sorry for the late reply.

The interceptor shouldn't be responding with 429 - it should hold onto the requests like normal. I think it should just "release" the requests, allowing N*M requests to go to the service, where N = the autoscaling request count, and M = the number of running pods in the deployment

@zroubalik zroubalik mentioned this issue Jun 7, 2024
19 tasks
@noyoshi
Copy link
Author

noyoshi commented Jul 18, 2024

This is all under the assumption that the scaleup threshold was properly configured and it is the max concurrent requests a replica can handle at once. Otherwise, this would technically leave some throughput on the table in some cases where the single replica could handle more than N requests.

So probably something that should be up to the user to configure IMO.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: To Triage
Development

No branches or pull requests

3 participants