-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't send 429 when max concurrency is reached #19
Conversation
624e13c
to
5c5919e
Compare
} | ||
|
||
// we can specialize another pod | ||
// increment the number of requests waiting for a pod to finish specializing and return not found so that the executor specializes one |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
<3 this comment for clarifying this causes specialization to happen!
otelUtils.LoggerWithTraceID(req.ctx, c.logger). | ||
Info("max concurrency reached, sending request to random pod", | ||
zap.String("function", req.function), | ||
zap.String("address", svc.val.Address), | ||
zap.Int("active_requests", svc.activeRequests), | ||
zap.Int("svc_waiting", funcSvcGroup.svcWaiting), | ||
zap.Int("queue_len", funcSvcGroup.queue.Len()), | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This log feels like it could be fairly noisy for our big apps. Is it intended to be temporary? Should we make it a debug log?
// check if we have any pods that are already specialized | ||
if numPodsSpecialized > 0 { | ||
// we have specialized pod(s), just pick one at random and use it for this request | ||
svc := randomSvc(funcSvcGroup.svcs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious what you all think about this scenario:
- One specialized pod
- A massive burst of requests, going well beyond the max requests
- All requests that spill over the max requests go to that one specialized pod because the others are specializing
Could we do more harm than good by having one pod take those request? Should we consider an upper bound where we start putting these requests on the queue for a specializing pod to become available?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point, how about this instead:
if numPodsSpecializing > 0 {
// we still have pods that are specializing
// increment the number of requests waiting for a pod to finish specializing and push this request onto the queue
} else {
// we have the maximum amount of specialized pods, just pick one at random and use it for this request
}
☝️ This queues the requests for the specializing pods instead of the specialized one(s). We already have logic below to randomly distribute the queue among all the pods once we've specialized the last one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think in the absolute worst-case scenario outlined above, I'd rather have those requests distributed, or at least some of them, distributed across the specializing pods to not overwhelm the one specialized pod. Some questions that come to my mind:
- Do we have any sense of what would be overwhelming? I know node could definitely take on a ton of requests, but eventually it would topple over.
- Latency-wise, anything we can measure? Would it be better to wait for specialization, or have the specialized pod slowly work through the requests?
- If specialization fails, we keep things in the queue. Does that mean, for now, it would be best to send it at the specialized pod instead and follow up where we pop things off the queue when an existing specialized pod can serve more requests, i.e. when
markAvailable
?
I'm not sure what's best, but I'll let you make that call!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Logic looks good! Really appreciate all the additional comments, which really help with understanding <3
Once @angelini is also on board, 🚢 away 🚀
This LGTM, I'll merge this into our 1.19 branch and rebase that on the final 1.19 version that the Fission team shipped. |
Now that max concurrency is being enforced via:
We're beginning to see the following logs in production:
The executor returns that error ☝️ here when the router asks the executor for a function's endpoint and said function:
n
specialized pods +n
specializing pods >=concurrency
requestsPerPod
requestsrequestsPerPod
requests queued for itThe router communicates with the executor here via HTTP and that error ☝️ turns into a
429 Too Many Requests
HTTP response.Since the executor returns
429 Too Many Requests
, and Fission uses hashicorp's retryable http client which automatically retries 429 errors, the request eventually succeeds and the router gets a pod's endpoint.To prove that the request for a function endpoint eventually succeeds, here's how often we're seeing
concurrency limit reached
:And here's how often we're seeing
error posting to getting service for function
:As you can see, we aren't seeing any
error posting to getting service for function
logs, but rate limiting our sandbox pods in Fission rather than our API feels unnecessary.Fission uses a function's
requestsPerPod
andconcurrency
to both scale (specialize) and rate limit (return429
). This PR removes the rate limit aspect and instead returns an endpoint at random instead of429
.To 🎩 , I set our dev sandbox's
requestsPerPod
to1
and ran our steady scenario:Using Fission 1.19.2
Router logs
Using This PR