Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTTP idle timeout relay ("resume wait requests") #30

Open
danthegoodman1 opened this issue May 27, 2024 · 2 comments
Open

HTTP idle timeout relay ("resume wait requests") #30

danthegoodman1 opened this issue May 27, 2024 · 2 comments
Labels
enhancement New feature or request

Comments

@danthegoodman1
Copy link
Owner

See https://modal.com/blog/serverless-http#:~:text=Dealing%20with%20HTTP%20idle%20timeouts

From the blog:

Dealing with HTTP idle timeouts

Okay, so if we had been a standard runtime, we would be done with HTTP now. But we’re still not done! There’s one more thing to consider: long-running requests.

If you make an HTTP request and the server doesn’t respond for 300 seconds, then Chrome cancels the request and gives you an error. This is not configurable. Other browsers and pieces of web infrastructure have varying timeouts. Our users often end up running expensive models that take longer than 5 minutes, so we need a way to support long-running requests.

Luckily, there’s a solution. After 150 seconds (2.5 minutes), we send a temporary “303 See Other” redirect to the browser, pointing them to an alternative URL with an ID for this specific request. The browser or HTTP client will follow this redirect, ending their current stream and starting a new one.

Browsers will follow up to 20 redirects for a link, so this effectively increases the idle timeout to 50 minutes. An example of this in action is shown below, with a single redirect.

We can handle this with groupcache: When we plan to send a 303 redirect, we can register an ID in groupcache. Then, when a request comes in with the ID, if we have it we can forward it to that machine internally (just proxy it?). The timeout can be short, and we can make a new ID just before we do it again. The request-owning server should know whether it's been used already so we can reject duplicate "resume wait requests".

@danthegoodman1 danthegoodman1 added the enhancement New feature or request label May 27, 2024
@danthegoodman1
Copy link
Owner Author

The groupcache API is not good for this (and hacking around it is not ideal to maintain).

Might be better for redis anyway since this is relatively low-volume (likely won't overload even at scale), not super important (dropping isn't the END of the world), and web servers are more likely to restart than a redis instance.

If worried about dropping can do something like a redis layer on top of foundationdb or using aws memorydb.

Another DB could work fine too, but have to consider what's easier to add casually vs. what are people likely using (e.g. using cassnadra is likely not a good idea, nor supporting multiple interfaces or a plugin)

@danthegoodman1
Copy link
Owner Author

just encode the server and the id of the request to resume right into the redirect URL so no state is needed, and the servers just know about each other via gossip

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant