HTTP idle timeout relay ("resume wait requests") #30

danthegoodman1 · 2024-05-27T22:12:15Z

See https://modal.com/blog/serverless-http#:~:text=Dealing%20with%20HTTP%20idle%20timeouts

From the blog:

Dealing with HTTP idle timeouts

Okay, so if we had been a standard runtime, we would be done with HTTP now. But we’re still not done! There’s one more thing to consider: long-running requests.

If you make an HTTP request and the server doesn’t respond for 300 seconds, then Chrome cancels the request and gives you an error. This is not configurable. Other browsers and pieces of web infrastructure have varying timeouts. Our users often end up running expensive models that take longer than 5 minutes, so we need a way to support long-running requests.

Luckily, there’s a solution. After 150 seconds (2.5 minutes), we send a temporary “303 See Other” redirect to the browser, pointing them to an alternative URL with an ID for this specific request. The browser or HTTP client will follow this redirect, ending their current stream and starting a new one.

Browsers will follow up to 20 redirects for a link, so this effectively increases the idle timeout to 50 minutes. An example of this in action is shown below, with a single redirect.

We can handle this with groupcache: When we plan to send a 303 redirect, we can register an ID in groupcache. Then, when a request comes in with the ID, if we have it we can forward it to that machine internally (just proxy it?). The timeout can be short, and we can make a new ID just before we do it again. The request-owning server should know whether it's been used already so we can reject duplicate "resume wait requests".

danthegoodman1 · 2024-05-27T22:17:35Z

The groupcache API is not good for this (and hacking around it is not ideal to maintain).

Might be better for redis anyway since this is relatively low-volume (likely won't overload even at scale), not super important (dropping isn't the END of the world), and web servers are more likely to restart than a redis instance.

If worried about dropping can do something like a redis layer on top of foundationdb or using aws memorydb.

Another DB could work fine too, but have to consider what's easier to add casually vs. what are people likely using (e.g. using cassnadra is likely not a good idea, nor supporting multiple interfaces or a plugin)

danthegoodman1 · 2024-05-27T23:12:13Z

just encode the server and the id of the request to resume right into the redirect URL so no state is needed, and the servers just know about each other via gossip

danthegoodman1 added the enhancement New feature or request label May 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HTTP idle timeout relay ("resume wait requests") #30

HTTP idle timeout relay ("resume wait requests") #30

danthegoodman1 commented May 27, 2024

Dealing with HTTP idle timeouts

danthegoodman1 commented May 27, 2024

danthegoodman1 commented May 27, 2024

HTTP idle timeout relay ("resume wait requests") #30

HTTP idle timeout relay ("resume wait requests") #30

Comments

danthegoodman1 commented May 27, 2024

Dealing with HTTP idle timeouts

danthegoodman1 commented May 27, 2024

danthegoodman1 commented May 27, 2024