Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTTP2 error under load. #1956

Closed
vishvajeet-patil opened this issue Oct 14, 2022 · 3 comments · Fixed by #2006
Closed

HTTP2 error under load. #1956

vishvajeet-patil opened this issue Oct 14, 2022 · 3 comments · Fixed by #2006
Assignees

Comments

@vishvajeet-patil
Copy link

Describe the bug
When putting some load we are getting an error {fetch_error:hyper::Error(Http2, Error { kind: GoAway(b"", NO_ERROR, Remote) })}target:apollo_router::services::subgraph_service} which implies connection destroyed by subgraph and router does not retry the request thereby resulting in an error.

To Reproduce
Steps to reproduce the behavior:

  1. Setup any subgraph services with http2 support
  2. Submit some sort of load on the service.
  3. Hopefully the above error will get logged.

Expected behavior
The error should not happen in the first place and if the http2 connection gets destroyed the router should retry the request automatically.

Output
Error log -
{fetch_error:hyper::Error(Http2, Error { kind: GoAway(b"", NO_ERROR, Remote) })}target:apollo_router::services::subgraph_service}

Desktop (please complete the following information):

  • OS: Docker container (Base image Debian:bullseye-slim)

Additional context
GO_AWAY frame error happens when the subgraph destroys the http2 connection gracefully (why would that happen?) and even if it happens we should retry the active requests over those connections.

@vishvajeet-patil vishvajeet-patil changed the title HTTP2 error in rust gateway HTTP2 error in rust router Oct 14, 2022
@vishvajeet-patil vishvajeet-patil changed the title HTTP2 error in rust router HTTP2 error in rust router under load. Oct 15, 2022
@abernix abernix changed the title HTTP2 error in rust router under load. HTTP2 error under load. Oct 17, 2022
@Geal
Copy link
Contributor

Geal commented Oct 19, 2022

could you describe a bit more what the expected behavior would be here? Request retries are a tricky feature, especially under load.
Here, what's probably happening is that the subgraph is sending the GO_AWAY frame because it is overloaded, so retrying requests while new requests are adding up too, to that same subgraph, would make things worse.

Until now we have avoided implementing in the router some infrastructure related features like load balancing, circuit breaking or retries, because they would be better addressed at the service mesh layer. They could still be done in the router, but they require some thought to be done properly

@vishvajeet-patil
Copy link
Author

@Geal But this GO_AWAY NO_ERROR implies graceful shutdown, isn't it? Will the subgraph throw this error if it is under load?

@oliverjumpertz
Copy link
Contributor

oliverjumpertz commented Oct 20, 2022

To add to this:

This currently also happens to us.

Especially in a Kubernetes context, where nginx is the primary choice of ingress, this happens regularly.

nginx shuts its http2 connections down every 1000 streams (default) to free memory allocations it couldn't free earlier.

This basically means that for every 1000 requests (Router and also other callers), a connection drops.

In this case, hyper cannot gracefully handle the drop of a connection while a request is in flight, and this error happens.

See here: hyperium/hyper#2500

Additionally see this section of the nginx documentation: https://nginx.org/en/docs/http/ngx_http_core_module.html#keepalive_requests

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants