-
Notifications
You must be signed in to change notification settings - Fork 272
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HTTP2 error under load. #1956
Comments
could you describe a bit more what the expected behavior would be here? Request retries are a tricky feature, especially under load. Until now we have avoided implementing in the router some infrastructure related features like load balancing, circuit breaking or retries, because they would be better addressed at the service mesh layer. They could still be done in the router, but they require some thought to be done properly |
@Geal But this GO_AWAY NO_ERROR implies graceful shutdown, isn't it? Will the subgraph throw this error if it is under load? |
To add to this: This currently also happens to us. Especially in a Kubernetes context, where nginx is the primary choice of ingress, this happens regularly. nginx shuts its http2 connections down every 1000 streams (default) to free memory allocations it couldn't free earlier. This basically means that for every 1000 requests (Router and also other callers), a connection drops. In this case, hyper cannot gracefully handle the drop of a connection while a request is in flight, and this error happens. See here: hyperium/hyper#2500 Additionally see this section of the nginx documentation: https://nginx.org/en/docs/http/ngx_http_core_module.html#keepalive_requests |
Describe the bug
When putting some load we are getting an error
{fetch_error:hyper::Error(Http2, Error { kind: GoAway(b"", NO_ERROR, Remote) })}target:apollo_router::services::subgraph_service}
which implies connection destroyed by subgraph and router does not retry the request thereby resulting in an error.To Reproduce
Steps to reproduce the behavior:
Expected behavior
The error should not happen in the first place and if the http2 connection gets destroyed the router should retry the request automatically.
Output
Error log -
{fetch_error:hyper::Error(Http2, Error { kind: GoAway(b"", NO_ERROR, Remote) })}target:apollo_router::services::subgraph_service}
Desktop (please complete the following information):
Additional context
GO_AWAY frame error happens when the subgraph destroys the http2 connection gracefully (why would that happen?) and even if it happens we should retry the active requests over those connections.
The text was updated successfully, but these errors were encountered: