-
Notifications
You must be signed in to change notification settings - Fork 651
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Calls trigger ERR_HTTP2_SESSION_ERROR with node 20.10.0 #2625
Comments
If your code works in one version of Node, and doesn't work in a later version of Node, my first assumption is that this is a regression in Node itself. Since log says that the session ended with a GOAWAY with error code 2 (internal error), that implies that it received that error from the server, though it is possible that the runtime synthesized that GOAWAY on the client. If it did come from the server, then maybe the client behaves differently in Node 20 vs earlier versions of Node that causes the server error. |
I am facing exactly the same error in my application that involves a high degree of concurrency. I have investigated the traffic with Wireshark and the server sends a HTTP2 GOAWAY with Code 2. The @grpc/grpc-js client then throws the mentioned exception and in my case does not retry the request. My logs are identical to the one of @AlbertLlebaria. Please let me know when there is something else that I can provide to further investigate the issue. Thanks a lot! :) |
I am also facing the same issue in my application and mostly during peak usage. I am getting the same exception which causing context cancellation for other upstream services. I am on node version 20.10.0 and using grpc-js version Please let me know if I can provide any more context or help in investigation of the issue. Thank you. |
Practically speaking, I think the only real impact of this bug is a slightly confusing error description. The logs show that the client received a GOAWAY with code 2. When Node receives that, it also ends any open streams on that connection with code 2. gRPC translates the HTTP2 code 2 into the gRPC status code 13, which is what we are seeing here. The confusing part is that this should be reported as an internal server error, but the gRPC client is interpreting it as an internal client error. |
I can recreate this error fairly easily and the cause of it is that under load when you have many streams close at the same time (especially via RST_STREAM) - session socket would receive UV_EOF, then that session will be terminated with GOAWAY sent from nodejs server. If you are interested - there is a little thread here: https://twitter.com/v_aminev/status/1757476612671492390 Typically I can recreate this on my local machine with ~25000 sessions closing at the same time. My gut tells me it has something to do with ACKs not happening on time or some sort of timeout/but in libuv, where it would raise UV_EOF (technically it should be FIN packet received by the stream, but that is not happening here) |
@AVVS A GOAWAY with a non-zero error code causing open streams to close is normal behavior, at least at the protocol level. We don't need to look for any lower-level failures to explain why this is happening. |
Not what I mean here. Scenario is the following:
The error is UV_EOF on step 5 - stream is perfectly fine, no goaway was received or sent yet |
https://twitter.com/v_aminev/status/1759772087512686736 - a little more context, first link mihgt not have opened the full thread.
many RST_STREAM -> 0.24 (due to grpc-timeout header for server stream, load testing purposes) To sum up - I don't think its a bug in grpc-js, but I do think there is something wrong with libuv handling of the opened socket, or maybe I am simply not aware of some sort of settings that I could change. This doesn't happen if the node process is not overloaded, so maybe there is a timeout or something like that where a stream read is scheduled, callback is not invoked for one reason or another and it throws UV_EOF |
And you're saying that this UV_EOF event causes Node to report that it has received a GOAWAY with error code 2, even though it hasn't? |
It's unlikely that this is directly related (because of GOAWAY code 2), but I thought it might be somewhat related as "peak usage" was mentioned later - should be related to that, rather than the original issue |
If the things you're describing aren't related to the original issue, and you don't think they're a bug in grpc-js, why are you posting it here? |
@murgatroid99 on the off chance that it is related. Even if it's not - I think it might be helpful in debugging for others - specifically they could turn on NODE_DEBUG_NATIVE=HTTP2SESSION,HTTP2STREAM NODE_DEBUG=http,net in env, this will give more logs and/or gather a little bit more with wireshark or similar inspection tools so that the reason for goaway[2] is clear EDIT: if you think whats been posted is not helpful at all - I'll just clean the messages up, sorry |
@murgatroid99 handleTrailers Should here call halfClose() instead of endCall() ? endCall will send RST_STREAM, not END_STREAM |
@bbit8 No. When the client receives trailers on a stream, that indicates that the stream has finished. |
@murgatroid99 When in high load, RST_STREAM will trigger this silly rate limiter in node.js. Is that possible? |
Maybe it could, but it shouldn't. The rate limit was added to prevent the HTTP/2 rapid reset attack, which involves a client creating streams and then immediately closing them, many times. Besides, that code isn't sending the RST_STREAM explicitly. It's just calling |
@murgatroid99 endCall() -> destroyHttp2Stream() -> http2Stream.close() . http2Stream.close() will send RST_STREAM. This is in the Node http2 document. here. The rate limiter will send GOAWAY when receiving excessive amount of RST_STREAM, no matter who sends the RST_STREAM. So when in high load, grpc-js client receives excessive amount of response, then gets excessive amount of RST_STREAM, then triggers the rate limiter in Node. |
I honestly forgot I had that in there.
|
I started noticing this error occasionally on bursts of sequential workloads, 1000 individual calls per second from one client to the same server.
It started occuring after we updated envoy on both server and client from v1.27.2 to v1.27.3, i cannot correlate anything from the diff or changelog back to this, I haven't looked in the dependency upgrades yet. Using a rate limiter was a solution for the short term but i'm hoping the redundant RST_STREAM #2695 PR will prevent envoy from "GOAWAY"ing the client with the burst sequential workload. |
I just published the change in #2695 in grpc-js version 1.10.4. Please try it out. |
@murgatroid99 in order to get this update in Firebase's dependencies, I ran If the error persists, I'll be back. |
We gonna have a problem with NodeJS 18. It uses
|
@murgatroid99 seems like the error still there. I'm getting this after upgrading to
|
We are also seeing this error |
We're seeing a similar issue, happens on both node 18 and 20 using In our setup, Envoy proxy is configured with The problem is Here's the fragment of the client log with
I wonder if there's any workaround or configuration to fix this. |
I think the key line is "error sending frame type 1 on stream 20001, code: 2" Here's what I think is happening in that log: the client trying to start the 10,001st stream races with the incoming GOAWAY, and when the client tries to send the headers for that stream, it causes this error, which Node reports as an INTERNAL error (indistinguishable from an internal server error). I don't think there's much gRPC can do about this: the best fix would be for the http2 module to report a different error code in this situation. |
@murgatroid99 thanks for your response! I reproduced the issue using pure Basically we need
Running the client with
I find the discussions you had with node.js maintainers here and here particularly relevant. |
I don't think those particular linked issues quite fit this. The first is just a general discussion about HTTP2, and the second is about the fact that if you make a request after receiving a GOAWAY, it throws an error. This problem is more about a step after that, where the client sends a GOAWAY with a non-zero code, causing existing requests to be cancelled. I think it's possible that the code path for handling that error in grpc-js is part of the problem, specifically the |
I was referring to the part of that discussion where you raised the issue of unresolved edge cases interfering with |
Problem description
When running multiple gRPC requests asynchronously with the same gRPC client, gRPC client at some point starts cancelling thoe calls with the following error message:
Received RST_STREAM with code 2 triggered by internal client error: Session closed with error code 2
I've enabled the following ENV variables
GRPC_TRACE=all
andGRPC_VERBOSITY=DEBUG
and these are the logs for one of the calls that received the RST_STREAM error.Right before the calls start triggering the
ERR_HTTP2_SESSION_ERROR
, i found the following a log explaining the client got closed by GOAWAY with code 2.0.0.0.0:3000 connection closed by GOAWAY with code 2
.You can find the full logs in the attached file:
grpcLogs.txt
This error happened after upgrading our NodeJS project to version 20.10.0. This problem is only replicable when there are a ''large'' (2361 calls from the logs) amount of gRPC calls triggered. When there is a reduced amout of calls, I couldn't replicate the error.
Moreover, the
ERR_HTTP2_SESSION_ERROR
is not replicable with lower versions Node versions ( issue didn't appear when runnin on node 20.5.1 ).Reproduction steps
Use Nodejs 20.10.0 and install grpc/grpc-js 1.9.12.
Run multiple gRPC calls with the same client asynchornously. Issue is also replicable with other older versions of
grpc/grpc-js such as 1.8.14 .
Environment
The text was updated successfully, but these errors were encountered: