-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
envoy-1.31.2 nghttp2 racing / DownStreamProtocolError / codec_error:The_user_callback_function_failed #36387
Comments
It might be helpful to try this with |
@kyessenov do you know if it requires special build or something? Otherwise nothing additional is logged when I'm enabling ENVOY_NGHTTP2_TRACE with http2 tracing |
yeah, it sounds like nghttp2 needs to be build with DEBUGBUILD https://nghttp2.org/documentation/nghttp2_set_debug_vprintf_callback.html then a question how to pass it though all of those bazel levels 🤔 |
I think you are right, the trace nghttp2 logs seem to have disappeared on recent builds. |
@kyessenov, I end up patching v1.31.2
connection 77: I added few seconds before and after for nghttp2 as have no idea how to filter it: trace log: https://github.com/pgeler/test/blob/main/grpc_connection_trace.txt |
@kyessenov any thoughts? |
TLS didn't change anything |
just as an observation, if somebody actually looking at this, it's working pretty stable when listener configured when |
This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions. |
@alyssawilk, @yanavlasov, any thoughts or suggestions where to look with this nghttp2 issue? |
To workaround with this, we forked envoy and triggered |
@alyssawilk any thought on disabling this nghttp2 rate limit? or make it configurable? |
This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions. |
I'll create a patch then... |
This is to reiterate on the issue with nghttp2 implementation and envoy gRPC clusters #30882 which was closed due to inactivity and understanding that oghttp2 will replace it within a release or so. So as of 1.31.2, nghttp2 seem not going away any time soon as oghttp2 have some observable issues with general http2 implementation. So we are dead in the water right now and thinking to rollback before 1.27.2 or something...
Description:
Overall it seem like racing condition on HTTP2, most of the time I was able it to reproduce with GRPC service., There is seem to be correlation on larger number of clients, however I have examples where similar issue happening with envoy to envoy HTTP path-through connectivity.
This particular example was created after ~5min, 1000k RPS, execution on 50+ simultaneous clients, grpc is a ext_proc stream-based service, but I can confirm that the issue isolated to newer envoy not to the ext_proc filter as b-sidecar(by rollback to envoy-1.27):
client -> envoy(a router - ext_proc) -> envoy(b-sidecar) -> grpc service:
http2 specific clusters configuration:
after period of time and high frequency requests:
requests starts failing with DPE(b-sidecar):
all requests isolated to single connection and it seem that is happening on connection per-connection basis:
tracelog last couple of seconds for the connection 233 (end of the life(b-sidecar) where issue occured):The text was updated successfully, but these errors were encountered: