Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

long headers from the server hang tonic #1834

Closed
ajwerner opened this issue Aug 1, 2024 · 5 comments
Closed

long headers from the server hang tonic #1834

ajwerner opened this issue Aug 1, 2024 · 5 comments

Comments

@ajwerner
Copy link
Contributor

ajwerner commented Aug 1, 2024

Bug Report

Version

0.12.1

Platform

Linux pc 6.5.0-42-generic #42-Ubuntu SMP PREEMPT_DYNAMIC Mon Jun 10 09:28:55 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

Description

Long error messages corrupt h2 streams in some way. I noticed this primarily because the upgrade in 9c1f2f9 / #1670 changed the default maximum header size from 16MiB to 16KiB. For bidirectional streaming, I first get the error message, "h2 protocol error: http2 error", but later streams on that connection seem somewhat busted. For unary RPCs, it seems that even the first request just hangs as opposed to the client getting an error. When I've debugged this, I've found that the client resets the stream, but then things just hang.

I've written a simple test that just hangs to demonstrate the issue:

tests/integration_tests/tests/status.rs

#[tokio::test]
async fn long_error_message() {
    struct Svc;

    const N: usize = 100_000;

    #[tonic::async_trait]
    impl test_server::Test for Svc {
        async fn unary_call(&self, _: Request<Input>) -> Result<Response<Output>, Status> {
            Err(Status::internal("a".repeat(N)))
        }
    }

    let svc = test_server::TestServer::new(Svc);

    let (tx, rx) = oneshot::channel::<()>();

    let jh = tokio::spawn(async move {
        Server::builder()
            .add_service(svc)
            .serve_with_shutdown("127.0.0.1:1340".parse().unwrap(), async { drop(rx.await) })
            .await
            .unwrap();
    });

    tokio::time::sleep(Duration::from_millis(100)).await;

    let mut channel = test_client::TestClient::connect("http://127.0.0.1:1340")
        .await
        .unwrap();

    // !!! This call never finishes!
    let err = channel
        .unary_call(Request::new(Input {}))
        .await
        .unwrap_err();

    assert_eq!(err.message(), "a".repeat(N));

    tx.send(()).unwrap();

    jh.await.unwrap();
}
@ajwerner
Copy link
Contributor Author

ajwerner commented Aug 1, 2024

On some level, the problem happens in h2:

[DEBUG h2::proto::streams::recv] stream error REQUEST_HEADER_FIELDS_TOO_LARGE -- recv_headers: frame is over size; stream=StreamId(1)
[TRACE h2::proto::streams::send] send_reset(..., reason=REFUSED_STREAM, initiator=Library, stream=StreamId(1), ..., is_reset=false; is_closed=true; pending_send.is_empty=true; state=State { inner: Closed(EndStream) }
[TRACE h2::proto::streams::send]  -> not sending explicit RST_STREAM (StreamId(1) was closed and send queue was flushed)
[TRACE h2::proto::streams::counts] transition_after; stream=StreamId(1); state=State { inner: Closed(Error(Reset(StreamId(1), REFUSED_STREAM, Library))) }; is_closed=true; pending_send_empty=true; buffered_send_data=0; num_recv=0; num_send=1
[TRACE h2::proto::streams::counts] dec_num_streams; stream=StreamId(1)

@ajwerner

This comment was marked as outdated.

@ajwerner
Copy link
Contributor Author

ajwerner commented Aug 1, 2024

Related issue: grpc/grpc-go#4265.

I think that failing with an inscrutable error is not great, but would be fine. The real problem here is that somewhere along the way we're messing up the lifecycle of these requests. It's the hangs that aren't really okay.

@ajwerner
Copy link
Contributor Author

ajwerner commented Aug 1, 2024

The client side closes the stream and then the request seems to get nothing, no error. Maybe one thing here is that the server should be rejecting the header that's too big given it's the one sending the size limit? Something is definitely messed up here, but I don't think I'll be able to debug it further right now.

@ajwerner ajwerner changed the title long error messages corrupt h2 stream after Hyper/Axum upgrade long headers from the server hang tonic Aug 2, 2024
ajwerner added a commit to ajwerner/tonic that referenced this issue Aug 2, 2024
There is a bug such that if the client sends a response with a header
value that exceeds the max_header_list_size, then RPCs just hang
(hyperium#1834). When tonic upgraded to hyper 1, it picked up [hyper#3622] which
changed the default from 16MiB to 16KiB for this configuration value.
Error messages in gRPC use headers. That means that services which ever
sent error messages in excess of 16KiB (including in their error
details!) will just hang.

This commit adds the ability for the client to configure this value to
something larger (perhaps the old default of 16MiB) to mitigate the
above-referenced bug.

[hyper#3622]: hyperium/hyper#3622
@ajwerner
Copy link
Contributor Author

ajwerner commented Aug 2, 2024

This appears to be a hyper issue. I'll close this and open something over there.

@ajwerner ajwerner closed this as completed Aug 2, 2024
github-merge-queue bot pushed a commit that referenced this issue Aug 3, 2024
There is a bug such that if the client sends a response with a header
value that exceeds the max_header_list_size, then RPCs just hang
(#1834). When tonic upgraded to hyper 1, it picked up [hyper#3622] which
changed the default from 16MiB to 16KiB for this configuration value.
Error messages in gRPC use headers. That means that services which ever
sent error messages in excess of 16KiB (including in their error
details!) will just hang.

This commit adds the ability for the client to configure this value to
something larger (perhaps the old default of 16MiB) to mitigate the
above-referenced bug.

[hyper#3622]: hyperium/hyper#3622
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant