-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create memhttp package to debug flaky testcases #594
Conversation
Changes httptest.Server to use an in memory network. This makes testing more robust to ephemeral port issues under load. Two flaky test cases were now easily reproducible and fixed. Both occured from races between starting the network request in a go routine and checking for a http2 stream and erroring if not. This can error on Send or Receive depending on how fast the request can complete. See: golang/go#14200
Running many iterations locally to find more, need to fix another one:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good so far to me. Are you planning to fix that other flake you found in this PR? Or do you want this PR approved now so you can merge it and fix the other issue in a subsequent PR?
@jhump had to step out but hopefully can debug it now. Not sure what is calling close twice. |
I left it as a TODO. I think it's related to |
Benchmarking increase in allocs, making the - BenchmarkConnect/unary-8 5433 1981280 ns/op 5651212 B/op 229 allocs/op
+ BenchmarkConnect/unary-8 5440 1980366 ns/op 5729820 B/op 237 allocs/op |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe better to use local network for bench.
Yeah, I think so, too. The benchmark seems less useful if it's intentionally different from real world conditions 🤷
Direction has changed a little since original review
Test is still flaky 😞, need to relook into this. Two errors:
|
Removes SetError in favour of reporting errors on BlockUntilResponseReady. ensureRequestMade removes sync.Once as the contract of ClientStream state calling Send with CloseRequest is not safe to call concurrently.
@jhump behaviour in duplexHTTPCall has changed in the last 2 commits to address the flaky tests. I've removed |
client_ext_test.go
Outdated
@@ -106,10 +106,8 @@ func TestClientPeer(t *testing.T) { | |||
err = clientStream.Send(&pingv1.SumRequest{}) | |||
assert.Nil(t, err) | |||
// server streaming | |||
serverStream, err := client.CountUp(ctx, connect.NewRequest(&pingv1.CountUpRequest{})) | |||
serverStream, err := client.CountUp(ctx, connect.NewRequest(&pingv1.CountUpRequest{Number: 1})) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this change relevant? Why would the request data make this flaky or not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It causes the server to error with invalid_argument: number must be positive: got 0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay. But was that not true before? Why wasn't this a problem before this PR? (I'm trying to understand why change it now.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test is checking in the interceptor that the peer address and peer protocol is set. It's testing each request style by sending an empty but valid request, however server stream is the only one that will cause the server handler to error. This can now report the server error instead of success. I don't think this is intended behaviour of the test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not setting Number
will cause the following only for grpcweb:
--- FAIL: TestClientPeer (0.00s)
--- FAIL: TestClientPeer/grpcweb (0.07s)
client_ext_test.go:112:
assertion: assert.Nil
got: unknown: http2: response body closed
Close
to fail withhttp2: response body closed
here- Errors on
discard
which usually receives anio.EOF
The grpcweb error response is a header only response, need to investigate why this error is returned on a no body.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now flaky with:
client_ext_test.go:113:
assertion: assert.Nil
got: unknown: io: read/write on closed pipe
Discard is reading from a closed response which is causing an io.ErrClosedPipe
. This is probably due to the new memhttp. Need to investigate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now flaky with:
By "now", do you mean after the last commit you pushed, which removed calls to response.Body.Close()
?
Also, in that latest commit, who is the caller that is now responsible for calling Close
? Is this done when StreamClientConn.CloseResponse()
is called? Is there a chance some code path is failing to call that? (In particular, I'm suspicious why this would only happen for gRPC-Web and not others.)
Discard is reading from a closed response which is causing an
io.ErrClosedPipe
.
It seems like discard
should be resilient against io.ErrClosedPipe
and replace it with io.EOF
when observed, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's due to the change in closing the pipe writer on Reader instead of Writer. net.Pipe
will close both halves but the error is given on the opposite end. So closing the Reader made the Writer return nice io.EOF
errors but the Reader
to receive io.ErrClosedPipe
. HTTP2 libraries set the request error to the response here
I'll need to write more testing around closing, and create an issue for testing with different latencies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HTTP2 sets the error on the response then here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discard should never see io.ErrClosedPipe
, however I've added context error checks to avoid issues with cancelation errors bubbling up to CloseResponse()
errors. These are tested in TestServer
tests.
Working and passing with -count=100
locally, can't see any more flaky test issues. Could you please re-review.
if _, err := discard(d.response.Body); err != nil { | ||
if _, err := discard(d.response.Body); err != nil && | ||
!errors.Is(err, context.Canceled) && | ||
!errors.Is(err, context.DeadlineExceeded) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added these error checks to make CloseResponse
behaviour more consistent. Otherwise Close does race with context cancel on errors causing either nil or a context error. Maybe we can revisit the discard behaviour in another PR.
@@ -257,15 +263,15 @@ func (d *duplexHTTPCall) makeRequest() { | |||
err = NewError(CodeUnavailable, err) | |||
} | |||
d.responseErr = err | |||
d.requestBodyReader.CloseWithError(io.EOF) | |||
d.requestBodyWriter.Close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this changed? Since it was using CloseWithError
and not simply Close
, this would not be the source of io.ErrPipeClosed
issues. In fact, I believe this will cause attempts to send on the broken stream to now return io.ErrPipeClosed
instead of io.EOF
.
In order to control the error received by the write half (senders), you must close the read half (which is what it was doing before), and vice versa.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Close
is just calling CloseWithError(nil)
which then gets set to io.EOF
: https://cs.opensource.google/go/go/+/refs/tags/go1.21.3:src/io/pipe.go;l=100
It's always the opposite side of the pipe that gets the error with the current side getting: io.ErrClosedPipe
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SetError
called close on read:
connect-go/duplex_http_call.go
Line 236 in d9b61e9
_ = d.requestBodyReader.Close() |
Was my mistake switching to
Writer
, it's now fixed in the latest PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the current side getting:
io.ErrClosedPipe
SetError
called close on read:
Exactly. By closing the writer here, we will induce io.ErrClosedPipe
in calls to send (which is what used the writer). The old behavior, closing the reader, is better since it results in EOF errors instead.
Was my mistake switching to
Writer
, it's now fixed in the latest PR.
I don't follow this. I'm looking at refreshed diff and it's still closing the writer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I confused myself and switched the ordering. My previous comments don't align.
Closing the write side makes Write
on the pipe error with io.ErrClosedPipe
. Connect checks for this when sending:
connect-go/duplex_http_call.go
Line 106 in d9b61e9
if err != nil && errors.Is(err, io.ErrClosedPipe) { |
However, on the read side we need to return
io.EOF
to avoid HTTP2 setting the io.ErrClosedPipe
as the error for the response body: #594 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CloseWrite
behaviour was different to SetError
:
connect-go/duplex_http_call.go
Line 133 in d9b61e9
return d.requestBodyWriter.Close() |
connect-go/duplex_http_call.go
Line 236 in d9b61e9
_ = d.requestBodyReader.Close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. Thank you for explaining!
Creates two internal packages:
memhttp
andmemhttptest
based on https://github.com/akshayjshah/memhttp . We use this for testing connect's services.Changes
httptest.Server
to use an in memory network. This makes testing more robust to ephemeral port issues under load. Two flaky test cases were now easily reproducible and fixed. Both occurred from races between starting the network request in a go routine and checking for a http2 stream and erroring if not. This can error on Send or Receive depending on how fast the request can complete.On
duplexHTTPCall
I removedSetError
to avoid overwriting the original response error. NowBlockUntilResponseReady
returns an error from the response initialization.See: golang/go#14200
Fixes:
TestServer/http1/*/*/cumsum_cancel_before_send
: send can error on write message.TestBidiRequiresHTTP2
: send canio.EOF
error on write, or succeed.TestBidiOverHTTP1
: same as above.TestClientPeer/grpcweb
: stream request sends an invalidate payload invoking server error, races with interceptor.Two places where errors raced:
connect-go/duplex_http_call.go
Line 290 in 525cf76
connect-go/handler.go
Line 184 in 525cf76