-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
net/http: Re-connect with upgraded HTTP2 connection fails to send Request.body #32441
Comments
As I am not in a position to sign the CLA and submit a fix, it would be good to get some maintainers to have a look at this issue. |
Sorry to bump, but is anyone able to have a look at this ? @bradfitz @FiloSottile |
Thanks for opening this issue @brianfife. Also, thanks for linking to #25009 and #28206, as they appear to be related (even though they were marked as closed as of Go 1.12). I'm on 1.12.7 and I'm seeing this issue with clients connecting to the API I work on. You did a good job of showing the issue in the actual transport code, but I figured I'd add some of my client debug logs to the issue for additional context. As you described, I always see this issue manifest after the server sends a
In the above, we can clearly see that the request contained a 51 byte, form-encoded body but the H/2 framer never sent it, setting the We typically see this happen when the client is using a moderate amount of concurrency, since the chances are increased that the periodic Interestingly, the Go bug manifests when the transport is left to choose H/2 on it's own via the connection upgrade codepath (aka So, either of these options will result in a client that does not repro this bug:
|
See #866, but once in a while when using HTTP/2, stripe-go sends an empty request body which then results in a 400 back from the server. We tracked the issue down to this one open on the Go language's HTTP/2 implementation: golang/go#32441 For now, the best thing we can do is disable HTTP/2 by default for new integrations. When the bug is fixed, we can turn it back on again.
To echo @brianfife, I'm also sorry to bump this, but I'm wondering if there's any movement or thoughts on how to move forward with this particular issue? @bradfitz @FiloSottile |
Correct, body is not re-winded after Lines 427 to 434 in cf06b9a
I've missed that during #25009 fix because the trigger condition is less frequent and was not covered by my go-only test-case. This condition triggers when http2Transport.RoundTripOpt is failing to get a connection during retry (e.g. on attempt to retry after GOAWAY when all other connections in pool are terminated), which results to http2noCachedConnError that is transformed into ErrSkipAltProtocol in http2noDialH2RoundTripper.RoundTrip .
If we change to something like that, everything should be OK: if t.useRegisteredProtocol(req) {
altProto, _ := t.altProto.Load().(map[string]RoundTripper)
if altRT := altProto[scheme]; altRT != nil {
if resp, err := altRT.RoundTrip(req); err != ErrSkipAltProtocol {
return resp, err
} else if req.GetBody != nil {
// Rewind the body after ErrSkipAltProtocol, because it can
// be already used.
newReq := *req
var err error
newReq.Body, err = req.GetBody()
if err != nil {
return nil, err
}
req = &newReq
}
}
} I've tested this change with my repro and got zero "400 Bad Request" errors, but I'm not sure that it is correct solution, probably we can refactor something to make this rewind more explicit. /cc @bradfitz |
the above looks reasonable to me, @ernado do you want to open a PR? |
Change https://golang.org/cl/210123 mentions this issue: |
orchestrator and transcoder. Mitigates issue in the golang's implementation of HTTP2 golang/go#32441
orchestrator and transcoder. Mitigates issue in the golang's implementation of HTTP2 golang/go#32441
@brianfife, I left a comment on @ernado's CL 210123 indicating what I think is the way forward: namely, we need to add an optional method that a |
/cc Could we fix this issue before next release? As an important feature(GOAWAY) in Kubernetes API Server rely on that. |
No ETA from me for CL 210123, so please submit other CL if anybody wants, just check that commends from my CL are addressed. Also current workaround is to use http2-only client. |
See also https://golang.org/cl/234358 from #39086 |
Thanks, let me see if there is anything I can do to fix this issue ASAP. |
Change https://golang.org/cl/234894 mentions this issue: |
This issue was incorrectly titled (x/net/http instead of net/http) and therefore incorrectly milestoned (Unreleased) and left out of Go 1.15 release planning. I've retiteld and remilestoned. Since this bug is affecting an important existing use case (Kubernetes), I've also marked it a release blocker. I also sent a fix. The code change is fairly straightforward, but net/http itself is quite subtle, so it would be best to get the change into beta1 to get the most possible testing. Therefore I did not add the "okay-after-beta1" label. |
Thanks for correcting the title, and submitting a patch - glad it's finally getting some traction! |
Is it possible make this bugfix backport to old version like v1.14.x or v1.13.x ? Since kubernetes is still using go1.13 to build itself. |
Thanks @rsc for your PR, which is very similar with my idea to fix this issue. |
Indeed, @answer1991, thanks for that CL as a starting point. |
@gopherbot please backport, as Kubernetes is affected. |
Backport issue(s) opened: #39278 (for 1.13), #39279 (for 1.14). Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://golang.org/wiki/MinorReleases. |
Change https://golang.org/cl/242117 mentions this issue: |
…tion loss better In certain cases the HTTP/2 stack needs to resend a request. It obtains a fresh body to send by calling req.GetBody. This call was missing from the path where the HTTP/2 round tripper returns ErrSkipAltProtocol, meaning fall back to HTTP/1.1. The result was that the HTTP/1.1 fallback request was sent with no body at all. This CL changes that code path to rewind the body before falling back to HTTP/1.1. But rewinding the body is easier said than done. Some requests have no GetBody function, meaning the body can't be rewound. If we need to rewind and can't, that's an error. But if we didn't read anything, we don't need to rewind. So we have to track whether we read anything, with a new ReadCloser wrapper. That in turn requires adding to the couple places that unwrap Body values to look at the underlying implementation. This CL adds the new rewinding code in the main retry loop as well. The new rewindBody function also takes care of closing the old body before abandoning it. That was missing in the old rewind code. Thanks to Aleksandr Razumov for CL 210123 and to Jun Chen for CL 234358, both of which informed this CL. Updates #32441. Fixes #39279. Change-Id: Id183758526c087c6b179ab73cf3b61ed23a2a46a Reviewed-on: https://go-review.googlesource.com/c/go/+/234894 Run-TryBot: Russ Cox <rsc@golang.org> Reviewed-by: Damien Neil <dneil@google.com> Reviewed-by: Bryan C. Mills <bcmills@google.com> (cherry picked from commit e3491c4) Reviewed-on: https://go-review.googlesource.com/c/go/+/242117 Run-TryBot: Dmitri Shuralyov <dmitshur@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Emmanuel Odeke <emm.odeke@gmail.com>
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
yes
What operating system and processor architecture are you using (
go env
)?go env
OutputWhat did you do?
This is a follow up of issues #25009 #28206
We are sending requests as follows:
What did you expect to see?
Good behaviour
With the above client configuration, I was able to determine the following code was sending out the request using the RoundTrip function noted here:
https://github.com/golang/go/blob/release-branch.go1.12/src/net/http/transport.go#L427
In our case, for the first and following requests
t.useRegisteredProtocol(req)
returns true, soaltRT.RoundTrip(req)
tries to send via h2_bundle's RoundTrip. As no connection exists at this point in time, this will fail due to error "http2: no cached connection was available", bubbling up ErrSkipAltProtocol "net/http: skip alternate protocol", and the code continues on. n.b. subsequent requests will be sent viaaltRT.RoundTrip(req)
if there are cached connections available, however a GoAway received from the server will trigger the same behaviour of advancing to the next section of code.https://github.com/golang/go/blob/release-branch.go1.12/src/net/http/transport.go#L447
In the above for loop, for the first request without any connections available, a new transport connection is created, and
pconn.alt.RoundTrip(req)
orpconn.RoundTrip(req)
is then used to submit and fulfill the Request, and this likely completes successfully. So far, so good.As noted above, subsequent requests from our client continue to be sent from
altRT.RoundTrip(req)
, at which point they are returned, without reaching the for loop.Bad behaviour
After some time, our load balancer may respond with a GoAway response after a full request is transmitted, including data from the Request.Body, as seen with the following debug:
This results in another transport being created in the for loop shown above, however the Request.Body is not being reset before the request is sent on the newly created transport connection.
The rewind code at https://github.com/golang/go/blob/release-branch.go1.12/src/net/http/transport.go#L497 doesn't get executed if
pconn.alt.RoundTrip(req)
orpconn.RoundTrip(req)
are successful in sending a request after a connection is re-established. From my limited testing, I think the only time the rewind is reached is when TLSNextProto is manually set as it is in TestTransportBodyAltRewind, because this skips setting up the altProto map intransport.go/onceSetNextProtoDefaults()
, as altRT is nil.It looks like moving the rewind code from L497 to L455 would satisfy the case where an existing connection receives a GoAway frame and the Body had been sent to the server (as described above), however it would also execute in the case of the initial request where a transport does not exist and the request hasn't been sent (which makes that execution not necessary).
I believe this test case demonstrates the issue described above:
output
Moving the rewind code to the top of the for loop as described above results in success:
The text was updated successfully, but these errors were encountered: