Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net/http: HTTP/2 response body Close method sometimes returns spurious context cancelation error (1.17.3 regression) #49366

Closed
cespare opened this issue Nov 4, 2021 · 14 comments
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.

Comments

@cespare
Copy link
Contributor

cespare commented Nov 4, 2021

What version of Go are you using (go version)?

Go 1.17.3

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

linux/amd64

What did you do?

  • Use an http.Client with (any) Timeout set
  • Make an HTTP/2 request
  • Read the body
  • Close the body

Demo code:

package main

import (
	"io"
	"log"
	"net/http"
	"time"
)

func main() {
	client := &http.Client{Timeout: time.Minute}
	for i := 0; i < 20; i++ {
		resp, err := client.Get("https://google.com")
		if err != nil {
			log.Fatal(err)
		}
		if _, err := io.Copy(io.Discard, resp.Body); err != nil {
			log.Fatal(err)
		}
		if err := resp.Body.Close(); err != nil {
			log.Fatalf("[i=%d] body close error: %s", i, err)
		}
	}
}

(I'm pinging google.com as an easy way to get an HTTP/2 request going.)

What did you expect to see?

No error, at least not on the resp.Body.Close() call.

What did you see instead?

For example:

2021/11/04 15:49:08 [i=0] body close error: context canceled

Sometimes it takes a few iterations to hit an error.

This is HTTP/2-specific: with GODEBUG=http2client=0 it does not reproduce.

This does not occur in Go 1.17.2 but it started happening with 1.17.3.

I realize it's weird to be checking the resp.Body.Close error in the first place. This came up in the context of twirp; I'm going to send a PR to them to stop checking the error. Even so, I don't think we should get this error in the client. It's especially confusing because we get the error immediately (so it's not related to the 1 minute timeout specifically).

I bisected on master to 7109323, which is CL 353870. The original x/net/http2 CL is CL 353390 and it was backported for Go 1.17.3 as CL 357683.

/cc @neild @bradfitz @nightlyone @fraenkel

@neild
Copy link
Contributor

neild commented Nov 5, 2021

resp.Body.Close wants to wait until the request stream has been finalized. (This ensures that, among other things, if we immediately send another request, the previous request no longer counts against the connection's concurrency limit.)

Finalizing the stream (in cleanupWriteRequest) can write to the network (sending a stream reset. Network writes can take an arbitrarily long amount of time, so resp.Body.Close waits until the cleanup happens or until the request is canceled, whichever happens first.

A http.Client with a non-zero timeout wraps the response body in a *http.cancelTimerBody. The cancelTimerBody cancels the request context after reading any error, including io.EOF, from the response body.

So if you read the full response, the request context is always canceled before resp.Body.Close is called.

I'm not immediately certain what the right fix is: resp.Body.Close waits for the stream cleanup because we have tests which expect that all stream state is cleaned up after the response body is closed. This seems like a reasonable assumption, and a good property to preserve. But we don't want to block indefinitely, so there needs to be some bound on how long resp.Body.Close will take. The context lifetime seems like the perfect bound, but we apparently can't use it.

@cespare
Copy link
Contributor Author

cespare commented Nov 5, 2021

Naive suggestion from someone without context:

If cancelTimerBody.Close would return context.Canceled and it had previously called stop, then could it know that the latter caused the former and return nil as a special case?

@neild
Copy link
Contributor

neild commented Nov 5, 2021

That fixes the problem where we return context.Canceled instead of nil, but it doesn't fix the problem where we want to wait for the request to be cleaned up before returning from Close.

@thanm thanm added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Nov 5, 2021
@cespare
Copy link
Contributor Author

cespare commented Nov 5, 2021

@neild I'm not sure I'm following you. Altering whether the function returns nil or context.Canceled doesn't change how long the function waits before returning.

Or are you saying that the problem I'm noticing is a symptom of a deeper problem, which is that resp.Body.Close doesn't wait for stream cleanup as intended if there's a timeout set?

@neild
Copy link
Contributor

neild commented Nov 6, 2021

Yes, that's right: resp.Body.Close is supposed to be waiting for stream cleanup, but it's returning immediately when the body has been completely read.

@gburt
Copy link

gburt commented Nov 6, 2021

This just hit us too; sorry for the +1 post. We reverted to the previous Go version because it's not feasible to change every instance where we check such errors (and not clear that's the right thing to do anyway). This feels like a pretty serious regression.

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/362354 mentions this issue: http2: avoid spurious context cancelation error from Response.Body.Close

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/361919 mentions this issue: net/http: do not cancel request context on response body read

@chrisntb
Copy link

We also saw this error in our logs when we upgraded to 1.17.3. We downgraded the log to warn level after seeing this issue to remove the noise.

@nicolaasuni
Copy link

nicolaasuni commented Nov 12, 2021

This issue is serious for me as I have a case when the entire body is lost before being processed.
This is happening when using httputil.DumpResponse to log the response body before returning it.
For example: https://github.com/nexmoinc/gosrvlib/blob/main/pkg/httpclient/client.go#L95 but with the body param set to true.
For now I solved by adding a Context Timeout.

gopherbot pushed a commit to golang/net that referenced this issue Nov 12, 2021
When a request has a deadline set, the net/http package can cancel
the request context after reading an error (including io.EOF) from
the response body. Avoid returning a spurious error from
transportResponseBody if the request context is canceled.

For golang/go#49366.

Change-Id: I645466a1d6e405bdf17b3cc087204e4622a140eb
Reviewed-on: https://go-review.googlesource.com/c/net/+/362354
Trust: Damien Neil <dneil@google.com>
Run-TryBot: Damien Neil <dneil@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
@neild
Copy link
Contributor

neild commented Nov 12, 2021

@gopherbot Please open backport issues for 1.16 and 1.17

@gopherbot
Copy link
Contributor

Backport issue(s) opened: #49558 (for 1.16), #49559 (for 1.17).

Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://golang.org/wiki/MinorReleases.

krasi-georgiev added a commit to cryptoriums/flashbot that referenced this issue Nov 13, 2021
becasue of this bug
golang/go#49366

Signed-off-by: Krasi Georgiev <8903888+krasi-georgiev@users.noreply.github.com>
@gopherbot
Copy link
Contributor

Change https://golang.org/cl/368084 mentions this issue: [release-branch.go1.16] net/http: do not cancel request context on response body read

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/368085 mentions this issue: [release-branch.go1.17] net/http: do not cancel request context on response body read

gopherbot pushed a commit that referenced this issue Dec 1, 2021
…sponse body read

When sending a Request with a non-context deadline, we create a
context with a timeout. This context is canceled when closing the
response body, and also if a read from the response body returns
an error (including io.EOF).

Cancelling the context in Response.Body.Read interferes with the
HTTP/2 client cleaning up after a request is completed, and is
unnecessary: The user should always close the body, the impact
from not canceling the context is minor (the context timer leaks
until it fires).

Fixes #49559.
For #49366.

Change-Id: Ieaed866116916261d9079f71d8fea7a7b303b8fb
Reviewed-on: https://go-review.googlesource.com/c/go/+/361919
Trust: Damien Neil <dneil@google.com>
Run-TryBot: Damien Neil <dneil@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
(cherry picked from commit 76fbd61)
Reviewed-on: https://go-review.googlesource.com/c/go/+/368085
Reviewed-by: Michael Knyszek <mknyszek@google.com>
gopherbot pushed a commit that referenced this issue Dec 1, 2021
…sponse body read

When sending a Request with a non-context deadline, we create a
context with a timeout. This context is canceled when closing the
response body, and also if a read from the response body returns
an error (including io.EOF).

Cancelling the context in Response.Body.Read interferes with the
HTTP/2 client cleaning up after a request is completed, and is
unnecessary: The user should always close the body, the impact
from not canceling the context is minor (the context timer leaks
until it fires).

For #49366.
Fixes #49558.

Change-Id: Ieaed866116916261d9079f71d8fea7a7b303b8fb
Reviewed-on: https://go-review.googlesource.com/c/go/+/361919
Trust: Damien Neil <dneil@google.com>
Run-TryBot: Damien Neil <dneil@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
(cherry picked from commit 76fbd61)
Reviewed-on: https://go-review.googlesource.com/c/go/+/368084
Reviewed-by: Michael Knyszek <mknyszek@google.com>
bep added a commit to bep/hugo that referenced this issue Dec 2, 2021
bep added a commit to bep/hugo that referenced this issue Dec 2, 2021
bep added a commit to bep/hugo that referenced this issue Dec 2, 2021
bep added a commit to gohugoio/hugo that referenced this issue Dec 2, 2021
dteh pushed a commit to dteh/fhttp that referenced this issue Jun 22, 2022
When a request has a deadline set, the net/http package can cancel
the request context after reading an error (including io.EOF) from
the response body. Avoid returning a spurious error from
transportResponseBody if the request context is canceled.

For golang/go#49366.

Change-Id: I645466a1d6e405bdf17b3cc087204e4622a140eb
Reviewed-on: https://go-review.googlesource.com/c/net/+/362354
Trust: Damien Neil <dneil@google.com>
Run-TryBot: Damien Neil <dneil@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
@golang golang locked and limited conversation to collaborators Dec 1, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

7 participants