Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net/http, x/net/http2: http server shutdown doesn't gracefully shut down HTTP2 connections #39776

Closed
p0lyn0mial opened this issue Jun 23, 2020 · 32 comments
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@p0lyn0mial
Copy link

p0lyn0mial commented Jun 23, 2020

What version of Go are you using (go version)?

$ go version
go version go1.13.7 darwin/amd64

Does this issue reproduce with the latest release?

Yes, tested on go version go1.14.4 darwin/amd64

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/lszaszki/Library/Caches/go-build"
GOENV="/Users/lszaszki/Library/Application Support/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GONOPROXY=""
GONOSUMDB=""
GOOS="darwin"
GOPATH="/Users/lszaszki/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/darwin_amd64"
GCCGO="gccgo"
AR="ar"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/4x/d8vbcg5s6lx229_q7l5wbw0h0000gn/T/go-build581125651=/tmp/go-build -gno-record-gcc-switches -fno-common"

What did you do?

I wrote a simple unit test that sets up an HTTP2 connection between a client and the server, sends a few requests and shuts down the backend server as soon as the last request arrives at the server. Since each request waits for 60 seconds in the handler I consider these requests to be active.

I analyzed net/http package and here's what I have found out.
On calling http.server.Shutdown() the server will be closed immediately because the connection is in StateNew (not in StateActive). In general net/http package manages connection states and provides a hook (ConnState) for reporting connections state. For HTTP 1.x it will transition the connection to StateActive as soon as it reads any byte on the wire. For HTTP 2.x h2_bundle.go will transition the connection to StateActive once there is more than one active stream. But the issue here is that ConnState hook is not wired. At least I haven’t found the wiring. So the server doesn’t know the current state of the connection.

I changed net/http so that it provides a default ConnState method. It is used by the HTTP2 implementation to report back the connection state. With this modification, my test takes ~60 seconds. The server waits until all requests complete before shutting down and all requests complete with success (HTTP 200).

What did you expect to see?

According to the documentation, the Shutdown method must wait indefinitely for active connections to return to the idle state and only then shut down the server. In that case, I would expect all requests to complete with success (HTTP 200) and to test take at least 60 seconds because each request simulates work by sleeping for that duration.

What did you see instead?

My test ended after ~6 seconds, all requests failed with Get "https://localhost:52066": http2: server sent GOAWAY and closed the connection; LastStreamID=49, ErrCode=NO_ERROR, debug="".

@p0lyn0mial
Copy link
Author

/cc @tkashem @sttts

@networkimprov
Copy link

cc @fraenkel

@fraenkel
Copy link
Contributor

From what I can tell of the test case, the current behavior is correct.

2020/06/24 08:07:52 handler: start, path=/1
2020/06/24 08:07:52 handler: callback being invoked, path=/1
2020/06/24 08:08:02 http2: Framer 0xc000174460: wrote GOAWAY len=8 LastStreamID=3 ErrCode=NO_ERROR Debug=""
2020/06/24 08:08:02 http2: Transport readFrame error on conn 0xc000082a80: (*errors.errorString) EOF
2020/06/24 08:08:02 secureServer.Serve returned - http: Server closed
2020/06/24 08:08:02 shutdown completed, it took 677.722µs
2020/06/24 08:08:02 RoundTrip failure: unexpected EOF
2020/06/24 08:08:02 client call ended
2020/06/24 08:08:02 server has been shutdown
2020/06/24 08:08:12 handler: done, path=/1
2020/06/24 08:08:12 test end
--- FAIL: TestHTTP2GracefulShutdown (27.01s)
    issue_test.go:125: status code: 200 body 
    issue_test.go:117: Get "https://localhost:34151/1": unexpected EOF

You started a request and called shutdown in the middle. The server reported that this connection will no longer accept any new streams (GOAWAY). The client closed the connection (readFrame EOF).

http2 has a different shutdown behavior which isn't documented anywhere. It is similar but it won't wait forever.

@p0lyn0mial
Copy link
Author

@fraenkel thanks for your response. Could you please check one more time?

I think that my test sends 25 requests, each waits for 60 seconds in the handler (after reaching the server) and only then it starts shutting down the server (not in the middle). I consider these requests to be active, being processed.

Actually net/http2 library promises to not close a connection until all current streams are done.

// startGracefulShutdown gracefully shuts down a connection. This
// sends GOAWAY with ErrCodeNo to tell the client we're gracefully
// shutting down. The connection isn't closed until all current
// streams are done.
//
// startGracefulShutdown returns immediately; it does not wait until
// the connection has shut down.
func (sc *http2serverConn) startGracefulShutdown() {

https://github.com/golang/go/blob/master/src/net/http/h2_bundle.go#L4862

@cagedmantis cagedmantis changed the title http.Server.Shutdown doesn't gracefully shut down HTTP2 connections x/net/http2: http server shutdown doesn't gracefully shut down HTTP2 connections Jun 24, 2020
@cagedmantis cagedmantis added this to the Backlog milestone Jun 24, 2020
@cagedmantis cagedmantis added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Jun 24, 2020
@tkashem
Copy link

tkashem commented Jun 24, 2020

Hi @fraenkel The client does not seem to receive a GOAWAY frame since it gets closed prematurely. I have put some logs below without and with the ConnState wiring. Can you please verify if this makes sense? Thank you for your time!

We designed a test to simulate a scenario where the server has an active request, its handler is executing while the server
is gracefully shutting down. The handler looks like this:

func(w http.ResponseWriter, r *http.Request) {
		<-time.After(10 * time.Second)
		close(waitCh)
		<-time.After(10 * time.Second)
	}

We kick off Graceful shutdown as soon as waitCh is closed

func() {
	<-waitCh

	ctx, cancel := context.WithTimeout(context.Background(), 10 * time.Second )
	server.Shutdown(ctx)
	cancel()
}

The following events take place when we run the test:
T+0s E1: client sends a request to te server

2020-06-24 16:12:48.080517 I | http2: Framer 0xc0001881c0: wrote HEADERS flags=END_STREAM|END_HEADERS stream=3 len=9

T+0s E2: the server receives the request and starts executing the handler.

2020-06-24 16:12:48.080901 I | http2: server read frame HEADERS flags=END_STREAM|END_HEADERS stream=3 len=9

T+10s: the following happens:

  • server.Shutdown(ctx) is invoked
    • E3: kick off http2 graceful termination - go serverInternalState.startGracefulShutdown (it runs asynchronously)
    • E4: close the connection if it is idle, inside closeIdleConns of http.Server.

E4 is where things go wrong, as @p0lyn0mial has explained above, ConnState hook is not wired and so the connection
is always in StateNew state and it gets closed prematurely by closeIdleConns.
the http2 server connection handles graceful termination in the following manner from serverConn.serve:

  • send a GOAWAY frame.
  • wait for all active streams to complete.
  • wait for goAwayTimeout=1s, end server connection loop and close the connection

The graceful termination process in http2 server connection serverConn.serve does not have a chance to complete
gracefully since http Server.Shutdown prematurely closes the connection.

We have added logs to net/http closeIdleConns to understand what's going on. Below we see that Server.Shutdown
prematurely closes the connection.

2020-06-24 16:19:03.144643 I | http server: calling closeIdleConns
2020-06-24 16:19:03.144676 I | setting connection to idle
2020-06-24 16:19:03.144683 I | http server: closing idle connection - state=2 seconds=1593029927
2020-06-24 16:19:03.144705 I | server: starting graceful shutdown to 127.0.0.1:38708 (this log is from `serverConn.serve`)

The client on the other hand encounters an EOF error before it receives a GOAWAY frame

2020-06-24 16:19:03.144747 I | http2: Transport readFrame error on conn 0xc000103980: (*errors.errorString) EOF

If we wire ConnState, the connection state transitions to StateActive and does not get closed prematurely. The http2
graceful termination completes as expected. Here is the log:

Below we can see that the server sends a GOAWAY frame and the client receives it.

2020-06-24 16:45:57.371335 I | http2: Framer 0xc000322000: wrote GOAWAY len=8 LastStreamID=3 ErrCode=NO_ERROR Debug=""
2020-06-24 16:45:57.371528 I | http2: Framer 0xc0004021c0: read GOAWAY len=8 LastStreamID=3 ErrCode=NO_ERROR Debug=""
2020-06-24 16:45:57.371565 I | http2: Transport received GOAWAY len=8 LastStreamID=3 ErrCode=NO_ERROR Debug=""

The active stream is completed, the client gets a HTTP 200 as expected by the test, and the server fires a GOAWAY close timer and the connection gets closed gracefully.

2020-06-24 16:46:08.371636 I | GOAWAY close timer fired; closing conn from 127.0.0.1:60622
2020-06-24 16:46:08.371665 I | http2: server connection loop closing 127.0.0.1:60622 on 0xc0002ea380
2020-06-24 16:46:08.371760 I | http2: Transport readFrame error on conn 0xc000001980: (*errors.errorString) EOF

@fraenkel
Copy link
Contributor

I am using the simplified test above rather than yours from k8s.
Please verify that the ConnState is actually NEW because http2serverConn.serve will set the ConnState to Active then Idle and eventually Active.

@p0lyn0mial
Copy link
Author

@fraenkel I wrote a test that doesn't use k8s, PTAL p0lyn0mial/kubernetes@81314a2

@fraenkel http2serverConn should set the ConnState to Active here https://github.com/golang/go/blob/master/src/net/http/h2_bundle.go#L5539 but our point is that it's not wired and doesn't have any effect. On Shutdown the server doesn't know the current state of the connection and closes it prematurely. The server thinks that the connection is StateNew whereas it should be in StateActive.

@tkashem
Copy link

tkashem commented Jun 25, 2020

Hi @fraenkel @p0lyn0mial I have copy pasted the non k8s test below for reference.

func TestHTTP2GracefulShutdown(t *testing.T) {
	callbacks := map[string]http.HandlerFunc{}
	handler := func(w http.ResponseWriter, req *http.Request) {
		path := req.URL.Path
		callback, ok := callbacks[path]
		if ok && callback != nil {
			callback(w, req)
		}

		w.WriteHeader(http.StatusOK)
	}

	waitCh := make(chan struct{})
	callbacks["/request-1"] = func(w http.ResponseWriter, r *http.Request) {
		<-time.After(10 * time.Second)
		close(waitCh)
		<-time.After(10 * time.Second)
	}

	server := httptest.NewUnstartedServer(http.HandlerFunc(handler))
	server.EnableHTTP2 = true
	server.StartTLS()
	defer server.Close()

	var shutdownComplete = make(chan struct{})
	shutdown := func() {
		<-waitCh
		ctx, cancel := context.WithTimeout(context.Background(), 20 * time.Second )
		server.Config.Shutdown(ctx)

		close(shutdownComplete)
		cancel()
	}

	call := func(wg *sync.WaitGroup, server *httptest.Server, path string) {
		if wg != nil {
			defer wg.Done()
		}

		client := server.Client()
		resp, err := client.Get(fmt.Sprintf("%s/%s", server.URL, path))
		if err != nil {
			t.Fatalf("request failed, path=%s - %v", path, err)
		}

		defer resp.Body.Close()
		body, err := ioutil.ReadAll(resp.Body)
		if err != nil {
			t.Fatalf("failed to read body, path=%s - %v", path, err)
		}
		t.Logf("status code: %v body %s", resp.StatusCode, body)

		if resp.StatusCode != 200 {
			t.Errorf("unexpected HTTP staus for request, path=%s: %v, expected: 200", path, resp.StatusCode)
		}
	}

	defer server.Close()
	go shutdown()

	// make one call just to set things in motion and wait for some time
	call(nil, server, "request-0")
	<-time.After(10 * time.Second)

	wg := sync.WaitGroup{}
	wg.Add(1)
	go call(&wg, server, "request-1")

	wg.Wait()
	<-shutdownComplete
}

@fraenkel
Copy link
Contributor

You are correct. The connection state is not being managed properly. I haven't quite figured out a good way to fix this yet.

@fraenkel
Copy link
Contributor

I do have a proper solution for this but its not backward compatible. This issue is that the http server sees the connection as New or Closed, the http2 server doesn't really care but fires ConnState events. A hook registered on the http server, sees the combined state events.

Shutdown looks for idle connections, and will see http2 connections as new and will pretty much kill them even when active.

One solution is to mark connections Active prior to calling any non-http handler. This would work except the httptest hook, doesn't like Active->Active transitions because the http2 server will fire an Active, Idle transition. Removing the Active event makes things work. This of course breaks backward compatibility given how the code is delivered.

@fraenkel
Copy link
Contributor

I have something that will work but I don't quite care for it. I will send it once I simplify the test case.

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/240278 mentions this issue: net/http: Mark http/2 connections active

@p0lyn0mial
Copy link
Author

@fraenkel I took a look at your patch. The solution is to mark the connections as Active before calling the handler. Could you point me to the code that will mark the connections back to Idle state once finished?

@fraenkel
Copy link
Contributor

They are never marked idle. They are marked closed.
Do not confuse the fact that http/2 marks streams as idle and active. This detail flows to the ConnState hook but never to the http/1 side.

@networkimprov
Copy link

CL comment should also reference #36946

@p0lyn0mial
Copy link
Author

@fraenkel I see, it will transition to StateClosed once the graceful shutdown is completed. Many thanks for explaining and providing the fix.

@p0lyn0mial
Copy link
Author

@fraenkel do you know if the fix will be back ported?

@fraenkel
Copy link
Contributor

fraenkel commented Jul 1, 2020

I don't make the calls on back ports but this does qualify since it does not behave as documented and there is no workaround.

@networkimprov
Copy link

networkimprov commented Jul 1, 2020

Is this in stdlib or x/? Things in x/ don't need backports, since you can import the latest version.

EDIT: oh, the CL says "net/http" :-)

@dmitshur this needs milestone 1.15 or 1.16 and should be retitled "net/http: ..."

@fraenkel, anyone can file a backport request!

@networkimprov
Copy link

cc @toothrot this needs milestone 1.15 or 1.16 and should be retitled "net/http: ..."

Also needs backport to 1.13/14.

@dmitshur dmitshur changed the title x/net/http2: http server shutdown doesn't gracefully shut down HTTP2 connections net/http, x/net/http2: http server shutdown doesn't gracefully shut down HTTP2 connections Jul 2, 2020
@dmitshur dmitshur modified the milestones: Backlog, Go1.16 Jul 2, 2020
@networkimprov
Copy link

@gopherbot please backport to 1.13 & 1.14. Per @fraenkel this does not behave as documented and there is no workaround.

@gopherbot
Copy link
Contributor

Backport issue(s) opened: #40085 (for 1.13), #40086 (for 1.14).

Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://golang.org/wiki/MinorReleases.

@toothrot
Copy link
Contributor

/cc @bradfitz as the http package owner.

@cagedmantis
Copy link
Contributor

/cc @bcmills @neild

@networkimprov
Copy link

@fraenkel I think this needs backports for impending minor releases of 1.13 & 1.14.

@fraenkel
Copy link
Contributor

@networkimprov the fix has yet to be approved

@networkimprov
Copy link

Any idea why? Is it a "risky" fix?

@ostcar

This comment has been minimized.

@networkimprov

This comment has been minimized.

@networkimprov
Copy link

The backport issues were closed. Once fixed, new backport issues should be manually posted for 1.15 & 1.14.

cc @odeke-em in case he could review the CL.

@odeke-em
Copy link
Member

odeke-em commented Sep 3, 2020 via email

@networkimprov
Copy link

@dmitshur @toothrot pls evaluate for backport :-)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

10 participants