Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/net/http2: requests experience excessive time delays when encountering network errors #67665

Open
Taction opened this issue May 27, 2024 · 5 comments · May be fixed by golang/net#210
Open
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@Taction
Copy link

Taction commented May 27, 2024

If the client loses its connection to the server, and multiple requests are sent. There would be only one dial start connecting to the server. If the dial timeout due to network issue, only the first request who creates the dial would be finished, and the other requests would continue trying to connect to the server even if their context has been canceled due to timeout.

For getClientConn multi request would creat one call to dial server.

func (p *clientConnPool) getClientConn(req *http.Request, addr string, dialOnMiss bool) (*ClientConn, error) {
	//...
	for {
                 //....

		// if there is no active connection
		traceGetConn(req, addr)
		// for a specific addr, only one call would be created.
		call := p.getStartDialLocked(req.Context(), addr)
		p.mu.Unlock()
		<-call.done

		// If should retry dial for the request, which do not respect to request's context.
		if shouldRetryDial(call, req) {
			continue
		}
		cc, err := call.res, call.err
		if err != nil {
			return nil, err
		}
		if cc.ReserveNewRequest() {
			return cc, nil
		}
	}
}

For shouldRetryDial function, even if the request's context has been canceled, the result can be true.

func shouldRetryDial(call *dialCall, req *http.Request) bool {
	if call.err == nil {
		// No error, no need to retry
		return false
	}
	if call.ctx == req.Context() {
		// If the call has the same context as the request, the dial
		// should not be retried, since any cancellation will have come
		// from this request.
		return false
	}
	if !errors.Is(call.err, context.Canceled) && !errors.Is(call.err, context.DeadlineExceeded) {
		// If the call error is not because of a context cancellation or a deadline expiry,
		// the dial should not be retried.
		return false
	}
	// Only retry if the error is a context cancellation error or deadline expiry
	// and the context associated with the call was canceled or expired.
	return call.ctx.Err() != nil
}
@gopherbot
Copy link
Contributor

Change https://go.dev/cl/588595 mentions this issue: http2: do not retry dial if request context canceled

@hardikmodha
Copy link

@Taction Do you have a minimal example that reproduces this issue?

@mknyszek
Copy link
Contributor

mknyszek commented Jun 3, 2024

CC @neild

@mknyszek mknyszek added this to the Unreleased milestone Jun 3, 2024
@mknyszek mknyszek added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Jun 3, 2024
@Taction
Copy link
Author

Taction commented Jun 4, 2024

@hardikmodha Sorry for the late reply. The scenario I've encountered is when there is a network issue, some requests consume more time than the request timeout + dial timeout to finish. I created a test to simulate this situation(also committed in my pr).

http2/clientconn_test.go

// TestConnectTimeout tests that a request does not exceed request timeout + dial timeout
func TestConnectTimeout(t *testing.T) {
	tr := &Transport{
		DialTLSContext: func(ctx context.Context, network, addr string, cfg *tls.Config) (net.Conn, error) {
			// mock a net dialler with 1s timeout, encountering network issue
			// keeping dialing until timeout
			var dialer = net.Dialer{Timeout: time.Duration(-1)}
			select {
			case <-time.After(time.Second):
			case <-ctx.Done():
			}
			return dialer.DialContext(ctx, network, addr)
		},
		AllowHTTP: true,
	}

	var sg sync.WaitGroup
	parentCtx, cancel := context.WithCancel(context.Background())
	defer cancel()

	for j := 0; j < 2; j++ {
		sg.Add(1)
		go func() {
			for i := 0; i < 10000; i++ {
				sg.Add(1)
				go func() {
					ctx, _ := context.WithTimeout(parentCtx, time.Second)
					req, err := http.NewRequestWithContext(ctx, "GET", "http://127.0.0.1:80", nil)
					if err != nil {
						t.Errorf("NewRequest: %v", err)
					}

					start := time.Now()
					tr.RoundTrip(req)
					duration := time.Since(start)
					// duration should not exceed request timeout + dial timeout
					if duration > 2*time.Second {
						t.Errorf("RoundTrip took %s; want <2s", duration.String())
					}
					sg.Done()
				}()
				time.Sleep(1 * time.Millisecond)
			}
			sg.Done()
		}()
	}
	sg.Wait()
}

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/591276 mentions this issue: http2: allow simulating transport dial failures

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants