Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net/http: apparent TestServerAllowsBlockingRemoteAddr flakes due to hard-coded timeouts #36179

Closed
bcmills opened this issue Dec 17, 2019 · 16 comments
Assignees
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done. Testing An issue that has been verified to require only test changes, not just a test failure.
Milestone

Comments

@bcmills
Copy link
Contributor

bcmills commented Dec 17, 2019

#!watchflakes
post <- pkg == "net/http" && (test == "TestServerAllowsBlockingRemoteAddr" || `^\s+TestServerAllowsBlockingRemoteAddr \(\d+m`) && (`panic: test timed out` || `context deadline exceeded`)

2019-12-16T20:38:31-f7f9866/plan9-386-0intro
2019-06-12T14:58:18-65f53da/plan9-amd64-9front

--- FAIL: TestServerAllowsBlockingRemoteAddr (1.08s)
    serve_test.go:1300: Request 1: Get "http://127.0.0.1:55517": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
    serve_test.go:1349: response 1 addr = ""; want "RA:21.21.21.21:21"
http.test 1018696: warning: process exceeds 100 file descriptors
FAIL
FAIL	net/http	114.189s

As far as I can tell, the root cause is the hard-coded time.Second here:

go/src/net/http/serve_test.go

Lines 1326 to 1330 in 931fe39

select {
case conn2 = <-conns:
case <-time.After(time.Second):
t.Fatal("Second Accept didn't happen")
}

It's not obvious to me why a timeout is needed there at all. If the test deadlocks, we presumably want a goroutine dump anyway.

CC @bradfitz @0intro @fhs

@bcmills bcmills added Testing An issue that has been verified to require only test changes, not just a test failure. help wanted NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Dec 17, 2019
@bcmills bcmills added this to the Backlog milestone Dec 17, 2019
@bcmills
Copy link
Contributor Author

bcmills commented Dec 17, 2019

That's the regression test for #12943.

@howjmay
Copy link
Contributor

howjmay commented Feb 29, 2020

Is this one still one still open

@gopherbot

This comment was marked as outdated.

@bcmills
Copy link
Contributor Author

bcmills commented Jan 31, 2023

The failure log linked in #57995 (comment) matches this issue; I'm not sure why watchflakes isn't posting it here.

@gopherbot
Copy link
Contributor

Found new dashboard test flakes for:

#!watchflakes
post <- pkg == "net/http" && (test == "TestServerAllowsBlockingRemoteAddr" || `^\s+TestServerAllowsBlockingRemoteAddr \(\d+m`) && (`panic: test timed out` || `context deadline exceeded`)
2023-01-17 18:12 dragonfly-amd64-622 go@c0799f70 net/http (log)
1 second passes in backend, proxygone= false
panic: test timed out after 3m0s
running tests:
	TestServerAllowsBlockingRemoteAddr (2m59s)
	TestServerAllowsBlockingRemoteAddr/h1 (2m54s)

goroutine 21276 [running]:
panic({0x848280, 0xc00018a8b0})
	/tmp/workdir/go/src/runtime/panic.go:987 +0x3bb fp=0xc000653f10 sp=0xc000653e50 pc=0x436b7b
testing.(*M).startAlarm.func1()
	/tmp/workdir/go/src/testing/testing.go:2241 +0x3b9 fp=0xc000653fe0 sp=0xc000653f10 pc=0x4fd339
runtime.goexit()
	/tmp/workdir/go/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc000653fe8 sp=0xc000653fe0 pc=0x46b201
created by time.goFunc
	/tmp/workdir/go/src/time/sleep.go:176 +0x32

watchflakes

@gopherbot
Copy link
Contributor

Found new dashboard test flakes for:

#!watchflakes
post <- pkg == "net/http" && (test == "TestServerAllowsBlockingRemoteAddr" || `^\s+TestServerAllowsBlockingRemoteAddr \(\d+m`) && (`panic: test timed out` || `context deadline exceeded`)
2023-02-24 19:16 dragonfly-amd64-622 go@8e5f56a2 net/http (log)
panic: test timed out after 3m0s
running tests:
	TestServerAllowsBlockingRemoteAddr (3m0s)
	TestServerAllowsBlockingRemoteAddr/h1 (2m57s)

goroutine 21822 [running]:
panic({0x821680?, 0xc00108c720?})
	/tmp/workdir/go/src/runtime/panic.go:1015 +0x3ac fp=0xc000627f10 sp=0xc000627e60 pc=0x43564c
testing.(*M).startAlarm.func1()
	/tmp/workdir/go/src/testing/testing.go:2244 +0x3b4 fp=0xc000627fe0 sp=0xc000627f10 pc=0x4f7554
runtime.goexit()
	/tmp/workdir/go/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc000627fe8 sp=0xc000627fe0 pc=0x4693c1
created by time.goFunc
	/tmp/workdir/go/src/time/sleep.go:176 +0x2d

watchflakes

@bcmills
Copy link
Contributor Author

bcmills commented Feb 25, 2023

Interesting! Those dragonfly failures look like actual deadlocks, not just spurious timeouts.

@bcmills
Copy link
Contributor Author

bcmills commented Feb 27, 2023

Nope, it is still the same spurious timeout:
https://cs.opensource.google/go/go/+/master:src/net/http/serve_test.go;l=1362;drc=133e0bca0be23980604ee98589ddcacdb1262afd

It's just manifesting as a deadlock due to some synchronous cleanup code.

@gopherbot
Copy link
Contributor

Found new dashboard test flakes for:

#!watchflakes
post <- pkg == "net/http" && (test == "TestServerAllowsBlockingRemoteAddr" || `^\s+TestServerAllowsBlockingRemoteAddr \(\d+m`) && (`panic: test timed out` || `context deadline exceeded`)
2023-02-27 22:37 darwin-amd64-13 go@5cded8b3 net/http (log)
panic: test timed out after 3m0s
running tests:
	TestServerAllowsBlockingRemoteAddr (2m59s)
	TestServerAllowsBlockingRemoteAddr/h1 (2m42s)

goroutine 21340 [running]:
panic({0x1421c00?, 0xc0012fd4d0?})
	/tmp/buildlet/go/src/runtime/panic.go:1015 +0x3ac fp=0xc00038ef10 sp=0xc00038ee60 pc=0x10347ac
testing.(*M).startAlarm.func1()
	/tmp/buildlet/go/src/testing/testing.go:2244 +0x3b4 fp=0xc00038efe0 sp=0xc00038ef10 pc=0x10f6e74
runtime.goexit()
	/tmp/buildlet/go/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc00038efe8 sp=0xc00038efe0 pc=0x1069981
created by time.goFunc
	/tmp/buildlet/go/src/time/sleep.go:176 +0x2d

watchflakes

@gopherbot
Copy link
Contributor

Found new dashboard test flakes for:

#!watchflakes
post <- pkg == "net/http" && (test == "TestServerAllowsBlockingRemoteAddr" || `^\s+TestServerAllowsBlockingRemoteAddr \(\d+m`) && (`panic: test timed out` || `context deadline exceeded`)
2023-03-01 21:23 freebsd-arm-paulzhol go@09852e75 net/http.TestServerAllowsBlockingRemoteAddr (log)
--- FAIL: TestServerAllowsBlockingRemoteAddr (0.00s)
    --- FAIL: TestServerAllowsBlockingRemoteAddr/h1 (1.15s)
        serve_test.go:1333: Request 1: Get "http://127.0.0.1:53309": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
        serve_test.go:1382: response 1 addr = ""; want "RA:21.21.21.21:21"

watchflakes

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/474582 mentions this issue: net/http: remove arbitrary timeout in TestServerAllowsBlockingRemoteAddr

@bcmills bcmills self-assigned this Mar 9, 2023
@bcmills bcmills modified the milestones: Backlog, Go1.21 Mar 9, 2023
@gopherbot
Copy link
Contributor

Found new dashboard test flakes for:

#!watchflakes
post <- pkg == "net/http" && (test == "TestServerAllowsBlockingRemoteAddr" || `^\s+TestServerAllowsBlockingRemoteAddr \(\d+m`) && (`panic: test timed out` || `context deadline exceeded`)
2023-03-13 18:55 dragonfly-amd64-622 go@778627f3 net/http.TestServerAllowsBlockingRemoteAddr (log)
--- FAIL: TestServerAllowsBlockingRemoteAddr (0.00s)
    --- FAIL: TestServerAllowsBlockingRemoteAddr/h1 (1.01s)
        serve_test.go:1333: Request 1: Get "http://127.0.0.1:64872": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
        serve_test.go:1376: response 1 addr = ""; want "RA:21.21.21.21:21"

watchflakes

@gopherbot gopherbot reopened this Mar 13, 2023
@bcmills
Copy link
Contributor Author

bcmills commented Mar 13, 2023

Oops, missed another arbitrary timeout.

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/476035 mentions this issue: net/http: remove more arbitrary timeouts from server tests

gopherbot pushed a commit that referenced this issue Mar 13, 2023
This change eliminates the easy, arbitrary timouts that should
never happen. It leaves in place a couple of more complicated ones
that will probably need retry loops for robustness.

For #49336.
For #36179.

Change-Id: I657ef223a66461413a915da5ce9150f49acec04a
Reviewed-on: https://go-review.googlesource.com/c/go/+/476035
Run-TryBot: Bryan Mills <bcmills@google.com>
Auto-Submit: Bryan Mills <bcmills@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Damien Neil <dneil@google.com>
@bcmills
Copy link
Contributor Author

bcmills commented Mar 13, 2023

Hopefully completely fixed this time, by CL 476035.

@bcmills bcmills closed this as completed Mar 13, 2023
@golang golang locked and limited conversation to collaborators Mar 12, 2024
@gopherbot gopherbot reopened this Apr 12, 2024
@gopherbot
Copy link
Contributor

Found new dashboard test flakes for:

#!watchflakes
post <- pkg == "net/http" && (test == "TestServerAllowsBlockingRemoteAddr" || `^\s+TestServerAllowsBlockingRemoteAddr \(\d+m`) && (`panic: test timed out` || `context deadline exceeded`)
2024-04-03 15:35 android-386-emu go@d8392e69 net/http (log)
panic: test timed out after 3m0s
running tests:
	Test304Responses (2m58s)
	TestBadResponseAfterReadingBody (2m58s)
	TestBidiStreamReverseProxy (2m58s)
	TestCancelRequestMidBody (2m58s)
	TestCancelRequestWhenSharingConnection (2m59s)
	TestCancelRequestWithChannel (2m59s)
	TestCancelRequestWithChannelBeforeDo_Cancel (2m59s)
	TestCancelRequestWithChannelBeforeDo_Context (2m59s)
...
	TestWriteHeader0 (2m58s)
	TestWriteHeaderNoCodeCheck (2m58s)
	TestWriteHeaderSwitchingProtocols (2m59s)
	TestZeroLengthPostAndResponse (2m59s)

goroutine 24228 [running]:
testing.(*M).startAlarm.func1()
	/workdir/go/src/testing/testing.go:2259 +0x3e1
created by time.goFunc
	/workdir/go/src/time/sleep.go:176 +0x35
2024-04-03 18:01 android-386-emu go@74501172 net/http (log)
panic: test timed out after 3m0s
running tests:
	Test304Responses (2m58s)
	TestBadResponseAfterReadingBody (2m58s)
	TestBidiStreamReverseProxy (2m58s)
	TestCancelRequestMidBody (2m58s)
	TestChunkedResponseHeaders (2m58s)
	TestClient (2m58s)
	TestClientAltersCookiesOnRedirect (2m58s)
	TestClientCallsCloseOnlyOnce (2m58s)
...
	TestWriteDeadlineExtendedOnNewRequest (2m58s)
	TestWriteHeader0 (2m58s)
	TestWriteHeaderNoCodeCheck (2m58s)
	TestZeroLengthPostAndResponse (2m58s)

goroutine 31241 [running]:
testing.(*M).startAlarm.func1()
	/workdir/go/src/testing/testing.go:2259 +0x3e1
created by time.goFunc
	/workdir/go/src/time/sleep.go:176 +0x35

watchflakes

@dmitshur dmitshur removed this from Test Flakes May 30, 2024
@dmitshur dmitshur added NeedsFix The path to resolution is known, but the work has not been done. and removed NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels May 30, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done. Testing An issue that has been verified to require only test changes, not just a test failure.
Projects
None yet
Development

No branches or pull requests

4 participants