Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net: Panics in TCP/UDP with Go 1.23 due to IPv6 link-local zone mishandling #69397

Open
hdm opened this issue Sep 11, 2024 · 12 comments
Open

net: Panics in TCP/UDP with Go 1.23 due to IPv6 link-local zone mishandling #69397

hdm opened this issue Sep 11, 2024 · 12 comments
Assignees
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@hdm
Copy link

hdm commented Sep 11, 2024

Go version

go 1.23.1 linux/amd64

Output of go env in your module/workspace:

So far only seeing this on Linux, but bug appears present in macOS as well. The builds that crash are using CGO_ENABLED='0'

GO111MODULE=''
GOARCH='arm64'
GOBIN=''
GOCACHE='/Users/user/Library/Caches/go-build'
GOENV='/Users/user/Library/Application Support/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='arm64'
GOHOSTOS='darwin'
GOINSECURE=''
GOMODCACHE='/Users/user/go/pkg/mod'
GOOS='darwin'
GOPATH='/Users/user/go'
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/usr/local/go'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/usr/local/go/pkg/tool/darwin_arm64'
GOVCS=''
GOVERSION='go1.23.1'
GODEBUG=''
GOTELEMETRY='local'
GOTELEMETRYDIR='/Users/user/Library/Application Support/go/telemetry'
GCCGO='gccgo'
GOARM64='v8.0'
AR='ar'
CC='clang'
CXX='clang++'
CGO_ENABLED='0'
GOMOD='/Users/user/go/platform/go.mod'
GOWORK='/Users/user/go/platform/go.work'
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -arch arm64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -ffile-prefix-map=/var/folders/yv/hdswp_fs15x1tlgz6fzw0wyh0000gn/T/go-build2162826536=/tmp/go-build -gno-record-gcc-switches -fno-common'

What did you do?

In an application that makes many TCP connections and sends many UDP packets, some of which target link-local IPv6 destinations, two panics started to appear with Go 1.23.

What did you see happen?

UDP sockets:

panic({0x269cdc0?, 0x576cff0?})
    /opt/hostedtoolcache/go/1.23.1/x64/src/runtime/panic.go:785 +0x132
net.dtoi(...)
    /opt/hostedtoolcache/go/1.23.1/x64/src/net/parse.go:132
net.(*ipv6ZoneCache).index(0xc0032c5a50?, {0x28, 0x29})
    /opt/hostedtoolcache/go/1.23.1/x64/src/net/interface.go:267 +0x1b3
net.ipToSockaddrInet6({0xc0032c5a50?, 0x10?, 0x10?}, 0x35, {0x28, 0x29})
    /opt/hostedtoolcache/go/1.23.1/x64/src/net/ipsock_posix.go:203 +0x1e8
net.(*UDPConn).writeTo(0xc00be18138, {0xc00d29e800, 0x38, 0x38}, 0xc002b3acb0?)
    /opt/hostedtoolcache/go/1.23.1/x64/src/net/udpsock_posix.go:129 +0x90
net.(*UDPConn).WriteTo(0xc00be18138, {0xc00d29e800?, 0x5b67560?, 0x5b67560?}, {0x3616e00?, 0xc006bfce40})
    /opt/hostedtoolcache/go/1.23.1/x64/src/net/udpsock.go:243 +0x56

The net.dtoi() code:

// Decimal to integer.
// Returns number, characters consumed, success.
func dtoi(s string) (n int, i int, ok bool) {
	n = 0
	for i = 0; i < len(s) && '0' <= s[i] && s[i] <= '9'; i++ {
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
		n = n*10 + int(s[i]-'0')
		if n >= big {
			return big, i, false
		}
	}
	if i == 0 {
		return 0, 0, false
	}
	return n, i, true
}

TCP sockets:

panic({0x269cdc0?, 0x576cff0?})
	/opt/hostedtoolcache/go/1.23.1/x64/src/runtime/panic.go:785 +0x132
net.(*TCPAddr).String(0xc007760810)
	/opt/hostedtoolcache/go/1.23.1/x64/src/net/tcpsock.go:50 +0xc5
net.(*netFD).dial(0xc00362c300, {0x3628468, 0xc0011284d0}, {0x36311c0, 0x0}, {0x36311c0, 0xc007760810}, 0xc00aee6038)
	/opt/hostedtoolcache/go/1.23.1/x64/src/net/sock_posix.go:98 +0x119
net.socket({0x3628468, 0xc0011284d0}, {0x2d38e59, 0x3}, 0xa, 0x1, 0x4151eb?, 0x0, {0x36311c0, 0x0}, ...)
	/opt/hostedtoolcache/go/1.23.1/x64/src/net/sock_posix.go:70 +0x29b
net.internetSocket({0x3628468, 0xc0011284d0}, {0x2d38e59, 0x3}, {0x36311c0, 0x0}, {0x36311c0, 0xc007760810}, 0x1, 0x0, ...)
	/opt/hostedtoolcache/go/1.23.1/x64/src/net/ipsock_posix.go:167 +0xf8
net.(*sysDialer).doDialTCPProto(0xc0059a29c0, {0x3628468, 0xc0011284d0}, 0x0, 0xc007760810, 0x0)
	/opt/hostedtoolcache/go/1.23.1/x64/src/net/tcpsock_posix.go:85 +0xec
net.(*sysDialer).doDialTCP(...)
	/opt/hostedtoolcache/go/1.23.1/x64/src/net/tcpsock_posix.go:75
net.(*sysDialer).dialTCP(0xc00aee60c0?, {0x3628468?, 0xc0011284d0?}, 0x25f4820?, 0xc00aee6130?)
	/opt/hostedtoolcache/go/1.23.1/x64/src/net/tcpsock_posix.go:71 +0x69
net.(*sysDialer).dialSingle(0xc0059a29c0, {0x3628468, 0xc0011284d0}, {0x3616fb8, 0xc007760810})
	/opt/hostedtoolcache/go/1.23.1/x64/src/net/dial.go:670 +0x27d
net.(*sysDialer).dialSerial(0xc0059a29c0, {0x3628468, 0xc0011284d0}, {0xc00c60faf0?, 0x1, 0xc0077607e0?})
	/opt/hostedtoolcache/go/1.23.1/x64/src/net/dial.go:635 +0x24e
net.(*sysDialer).dialParallel(0xc00c60fae0?, {0x3628468?, 0xc0011284d0?}, {0xc00c60faf0?, 0xc0011284d0?, 0x2d3b265?}, {0x0?, 0x2d38e59?, 0x698a0fdae4d?})
	/opt/hostedtoolcache/go/1.23.1/x64/src/net/dial.go:536 +0x3a7
net.(*Dialer).DialContext(0xc00aee6740, {0x3628238, 0x60b40e0}, {0x2d38e59, 0x3}, {0xc01361bef0, 0x24})
	/opt/hostedtoolcache/go/1.23.1/x64/src/net/dial.go:527 +0x6a5
net.(*Dialer).Dial(...)
	/opt/hostedtoolcache/go/1.23.1/x64/src/net/dial.go:453

Both issues occur infrequently (2-3 times a day with millions of connections/day) and seem related to the IPv6 link-local zone cache.

What did you expect to see?

No panics

@mateusz834 mateusz834 added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Sep 11, 2024
@mateusz834
Copy link
Member

CC @ianlancetaylor

@mateusz834
Copy link
Member

I wonder whether this might be related to the unique package, the net/netip package uses it since 1.23 for Zone string interning.

@ianlancetaylor
Copy link
Contributor

Please show us the full panic message, so that we can see the addresses involved. Thanks.

CC @mknyszek

@hdm
Copy link
Author

hdm commented Sep 11, 2024

Sure thing. The sanitized version is roughly this:

The target address was specified as [fe80::6666:66ff:fe66:6666%interfaceName]:443 (the zone/interfaceName is not shown in our panic handler, but it's retrieved from an application cache that tracks which zone goes to what destination prior to making the Dial.

panic processing fe80::6666:66ff:fe66:6666:443: runtime error: invalid memory address or nil pointer dereference : goroutine 231565 [running]:
runtime/debug.Stack()
	/opt/hostedtoolcache/go/1.23.1/x64/src/runtime/debug/stack.go:26 +0x5e
github.com/company/platform/product.(*Processor).ProcessSingleTarget.func1()
	/home/runner/work/platform/platform/product/tcp.go:542 +0x1a6
panic({0x269cdc0?, 0x576cff0?})
	/opt/hostedtoolcache/go/1.23.1/x64/src/runtime/panic.go:785 +0x132
net.(*TCPAddr).String(0xc007760810)
	/opt/hostedtoolcache/go/1.23.1/x64/src/net/tcpsock.go:50 +0xc5
net.(*netFD).dial(0xc00362c300, {0x3628468, 0xc0011284d0}, {0x36311c0, 0x0}, {0x36311c0, 0xc007760810}, 0xc00aee6038)
	/opt/hostedtoolcache/go/1.23.1/x64/src/net/sock_posix.go:98 +0x119
net.socket({0x3628468, 0xc0011284d0}, {0x2d38e59, 0x3}, 0xa, 0x1, 0x4151eb?, 0x0, {0x36311c0, 0x0}, ...)
	/opt/hostedtoolcache/go/1.23.1/x64/src/net/sock_posix.go:70 +0x29b
net.internetSocket({0x3628468, 0xc0011284d0}, {0x2d38e59, 0x3}, {0x36311c0, 0x0}, {0x36311c0, 0xc007760810}, 0x1, 0x0, ...)
	/opt/hostedtoolcache/go/1.23.1/x64/src/net/ipsock_posix.go:167 +0xf8
net.(*sysDialer).doDialTCPProto(0xc0059a29c0, {0x3628468, 0xc0011284d0}, 0x0, 0xc007760810, 0x0)
	/opt/hostedtoolcache/go/1.23.1/x64/src/net/tcpsock_posix.go:85 +0xec
net.(*sysDialer).doDialTCP(...)
	/opt/hostedtoolcache/go/1.23.1/x64/src/net/tcpsock_posix.go:75
net.(*sysDialer).dialTCP(0xc00aee60c0?, {0x3628468?, 0xc0011284d0?}, 0x25f4820?, 0xc00aee6130?)
	/opt/hostedtoolcache/go/1.23.1/x64/src/net/tcpsock_posix.go:71 +0x69
net.(*sysDialer).dialSingle(0xc0059a29c0, {0x3628468, 0xc0011284d0}, {0x3616fb8, 0xc007760810})
	/opt/hostedtoolcache/go/1.23.1/x64/src/net/dial.go:670 +0x27d
net.(*sysDialer).dialSerial(0xc0059a29c0, {0x3628468, 0xc0011284d0}, {0xc00c60faf0?, 0x1, 0xc0077607e0?})
	/opt/hostedtoolcache/go/1.23.1/x64/src/net/dial.go:635 +0x24e
net.(*sysDialer).dialParallel(0xc00c60fae0?, {0x3628468?, 0xc0011284d0?}, {0xc00c60faf0?, 0xc0011284d0?, 0x2d3b265?}, {0x0?, 0x2d38e59?, 0x698a0fdae4d?})
	/opt/hostedtoolcache/go/1.23.1/x64/src/net/dial.go:536 +0x3a7
net.(*Dialer).DialContext(0xc00aee6740, {0x3628238, 0x60b40e0}, {0x2d38e59, 0x3}, {0xc01361bef0, 0x24})
	/opt/hostedtoolcache/go/1.23.1/x64/src/net/dial.go:527 +0x6a5
net.(*Dialer).Dial(...)
	/opt/hostedtoolcache/go/1.23.1/x64/src/net/dial.go:453
github.com/company/platform/product.(*Processor).ConnectToTarget(0xc0022cf1e0, {{0xc00d70a7d0, 0x10, 0x10}, 0x1bb, {0x0, 0x0}, {0x0, 0x0}, {0x2d38a81, ...}, ...}, ...)
	/home/runner/work/platform/platform/product/tcp.go:602 +0x35b
github.com/company/platform/product.(*Processor).ReconnectToTargetWithLimit(0xc0022cf1e0, {{0xc00d70a7d0, 0x10, 0x10}, 0x1bb, {0x0, 0x0}, {0x0, 0x0}, {0x2d38a81, ...}, ...}, ...)
	/home/runner/work/platform/platform/product/tcp.go:652 +0x9f
github.com/company/platform/product.(*Processor).ReconnectToTarget(...)
	/home/runner/work/platform/platform/product/tcp.go:645
github.com/company/platform/product.(*Processor).tlsFuncTarget(0x0?, 0xc00aee7048, {{0xc00d70a7d0, 0x10, 0x10}, 0x1bb, {0x0, 0x0}, {0x0, 0x0}, ...}, ...)
	/home/runner/work/platform/platform/product/tcp_tls_func.go:751 +0xdf
github.com/company/platform/product.(*Processor).tlsFuncMaxFragments(0xc0022cf1e0, {{0xc00d70a7d0, 0x10, 0x10}, 0x1bb, {0x0, 0x0}, {0x0, 0x0}, {0x2d38a81, ...}, ...}, ...)
	/home/runner/work/platform/platform/product/tcp_tls_func.go:557 +0x1a5
github.com/company/platform/product.(*Processor).ProbeTLSRunFuncs(0xc0022cf1e0, {{0xc00d70a7d0, 0x10, 0x10}, 0x1bb, {0x0, 0x0}, {0x0, 0x0}, {0x2d38a81, ...}, ...}, ...)
	/home/runner/work/platform/platform/product/tcp_tls_func.go:456 +0xa7
github.com/company/platform/product.(*Processor).ProbeTLSFunc(0xc0022cf1e0, {{0xc00d70a7d0, 0x10, 0x10}, 0x1bb, {0x0, 0x0}, {0x0, 0x0}, {0x2d38a81, ...}, ...}, ...)
	/home/runner/work/platform/platform/product/tcp_tls_func.go:201 +0x24e
github.com/company/platform/product.(*Processor).ProbeTLS(0xc0022cf1e0, {{0xc00d70a7d0, 0x10, 0x10}, 0x1bb, {0x0, 0x0}, {0x0, 0x0}, {0x2d38a81, ...}, ...}, ...)
	/home/runner/work/platform/platform/product/tcp_tls.go:12 +0x12b
github.com/company/platform/product.(*Processor).GetInformationMoreTLS(0xc0022cf1e0, {{0xc00d70a7d0, 0x10, 0x10}, 0x1bb, {0x0, 0x0}, {0x0, 0x0}, {0x2d38a81, ...}, ...}, ...)
	/home/runner/work/platform/platform/product/tcp.go:3325 +0xa48
github.com/company/platform/product.(*Processor).GatherInformation(0xc0022cf1e0, {{0xc00d70a7d0, 0x10, 0x10}, 0x1bb, {0x0, 0x0}, {0x0, 0x0}, {0x2d38a81, ...}, ...}, ...)
	/home/runner/work/platform/platform/product/tcp.go:3198 +0x4268
github.com/company/platform/product.(*Processor).ProcessTarget(0xc0022cf1e0, {{0xc00d70a7d0, 0x10, 0x10}, 0x1bb, {0x0, 0x0}, {0x0, 0x0}, {0x2d38a81, ...}, ...})
	/home/runner/work/platform/platform/product/tcp.go:722 +0x5bb
github.com/company/platform/product.(*Processor).ProcessSingleTarget(0xc0022cf1e0, {{0xc00d70a7d0, 0x10, 0x10}, 0x1bb, {0x0, 0x0}, {0x0, 0x0}, {0x2d38a81, ...}, ...})
	/home/runner/work/platform/platform/product/tcp.go:545 +0x298
github.com/company/platform/product.(*Processor).ProcessTargets(0xc0022cf1e0)
	/home/runner/work/platform/platform/product/tcp.go:507 +0x126
created by github.com/company/platform/product.NewProcessor in goroutine 231049
	/home/runner/work/platform/platform/product/tcp.go:184 +0x353

@ianlancetaylor
Copy link
Contributor

Thanks. I was really hoping to see the address. I would expect the output to start with something like

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x466c67]

Do you see anything like that "signal SIGSEGV" line?

@hdm
Copy link
Author

hdm commented Sep 11, 2024

@ianlancetaylor Ah, sorry, this panic was caught by our goroutine recover() and did not crash the process (vs a runtime panic). There is no SEGV or similar as a result, the trace above is via the recover() error print with a debug.Stack() dump.

@ianlancetaylor
Copy link
Contributor

Ah, OK, thanks.

@ianlancetaylor ianlancetaylor added the compiler/runtime Issues related to the Go compiler and/or runtime. label Sep 11, 2024
@ianlancetaylor
Copy link
Contributor

I agree that this must be somehow related to the way that net/netip uses the unique package in 1.23.

@mknyszek mknyszek self-assigned this Sep 18, 2024
@mknyszek
Copy link
Contributor

I agree with @ianlancetaylor's assessment. I suspect this is related to #69210 whose fix will be in Go 1.23.2. Is there any chance you could try at tip-of-tree? If you're still seeing failures even so, I'm happy to help narrow them down.

@mknyszek mknyszek added this to the Go1.24 milestone Sep 18, 2024
@hdm
Copy link
Author

hdm commented Sep 18, 2024

@mknyszek thanks! so far we are only seeing in a very small percentage of cases in customer environments; we may not be able to use a tip build, but i will keep trying to come up with a minimal reproducer

@mknyszek
Copy link
Contributor

Thanks. For the record, I think the failure mode in #69210 is a direct result of using the unique package very heavily, directly. I think it's much less likely in an application doing a whole bunch of other things too. I'd expect to see just semi-random failures on an attempt to access unique-ified data as you see here. Apologies for the breakage!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
Development

No branches or pull requests

5 participants