Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: SIGSEGV via go test with -race -cover and -tags netgo on alpine linux/arm64 #59369

Open
GeorgeMac opened this issue Apr 1, 2023 · 16 comments
Assignees
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@GeorgeMac
Copy link

unexpected fault address 0x1412000
fatal error: fault
[signal SIGSEGV: segmentation violation code=0x1 addr=0x1412000 pc=0xa7e50]

goroutine 7 [running]:
runtime.throw({0x28990c?, 0x0?})
	/usr/local/go/src/runtime/panic.go:1047 +0x40 fp=0xc000204b40 sp=0xc000204b10 pc=0x72540
runtime.sigpanic()
	/usr/local/go/src/runtime/signal_unix.go:851 +0x1a8 fp=0xc000204b80 sp=0xc000204b40 pc=0x8a638
racecallatomic()
	/usr/local/go/src/runtime/race_arm64.s:351 +0x10 fp=0xc000204ba0 sp=0xc000204b90 pc=0xa7e50
sync/atomic.AddInt32()
	/usr/local/go/src/runtime/race_arm64.s:286 +0x18 fp=0xc000204bb0 sp=0xc000204ba0 pc=0xa8248
sync/atomic.AddUint32(0x83d91a7?, 0xdbba11c8?)
	<autogenerated>:1 +0x18 fp=0xc000204be0 sp=0xc000204bb0 pc=0xa96d8
go.flipt.io/flipt/internal/storage/oplock/memory.(*Service).TryAcquire(0xc00004b0b0, {0xc000204de8?, 0x0?}, {0x289720, 0x4}, 0x1?)
	/src/internal/storage/oplock/memory/memory.go:43 +0x280 fp=0xc000204d80 sp=0xc000204be0 pc=0x22b5b0
go.flipt.io/flipt/internal/storage/oplock/testing.Harness.func1()
	/src/internal/storage/oplock/testing/testing.go:50 +0x150 fp=0xc000204f20 sp=0xc000204d80 pc=0x22b200
golang.org/x/sync/errgroup.(*Group).Go.func1()
	/go/pkg/mod/golang.org/x/sync@v0.1.0/errgroup/errgroup.go:75 +0x70 fp=0xc000204fd0 sp=0xc000204f20 pc=0x22a450
runtime.goexit()
	/usr/local/go/src/runtime/asm_arm64.s:1172 +0x4 fp=0xc000204fd0 sp=0xc000204fd0 pc=0xa7134
created by golang.org/x/sync/errgroup.(*Group).Go
	/go/pkg/mod/golang.org/x/sync@v0.1.0/errgroup/errgroup.go:72 +0x108

What version of Go are you using (go version)?

$ go version
go version go1.20.2 linux/arm64

Does this issue reproduce with the latest release?

Yes. Infact I can't reproduce it on go 1.20.1

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="arm64"
GOBIN=""
GOCACHE="/root/.cache/go-build"
GOENV="/root/.config/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="arm64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_arm64"
GOVCS=""
GOVERSION="go1.20.2"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/src/go.mod"
GOWORK="/src/go.work"
CGO_CFLAGS="-O2 -g"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-O2 -g"
CGO_FFLAGS="-O2 -g"
CGO_LDFLAGS="-O2 -g"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -pthread -Wl,--no-gc-sections -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build3327561268=/tmp/go-build -gno-record-gcc-switches"

What did you do?

I can comfortably reproduce it in this particular open source package:
https://github.com/flipt-io/flipt/tree/main/internal/storage/oplock/memory

I can consistently reproduce a SIGSEV when I perform go test -cover -race -tags netgo ..

It requires all three of these flags to be present to trigger it.

What did you expect to see?

No segfault. Works for golang:1.20.1-alpine3.16 docker image. However, golang:1.20.2-alpine3:16 fails consistently with the error seen above.

What did you see instead?

A SIGSEGV fatal error.

@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Apr 1, 2023
@mknyszek mknyszek added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Apr 3, 2023
@mknyszek mknyszek added this to the Backlog milestone Apr 3, 2023
@ianlancetaylor
Copy link
Member

CC @thanm

@thanm
Copy link
Contributor

thanm commented Apr 4, 2023

Thanks for the report.

I tried reproducing this on our linux/arm64 builder, but so far no luck.

It is a bit surprising that you're seeing this only with Go 1.20.2 and not Go 1.20.1. There is a coverage related bugfix in that sequence of commits (202a1a5706 ... aee9a19c55) but it is in the data reader, has nothing to do with the runtime.

I looked at the assembly we're generating for the function in question (go.flipt.io/flipt/internal/storage/oplock/memory.(*Service).TryAcquire)and I don't see any immediate red flags.

Not sure what to do next to debug this, without a way to repro the crash.

@GeorgeMac
Copy link
Author

Hmmm intruiging. The only other thing of note was that I could only repro this on alpine in particular.
Im definitely way out of my depth, but could the whole Alpine uses libc-musl be a part of the equation?
Could that be a difference to the linux/arm64 builder you used?

@bcmills
Copy link
Contributor

bcmills commented Apr 4, 2023

We do have a linux-amd64-alpine builder. @thanm, you might try a gomote on that and see if it reproduces there.

@bcmills
Copy link
Contributor

bcmills commented Apr 4, 2023

(Probably not, though, what with it being amd64 instead of arm64. 🤷‍♂️)

@bcmills
Copy link
Contributor

bcmills commented Apr 4, 2023

@GeorgeMac, I suspect that this could be a bad interaction between coverage flushing and a race in your test.

The test function can exit the goroutine here without ever waiting for the errgroup result:
https://github.com/flipt-io/flipt/blob/f04ddb10cdd98cec122ca3242a67743bec557eef/internal/storage/oplock/testing/testing.go#L77

If it takes that exit path, then it seems plausible that the goroutine in the errgroup may race with the test's main completing and flushing coverage results. Maybe that triggers some other bug in coverage reporting?

@bcmills
Copy link
Contributor

bcmills commented Apr 4, 2023

It would be helpful to see what other goroutines are running at the time of the crash.

@bcmills bcmills added the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label Apr 4, 2023
@GeorgeMac
Copy link
Author

Here are all the goroutine stacks from the crash (below), is there something more I can do to get you information on other goroutines at the time of the crash?

I just changed the require to an assert too (to avoid exiting before the Wait) and I get the same crash:

unexpected fault address 0x1412000
fatal error: fault
[signal SIGSEGV: segmentation violation code=0x1 addr=0x1412000 pc=0xa7e50]

goroutine 36 [running]:
runtime.throw({0x28990c?, 0x0?})
	/usr/local/go/src/runtime/panic.go:1047 +0x40 fp=0xc000041b40 sp=0xc000041b10 pc=0x72540
runtime.sigpanic()
	/usr/local/go/src/runtime/signal_unix.go:851 +0x1a8 fp=0xc000041b80 sp=0xc000041b40 pc=0x8a638
racecallatomic()
	/usr/local/go/src/runtime/race_arm64.s:351 +0x10 fp=0xc000041ba0 sp=0xc000041b90 pc=0xa7e50
sync/atomic.AddInt32()
	/usr/local/go/src/runtime/race_arm64.s:286 +0x18 fp=0xc000041bb0 sp=0xc000041ba0 pc=0xa8248
sync/atomic.AddUint32(0x6fa63f5?, 0xdbbe4017?)
	<autogenerated>:1 +0x18 fp=0xc000041be0 sp=0xc000041bb0 pc=0xa96d8
go.flipt.io/flipt/internal/storage/oplock/memory.(*Service).TryAcquire(0xc00011b080, {0xc000041de8?, 0x0?}, {0x289720, 0x4}, 0x1?)
	/src/internal/storage/oplock/memory/memory.go:43 +0x280 fp=0xc000041d80 sp=0xc000041be0 pc=0x22b480
go.flipt.io/flipt/internal/storage/oplock/testing.Harness.func1()
	/src/internal/storage/oplock/testing/testing.go:50 +0x150 fp=0xc000041f20 sp=0xc000041d80 pc=0x22b0d0
golang.org/x/sync/errgroup.(*Group).Go.func1()
	/go/pkg/mod/golang.org/x/sync@v0.1.0/errgroup/errgroup.go:75 +0x70 fp=0xc000041fd0 sp=0xc000041f20 pc=0x22a320
runtime.goexit()
	/usr/local/go/src/runtime/asm_arm64.s:1172 +0x4 fp=0xc000041fd0 sp=0xc000041fd0 pc=0xa7134
created by golang.org/x/sync/errgroup.(*Group).Go
	/go/pkg/mod/golang.org/x/sync@v0.1.0/errgroup/errgroup.go:72 +0x108

goroutine 1 [chan receive]:
runtime.gopark(0x522e70?, 0xffffffff?, 0x8?, 0x77?, 0x49a2c?)
	/usr/local/go/src/runtime/proc.go:381 +0xe0 fp=0xc000187670 sp=0xc000187650 pc=0x74fa0
runtime.chanrecv(0xc0001763f0, 0xc00018778e, 0x1)
	/usr/local/go/src/runtime/chan.go:583 +0x408 fp=0xc000187710 sp=0xc000187670 pc=0x42b48
runtime.chanrecv1(0x461060?, 0x2466a0?)
	/usr/local/go/src/runtime/chan.go:442 +0x14 fp=0xc000187740 sp=0xc000187710 pc=0x42704
testing.(*T).Run(0xc00010ed00, {0x28b812, 0xc}, 0x2b4b08)
	/usr/local/go/src/testing/testing.go:1630 +0x5d4 fp=0xc000187880 sp=0xc000187740 pc=0x180b64
testing.runTests.func1(0x0?)
	/usr/local/go/src/testing/testing.go:2036 +0x84 fp=0xc0001878e0 sp=0xc000187880 pc=0x183ab4
testing.tRunner(0xc00010ed00, 0xc000187b38)
	/usr/local/go/src/testing/testing.go:1576 +0x184 fp=0xc0001879f0 sp=0xc0001878e0 pc=0x17fa14
testing.runTests(0xc000131180?, {0x456330, 0x1, 0x1}, {0xc000187c08?, 0x49a2c?, 0x461420?})
	/usr/local/go/src/testing/testing.go:2034 +0x6ac fp=0xc000187b60 sp=0xc0001879f0 pc=0x18394c
testing.(*M).Run(0xc000131180)
	/usr/local/go/src/testing/testing.go:1906 +0x8e4 fp=0xc000187ec0 sp=0xc000187b60 pc=0x1819c4
main.main()
	_testmain.go:78 +0x2d4 fp=0xc000187f70 sp=0xc000187ec0 pc=0x22bfa4
runtime.main()
	/usr/local/go/src/runtime/proc.go:250 +0x200 fp=0xc000187fd0 sp=0xc000187f70 pc=0x74b80
runtime.goexit()
	/usr/local/go/src/runtime/asm_arm64.s:1172 +0x4 fp=0xc000187fd0 sp=0xc000187fd0 pc=0xa7134

goroutine 2 [force gc (idle)]:
runtime.gopark(0x461c40?, 0x0?, 0x0?, 0x0?, 0x0?)
	/usr/local/go/src/runtime/proc.go:381 +0xe0 fp=0xc0000327a0 sp=0xc000032780 pc=0x74fa0
runtime.goparkunlock(...)
	/usr/local/go/src/runtime/proc.go:387
runtime.forcegchelper()
	/usr/local/go/src/runtime/proc.go:305 +0xb0 fp=0xc0000327d0 sp=0xc0000327a0 pc=0x74df0
runtime.goexit()
	/usr/local/go/src/runtime/asm_arm64.s:1172 +0x4 fp=0xc0000327d0 sp=0xc0000327d0 pc=0xa7134
created by runtime.init.6
	/usr/local/go/src/runtime/proc.go:293 +0x24

goroutine 18 [GC sweep wait]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	/usr/local/go/src/runtime/proc.go:381 +0xe0 fp=0xc000040f60 sp=0xc000040f40 pc=0x74fa0
runtime.goparkunlock(...)
	/usr/local/go/src/runtime/proc.go:387
runtime.bgsweep(0x0?)
	/usr/local/go/src/runtime/mgcsweep.go:278 +0x98 fp=0xc000040fb0 sp=0xc000040f60 pc=0x5fca8
runtime.gcenable.func1()
	/usr/local/go/src/runtime/mgc.go:178 +0x28 fp=0xc000040fd0 sp=0xc000040fb0 pc=0x54d88
runtime.goexit()
	/usr/local/go/src/runtime/asm_arm64.s:1172 +0x4 fp=0xc000040fd0 sp=0xc000040fd0 pc=0xa7134
created by runtime.gcenable
	/usr/local/go/src/runtime/mgc.go:178 +0x6c

goroutine 19 [GC scavenge wait]:
runtime.gopark(0xc00008e000?, 0x2f2b30?, 0x1?, 0x0?, 0x0?)
	/usr/local/go/src/runtime/proc.go:381 +0xe0 fp=0xc000104f50 sp=0xc000104f30 pc=0x74fa0
runtime.goparkunlock(...)
	/usr/local/go/src/runtime/proc.go:387
runtime.(*scavengerState).park(0x4614a0)
	/usr/local/go/src/runtime/mgcscavenge.go:400 +0x5c fp=0xc000104f80 sp=0xc000104f50 pc=0x5dc5c
runtime.bgscavenge(0x0?)
	/usr/local/go/src/runtime/mgcscavenge.go:628 +0x44 fp=0xc000104fb0 sp=0xc000104f80 pc=0x5e174
runtime.gcenable.func2()
	/usr/local/go/src/runtime/mgc.go:179 +0x28 fp=0xc000104fd0 sp=0xc000104fb0 pc=0x54d28
runtime.goexit()
	/usr/local/go/src/runtime/asm_arm64.s:1172 +0x4 fp=0xc000104fd0 sp=0xc000104fd0 pc=0xa7134
created by runtime.gcenable
	/usr/local/go/src/runtime/mgc.go:179 +0xac

goroutine 34 [finalizer wait]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	/usr/local/go/src/runtime/proc.go:381 +0xe0 fp=0xc000046d80 sp=0xc000046d60 pc=0x74fa0
runtime.runfinq()
	/usr/local/go/src/runtime/mfinal.go:193 +0x124 fp=0xc000046fd0 sp=0xc000046d80 pc=0x53e54
runtime.goexit()
	/usr/local/go/src/runtime/asm_arm64.s:1172 +0x4 fp=0xc000046fd0 sp=0xc000046fd0 pc=0xa7134
created by runtime.createfing
	/usr/local/go/src/runtime/mfinal.go:163 +0x80

goroutine 35 [chan receive]:
runtime.gopark(0x0?, 0xc000202030?, 0x18?, 0x0?, 0x4718?)
	/usr/local/go/src/runtime/proc.go:381 +0xe0 fp=0xc000045b40 sp=0xc000045b20 pc=0x74fa0
runtime.chanrecv(0xc000134240, 0xc000045e00, 0x1)
	/usr/local/go/src/runtime/chan.go:583 +0x408 fp=0xc000045be0 sp=0xc000045b40 pc=0x42b48
runtime.chanrecv2(0x0?, 0x1?)
	/usr/local/go/src/runtime/chan.go:447 +0x14 fp=0xc000045c10 sp=0xc000045be0 pc=0x42724
go.flipt.io/flipt/internal/storage/oplock/testing.Harness(0xc00010eea0, {0x2f4f50?, 0xc00011b080})
	/src/internal/storage/oplock/testing/testing.go:66 +0x5f0 fp=0xc000045e40 sp=0xc000045c10 pc=0x22aab0
go.flipt.io/flipt/internal/storage/oplock/memory.Test_Harness(0x0?)
	/src/internal/storage/oplock/memory/memory_test.go:10 +0xf8 fp=0xc000045e90 sp=0xc000045e40 pc=0x22b918
testing.tRunner(0xc00010eea0, 0x2b4b08)
	/usr/local/go/src/testing/testing.go:1576 +0x184 fp=0xc000045fa0 sp=0xc000045e90 pc=0x17fa14
testing.(*T).Run.func1()
	/usr/local/go/src/testing/testing.go:1629 +0x44 fp=0xc000045fd0 sp=0xc000045fa0 pc=0x180d54
runtime.goexit()
	/usr/local/go/src/runtime/asm_arm64.s:1172 +0x4 fp=0xc000045fd0 sp=0xc000045fd0 pc=0xa7134
created by testing.(*T).Run
	/usr/local/go/src/testing/testing.go:1629 +0x5b8

goroutine 37 [sync.Mutex.Lock]:
runtime.gopark(0x496b40?, 0xffffaf070f40?, 0x68?, 0xb?, 0x4f150?)
	/usr/local/go/src/runtime/proc.go:381 +0xe0 fp=0xc000100ad0 sp=0xc000100ab0 pc=0x74fa0
runtime.goparkunlock(...)
	/usr/local/go/src/runtime/proc.go:387
runtime.semacquire1(0xc00011b084, 0x80?, 0x3, 0x1, 0xbc?)
	/usr/local/go/src/runtime/sema.go:160 +0x20c fp=0xc000100b30 sp=0xc000100ad0 pc=0x878ac
sync.runtime_SemacquireMutex(0xc000100b88?, 0x3c?, 0xc000100bb8?)
	/usr/local/go/src/runtime/sema.go:77 +0x28 fp=0xc000100b70 sp=0xc000100b30 pc=0xa3328
sync.(*Mutex).lockSlow(0xc00011b080)
	/usr/local/go/src/sync/mutex.go:171 +0x1e8 fp=0xc000100bc0 sp=0xc000100b70 pc=0xc1ce8
sync.(*Mutex).Lock(0xc00011b080)
	/usr/local/go/src/sync/mutex.go:90 +0x60 fp=0xc000100be0 sp=0xc000100bc0 pc=0xc1ad0
go.flipt.io/flipt/internal/storage/oplock/memory.(*Service).TryAcquire(0xc00011b080, {0xc000100de8?, 0x0?}, {0x289720, 0x4}, 0x1?)
	/src/internal/storage/oplock/memory/memory.go:29 +0xa4 fp=0xc000100d80 sp=0xc000100be0 pc=0x22b2a4
go.flipt.io/flipt/internal/storage/oplock/testing.Harness.func1()
	/src/internal/storage/oplock/testing/testing.go:50 +0x150 fp=0xc000100f20 sp=0xc000100d80 pc=0x22b0d0
golang.org/x/sync/errgroup.(*Group).Go.func1()
	/go/pkg/mod/golang.org/x/sync@v0.1.0/errgroup/errgroup.go:75 +0x70 fp=0xc000100fd0 sp=0xc000100f20 pc=0x22a320
runtime.goexit()
	/usr/local/go/src/runtime/asm_arm64.s:1172 +0x4 fp=0xc000100fd0 sp=0xc000100fd0 pc=0xa7134
created by golang.org/x/sync/errgroup.(*Group).Go
	/go/pkg/mod/golang.org/x/sync@v0.1.0/errgroup/errgroup.go:72 +0x108

goroutine 38 [sync.Mutex.Lock]:
runtime.gopark(0x465cc0?, 0x10?, 0x60?, 0xc0?, 0x49144?)
	/usr/local/go/src/runtime/proc.go:381 +0xe0 fp=0xc000047ad0 sp=0xc000047ab0 pc=0x74fa0
runtime.goparkunlock(...)
	/usr/local/go/src/runtime/proc.go:387
runtime.semacquire1(0xc00011b084, 0x80?, 0x3, 0x1, 0x4c?)
	/usr/local/go/src/runtime/sema.go:160 +0x20c fp=0xc000047b30 sp=0xc000047ad0 pc=0x878ac
sync.runtime_SemacquireMutex(0xc000047b88?, 0x3c?, 0xc000047bb8?)
	/usr/local/go/src/runtime/sema.go:77 +0x28 fp=0xc000047b70 sp=0xc000047b30 pc=0xa3328
sync.(*Mutex).lockSlow(0xc00011b080)
	/usr/local/go/src/sync/mutex.go:171 +0x1e8 fp=0xc000047bc0 sp=0xc000047b70 pc=0xc1ce8
sync.(*Mutex).Lock(0xc00011b080)
	/usr/local/go/src/sync/mutex.go:90 +0x60 fp=0xc000047be0 sp=0xc000047bc0 pc=0xc1ad0
go.flipt.io/flipt/internal/storage/oplock/memory.(*Service).TryAcquire(0xc00011b080, {0xc000047de8?, 0x0?}, {0x289720, 0x4}, 0x1?)
	/src/internal/storage/oplock/memory/memory.go:29 +0xa4 fp=0xc000047d80 sp=0xc000047be0 pc=0x22b2a4
go.flipt.io/flipt/internal/storage/oplock/testing.Harness.func1()
	/src/internal/storage/oplock/testing/testing.go:50 +0x150 fp=0xc000047f20 sp=0xc000047d80 pc=0x22b0d0
golang.org/x/sync/errgroup.(*Group).Go.func1()
	/go/pkg/mod/golang.org/x/sync@v0.1.0/errgroup/errgroup.go:75 +0x70 fp=0xc000047fd0 sp=0xc000047f20 pc=0x22a320
runtime.goexit()
	/usr/local/go/src/runtime/asm_arm64.s:1172 +0x4 fp=0xc000047fd0 sp=0xc000047fd0 pc=0xa7134
created by golang.org/x/sync/errgroup.(*Group).Go
	/go/pkg/mod/golang.org/x/sync@v0.1.0/errgroup/errgroup.go:72 +0x108

goroutine 39 [sync.Mutex.Lock]:
runtime.gopark(0x0?, 0x0?, 0x18?, 0x5b?, 0x49144?)
	/usr/local/go/src/runtime/proc.go:381 +0xe0 fp=0xc000105ad0 sp=0xc000105ab0 pc=0x74fa0
runtime.goparkunlock(...)
	/usr/local/go/src/runtime/proc.go:387
runtime.semacquire1(0xc00011b084, 0x80?, 0x3, 0x1, 0xcc?)
	/usr/local/go/src/runtime/sema.go:160 +0x20c fp=0xc000105b30 sp=0xc000105ad0 pc=0x878ac
sync.runtime_SemacquireMutex(0xc000105b88?, 0x3c?, 0xc000105bb8?)
	/usr/local/go/src/runtime/sema.go:77 +0x28 fp=0xc000105b70 sp=0xc000105b30 pc=0xa3328
sync.(*Mutex).lockSlow(0xc00011b080)
	/usr/local/go/src/sync/mutex.go:171 +0x1e8 fp=0xc000105bc0 sp=0xc000105b70 pc=0xc1ce8
sync.(*Mutex).Lock(0xc00011b080)
	/usr/local/go/src/sync/mutex.go:90 +0x60 fp=0xc000105be0 sp=0xc000105bc0 pc=0xc1ad0
go.flipt.io/flipt/internal/storage/oplock/memory.(*Service).TryAcquire(0xc00011b080, {0xc000105de8?, 0x0?}, {0x289720, 0x4}, 0x1?)
	/src/internal/storage/oplock/memory/memory.go:29 +0xa4 fp=0xc000105d80 sp=0xc000105be0 pc=0x22b2a4
go.flipt.io/flipt/internal/storage/oplock/testing.Harness.func1()
	/src/internal/storage/oplock/testing/testing.go:50 +0x150 fp=0xc000105f20 sp=0xc000105d80 pc=0x22b0d0
golang.org/x/sync/errgroup.(*Group).Go.func1()
	/go/pkg/mod/golang.org/x/sync@v0.1.0/errgroup/errgroup.go:75 +0x70 fp=0xc000105fd0 sp=0xc000105f20 pc=0x22a320
runtime.goexit()
	/usr/local/go/src/runtime/asm_arm64.s:1172 +0x4 fp=0xc000105fd0 sp=0xc000105fd0 pc=0xa7134
created by golang.org/x/sync/errgroup.(*Group).Go
	/go/pkg/mod/golang.org/x/sync@v0.1.0/errgroup/errgroup.go:72 +0x108

goroutine 40 [sync.Mutex.Lock]:
runtime.gopark(0xc000101b18?, 0x48f18?, 0x18?, 0x1b?, 0x49144?)
	/usr/local/go/src/runtime/proc.go:381 +0xe0 fp=0xc000101ad0 sp=0xc000101ab0 pc=0x74fa0
runtime.goparkunlock(...)
	/usr/local/go/src/runtime/proc.go:387
runtime.semacquire1(0xc00011b084, 0x80?, 0x3, 0x1, 0x6e?)
	/usr/local/go/src/runtime/sema.go:160 +0x20c fp=0xc000101b30 sp=0xc000101ad0 pc=0x878ac
sync.runtime_SemacquireMutex(0xc000101b88?, 0x3c?, 0xc000101bb8?)
	/usr/local/go/src/runtime/sema.go:77 +0x28 fp=0xc000101b70 sp=0xc000101b30 pc=0xa3328
sync.(*Mutex).lockSlow(0xc00011b080)
	/usr/local/go/src/sync/mutex.go:171 +0x1e8 fp=0xc000101bc0 sp=0xc000101b70 pc=0xc1ce8
sync.(*Mutex).Lock(0xc00011b080)
	/usr/local/go/src/sync/mutex.go:90 +0x60 fp=0xc000101be0 sp=0xc000101bc0 pc=0xc1ad0
go.flipt.io/flipt/internal/storage/oplock/memory.(*Service).TryAcquire(0xc00011b080, {0xc000101de8?, 0x0?}, {0x289720, 0x4}, 0x1?)
	/src/internal/storage/oplock/memory/memory.go:29 +0xa4 fp=0xc000101d80 sp=0xc000101be0 pc=0x22b2a4
go.flipt.io/flipt/internal/storage/oplock/testing.Harness.func1()
	/src/internal/storage/oplock/testing/testing.go:50 +0x150 fp=0xc000101f20 sp=0xc000101d80 pc=0x22b0d0
golang.org/x/sync/errgroup.(*Group).Go.func1()
	/go/pkg/mod/golang.org/x/sync@v0.1.0/errgroup/errgroup.go:75 +0x70 fp=0xc000101fd0 sp=0xc000101f20 pc=0x22a320
runtime.goexit()
	/usr/local/go/src/runtime/asm_arm64.s:1172 +0x4 fp=0xc000101fd0 sp=0xc000101fd0 pc=0xa7134
created by golang.org/x/sync/errgroup.(*Group).Go
	/go/pkg/mod/golang.org/x/sync@v0.1.0/errgroup/errgroup.go:72 +0x108
FAIL	go.flipt.io/flipt/internal/storage/oplock/memory	2.048s
FAIL

@bcmills bcmills removed the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label Apr 4, 2023
@mknyszek
Copy link
Contributor

mknyszek commented Apr 5, 2023

This may be related to #59435?

EDIT: See the duplicate issue for a short explanation.

@mknyszek
Copy link
Contributor

mknyszek commented Apr 5, 2023

In triage, we think that it might be worth running the program under strace to see if any mappings that the race detector makes is getting clobbered by libc or something else.

@GeorgeMac
Copy link
Author

GeorgeMac commented Apr 6, 2023

Unsure if you wanted me to do the strace or not. But I took a crack at it (Im not super familiar with handling strace).

I could run strace fine if I didn't try to write its output (stderr) to a file for some reason (using both -o and redirection).
It just seemed to end up spewing errors like this into the output file:

--- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=4112, si_uid=0} ---
rt_sigreturn({mask=[]})                 = 1
--- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=4112, si_uid=0} ---
rt_sigreturn({mask=[]})

Bit of googling suggested some flags for strace (-D -DDD and -ff).
I got it to work and produce output files with both -DDD and -ff. However, that causes it to split each subprocesses strace output into its own file.

So I have attached a tarball of that output to this issue.
I understand if you're not going to trust a suspicious tarball though. Any strace advice appreciated 😂 .

strace-output.tar.gz

@thanm
Copy link
Contributor

thanm commented Apr 6, 2023

Thanks. Kind of a messy strace, since it include the Go command and all the stuff it invokes prior to the test executable, but I think I can pick out the necessary bits.

In test.trace.4105 I can see the execve("/tmp/go-build1710159565/b001/memory.test"), and in that same trace I see the mmap calls that look like the ones being made by the tsan runtime (see https://github.com/llvm/llvm-project/blob/5c0c2bc1c8d84d01e7a07e1aab9f468b36d071db/compiler-rt/lib/tsan/rtl/tsan_platform.h#L532 that shows a layout of the tsan regions):

mmap(0x200000840000, 33443840, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x200000840000
madvise(0x200000840000, 33443840, MADV_NOHUGEPAGE) = 0
mmap(0x300000210000, 8388608, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x300000210000
madvise(0x300000210000, 8388608, MADV_NOHUGEPAGE) = 0

but I don't see anything suspicious or that might conflict with Go memory so far.

More worked needed.

@thanm
Copy link
Contributor

thanm commented Apr 6, 2023

Hi, one more request please, which is to build a test executable and dump out the section headers, e.g.

$ cd lipt/internal/storage/oplock/memory
$ go test -c -o t.exe -race -cover 
$ objdump -x --wide t.exe

In my linux-arm64 test binary what is interesting about the offending store is that it is right at the very end of the .noptrbss section. Here's the section:

 21 .noptrbss     00f7c540  0000000000495b00  0000000000495b00  00485b00  2**6
                  ALLOC

and here's the counter variable symbol being written:

0000000001412020 g     O .noptrbss	0000000000000020              go.flipt.io/flipt/internal/storage/oplock/memory.goCover_f4028d57fe93__1

which is literally the last thing there. I'm wondering if maybe we have an off-by-one somewhere that is producing an address that is slightly outside the section (in combination with slightly different stuff on alpine maybe).

Another thing that might conceivably help is the objdump -tldr output for just the function in question (e.g. go.flipt.io/flipt/internal/storage/oplock/memory.(*Service).TryAcquire. Thanks.

@GeorgeMac
Copy link
Author

GeorgeMac commented Apr 7, 2023

I've attached both outputs below:

  • objdump -x --wide : dump.txt
  • objdump -tlr -d="go.flipt.io/flipt/internal/storage/oplock/memory.(*Service).TryAcquire": dump.tldr.txt

@thanm
Copy link
Contributor

thanm commented Apr 7, 2023

Thanks. From my analysis it looks about the same on your system as on my linux/arm64-- faulting address is way at the end of the .noptrbss section, but does seem to be legal (unless I messed up my math).

@thanm
Copy link
Contributor

thanm commented Jun 15, 2023

http://go.dev/cl/c/go/+/503937 may also fix this issue -- very similar failure modes, I am willing to bet this is the same bug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
Development

No branches or pull requests

6 participants