-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: fatal error: found bad pointer in Go heap (incorrect use of unsafe or cgo?) #37688
Comments
@peterbourgon, does a binary built with |
Frustratingly, I've been unable to reproduce, even with the same binary on the same host. I'm still working at it... |
I'm seeing the same issue with a package I maintain. Fails with 1.14.0 without To reproduce:
If I skip all the tests that involve panic recovery, then it no longer reproduces:
|
Possibly related to #37664 ? @danscales |
Though it's always possible, this doesn't look likely related to #37664 . In that bug, I would expect either an invalid pc during a traceback (gentraceback()) from a stale defer or a problem during adjustdefers() or tracebackdefers() in scanstack(). It would easy to confirm if we have a repro case (but that's hard to come by, I know). |
@danscales the repro is in #37688 (comment), but I've applied https://go-review.googlesource.com/c/go/+/222420/ locally and it doesn't resolve the issue. |
Small update: just confirmed that this issue persists with 1.14.1. I ran a |
Thanks. That is https://golang.org/cl/190098 which added low-cost defers. @danscales |
Yes, thanks for bisecting it! I have reproduced it on my system, so I will start debugging it. |
I understand the cause and have a simple fix. Working to see if I can create a much simpler test that can reproduce the problem. Not sure if that will be doable, because of the complex interactions and non-determinism involving the GC, etc. |
@gopherbot Please open a backport to 1.14. This problem causes program crashes with no workaround. |
Backport issue(s) opened: #37968 (for 1.14). Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://golang.org/wiki/MinorReleases. |
Change https://golang.org/cl/224581 mentions this issue: |
Change https://golang.org/cl/225279 mentions this issue: |
…d to g0 defer list during panic newdefer() actually adds the new defer to the current g's defer chain. That happens even if we are on the system stack, in which case the g will be the g0 stack. For open-coded defers, we call newdefer() (only during panic processing) while on the system stack, so the new defer is unintentionally added to the g0._defer defer list. The code later correctly adds the defer to the user g's defer list. The g0._defer list is never used. However, that pointer on the g0._defer list can keep a defer struct alive that is intended to be garbage-collected (smaller defers use a defer pool, but larger-sized defer records are just GC'ed). freedefer() does not zero out pointers when it intends that a defer become garbage-collected. So, we can have the pointers in a defer that is held alive by g0._defer become invalid (in particular d.link). This is the cause of the bad pointer bug in this issue The fix is to change newdefer (only used in two places) to not add the new defer to the gp._defer list. We just do it after the call with the correct gp pointer. (As mentioned above, this code was already there after the newdefer in addOneOpenDeferFrame.) That ensures that defers will be correctly garbage-collected and eliminate the bad pointer. This fix definitely fixes the original repro. I added a test and tried hard to reproduce the bug (based on the original repro code), but awasn't actually able to cause the bug. However, the test is still an interesting mix of heap-allocated, stack-allocated, and open-coded defers. For #37688 Fixes #37968 Fixes #37688 Change-Id: I1a481b9d9e9b9ba4e8726ef718a1f4512a2d6faf Reviewed-on: https://go-review.googlesource.com/c/go/+/224581 Run-TryBot: Dan Scales <danscales@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> (cherry picked from commit 825ae71) Reviewed-on: https://go-review.googlesource.com/c/go/+/225279 Run-TryBot: Ian Lance Taylor <iant@golang.org> Reviewed-by: Dan Scales <danscales@google.com>
Hello, I'm seeing panics such as the one this ticket shows with go1.14.4 but NOT with go1.14.2. #37968 says these are addressed with go1.14.2, so have they sneaked back in somehow?
|
I see the panics with 1.14.3 as well ..
|
@abhinavdangeti, please open a new issue and include details on what you tried that failed, with a reproducer program or directions on how to replicate your situation. |
See comments: golang/go#37688 (comment) See those panics with 1.14.3, 1.14.4. And with 1.14.2: runtime: nelems=5 nalloc=3 previous allocCount=1 nfreed=65534 fatal error: sweep increased allocation count runtime stack: runtime.throw(0x1c80007, 0x20) /Users/abhinavdangeti/.cbdepscache/exploded/x86_64/go-1.14.2/go/src/runtime/panic.go:1116 +0x72 runtime.(*mspan).sweep(0x2f5ec68, 0x2f5ec00, 0x7000049eee01) /Users/abhinavdangeti/.cbdepscache/exploded/x86_64/go-1.14.2/go/src/runtime/mgcsweep.go:328 +0x8b8 runtime.(*mcentral).uncacheSpan(0x2727458, 0x2f5ec68) /Users/abhinavdangeti/.cbdepscache/exploded/x86_64/go-1.14.2/go/src/runtime/mcentral.go:197 +0x79 runtime.(*mcache).releaseAll(0x2943560) /Users/abhinavdangeti/.cbdepscache/exploded/x86_64/go-1.14.2/go/src/runtime/mcache.go:158 +0x6b runtime.(*mcache).prepareForSweep(0x2943560) /Users/abhinavdangeti/.cbdepscache/exploded/x86_64/go-1.14.2/go/src/runtime/mcache.go:185 +0x46 runtime.gcMarkTermination.func4.1(0xc00004f800) /Users/abhinavdangeti/.cbdepscache/exploded/x86_64/go-1.14.2/go/src/runtime/mgc.go:1755 +0x2f runtime.forEachP(0x1cb56c0) /Users/abhinavdangeti/.cbdepscache/exploded/x86_64/go-1.14.2/go/src/runtime/proc.go:1260 +0x119 runtime.gcMarkTermination.func4() /Users/abhinavdangeti/.cbdepscache/exploded/x86_64/go-1.14.2/go/src/runtime/mgc.go:1754 +0x2d runtime.systemstack(0xa500000) /Users/abhinavdangeti/.cbdepscache/exploded/x86_64/go-1.14.2/go/src/runtime/asm_amd64.s:370 +0x66 runtime.mstart() /Users/abhinavdangeti/.cbdepscache/exploded/x86_64/go-1.14.2/go/src/runtime/proc.go:1041 Change-Id: I11b95706a42db5665f3269f97adcfc82c394a293 Reviewed-on: http://review.couchbase.org/c/cbft/+/130933 Well-Formed: Build Bot <build@couchbase.com> Reviewed-by: Chris Hillery <ceej@couchbase.com> Tested-by: Abhinav Dangeti <abhinav@couchbase.com>
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
Yes.
What operating system and processor architecture are you using (
go env
)?go env
OutputWhat did you do?
I built a version of my program with a minor code change, and for the first time switching from the go 1.13 compiler to go 1.14. I then deployed that version of the program to my staging environment, all linux/amd64 machines.
What did you expect to see?
The program to start up normally on all hosts.
What did you see instead?
On about half of the hosts, the program crashed on startup with slight variations on this message:
I don't use unsafe or cgo in my program directly, though some of my transitive dependencies might, I can't say for sure.
The text was updated successfully, but these errors were encountered: