-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: "fatal: morestack on g0" on Linux #23360
Comments
My Linux kernel is "Linux 3.10.0-514.el7.x86_64 #1 SMP Tue Nov 22 16:42:41 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux" Line 45 in bf9ad70
Could I say that I meet the bug of Linux kernel? |
Hard to tell w/o a reproduction case. |
@cznic Thanks! I am trying to reproduce it. But I am not sure whether I could make it. |
I'll also page some other folks to be aware of this as we wait for more information |
I upgrade my gdb to 8.0 and find that there are 74 threads but all of them are waiting here: runtime.futex () at /usr/local/go/src/runtime/sys_linux_amd64.s:439 |
This looks like memory corruption. Have you tried running your program under the race detector? See https://blog.golang.org/race-detector . |
@davecheney Thanks! |
We just figured out how to print
All |
Deadlock could be caused from kernel bug because our binary in running against linux 3.10 which has a kernel bug causing potential deadlock. For second reason, we are confused why all thread's schedulers are crashed. Per my understanding, all threads have their own |
It's normal for an inactive goroutine to be sitting in Your original report, You suggest that a |
@shenli, it would be useful if you could dump all of the thread stacks from gdb using Unfortunately, I wouldn't expect the |
@aclements The output of |
@ianlancetaylor In our scenario, g0 is gone but the whole process is still there. |
I don't know what you mean when you say that g0 is gone. Note that g0 is not printed by |
@ianlancetaylor I meet |
That is a fatal error message that should crash the entire program. It is not intended to leave the program running without g0. Thanks for the stack trace. It does most threads waiting for a lock on
This shows that the program got signal 5 ( } else if _g_.m.mcache == nil { // can happen if called from signal handler or throw
_g_.m.mcache = allocmcache()
} This makes me wonder whether the signal occurred while holding the lock. I don't know if that can happen while I don't understand how to relate this to Unfortunately nothing in the stack trace tells me where the |
@ianlancetaylor We use gdb to get the thread info. So I think it may cause the |
If gdb triggers the In any case I now see that this is something of a distraction. After the program gets That brings me back to thinking about memory corruption. Have you been able to run your test program compiled with the race detector? |
It does mean that whatever went wrong went wrong on thread 2. I'm surprised gdb couldn't walk over the signal stack switch, but we might be able to get enough from the signal context to unwind the stack ourselves. @shenli, I assume you're doing this all from a core file? Or do you still have the process around? Either way, could you try running:
|
@aclements The result is here:
|
@ianlancetaylor This happened in a production environment. So we could not enable race detection. I built a test environment and try to produce the same workload. But have not got the error. |
We met this issue a few times. We decided to build two binaries: one with Golang 1.7 and another one with Golang 1.10. We will run the two binaries in the production environment and check if the fatal still there. |
We build a new binary with Golang 1.10 and put it into the production environment. It has been running for more than two days w/o the fatal problem. I'm keeping my eyes on it. |
I took another look at the thread stack dump and there's something I can't figure out. What's holding |
@shenli, thanks for the gdb output. If you still have the crashed process or core dump, could you run the following in gdb?
|
Thanks. You'll need to tweak the instructions for a different crash instance. To find the right thread number, do |
@aclements Thanks! Got it! |
Change https://golang.org/cl/88835 mentions this issue: |
Currently, startpanic_m (which prepares for an unrecoverable panic) goes out of its way to make it possible to allocate during panic handling by allocating an mcache if there isn't one. However, this is both potentially dangerous and unnecessary. Allocating an mcache is a generally complex thing to do in an already precarious situation. Specifically, it requires obtaining the heap lock, and there's evidence that this may be able to deadlock (#23360). However, it's also unnecessary because we never allocate from the unrecoverable panic path. This didn't use to be the case. The call to allocmcache was introduced long ago, in CL 7388043, where it was in preparation for separating Ms and Ps and potentially running an M without an mcache. At the time, after calling startpanic, the runtime could call String and Error methods on panicked values, which could do anything including allocating. That was generally unsafe even at the time, and CL 19792 fixed this be pre-printing panic messages before calling startpanic. As a result, we now no longer allocate after calling startpanic. This CL not only removes the allocmcache call, but goes a step further to explicitly disallow any allocation during unrecoverable panic handling, even in situations where it might be safe. This way, if panic handling ever does an allocation that would be unsafe in unusual circumstances, we'll know even if it happens during normal circumstances. This would help with debugging #23360, since the deadlock in allocmcache is currently masking the real failure. Beyond all.bash, I manually tested this change by adding panics at various points in early runtime init, signal handling, and the scheduler to check unusual panic situations. Change-Id: I85df21e2b4b20c6faf1f13fae266c9339eebc061 Reviewed-on: https://go-review.googlesource.com/88835 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>
@aclements The related PR is merged. So I think we will not face the deadlock issue anymore. I will wait for the new release. |
@ALTree Yes, I will update this issue when we could figure out the real problem. I'm waiting for the new Golang release. |
I don't know if this is related or not, but I can reproduce this quite reliably on Mac OS 10.12.6 and
and then quitting less (
Should I open a separate ticket? |
@shenli , have you been able to reproduce this with Go 1.10? |
@aclements It never happens with Go 1.10. I guess it is fixed. But I will keep watching it. Thanks! |
Thanks for checking! I'll go ahead and close this, but let us know if you're able to reproduce it. |
I have this occasionally on some Gentoo servers with go 1.10.1, kernel 4.8.17-hardened and prometheus-node-exporter. I cannot reproduce it. And it happens only after some weeks of running. |
I also met the issue ,fatal: morestack on g0 , it happens only after some weeks of running. GO VERSION is 1.11.4 .Kernel is 4.9.0-040900-generic |
I also meet the same issue,
|
@liuyanhit @ahjdzx i'm sorry to hear you are having issues. This issue has been closed for over six months, please file a new issue including as much detail as possible. Thank you |
Please answer these questions before submitting your issue. Thanks!
What version of Go are you using (
go version
)?go version go1.9.1 linux/amd64
Does this issue reproduce with the latest release?
I do not know how to reproduce it. But it occurs twice.
What operating system and processor architecture are you using (
go env
)?What did you do?
Run a binary as daemon service.
What did you expect to see?
The process runs forever.
What did you see instead?
I get "fatal: morestack on g0" from stderr. The process is still there but does not respond anymore. When I use
curl http://ip:port/debug/pprof/goroutine?debug=1
to check the stack, but it halts. There is nothing useful in stderr or dmesg.I tried pstack and got the following result. It is very strange that there is only one thread alive. I have googled it but get nothing useful.
The text was updated successfully, but these errors were encountered: