Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nomad panic when capturing threadcreate pprof on version 0.11.0 to 0.11.2 #12804

Closed
davemay99 opened this issue Apr 27, 2022 · 4 comments · Fixed by #12807
Closed

Nomad panic when capturing threadcreate pprof on version 0.11.0 to 0.11.2 #12804

davemay99 opened this issue Apr 27, 2022 · 4 comments · Fixed by #12807

Comments

@davemay99
Copy link
Contributor

Nomad version

  • 0.11.0
  • 0.11.1
  • 0.11.2

Issue

When running nomad operator debug against Nomad versions 0.11.0 to 0.11.2, the Nomad server agent crashes.

Reproduction steps

Window 1 - dev agent

curl https://releases.hashicorp.com/nomad/0.11.2/nomad_0.11.2_linux_amd64.zip -o nomad-0.11.2.zip
unzip nomad-0.11.2.zip && mv nomad nomad-0.11.2
./nomad-0.11.2 agent -dev

Window 2 - debug

curl https://releases.hashicorp.com/nomad/1.2.6/nomad_1.2.6_linux_amd64.zip -o nomad-1.2.6.zip
unzip nomad-1.2.6.zip && mv nomad nomad-1.2.6
./nomad-1.2.6 operator debug -duration=1s -interval=1s

Expected Result

No panic

Actual Result

panic

Nomad logs with top of goroutine dump
2022-04-27T16:58:26.010-0400 [DEBUG] http: request complete: method=GET path=/v1/agent/pprof/profile?debug=0&gc=0&node_id=f7bf7777-b4b4-fa4b-cd73-078ed13e1069&seconds=1&server_id= duration=1.008276789s
2022-04-27T16:58:26.018-0400 [DEBUG] http: request complete: method=GET path=/v1/agent/pprof/goroutine?debug=1&gc=0&node_id=f7bf7777-b4b4-fa4b-cd73-078ed13e1069&seconds=1&server_id= duration=3.814216ms
2022-04-27T16:58:26.025-0400 [DEBUG] http: request complete: method=GET path=/v1/agent/pprof/goroutine?debug=2&gc=0&node_id=f7bf7777-b4b4-fa4b-cd73-078ed13e1069&seconds=1&server_id= duration=3.336164ms
2022-04-27T16:58:26.031-0400 [DEBUG] http: request complete: method=GET path=/v1/agent/pprof/goroutine?debug=0&gc=0&node_id=f7bf7777-b4b4-fa4b-cd73-078ed13e1069&seconds=1&server_id= duration=2.904533ms
2022-04-27T16:58:27.034-0400 [DEBUG] http: request complete: method=GET path=/v1/agent/pprof/trace?debug=0&gc=0&node_id=f7bf7777-b4b4-fa4b-cd73-078ed13e1069&seconds=1&server_id= duration=1.002318087s
2022-04-27T16:58:27.046-0400 [DEBUG] http: request complete: method=GET path=/v1/agent/pprof/heap?debug=0&gc=0&node_id=f7bf7777-b4b4-fa4b-cd73-078ed13e1069&seconds=1&server_id= duration=8.222104ms
2022-04-27T16:58:27.051-0400 [DEBUG] http: request complete: method=GET path=/v1/agent/pprof/allocs?debug=0&gc=0&node_id=f7bf7777-b4b4-fa4b-cd73-078ed13e1069&seconds=1&server_id= duration=3.447258ms

runtime: pcdata is -2 and 4 locals stack map entries for runtime/pprof.runtime_expandFinalInlineFrame (targetpc=0x4578d9)
fatal error: bad symbol table

runtime stack:
runtime.throw(0x3073148, 0x10)
runtime/panic.go:1114 +0x72
runtime.getStackMap(0x7f93e2ffcad0, 0x7f93e2ffcbd8, 0x457801, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
runtime/stack.go:1246 +0x741
runtime.adjustframe(0x7f93e2ffcad0, 0x7f93e2ffcbc0, 0x56362c0)
runtime/stack.go:636 +0x84
runtime.gentraceback(0xffffffffffffffff, 0xffffffffffffffff, 0x0, 0xc000cdbe00, 0x0, 0x0, 0x7fffffff, 0x3145ef0, 0x7f93e2ffcbc0, 0x0, ...)
runtime/traceback.go:334 +0x110e
runtime.copystack(0xc000cdbe00, 0x4000)
runtime/stack.go:888 +0x291
runtime.newstack()
runtime/stack.go:1043 +0x20b
runtime.morestack()
runtime/asm_amd64.s:449 +0x8f

goroutine 2141 [copystack]:
time.Time.AppendFormat(0xc092897cc3348755, 0x24bf167875, 0x577acc0, 0xc000d1a4c8, 0x0, 0x40, 0x30a66d9, 0x1c, 0xc000b047a8, 0x1c, ...)
time/format.go:511 +0x20c4 fp=0xc000d1a468 sp=0xc000d1a460 pc=0x4cf084
time.Time.Format(0xc092897cc3348755, 0x24bf167875, 0x577acc0, 0x30a66d9, 0x1c, 0x160000001b, 0x500000002)
time/format.go:505 +0xbc fp=0xc000d1a518 sp=0xc000d1a468 pc=0x4ccf1c
github.com/hashicorp/nomad/vendor/github.com/hashicorp/go-hclog.(*intLogger).logPlain(0xc00010eea0, 0xc092897cc3348755, 0x24bf167875, 0x577acc0, 0x30567d5, 0x4, 0x2, 0x3073ed8, 0x10, 0xc00079a600, ...)
github.com/hashicorp/nomad/vendor/github.com/hashicorp/go-hclog/intlogger.go:175 +0x83 fp=0xc000d1a680 sp=0xc000d1a518 pc=0x8c1863
github.com/hashicorp/nomad/vendor/github.com/hashicorp/go-hclog.(*intLogger).log(0xc00010eea0, 0x30567d5, 0x4, 0x2, 0x3073ed8, 0x10, 0xc00079a600, 0x6, 0x6)
github.com/hashicorp/nomad/vendor/github.com/hashicorp/go-hclog/intlogger.go:136 +0x201 fp=0xc000d1a720 sp=0xc000d1a680 pc=0x8c1661
github.com/hashicorp/nomad/vendor/github.com/hashicorp/go-hclog.(*intLogger).Debug(0xc00010eea0, 0x3073ed8, 0x10, 0xc00079a600, 0x6, 0x6)
github.com/hashicorp/nomad/vendor/github.com/hashicorp/go-hclog/intlogger.go:441 +0x7a fp=0xc000d1a778 sp=0xc000d1a720 pc=0x8c3f8a
github.com/hashicorp/nomad/vendor/github.com/hashicorp/go-hclog.(*interceptLogger).Debug(0xc000675cb0, 0x3073ed8, 0x10, 0xc00079a600, 0x6, 0x6)
github.com/hashicorp/nomad/vendor/github.com/hashicorp/go-hclog/interceptlogger.go:62 +0x99 fp=0xc000d1a868 sp=0xc000d1a778 pc=0x8bfd29
github.com/hashicorp/nomad/command/agent.(*HTTPServer).wrapNonJSON.func1.1(0xc0000a2780, 0xc000e74f00, 0xc00064e2a0, 0x6b, 0xc092897cc32e3f24, 0x24bf10305c, 0x577acc0)
github.com/hashicorp/nomad/command/agent/http.go:514 +0x199 fp=0xc000d1a8d0 sp=0xc000d1a868 pc=0x247ca99
runtime.call64(0x0, 0x313fdb0, 0xc0002b8648, 0x3800000038)
runtime/asm_amd64.s:540 +0x3b fp=0xc000d1a920 sp=0xc000d1a8d0 pc=0x46adeb
runtime.reflectcallSave(0xc000d1aa50, 0x313fdb0, 0xc0002b8648, 0xc000000038)
runtime/panic.go:879 +0x58 fp=0xc000d1a950 sp=0xc000d1a920 pc=0x438f38
runtime.runOpenDeferFrame(0xc000cdbe00, 0xc0002b8600, 0x0)
runtime/panic.go:853 +0x2b9 fp=0xc000d1a9e0 sp=0xc000d1a950 pc=0x438df9
panic(0x2e0b6e0, 0xc0010a8920)
runtime/panic.go:967 +0x166 fp=0xc000d1aa98 sp=0xc000d1a9e0 pc=0x439146
runtime.goPanicIndex(0xffffffffffffffff, 0x0)
runtime/panic.go:88 +0xa3 fp=0xc000d1aae0 sp=0xc000d1aa98 pc=0x4371d3
runtime/pprof.runtime_expandFinalInlineFrame(0xc000dba000, 0x0, 0x20, 0x0, 0x0, 0x0)
runtime/symtab.go:156 +0x3aa fp=0xc000d1ac78 sp=0xc000d1aae0 pc=0x4578da
runtime/pprof.(*profileBuilder).appendLocsForStack(0xc000db26e0, 0x0, 0x0, 0x0, 0xc000dba000, 0x0, 0x20, 0x2, 0x800, 0x203000)
runtime/pprof/proto.go:389 +0x7f fp=0xc000d1ad38 sp=0xc000d1ac78 pc=0xbd0c8f
runtime/pprof.printCountProfile(0x376fa80, 0xc0003da0f0, 0x0, 0x3067615, 0xc, 0x37a9f20, 0xc0010c6860, 0x4070c9, 0xc000b05260)
runtime/pprof/pprof.go:449 +0x9ab fp=0xc000d1af70 sp=0xc000d1ad38 pc=0xbcb04b
runtime/pprof.writeRuntimeProfile(0x376fa80, 0xc0003da0f0, 0x0, 0x3067615, 0xc, 0x3145ea8, 0x7f93e84b1e30, 0x3056419)
runtime/pprof/pprof.go:702 +0x119 fp=0xc000d1afd8 sp=0xc000d1af70 pc=0xbcd6e9
runtime/pprof.writeThreadCreate(0x376fa80, 0xc0003da0f0, 0x0, 0x203000, 0x406a9a)
runtime/pprof/pprof.go:643 +0x60 fp=0xc000d1b028 sp=0xc000d1afd8 pc=0xbcd370
runtime/pprof.(*Profile).WriteTo(0x55e20e0, 0x376fa80, 0xc0003da0f0, 0x0, 0x2dfd0c0, 0xc0009c66f0)
runtime/pprof/pprof.go:329 +0x3da fp=0xc000d1b150 sp=0xc000d1b028 pc=0xbca21a
github.com/hashicorp/nomad/command/agent/pprof.Profile(0xc000cdc942, 0xc, 0x0, 0x0, 0x0, 0x0, 0x0, 0xc000b052c8, 0x411476, 0xc0003da0c0)
github.com/hashicorp/nomad/command/agent/pprof/pprof.go:69 +0xd9 fp=0xc000d1b1f0 sp=0xc000d1b150 pc=0xbd57c9
github.com/hashicorp/nomad/client.(*Agent).Profile(0xc0000116f8, 0xc000136f00, 0xc0003da090, 0x0, 0x0)
github.com/hashicorp/nomad/client/agent_endpoint.go:59 +0x4f7 fp=0xc000d1b2b8 sp=0xc000d1b1f0 pc=0x17a7f07
runtime.call64(0xc0009c6690, 0xc0000118b8, 0xc0003da0c0, 0x1800000028)
runtime/asm_amd64.s:540 +0x3b fp=0xc000d1b308 sp=0xc000d1b2b8 pc=0x46adeb
reflect.Value.call(0xc000404660, 0xc0000118b8, 0x13, 0x305611d, 0x4, 0xc000b05600, 0x3, 0x3, 0xc0003da090, 0xc000b05658, ...)
reflect/value.go:460 +0x8ab fp=0xc000d1b520 sp=0xc000d1b308 pc=0x4a45bb
reflect.Value.Call(0xc000404660, 0xc0000118b8, 0x13, 0xc000b05600, 0x3, 0x3, 0x8, 0x16, 0x16)
reflect/value.go:321 +0xb4 fp=0xc000d1b5a0 sp=0xc000d1b520 pc=0x4a3ac4
net/rpc.(*service).call(0xc000383dc0, 0xc000123db0, 0xc000d45420, 0x0, 0xc00022bb00, 0xc0010b14a0, 0x2e0e860, 0xc000136f00, 0x16, 0x2b6d520, ...)
net/rpc/server.go:377 +0x17f fp=0xc000d1b668 sp=0xc000d1b5a0 pc=0x901f1f
net/rpc.(*Server).ServeRequest(0xc000123db0, 0x37cea20, 0xc00107da00, 0x2c834c0, 0x57abf01)
net/rpc/server.go:498 +0x23e fp=0xc000d1b710 sp=0xc000d1b668 pc=0x902dfe
github.com/hashicorp/nomad/client.(*Client).ClientRPC(0xc000018c80, 0x3067df0, 0xd, 0x26d4f80, 0xc0003a4278, 0x2b6d520, 0xc0003da060, 0x6, 0x6)
github.com/hashicorp/nomad/client/rpc.go:37 +0xb9 fp=0xc000d1b750 sp=0xc000d1b710 pc=0x17d12e9
github.com/hashicorp/nomad/command/agent.(*HTTPServer).agentPprof(0xc0000a2780, 0x3059712, 0x6, 0x37b36a0, 0xc000405860, 0xc000e74f00, 0x0, 0x1, 0x1d, 0x70, ...)
github.com/hashicorp/nomad/command/agent/agent_endpoint.go:398 +0x6e0 fp=0xc000d1b8b0 sp=0xc000d1b750 pc=0x2434230
github.com/hashicorp/nomad/command/agent.(*HTTPServer).AgentPprofRequest(0xc0000a2780, 0x37b36a0, 0xc000405860, 0xc000e74f00, 0xc000928990, 0x4d6f96, 0x6269ae73, 0x32e3f24, 0x6d31b4861795c2)
github.com/hashicorp/nomad/command/agent/agent_endpoint.go:356 +0x1ad fp=0xc000d1b948 sp=0xc000d1b8b0 pc=0x243376d
github.com/hashicorp/nomad/command/agent.(*HTTPServer).AgentPprofRequest-fm(0x37b36a0, 0xc000405860, 0xc000e74f00, 0x3070bf8, 0x10, 0x4183e1, 0xd0, 0x8)
github.com/hashicorp/nomad/command/agent/agent_endpoint.go:337 +0x48 fp=0xc000d1b9a0 sp=0xc000d1b948 pc=0x2481a68
github.com/hashicorp/nomad/command/agent.(*HTTPServer).wrapNonJSON.func1(0x37b36a0, 0xc000405860, 0xc000e74f00)
github.com/hashicorp/nomad/command/agent/http.go:516 +0x167 fp=0xc000d1baa0 sp=0xc000d1b9a0 pc=0x247cc67
net/http.HandlerFunc.ServeHTTP(0xc000424500, 0x37b36a0, 0xc000405860, 0xc000e74f00)
net/http/server.go:2012 +0x44 fp=0xc000d1bac8 sp=0xc000d1baa0 pc=0x77f9a4
net/http.(*ServeMux).ServeHTTP(0xc0005b2200, 0x37b36a0, 0xc000405860, 0xc000e74f00)
net/http/server.go:2387 +0x1a5 fp=0xc000d1bb28 sp=0xc000d1bac8 pc=0x7818e5
github.com/hashicorp/nomad/vendor/github.com/NYTimes/gziphandler.GzipHandlerWithOpts.func1.1(0x37b7720, 0xc000d4ab60, 0xc000e74f00)
github.com/hashicorp/nomad/vendor/github.com/NYTimes/gziphandler/gzip.go:269 +0x1e6 fp=0xc000d1bbc0 sp=0xc000d1bb28 pc=0x240f846
net/http.HandlerFunc.ServeHTTP(0xc0007911a0, 0x37b7720, 0xc000d4ab60, 0xc000e74f00)
net/http/server.go:2012 +0x44 fp=0xc000d1bbe8 sp=0xc000d1bbc0 pc=0x77f9a4
net/http.serverHandler.ServeHTTP(0xc000146460, 0x37b7720, 0xc000d4ab60, 0xc000e74f00)
net/http/server.go:2807 +0xa3 fp=0xc000d1bc18 sp=0xc000d1bbe8 pc=0x782e33
net/http.(*conn).serve(0xc00037f540, 0x37ce760, 0xc0012a4680)
net/http/server.go:1895 +0x86c fp=0xc000d1bfc8 sp=0xc000d1bc18 pc=0x77e7ac
runtime.goexit()
runtime/asm_amd64.s:1373 +0x1 fp=0xc000d1bfd0 sp=0xc000d1bfc8 pc=0x46cac1
created by net/http.(*Server).Serve
net/http/server.go:2933 +0x35c

@davemay99
Copy link
Contributor Author

Further testing shows that the trace profile was the last to successfully complete, not the last to run. The pprof profile which actually caused the panic is threadcreate.

@tgross
Copy link
Member

tgross commented Apr 27, 2022

A quick test shows that /debug/pprof/threadcreate will cause the panic in Nomad 0.11.2, so this isn't specific to the operator debug command. (Note that /debug/pprof/threadcreate?debug=1 which is linked from the pprof web page is fine.)

If you're going to push a PR with a fix for operator debug, that seems reasonable. Nomad 0.11 is out of support so I don't see much point in adding docs to the enable_debug config docs warning against it. I'm going to change the issue title to help searchability.

@tgross tgross changed the title Nomad panic when capturing pprof on version 0.11.0 to 0.11.2 Nomad panic when capturing threadcreate pprof on version 0.11.0 to 0.11.2 Apr 27, 2022
@davemay99 davemay99 self-assigned this Apr 28, 2022
@tgross
Copy link
Member

tgross commented Apr 28, 2022

I took a crack at replicating this behavior in a standalone application with pprof and it looks like it's not even a stdlib issue but some interaction between the stdlib and the version of hclog in that version of Nomad. But this is all pre-go-modules and the effort of trying to get builds of old versions of libraries is more than we should time box for this 😀 .

@github-actions
Copy link

github-actions bot commented Oct 8, 2022

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 8, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants