Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix nil pointer exception when running status with failed allocations #2604

Closed
wants to merge 1 commit into from

Conversation

clinta
Copy link
Contributor

@clinta clinta commented May 2, 2017

When a job has an allocation which failed to be created alloc.Job is nil, causing a nil pointer exception and an empty file to be returned to nomad status.

@dadgar
Copy link
Contributor

dadgar commented May 2, 2017

@clinta What version of Nomad are you on? There shouldn't ever be an allocation without a job so it is really a bug that this would be masking.

@clinta
Copy link
Contributor Author

clinta commented May 2, 2017

I'm running from master, because I need the feature in #2535 which has not yet made it into a release.

Here's some logs that provide context of the allocation that got into this state. It was for the task group hdfs-namenode2. This container wouldn't start due to an issue with a docker volume driver. It appears that after it failed to start, nomad did not remove the container and tried to create it again, the subsequent create operations failing because the old container already existed.

May 02 11:30:52 dc0-cls02 nomad[15432]:     2017/05/02 11:30:52.104417 [ERR] driver.docker: Failed to create container: Post http://unix.sock/containers/create?name=hdfs-namenode2-399ea264-2d63-7b83-9484-be35b6dda4ad: EOF
May 02 11:30:52 dc0-cls02 nomad[15432]:     2017/05/02 11:30:52.107207 [WARN] client: failed to start task "hdfs-namenode2" for alloc "399ea264-2d63-7b83-9484-be35b6dda4ad": Failed to create container: Post http://unix.sock/containers/create?name=hdfs-namenode2-399ea264-2d63-7b83-9484-be35b6dda4ad: EOF
May 02 11:30:52 dc0-cls02 nomad[15432]:     2017/05/02 11:30:52.108120 [INFO] client: Restarting task "hdfs-namenode2" for alloc "399ea264-2d63-7b83-9484-be35b6dda4ad" in 15.146291303s
May 02 11:31:12 dc0-cls02 nomad[15432]:     2017/05/02 11:31:12.340198 [ERR] driver.docker: Failed to create container: container already exists
May 02 11:31:12 dc0-cls02 nomad[15432]:     2017/05/02 11:31:12.343062 [WARN] client: failed to start task "hdfs-namenode2" for alloc "399ea264-2d63-7b83-9484-be35b6dda4ad": Failed to create container: container already exists
May 02 11:31:12 dc0-cls02 nomad[15432]:     2017/05/02 11:31:12.343665 [INFO] client: Not restarting task: hdfs-namenode2 for alloc: 399ea264-2d63-7b83-9484-be35b6dda4ad
May 02 11:31:12 dc0-cls02 nomad[15432]:     2017/05/02 11:31:12.345574 [INFO] client: marking allocation 399ea264-2d63-7b83-9484-be35b6dda4ad for GC
May 02 11:31:12 dc0-cls02 nomad[15432]:     2017/05/02 11:31:12.348991 [INFO] client: marking allocation 399ea264-2d63-7b83-9484-be35b6dda4ad for GC
May 02 11:33:38 dc0-cls02 nomad[15432]:     2017/05/02 11:33:38 http: panic serving 10.1.6.184:34408: runtime error: invalid memory address or nil pointer dereference
May 02 11:33:38 dc0-cls02 nomad[15432]: goroutine 1037521 [running]:
May 02 11:33:38 dc0-cls02 nomad[15432]: net/http.(*conn).serve.func1(0xc4201d9900)
May 02 11:33:38 dc0-cls02 nomad[15432]:         /usr/lib/go/src/net/http/server.go:1721 +0xd0
May 02 11:33:38 dc0-cls02 nomad[15432]: panic(0xfbb1e0, 0x18ac2b0)
May 02 11:33:38 dc0-cls02 nomad[15432]:         /usr/lib/go/src/runtime/panic.go:489 +0x2cf
May 02 11:33:38 dc0-cls02 nomad[15432]: github.com/hashicorp/nomad/nomad/state.(*StateStore).AllocsByJob(0xc4222a8c80, 0x0, 0xc4207cd41c, 0x4, 0xc421909100, 0xc4222a8c60, 0xc4219091c0, 0xc4219091c0, 0x411cd8, 0x20)
May 02 11:33:38 dc0-cls02 nomad[15432]:         /home/clint/go/src/github.com/hashicorp/nomad/nomad/state/state_store.go:1164 +0x3ed
May 02 11:33:38 dc0-cls02 nomad[15432]: github.com/hashicorp/nomad/nomad.(*Job).Allocations.func1(0x0, 0xc4222a8c80, 0x0, 0x0)
May 02 11:33:38 dc0-cls02 nomad[15432]:         /home/clint/go/src/github.com/hashicorp/nomad/nomad/job_endpoint.go:546 +0x7c
May 02 11:33:38 dc0-cls02 nomad[15432]: github.com/hashicorp/nomad/nomad.(*Server).blockingRPC(0xc4200811e0, 0xc421909318, 0x0, 0x0)
May 02 11:33:38 dc0-cls02 nomad[15432]:         /home/clint/go/src/github.com/hashicorp/nomad/nomad/rpc.go:389 +0x161
May 02 11:33:38 dc0-cls02 nomad[15432]: github.com/hashicorp/nomad/nomad.(*Job).Allocations(0xc420460230, 0xc4239a42d0, 0xc4207cde30, 0x0, 0x0)
May 02 11:33:38 dc0-cls02 nomad[15432]:         /home/clint/go/src/github.com/hashicorp/nomad/nomad/job_endpoint.go:571 +0x279
May 02 11:33:38 dc0-cls02 nomad[15432]: reflect.Value.call(0xc42016c6c0, 0xc420460380, 0x13, 0x113ffbc, 0x4, 0xc421909798, 0x3, 0x3, 0x199, 0xc4207cde30, ...)
May 02 11:33:38 dc0-cls02 nomad[15432]:         /usr/lib/go/src/reflect/value.go:434 +0x91f
May 02 11:33:38 dc0-cls02 nomad[15432]: reflect.Value.Call(0xc42016c6c0, 0xc420460380, 0x13, 0xc421f1d798, 0x3, 0x3, 0xc421883920, 0xc4204bd401, 0x0)
May 02 11:33:38 dc0-cls02 nomad[15432]:         /usr/lib/go/src/reflect/value.go:302 +0xa4
May 02 11:33:38 dc0-cls02 nomad[15432]: net/rpc.(*service).call(0xc42045c9c0, 0xc42045c8c0, 0xc4228d0fe0, 0xc42046d180, 0xc421883920, 0xfd9ac0, 0xc4239a42d0, 0x16, 0xf08b60, 0xc4207cde30, ...)
May 02 11:33:38 dc0-cls02 nomad[15432]:         /usr/lib/go/src/net/rpc/server.go:387 +0x144
May 02 11:33:38 dc0-cls02 nomad[15432]: net/rpc.(*Server).ServeRequest(0xc42045c8c0, 0x18cc0c0, 0xc42233b740, 0xc4207cdcb0, 0xc4204bd400)
May 02 11:33:38 dc0-cls02 nomad[15432]:         /usr/lib/go/src/net/rpc/server.go:502 +0x267
May 02 11:33:38 dc0-cls02 nomad[15432]: github.com/hashicorp/nomad/nomad.(*Server).RPC(0xc4200811e0, 0x114e810, 0xf, 0xfd9ac0, 0xc4239a4280, 0xf08b60, 0xc4207cdd70, 0xc4239a4201, 0xc4207cdd70)
May 02 11:33:38 dc0-cls02 nomad[15432]:         /home/clint/go/src/github.com/hashicorp/nomad/nomad/server.go:1021 +0xc4
May 02 11:33:38 dc0-cls02 nomad[15432]: github.com/hashicorp/nomad/command/agent.(*Agent).RPC(0xc42046ec30, 0x114e810, 0xf, 0xfd9ac0, 0xc4239a4280, 0xf08b60, 0xc4207cdd70, 0xc42051b8c0, 0x10000c4205051d0)
May 02 11:33:38 dc0-cls02 nomad[15432]:         /home/clint/go/src/github.com/hashicorp/nomad/command/agent/agent.go:628 +0x7a
May 02 11:33:38 dc0-cls02 nomad[15432]: github.com/hashicorp/nomad/command/agent.(*HTTPServer).jobAllocations(0xc420849140, 0x18ca800, 0xc4220e17a0, 0xc421299000, 0xc4207cd41c, 0x4, 0xc42233f1e0, 0xc420ac96e0, 0xc420026600, 0x18)
May 02 11:33:38 dc0-cls02 nomad[15432]:         /home/clint/go/src/github.com/hashicorp/nomad/command/agent/job_endpoint.go:192 +0x2ed
May 02 11:33:38 dc0-cls02 nomad[15432]: github.com/hashicorp/nomad/command/agent.(*HTTPServer).JobSpecificRequest(0xc420849140, 0x18ca800, 0xc4220e17a0, 0xc421299000, 0xe3c583, 0xc421f1db20, 0xc421f1db18, 0xc420ac96e0)
May 02 11:33:38 dc0-cls02 nomad[15432]:         /home/clint/go/src/github.com/hashicorp/nomad/command/agent/job_endpoint.go:50 +0x2a2
May 02 11:33:38 dc0-cls02 nomad[15432]: github.com/hashicorp/nomad/command/agent.(*HTTPServer).JobSpecificRequest-fm(0x18ca800, 0xc4220e17a0, 0xc421299000, 0xc4222a8bc0, 0x18, 0xed09a9dd2, 0x10f0461)
May 02 11:33:38 dc0-cls02 nomad[15432]:         /home/clint/go/src/github.com/hashicorp/nomad/command/agent/http.go:150 +0x48
May 02 11:33:38 dc0-cls02 nomad[15432]: github.com/hashicorp/nomad/command/agent.(*HTTPServer).wrap.func1(0x18ca800, 0xc4220e17a0, 0xc421299000)
May 02 11:33:38 dc0-cls02 nomad[15432]:         /home/clint/go/src/github.com/hashicorp/nomad/command/agent/http.go:226 +0x12e
May 02 11:33:38 dc0-cls02 nomad[15432]: net/http.HandlerFunc.ServeHTTP(0xc420232360, 0x18ca800, 0xc4220e17a0, 0xc421299000)
May 02 11:33:38 dc0-cls02 nomad[15432]:         /usr/lib/go/src/net/http/server.go:1942 +0x44
May 02 11:33:38 dc0-cls02 nomad[15432]: net/http.(*ServeMux).ServeHTTP(0xc42051b890, 0x18ca800, 0xc4220e17a0, 0xc421299000)
May 02 11:33:38 dc0-cls02 nomad[15432]:         /usr/lib/go/src/net/http/server.go:2238 +0x130
May 02 11:33:38 dc0-cls02 nomad[15432]: github.com/hashicorp/nomad/vendor/github.com/NYTimes/gziphandler.GzipHandler.func1(0x18ca800, 0xc4220e17a0, 0xc421299000)
May 02 11:33:38 dc0-cls02 nomad[15432]:         /home/clint/go/src/github.com/hashicorp/nomad/vendor/github.com/NYTimes/gziphandler/gzip.go:73 +0x2c8
May 02 11:33:38 dc0-cls02 nomad[15432]: net/http.HandlerFunc.ServeHTTP(0xc4202329a0, 0x18ca800, 0xc4220e17a0, 0xc421299000)
May 02 11:33:38 dc0-cls02 nomad[15432]:         /usr/lib/go/src/net/http/server.go:1942 +0x44
May 02 11:33:38 dc0-cls02 nomad[15432]: net/http.serverHandler.ServeHTTP(0xc420498000, 0x18ca800, 0xc4220e17a0, 0xc421299000)
May 02 11:33:38 dc0-cls02 nomad[15432]:         /usr/lib/go/src/net/http/server.go:2568 +0x92
May 02 11:33:38 dc0-cls02 nomad[15432]: net/http.(*conn).serve(0xc4201d9900, 0x18cbd00, 0xc42233b540)
May 02 11:33:38 dc0-cls02 nomad[15432]:         /usr/lib/go/src/net/http/server.go:1825 +0x612
May 02 11:33:38 dc0-cls02 nomad[15432]: created by net/http.(*Server).Serve
May 02 11:33:38 dc0-cls02 nomad[15432]:         /usr/lib/go/src/net/http/server.go:2668 +0x2ce

@dadgar
Copy link
Contributor

dadgar commented May 2, 2017

@clinta And you were running in dev mode? I am going to close the PR and open an issue.

@dadgar dadgar closed this May 2, 2017
@dadgar
Copy link
Contributor

dadgar commented May 2, 2017

#2605

@github-actions
Copy link

github-actions bot commented Apr 2, 2023

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Apr 2, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants