Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allocation fails with a panic #5935

Closed
denysvitali opened this issue Jul 7, 2019 · 6 comments
Closed

Allocation fails with a panic #5935

denysvitali opened this issue Jul 7, 2019 · 6 comments

Comments

@denysvitali
Copy link

Nomad version

Nomad v0.9.2 (028326684b9da489e0371247a223ef3ae4755d87)

Operating system and Environment details

CentOS Linux release 7.6.1810 (Core)

Issue

After a job is sent to nomad (happened randomly today), nomad segfaults and returns the following stacktrace:

Jul 07 15:08:24 ded1 nomad[15292]: panic: runtime error: invalid memory address or nil pointer dereference
Jul 07 15:08:24 ded1 nomad[15292]: [signal SIGSEGV: segmentation violation code=0x1 addr=0x50 pc=0x113437a]
Jul 07 15:08:24 ded1 nomad[15292]: goroutine 48816937 [running]:
Jul 07 15:08:24 ded1 nomad[15292]: github.com/hashicorp/nomad/client/allocrunner.(*allocRunner).clientAlloc(0xc000c1a180, 0xc00241daa0, 0x0)
Jul 07 15:08:24 ded1 nomad[15292]: /opt/gopath/src/github.com/hashicorp/nomad/client/allocrunner/alloc_runner.go:574 +0x24a
Jul 07 15:08:24 ded1 nomad[15292]: github.com/hashicorp/nomad/client/allocrunner.(*allocRunner).handleTaskStateUpdates(0xc000c1a180)
Jul 07 15:08:25 ded1 nomad[15292]: /opt/gopath/src/github.com/hashicorp/nomad/client/allocrunner/alloc_runner.go:471 +0x66b
Jul 07 15:08:25 ded1 nomad[15292]: created by github.com/hashicorp/nomad/client/allocrunner.(*allocRunner).Run
Jul 07 15:08:25 ded1 nomad[15292]: /opt/gopath/src/github.com/hashicorp/nomad/client/allocrunner/alloc_runner.go:239 +0x71

Reproduction steps

Happens randomly

Job file (if appropriate)

I'll provide it in the following hours

@nickethier
Copy link
Member

Hey @denysvetali thanks for reporting this. Could you try this against 0.9.3 to see if you experience the same issue?

@denysvitali
Copy link
Author

It happened randomly a couple of seconds ago. I'm upgrading to 0.9.3 as we speek. It might be an edge case, or something that popped up randomly for whatever reason (therefore it might also not be reproducible)

@notnoop
Copy link
Contributor

notnoop commented Jul 8, 2019

@denysvitali I believe this was fixed in 0.9.3 as part of #5805 . Nomad 0.9.2 clients panic when an operator renames a TaskGroup name of a running job. Renaming TaskGroups isn't very common or predictable operation so it makes it appear sporadic. Please inspect the logs and job updates to confirm this hypothesis, or check how your cluster fair after upgrading to 0.9.3 and let us know! Thanks.

@denysvitali
Copy link
Author

Yeah, that seems to be the case. Now that I recall my actions I actually removed a task group (commented it out) while the job was running. I've updated to Nomad 0.9.3 now, I'll try again in the next couple of days to see if it works again. Thank you!

@preetapan
Copy link
Contributor

Closing this based on the comments above

@github-actions
Copy link

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 21, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants