Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nomad 0.6.0 nomad crashes #3008

Closed
discobean opened this issue Aug 11, 2017 · 5 comments · Fixed by #3023
Closed

Nomad 0.6.0 nomad crashes #3008

discobean opened this issue Aug 11, 2017 · 5 comments · Fixed by #3023

Comments

@discobean
Copy link

discobean commented Aug 11, 2017

If you have a question, prepend your issue with [question] or preferably use the nomad mailing list.

If filing a bug please include the following:

Nomad version

0.6.0 linux

Operating system and Environment details

linux, cluster with 3 masters in quorum

Issue

Nomad servers crashed. If started 1 master Nomad would continue running. Once started a second master to achieve quorum the cluster would fail and both applications would crash.

I tried restarting with a peers.json to recover, but it seems the issue was not with finding each other, but with maybe? an invalid or broken job specification somewhere?

Reproduction steps

I'm not able to reproduce

Nomad Server logs (if appropriate)

Aug 11 05:50:47 ip-10-123-11-208 systemd[1]: Started Nomad.
Aug 11 05:50:47 ip-10-123-11-208 nomad.sh[22778]:     Loaded configuration from /etc/nomad.conf, /etc/nomad.d/advertise.json
Aug 11 05:50:47 ip-10-123-11-208 nomad.sh[22778]: ==> Starting Nomad agent...
Aug 11 05:50:48 ip-10-123-11-208 nomad.sh[22778]: panic: runtime error: index out of range
Aug 11 05:50:48 ip-10-123-11-208 nomad.sh[22778]: goroutine 1 [running]:
Aug 11 05:50:48 ip-10-123-11-208 nomad.sh[22778]: github.com/hashicorp/nomad/scheduler.bitmapFrom(0xc42048a900, 0x18, 0x0, 0x0, 0x0)
Aug 11 05:50:48 ip-10-123-11-208 nomad.sh[22778]:         /opt/gopath/src/github.com/hashicorp/nomad/scheduler/reconcile_util.go:297 +0x263
Aug 11 05:50:48 ip-10-123-11-208 nomad.sh[22778]: github.com/hashicorp/nomad/scheduler.newAllocNameIndex(0xc4206a4b80, 0x18, 0xc4204dbb80, 0x3, 0x18, 0xc42048a900, 0xc9d251)
Aug 11 05:50:48 ip-10-123-11-208 nomad.sh[22778]:         /opt/gopath/src/github.com/hashicorp/nomad/scheduler/reconcile_util.go:259 +0x3c
Aug 11 05:50:48 ip-10-123-11-208 nomad.sh[22778]: github.com/hashicorp/nomad/scheduler.(*allocReconciler).computeGroup(0xc42011d420, 0xc4204dbb80, 0x3, 0xc42048a7e0, 0xc42048a7b0)
Aug 11 05:50:48 ip-10-123-11-208 nomad.sh[22778]:         /opt/gopath/src/github.com/hashicorp/nomad/scheduler/reconcile.go:306 +0x23b
Aug 11 05:50:48 ip-10-123-11-208 nomad.sh[22778]: github.com/hashicorp/nomad/scheduler.(*allocReconciler).Compute(0xc42011d420, 0xc42067f3c0)
Aug 11 05:50:48 ip-10-123-11-208 nomad.sh[22778]:         /opt/gopath/src/github.com/hashicorp/nomad/scheduler/reconcile.go:169 +0x1f1
Aug 11 05:50:48 ip-10-123-11-208 nomad.sh[22778]: github.com/hashicorp/nomad/scheduler.(*GenericScheduler).computeJobAllocs(0xc4204c62d0, 0xc42066f800, 0xc42067eb00)
Aug 11 05:50:48 ip-10-123-11-208 nomad.sh[22778]:         /opt/gopath/src/github.com/hashicorp/nomad/scheduler/generic_sched.go:391 +0x3cc
Aug 11 05:50:48 ip-10-123-11-208 nomad.sh[22778]: github.com/hashicorp/nomad/scheduler.(*GenericScheduler).process(0xc4204c62d0, 0x24, 0x24, 0xc420690a58)
Aug 11 05:50:48 ip-10-123-11-208 nomad.sh[22778]:         /opt/gopath/src/github.com/hashicorp/nomad/scheduler/generic_sched.go:230 +0x53e
Aug 11 05:50:48 ip-10-123-11-208 nomad.sh[22778]: github.com/hashicorp/nomad/scheduler.(*GenericScheduler).(github.com/hashicorp/nomad/scheduler.process)-fm(0xc4202607e0, 0xc420690a58, 0x4c421e)
Aug 11 05:50:48 ip-10-123-11-208 nomad.sh[22778]:         /opt/gopath/src/github.com/hashicorp/nomad/scheduler/generic_sched.go:131 +0x2a
Aug 11 05:50:48 ip-10-123-11-208 nomad.sh[22778]: github.com/hashicorp/nomad/scheduler.retryMax(0x5, 0xc420690bb0, 0xc420690bc0, 0xc, 0xffffffffffffff01)
Aug 11 05:50:48 ip-10-123-11-208 nomad.sh[22778]:         /opt/gopath/src/github.com/hashicorp/nomad/scheduler/util.go:268 +0x42
Aug 11 05:50:48 ip-10-123-11-208 nomad.sh[22778]: github.com/hashicorp/nomad/scheduler.(*GenericScheduler).Process(0xc4204c62d0, 0xc420261320, 0xc420013cc0, 0x1cea660)
Aug 11 05:50:48 ip-10-123-11-208 nomad.sh[22778]:         /opt/gopath/src/github.com/hashicorp/nomad/scheduler/generic_sched.go:131 +0x2a0
Aug 11 05:50:48 ip-10-123-11-208 nomad.sh[22778]: github.com/hashicorp/nomad/nomad.(*nomadFSM).reconcileQueuedAllocations(0xc420417560, 0x8e2a, 0x0, 0x0)
Aug 11 05:50:48 ip-10-123-11-208 nomad.sh[22778]:         /opt/gopath/src/github.com/hashicorp/nomad/nomad/fsm.go:919 +0x3e5
Aug 11 05:50:48 ip-10-123-11-208 nomad.sh[22778]: github.com/hashicorp/nomad/nomad.(*nomadFSM).applyReconcileSummaries(0xc420417560, 0xc4201ccfc1, 0x8, 0x8, 0x8e2a, 0x0, 0x117fb20)
Aug 11 05:50:48 ip-10-123-11-208 nomad.sh[22778]:         /opt/gopath/src/github.com/hashicorp/nomad/nomad/fsm.go:526 +0x7e
Aug 11 05:50:48 ip-10-123-11-208 nomad.sh[22778]: github.com/hashicorp/nomad/nomad.(*nomadFSM).Apply(0xc420417560, 0xc4202dafc0, 0xc4202dafc0, 0x0)
Aug 11 05:50:48 ip-10-123-11-208 nomad.sh[22778]:         /opt/gopath/src/github.com/hashicorp/nomad/nomad/fsm.go:153 +0x6b3
Aug 11 05:50:48 ip-10-123-11-208 nomad.sh[22778]: github.com/hashicorp/nomad/vendor/github.com/hashicorp/raft.RecoverCluster(0xc4200e25a0, 0x1ce1c80, 0xc420417560, 0x1ce9940, 0xc4201c9a40, 0x1ce5380, 0xc4201dc060, 0x1ce2bc0, 0xc4
Aug 11 05:50:48 ip-10-123-11-208 nomad.sh[22778]:         /opt/gopath/src/github.com/hashicorp/nomad/vendor/github.com/hashicorp/raft/api.go:313 +0x4ef
Aug 11 05:50:48 ip-10-123-11-208 nomad.sh[22778]: github.com/hashicorp/nomad/nomad.(*Server).setupRaft(0xc420001a00, 0x0, 0x0)
Aug 11 05:50:48 ip-10-123-11-208 nomad.sh[22778]:         /opt/gopath/src/github.com/hashicorp/nomad/nomad/server.go:878 +0x11bc
Aug 11 05:50:48 ip-10-123-11-208 nomad.sh[22778]: github.com/hashicorp/nomad/nomad.NewServer(0xc42020c300, 0x1cddd40, 0xc42000e1c0, 0xc4203ffe00, 0xc4204bd758, 0xc4204bd760, 0xc4204bd750)
Aug 11 05:50:48 ip-10-123-11-208 nomad.sh[22778]:         /opt/gopath/src/github.com/hashicorp/nomad/nomad/server.go:263 +0xe0e
Aug 11 05:50:48 ip-10-123-11-208 nomad.sh[22778]: github.com/hashicorp/nomad/command/agent.(*Agent).setupServer(0xc4203f62a0, 0xc4200c0f00, 0x0)
Aug 11 05:50:48 ip-10-123-11-208 nomad.sh[22778]:         /opt/gopath/src/github.com/hashicorp/nomad/command/agent/agent.go:362 +0x1fb
Aug 11 05:50:48 ip-10-123-11-208 nomad.sh[22778]: github.com/hashicorp/nomad/command/agent.NewAgent(0xc42034d760, 0x1cd9540, 0xc4200b2b40, 0x0, 0x0, 0x0)
Aug 11 05:50:48 ip-10-123-11-208 nomad.sh[22778]:         /opt/gopath/src/github.com/hashicorp/nomad/command/agent/agent.go:83 +0x1fb
Aug 11 05:50:48 ip-10-123-11-208 nomad.sh[22778]: github.com/hashicorp/nomad/command/agent.(*Command).setupAgent(0xc42039f9a0, 0xc42034d760, 0x1cd9540, 0xc4200b2b40, 0x1, 0xc4203ffc70)
Aug 11 05:50:48 ip-10-123-11-208 nomad.sh[22778]:         /opt/gopath/src/github.com/hashicorp/nomad/command/agent/command.go:339 +0x89
Aug 11 05:50:48 ip-10-123-11-208 nomad.sh[22778]: github.com/hashicorp/nomad/command/agent.(*Command).Run(0xc42039f9a0, 0xc4200101a0, 0x2, 0x2, 0x0)
Aug 11 05:50:48 ip-10-123-11-208 nomad.sh[22778]:         /opt/gopath/src/github.com/hashicorp/nomad/command/agent/command.go:461 +0x3ec
Aug 11 05:50:48 ip-10-123-11-208 nomad.sh[22778]: github.com/hashicorp/nomad/vendor/github.com/mitchellh/cli.(*CLI).Run(0xc4203d4000, 0xc4203d4000, 0x29, 0x147d6b0)
Aug 11 05:50:48 ip-10-123-11-208 nomad.sh[22778]:         /opt/gopath/src/github.com/hashicorp/nomad/vendor/github.com/mitchellh/cli/cli.go:235 +0x2d1
Aug 11 05:50:48 ip-10-123-11-208 nomad.sh[22778]: main.RunCustom(0xc420010190, 0x3, 0x3, 0xc420383aa0, 0x0)
Aug 11 05:50:48 ip-10-123-11-208 nomad.sh[22778]:         /opt/gopath/src/github.com/hashicorp/nomad/main.go:53 +0xed6
Aug 11 05:50:48 ip-10-123-11-208 nomad.sh[22778]: main.Run(0xc420010190, 0x3, 0x3, 0xc4200001a0)
Aug 11 05:50:48 ip-10-123-11-208 nomad.sh[22778]:         /opt/gopath/src/github.com/hashicorp/nomad/main.go:23 +0x56
Aug 11 05:50:48 ip-10-123-11-208 nomad.sh[22778]: main.main()
Aug 11 05:50:48 ip-10-123-11-208 nomad.sh[22778]:         /opt/gopath/src/github.com/hashicorp/nomad/main.go:19 +0x64
Aug 11 05:50:48 ip-10-123-11-208 systemd[1]: nomad.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
Aug 11 05:50:48 ip-10-123-11-208 systemd[1]: nomad.service: Unit entered failed state.
Aug 11 05:50:48 ip-10-123-11-208 systemd[1]: nomad.service: Failed with result 'exit-code'.

Nomad Client logs (if appropriate)

Job file (if appropriate)

N/A

@dadgar
Copy link
Contributor

dadgar commented Aug 11, 2017

Hey sorry you ran into this. Where you updating from an older version of Nomad? What was the state of the previous cluster?

@discobean
Copy link
Author

discobean commented Aug 11, 2017

Hi, I did upgrade from 0.5.5, I went to 0.6.0rc2 but that was sometime ago and it seemed stable, then the crash happened, I then tried upgrading to 0.6.0 but the same issue continued. I ended up starting from scratch and resubmitting all old jobs.

dadgar added a commit that referenced this issue Aug 12, 2017
This PR fixes an allignment calculation when determining the bitmap
size.

Fixes #3008
@discobean
Copy link
Author

discobean commented Aug 25, 2017

We ran into this again, and can confirm current 0.6.1-dev brought the servers back into quorum, thanks so much!

@dadgar
Copy link
Contributor

dadgar commented Aug 25, 2017

@discobean Sorry you ran into it again :( 0.6.1 will be out soon!

@github-actions
Copy link

github-actions bot commented Dec 8, 2022

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Dec 8, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants