Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SEGFAULT on nomad status #2918

Closed
hynek opened this issue Jul 27, 2017 · 7 comments
Closed

SEGFAULT on nomad status #2918

hynek opened this issue Jul 27, 2017 · 7 comments

Comments

@hynek
Copy link
Contributor

hynek commented Jul 27, 2017

Nomad version

Remote CLI client: Nomad v0.6.0
Cluster servers and clients: Nomad v0.5.6

Operating system and Environment details

macOS 10.12.6, nomad installed by homebrew
Cluster running on Xenial

Issue

Nomad status crashes:

$ nomad status cdn
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x4b7f3d7]

goroutine 1 [running]:
github.com/hashicorp/nomad/command.(*StatusCommand).Run(0xc420408510, 0xc420010260, 0x1, 0x1, 0xc420408000)
	/private/tmp/nomad-20170727-88530-1ke9jy6/nomad-0.6.0/src/github.com/hashicorp/nomad/command/status.go:141 +0x927
github.com/hashicorp/nomad/vendor/github.com/mitchellh/cli.(*CLI).Run(0xc42040a000, 0xc42040a000, 0x29, 0x4eff470)
	/private/tmp/nomad-20170727-88530-1ke9jy6/nomad-0.6.0/src/github.com/hashicorp/nomad/vendor/github.com/mitchellh/cli/cli.go:235 +0x2d1
main.RunCustom(0xc420010250, 0x2, 0x2, 0xc4203b5620, 0x0)
	/private/tmp/nomad-20170727-88530-1ke9jy6/nomad-0.6.0/src/github.com/hashicorp/nomad/main.go:53 +0xed6
main.Run(0xc420010250, 0x2, 0x2, 0xc4200001a0)
	/private/tmp/nomad-20170727-88530-1ke9jy6/nomad-0.6.0/src/github.com/hashicorp/nomad/main.go:23 +0x56
main.main()
	/private/tmp/nomad-20170727-88530-1ke9jy6/nomad-0.6.0/src/github.com/hashicorp/nomad/main.go:19 +0x64

Reproduction steps

Have a 0.5.6 cluster running and try to ask for a job status with the 0.6.0 client.

I don’t know if there’s anything special about that combination but for hopefully understandable reasons, I’m very reluctant to update the cluster itself. :)

@lovwal
Copy link

lovwal commented Jul 27, 2017

Incompatibility between the 0.6 client and 0.5.6 server for the status command.
In 0.6 the field SubmitTime was introduced into the Job struct (commit 3935656). Your client tries to reference this field in the response from the server, which is nil as it is not included in the response.

@shantanugadgil
Copy link
Contributor

@hynek I believe the documented method is to update the servers first.
same for Consul as well!

@dadgar
Copy link
Contributor

dadgar commented Jul 27, 2017

Hey,

As others have mentioned, you should update your servers, then your clients. That avoids incompatibilities like this. Check out the upgrade guide for more info: https://www.nomadproject.io/docs/upgrade/index.html

@dadgar dadgar closed this as completed Jul 27, 2017
@hynek
Copy link
Contributor Author

hynek commented Jul 27, 2017

OK I think I miswrote because I tapped into the trap of Nomad’s (IMHO unfortunate) nomenclature of “client = cluster node”.

When I wrote client, I meant the CLI client on my desktop, that got updated under my butt by homebrew that I use to query the cluster occasionally. It doesn’t get mentioned in the guide and I’m sure I’m not the only one and would expect more people be burned by that. In any case a SEGFAULT seems an odd backward-compatibility behavior…

@hynek hynek changed the title SEFAULT on nomad status SEGFAULT on nomad status Jul 27, 2017
@shantanugadgil
Copy link
Contributor

I agree with @hynek ; a SEGFAULT does make things look bad. A cleaner exit would be better I think!

@dadgar
Copy link
Contributor

dadgar commented Jul 28, 2017

@hynek Sorry you ran into this. The reason for the segfault was that the CLI was operating on a newer version of the API structs than was returned by the server because of the version mismatch.

As such the CLI dereferenced a pointer that should never be nil in 0.6.0. However this isn't to say that a segfault is an acceptable exit code. In the future we may want to detect the version of the CLI and the Server version and exit early if there are known incompatibilities to avoid issues like this.

Sorry if my original message to you came across dismissive. Bug reports like this are great because it gets us thinking about improvements to the product. Hopefully in future versions we will be able to detect these cases and print a nice error 😄

Thanks,
Alex

@dadgar dadgar reopened this Jul 28, 2017
dadgar added a commit that referenced this issue Jul 28, 2017
This PR goes through the CLI commands and ensures that a 0.6.X cli
gracefully handles interacting with a 0.5.X Nomad Agent.

Fixes #2918
@github-actions
Copy link

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Dec 11, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants