Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Broken CSI plugin - panic querying volume #8730

Closed
urjitbhatia opened this issue Aug 25, 2020 · 3 comments · Fixed by #8735
Closed

Broken CSI plugin - panic querying volume #8730

urjitbhatia opened this issue Aug 25, 2020 · 3 comments · Fixed by #8735
Assignees
Labels
stage/accepted Confirmed, and intend to work on. No timeline committment though. theme/crash theme/storage type/bug
Milestone

Comments

@urjitbhatia
Copy link

Nomad version

Output from nomad version
Nomad v0.12.3 (2db8abd9620dd41cb7bfe399551ba0f7824b3f61)

Operating system and Environment details

Linux 4.14.186-146.268.amzn2.x86_64 #1 SMP Tue Jul 14 18:16:52 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

Issue

Newly registered volume wasn't available for the job and trying to use the UI to see the storage details results in a panic log. Can't de-register the volume either. We have some previously stuck volumes that we hoped would be fixed once we moved to 0.12.3 but they're still hanging around too claiming that all node allocations are used.

Previous issue: #8683

Reproduction steps

Job file (if appropriate)

Nomad Client logs (if appropriate)

Nomad Server logs (if appropriate)

Aug 25 01:19:00 ip-10-2-1-251.us-east-2.compute.internal nomad[4547]: 2020-08-25T01:19:00.732Z [ERROR] http: http: panic serving 10.2.0.183:39848: runtime error: invalid memory address or nil pointer dereference
Aug 25 01:19:00 ip-10-2-1-251.us-east-2.compute.internal nomad[4547]: goroutine 88359 [running]:
Aug 25 01:19:00 ip-10-2-1-251.us-east-2.compute.internal nomad[4547]: net/http.(*conn).serve.func1(0xc0005ee640)
Aug 25 01:19:00 ip-10-2-1-251.us-east-2.compute.internal nomad[4547]: net/http/server.go:1800 +0x139
Aug 25 01:19:00 ip-10-2-1-251.us-east-2.compute.internal nomad[4547]: panic(0x2b7e5c0, 0x533d310)
Aug 25 01:19:00 ip-10-2-1-251.us-east-2.compute.internal nomad[4547]: runtime/panic.go:975 +0x3e3
Aug 25 01:19:00 ip-10-2-1-251.us-east-2.compute.internal nomad[4547]: github.com/hashicorp/nomad/nomad/structs.(*Allocation).Stub(...)
Aug 25 01:19:00 ip-10-2-1-251.us-east-2.compute.internal nomad[4547]: github.com/hashicorp/nomad/nomad/structs/structs.go:9019
Aug 25 01:19:00 ip-10-2-1-251.us-east-2.compute.internal nomad[4547]: github.com/hashicorp/nomad/command/agent.structsCSIVolumeToApi(0xc000df71e0, 0xc00114bf20)
Aug 25 01:19:00 ip-10-2-1-251.us-east-2.compute.internal nomad[4547]: github.com/hashicorp/nomad/command/agent/csi_endpoint.go:344 +0x508
Aug 25 01:19:00 ip-10-2-1-251.us-east-2.compute.internal nomad[4547]: github.com/hashicorp/nomad/command/agent.(*HTTPServer).csiVolumeGet(0xc00030a2d0, 0xc0013965e3, 0xe, 0x38eb080, 0xc00114bf20, 0xc000548a00, 0xc0007ac440, 0x1, 0x1, 0xc00065d970)
Aug 25 01:19:00 ip-10-2-1-251.us-east-2.compute.internal nomad[4547]: github.com/hashicorp/nomad/command/agent/csi_endpoint.go:108 +0x186
Aug 25 01:19:00 ip-10-2-1-251.us-east-2.compute.internal nomad[4547]: github.com/hashicorp/nomad/command/agent.(*HTTPServer).CSIVolumeSpecificRequest(0xc00030a2d0, 0x38eb080, 0xc00114bf20, 0xc000548a00, 0x4d78e6, 0x5f446704, 0x2ba5af9f, 0x227efcd5e4d)
Aug 25 01:19:00 ip-10-2-1-251.us-east-2.compute.internal nomad[4547]: github.com/hashicorp/nomad/command/agent/csi_endpoint.go:67 +0x211
Aug 25 01:19:00 ip-10-2-1-251.us-east-2.compute.internal nomad[4547]: github.com/hashicorp/nomad/command/agent.(*HTTPServer).wrap.func1(0x38eb080, 0xc00114bf20, 0xc000548a00)
Aug 25 01:19:00 ip-10-2-1-251.us-east-2.compute.internal nomad[4547]: github.com/hashicorp/nomad/command/agent/http.go:448 +0x176
Aug 25 01:19:00 ip-10-2-1-251.us-east-2.compute.internal nomad[4547]: net/http.HandlerFunc.ServeHTTP(0xc00046dd80, 0x38eb080, 0xc00114bf20, 0xc000548a00)
Aug 25 01:19:00 ip-10-2-1-251.us-east-2.compute.internal nomad[4547]: net/http/server.go:2041 +0x44
Aug 25 01:19:00 ip-10-2-1-251.us-east-2.compute.internal nomad[4547]: net/http.(*ServeMux).ServeHTTP(0xc0001445c0, 0x38eb080, 0xc00114bf20, 0xc000548a00)
Aug 25 01:19:00 ip-10-2-1-251.us-east-2.compute.internal nomad[4547]: net/http/server.go:2416 +0x1a5
Aug 25 01:19:00 ip-10-2-1-251.us-east-2.compute.internal nomad[4547]: github.com/NYTimes/gziphandler.GzipHandlerWithOpts.func1.1(0x38f5100, 0xc0001c7ea0, 0xc000548a00)
Aug 25 01:19:00 ip-10-2-1-251.us-east-2.compute.internal nomad[4547]: github.com/NYTimes/gziphandler@v1.0.1/gzip.go:277 +0x1e6
Aug 25 01:19:00 ip-10-2-1-251.us-east-2.compute.internal nomad[4547]: net/http.HandlerFunc.ServeHTTP(0xc0001a9d10, 0x38f5100, 0xc0001c7ea0, 0xc000548a00)
Aug 25 01:19:00 ip-10-2-1-251.us-east-2.compute.internal nomad[4547]: net/http/server.go:2041 +0x44
Aug 25 01:19:00 ip-10-2-1-251.us-east-2.compute.internal nomad[4547]: net/http.serverHandler.ServeHTTP(0xc00027c1c0, 0x38f5100, 0xc0001c7ea0, 0xc000548a00)
Aug 25 01:19:00 ip-10-2-1-251.us-east-2.compute.internal nomad[4547]: net/http/server.go:2836 +0xa3
Aug 25 01:19:00 ip-10-2-1-251.us-east-2.compute.internal nomad[4547]: net/http.(*conn).serve(0xc0005ee640, 0x3907000, 0xc000cf0a80)
Aug 25 01:19:00 ip-10-2-1-251.us-east-2.compute.internal nomad[4547]: net/http/server.go:1924 +0x86c
Aug 25 01:19:00 ip-10-2-1-251.us-east-2.compute.internal nomad[4547]: created by net/http.(*Server).Serve
Aug 25 01:19:00 ip-10-2-1-251.us-east-2.compute.internal nomad[4547]: net/http/server.go:2962 +0x35c
@tgross
Copy link
Member

tgross commented Aug 25, 2020

Hi @urjitbhatia! Sorry to hear about this. Looks like the panic is coming from here: https://github.com/hashicorp/nomad/blob/v0.12.3/command/agent/csi_endpoint.go#L344 where the allocation is nil. We'll get right on a fix for this.

@tgross
Copy link
Member

tgross commented Aug 25, 2020

@urjitbhatia I've opened #8735 to fix the bug, which we'll ship in 0.12.4 shortly. I've also opened #8734 to look into whether we can eliminate this kind of bug entirely from the code base.

@tgross tgross added this to the 0.12.4 milestone Aug 25, 2020
@tgross tgross added the stage/accepted Confirmed, and intend to work on. No timeline committment though. label Aug 25, 2020
@tgross tgross self-assigned this Aug 25, 2020
@github-actions
Copy link

github-actions bot commented Nov 2, 2022

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 2, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
stage/accepted Confirmed, and intend to work on. No timeline committment though. theme/crash theme/storage type/bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants