Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

panic in volume watcher #15095

Closed
tgross opened this issue Nov 1, 2022 · 3 comments · Fixed by #15101
Closed

panic in volume watcher #15095

tgross opened this issue Nov 1, 2022 · 3 comments · Fixed by #15101

Comments

@tgross
Copy link
Member

tgross commented Nov 1, 2022

Reported through our support org:

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x110 pc=0x1d647f3]

goroutine 22850374 [running]:
github.com/hashicorp/nomad/nomad/volumewatcher.(*volumeWatcher).volumeReapImpl(0x0?, 0x0)
    github.com/hashicorp/nomad/nomad/volumewatcher/volume_watcher.go:208 +0x33
github.com/hashicorp/nomad/nomad/volumewatcher.(*volumeWatcher).volumeReap(0xc002c84f00, 0xc0020ea1a0?)
    github.com/hashicorp/nomad/nomad/volumewatcher/volume_watcher.go:193 +0x58
github.com/hashicorp/nomad/nomad/volumewatcher.(*volumeWatcher).watch(0xc002c84f00)
    github.com/hashicorp/nomad/nomad/volumewatcher/volume_watcher.go:121 +0x5f
created by github.com/hashicorp/nomad/nomad/volumewatcher.(*volumeWatcher).Start
    github.com/hashicorp/nomad/nomad/volumewatcher/volume_watcher.go:89 +0x125

This is bubbling up from volume_watcher.go#L208, which suggests the volume is nil here. A quick look at the code shows we're missing a nil-check at volume_watcher.go#L120-L121

@tgross tgross self-assigned this Nov 1, 2022
@tgross tgross added this to the 1.4.3 milestone Nov 1, 2022
@tgross
Copy link
Member Author

tgross commented Nov 1, 2022

I've got a patch worked up for this that includes a test that shows we can hit this if the volumewatcher restarts its goroutine just as a delete happens. For example, if a GC claim is written and as a result the volume is deleted before the volumewatcher can enter the loop in the run function, it'll panic. There's a race condition here around shutting down the volumewatcher goroutine I need to solve before I can get that fix up though.

@tgross
Copy link
Member Author

tgross commented Nov 1, 2022

#15101 has been merged and will ship in the next release of Nomad, with backports to the 1.3.x and 1.2.x series.

@github-actions
Copy link

github-actions bot commented Mar 2, 2023

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 2, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant