Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix panic while deleting CSI plugins for missing job #7758

Merged
merged 1 commit into from
Apr 20, 2020
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 1 addition & 2 deletions nomad/state/state_store.go
Original file line number Diff line number Diff line change
Expand Up @@ -1187,15 +1187,14 @@ func (s *StateStore) deleteJobFromPlugin(index uint64, txn *memdb.Txn, job *stru
plugins := map[string]*structs.CSIPlugin{}

for _, a := range allocs {
tg := job.LookupTaskGroup(a.TaskGroup)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @michaeldwan I think the error we made here is that this line should have read:

tg := a.job.LookupTaskGroup(a.TaskGroup)

It seemed strange to me that a job can not have the taskgroup for one of its own allocs, but my colleague @notnoop has pointed out to me that the alloc can have a view of the job that's stale relative to the job we've pulled from the state store, and that's why we're running into this panic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's roughly what we think happened too. We were actually working on a network issue that impacted consul but nomad seemed healthy through it.

I can make that change here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gah, I typoed my example. It's a.Job (capital letter) 😊

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just made the change. There's a few other places LookupTaskGroup is being called but they appear to get both the job and alloc from the same snapshot and should be okay.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made the same typo and pushed before waiting for tests in vagrant to run :)

// if its nil, we can just panic
tg := a.Job.LookupTaskGroup(a.TaskGroup)
for _, t := range tg.Tasks {
if t.CSIPluginConfig != nil {
plugAllocs = append(plugAllocs, &pair{
pluginID: t.CSIPluginConfig.ID,
alloc: a,
})

}
}
}
Expand Down