Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

template: protect use of template manager with a lock #15192

Merged
merged 1 commit into from
Nov 10, 2022

Conversation

shoenig
Copy link
Member

@shoenig shoenig commented Nov 9, 2022

This PR protects access to templateHook.templateManager with its lock. So
far we have not been able to reproduce the panic - but it seems either Poststart
is running without a Prestart being run first (should be impossible), or the
Update hook is running concurrently with Poststart, nil-ing out the templateManager
in a race with Poststart.

Fixes #15189

Backport only to 1.4.x and 1.3.x

@shoenig shoenig force-pushed the b-template-hook-manager-panic branch from 2eb3293 to 81e27ae Compare November 9, 2022 15:46
@shoenig shoenig changed the title [no ci] template: protect use of template manager with a lock template: protect use of template manager with a lock Nov 9, 2022
@shoenig shoenig marked this pull request as ready for review November 9, 2022 16:38
Copy link
Contributor

@lgfa29 lgfa29 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments, but nothing major. I also think this needs to be backported to 1.3.x since that's when we introduced change_mode = "script".

Comment on lines 177 to 182
// just quit if we are being killed anyway
select {
case <-ctx.Done():
return nil
default:
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this leak resources like the templateManager? Or is the expectation that the Stop hook is responsible for any clean-up (in which case it may be good to document this expectation in the comment)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm yeah, actually this is unrelated to the bug fix at hand so I'll just eliminate it

Comment on lines 119 to 122
h.managerLock.Lock()
defer h.managerLock.Unlock()

if req.DriverExec != nil && h.templateManager != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like the root cause was the missing lock acquisition?

Update hooks run concurrently with others and so if an update happens while the Poststart hook is running the SetDriverHandle call may happen between these two lines, causing h.templateManager to be nil.

Apart from this I'm not sure how else templateManager could be nil, should we just no-op like in Update?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we just no-op like [in Update]

I think so, yeah - as in lets check & exit on nil before entering this if/else.

This PR protects access to `templateHook.templateManager` with its lock. So
far we have not been able to reproduce the panic - but it seems either Poststart
is running without a Prestart being run first (should be impossible), or the
Update hook is running concurrently with Poststart, nil-ing out the templateManager
in a race with Poststart.

Fixes #15189
@github-actions
Copy link

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 11, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
backport/1.3.x backport to 1.3.x release line backport/1.4.x backport to 1.4.x release line
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Nomad 1.4.2 - panic: runtime error: invalid memory address or nil pointer dereference
2 participants