Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tr: Fetch Wait channel before killTask in restart #5889

Merged
merged 1 commit into from
Jun 26, 2019

Conversation

endocrimes
Copy link
Contributor

Currently, if killTask results in the termination of a process before
calling WaitTask, Restart() will incorrectly return a TaskNotFound
error when using the raw_exec driver on Windows.

This fetches the WaitCh before killing the process to avoid this race
condition.

Currently, if killTask results in the termination of a process before
calling WaitTask, Restart() will incorrectly return a TaskNotFound
error when using the raw_exec driver on Windows.
Copy link
Member

@schmichael schmichael left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch. Should kill have similar treatment or doesn't it matter since kill always transitions to a terminal state?

@endocrimes
Copy link
Contributor Author

I’m debugging a similar issue during executor shutdown right now which might end up leading to something similar, but not fully sure yet.

@endocrimes endocrimes merged commit 520cd90 into master Jun 26, 2019
@endocrimes endocrimes deleted the dani/b-task-restart branch June 26, 2019 14:18
@notnoop
Copy link
Contributor

notnoop commented Jun 27, 2019

Good catch! Is this something we can test? Feels like a subtle thing that we might miss with enough refactoring.

Also, I'm very confused how the ordering causes TaskNotFound error, mind if you elaborate on how it gets to that state? In raw_exec, Seeing handle.Wait() only returns TaskNotFound if task was removed from d.tasks [1], which seems to only happen in driver.DestroyTask()[2]. Was there another place where task was deleted/destroyed?

[1]

func (d *Driver) WaitTask(ctx context.Context, taskID string) (<-chan *drivers.ExitResult, error) {
handle, ok := d.tasks.Get(taskID)
if !ok {
return nil, drivers.ErrTaskNotFound
}

[2]
d.tasks.Delete(taskID)

@endocrimes
Copy link
Contributor Author

@notnoop DestroyTask gets called as part of terminating a tasks Run loop here: https://github.com/hashicorp/nomad/blob/dani/win-killer/client/allocrunner/taskrunner/task_runner.go#L482

testing this reliably seems pretty hard right now, because it's very much integration test-y and would need a full driver matrix - might be dooable when we invest in more test infra soon though.

@notnoop
Copy link
Contributor

notnoop commented Jun 27, 2019

Ah - makes sense - I missed that call - thanks for the clarification!

@github-actions
Copy link

github-actions bot commented Feb 7, 2023

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Feb 7, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants