Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

std::process::Child::kill should not fail if process is terminated on Windows #112423

Closed
stepancheg opened this issue Jun 8, 2023 · 6 comments · Fixed by #112594
Closed

std::process::Child::kill should not fail if process is terminated on Windows #112423

stepancheg opened this issue Jun 8, 2023 · 6 comments · Fixed by #112594
Labels
C-bug Category: This is a bug. O-windows Operating system: Windows T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@stepancheg
Copy link
Contributor

Child::kill calls kill on Unix, and TerminateProcess on Windows.

If process is terminated, but handle is alive yet, kill is successful on Unix, but fails on Windows.

kill is successful on Unix because the process is zombie until waited (it fails after process waited, that's probably OK)

According to documentation of TerminateProcess

After a process has terminated, call to TerminateProcess with open handles to the process fails with ERROR_ACCESS_DENIED (5) error code

So if someone else (not us) terminated the process, kill would fail.

There are problems with this behavior:

  • API is different between Unix and Windows
  • API to reliably terminate the process is complicated
    • call kill
    • if kill failed, need to try_wait. And if try_wait is successful, ignore the error of kill

I could reproduce it. I don't have easy access to Windows machine to create completely isolated test case, but repro looks like this:

let mut command = std::process::Command::new("powershell");
command.args(["-c", "Start-Sleep -Seconds 10000"]);
let mut child = command.spawn().unwrap();
let pid = child.id().try_into().unwrap();

// Terminate by process id, NOT by our process handle. Like if some other process killed it.
// This function calls `OpenProcess(pid)` followed by `TerminateProcess`.
kill(pid).unwrap();

thread::sleep(Duration::from_secs(5));

child.kill().expect("Failed to kill child process");

This fails with:

Failed to kill child process: Os { code: 5, kind: PermissionDenied, message: "Access is denied." }

exactly as described in WinAPI docs.

I think proper kill implementation should look like this:

  • if process has been waited already, fail explicitly, to be consistent with Unix
  • call TerminateProcess
  • if TerminateProcess failed, call GetExitCodeProcess
  • if GetExitCodeProcess is successful and exit code is not STILL_ACTIVE, consider kill successful

This is how kill is implemented in our project:

https://github.com/facebook/buck2/blob/b6fb1364e1caf6dac821767b27bda65726e38895/app/buck2_wrapper_common/src/winapi_process.rs#L58-L72

Meta

rustc --version --verbose:

rustc 1.70.0-nightly

@stepancheg stepancheg added the C-bug Category: This is a bug. label Jun 8, 2023
@Noratrieb Noratrieb added T-libs Relevant to the library team, which will review and decide on the PR/issue. O-windows Operating system: Windows labels Jun 9, 2023
@ChrisDenton
Copy link
Member

It does make sense to return success if the process has already exited. That said...

if process has been waited already, fail explicitly, to be consistent with Unix

I'm not sure this makes any sense when put in Windows terms. Wait is more like poll. There's no way to tell if something has been "waited". The best you can do is test if the process has exited already, which would make this step redundant given that the next steps test exactly that.

Also it's important to note that STILL_ACTIVE is equal to 259 which is also a perfectly valid exit code. We should use try_wait() instead.

If we want to change this then my proposal would be to simply try to terminate and if that fails (with ERROR_ACCESS_DENIED?) then call try_wait() to determine if the process has already exited. Also the docs should be updated to note the platform differences. It's very much just documenting POSIX atm.

@stepancheg
Copy link
Contributor Author

stepancheg commented Jun 9, 2023

There's no way to tell if something has been "waited"

Store a flag in Child, as Rust does for Unix.

Also it's important to note that STILL_ACTIVE is equal to 259 which is also a perfectly valid exit code. We should use try_wait() instead.

Good point — I don't know WinAPI well, but seems like this is how it is supposed to be implemented, thanks!

@ChrisDenton
Copy link
Member

ChrisDenton commented Jun 9, 2023

Store a flag in Child, as Rust does for Unix.

Perhaps. I admit I'm not terribly keen on trying to emulate the POSIX implementation exactly. This has lead to problems in the past, especially when it leads to inconsistent behaviour within the platform (for example, people using their own wait function(s) now have to know to also implement their own kill function).

What is the benefit of erroring here? How would that error be handled?

@stepancheg
Copy link
Contributor Author

What is the benefit of erroring here?

Simplicity of writing cross-platform code. Write something on one OS, and it will behave the same on the other OS. Writing cross-platform code is hard because such corner cases needs to be handled properly, and which are not very easy to test (and even figure out what scenarios needs to be tested).

That said, I'd prefer to change API on Unix instead to not return error on kill after the process has been waited. Because yes, how would that error be handled? What is benefit or erroring here?

@ChrisDenton ChrisDenton added T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. I-libs-api-nominated Nominated for discussion during a libs-api team meeting. and removed T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Jun 9, 2023
@ChrisDenton
Copy link
Member

ChrisDenton commented Jun 9, 2023

Ok, I'm going to tag this for libs-api discussion. I'm personally uncertain here but I'd love to be convinced. I'll try to summarise below but also see the above discussion.

Discussion

The semantics of Windows and posix implementations are different. It is advantageous to smooth over those differences in the std. However, how to do that is the question.

I think the current documentation is wrong here. It says:

If the child has already exited, an InvalidInput error is returned.

The Windows implementation does not return InvalidInput and the posix implementation only does so if Child::wait (or maybe Child::try_wait) is called first. Aside from being inaccurately documented, I'm not really convinced this is useful behaviour. It is necessary for posix to attempt to avoid using a PID that may have been recycled, but to me this feels like a platform implementation detail ather than something for users to handle.

@ChrisDenton
Copy link
Member

This was discussed in a libs-api meeting. The feeling was that it makes sense to simply return Ok(()) in the cases where the process has already exited. It will need a T-libs-api FCP though as this is a breaking change to the documented behaviour (even if the docs aren't quite accurate),

@dtolnay dtolnay removed the I-libs-api-nominated Nominated for discussion during a libs-api team meeting. label Jun 27, 2023
@bors bors closed this as completed in dfe0683 Jul 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: This is a bug. O-windows Operating system: Windows T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants