Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: os/exec: add option to send Interrupt instead of Kill on Context cancellation #22757

Closed
andreynering opened this issue Nov 16, 2017 · 14 comments

Comments

@andreynering
Copy link

The reason is that some executables do cleanup on Interrupt, like deleting temporary files.

Ideally it should send Interrupt but force a Kill after a timeout (of 2 seconds?). Also, it should be no-op on Windows that don't support sending Interrupt (and just send Kill instead).

It's already possible to manually listen to the Context and send the signals, but go test -race indicates a race when doing so, which I think it's not fixable because the internal mutex is unexported.

Background:

@odeke-em
Copy link
Member

odeke-em commented Nov 16, 2017

@andreynering this issue looks like a subset of the closed proposal #21135 which was closed due to lack of demand since no one responded to @rsc's request for comments so an indication of less demand.

@bcmills
Copy link
Contributor

bcmills commented Nov 16, 2017

go test -race indicates a race when doing so

Please include the actual code you tried (ideally as a Playground link) and the actual error you got. If the obvious way to do it is racy, perhaps we can just fix the race.

@andreynering
Copy link
Author

@bcmills

package main

import (
	"context"
	"fmt"
	"os"
	"os/exec"
	"runtime"
	"time"
)

func main() {
	fmt.Println("Starting...")

	ctx, _ := context.WithTimeout(context.Background(), 2*time.Second)
	cmd := exec.CommandContext(ctx, "sleep", "5")

	go func() {
		<-ctx.Done()

		// Windows doesn't support Interrupt
		if runtime.GOOS == "windows" {
			_ = cmd.Process.Signal(os.Kill)
			return
		}

		go func() {
			time.Sleep(2 * time.Second)
			_ = cmd.Process.Signal(os.Kill)
		}()
		cmd.Process.Signal(os.Interrupt)
	}()

	if err := cmd.Run(); err != nil {
		panic(err)
	}

	fmt.Println("Finishing...")
}

@rsc
Copy link
Contributor

rsc commented Nov 16, 2017

@andreynering, sorry but can you also post the output you get from running that program with -race? I am not getting any race failures when I use "go run -race x.go" (where x.go is your program).

It's possible that the race you are seeing is on cmd.Process, which cmd.Run initializes and your goroutine uses. If so the solution would be to run cmd.Start then kick off the goroutine, then use cmd.Wait. Start is what initializes cmd.Process, so the goroutine does need to run only after Start has done that.

@andreynering
Copy link
Author

@rsc Sorry, there was a mistake in that script. Just change this line:

- cmd := exec.CommandContext(ctx, "sleep", "5")
+ cmd := exec.Command("sleep", "5")

And the output:

$ go run -race signal.go
Starting...
==================
WARNING: DATA RACE
Read at 0x00c42007a200 by goroutine 7:
  main.main.func1()
      /home/andrey/signal.go:31 +0x90

Previous write at 0x00c42007a200 by main goroutine:
  os/exec.(*Cmd).Start()
      /home/andrey/GOROOT/src/os/exec/exec.go:363 +0x8d6
  os/exec.(*Cmd).Run()
      /home/andrey/GOROOT/src/os/exec/exec.go:286 +0x3c
  main.main()
      /home/andrey/signal.go:34 +0x190

Goroutine 7 (running) created at:
  main.main()
      /home/andrey/signal.go:18 +0x182
==================
==================
WARNING: DATA RACE
Read at 0x00c4200800c0 by goroutine 7:
  os.(*Process).signal()
      /home/andrey/GOROOT/src/os/exec_unix.go:56 +0x51
  os.(*Process).Signal()
      /home/andrey/GOROOT/src/os/exec.go:121 +0x4c
  main.main.func1()
      /home/andrey/signal.go:31 +0xcd

Previous write at 0x00c4200800c0 by main goroutine:
  os.newProcess()
      /home/andrey/GOROOT/src/os/exec.go:24 +0x5d
  os.startProcess()
      /home/andrey/GOROOT/src/os/exec_posix.go:49 +0x542
  os.StartProcess()
      /home/andrey/GOROOT/src/os/exec.go:94 +0x7a
  os/exec.(*Cmd).Start()
      /home/andrey/GOROOT/src/os/exec/exec.go:363 +0x89b
  os/exec.(*Cmd).Run()
      /home/andrey/GOROOT/src/os/exec/exec.go:286 +0x3c
  main.main()
      /home/andrey/signal.go:34 +0x190

Goroutine 7 (running) created at:
  main.main()
      /home/andrey/signal.go:18 +0x182
==================
==================
WARNING: DATA RACE
Read at 0x00c4200800d0 by goroutine 7:
  sync/atomic.LoadInt32()
      /home/andrey/GOROOT/src/runtime/race_amd64.s:206 +0xb
  os.(*Process).done()
      /home/andrey/GOROOT/src/os/exec.go:34 +0x3e
  os.(*Process).signal()
      /home/andrey/GOROOT/src/os/exec_unix.go:64 +0x1dc
  os.(*Process).Signal()
      /home/andrey/GOROOT/src/os/exec.go:121 +0x4c
  main.main.func1()
      /home/andrey/signal.go:31 +0xcd

Previous write at 0x00c4200800d0 by main goroutine:
  os.newProcess()
      /home/andrey/GOROOT/src/os/exec.go:24 +0x5d
  os.startProcess()
      /home/andrey/GOROOT/src/os/exec_posix.go:49 +0x542
  os.StartProcess()
      /home/andrey/GOROOT/src/os/exec.go:94 +0x7a
  os/exec.(*Cmd).Start()
      /home/andrey/GOROOT/src/os/exec/exec.go:363 +0x89b
  os/exec.(*Cmd).Run()
      /home/andrey/GOROOT/src/os/exec/exec.go:286 +0x3c
  main.main()
      /home/andrey/signal.go:34 +0x190

Goroutine 7 (running) created at:
  main.main()
      /home/andrey/signal.go:18 +0x182
==================
panic: signal: interrupt

goroutine 1 [running]:
main.main()
        /home/andrey/signal.go:35 +0x248
exit status 2

@andreynering
Copy link
Author

andreynering commented Nov 16, 2017

It's possible that the race you are seeing is on cmd.Process, which cmd.Run initializes and your goroutine uses. If so the solution would be to run cmd.Start then kick off the goroutine, then use cmd.Wait. Start is what initializes cmd.Process, so the goroutine does need to run only after Start has done that.

You're right! Using Start() and Wait() fixes the race. (Updated script below).

Anyway, I still think the proposal should be considered:

  • It's convenient to have that in the stdlib;
  • It's a common use case;
  • It's too easy to someone commit this mistake and have a racy code.

package main

import (
	"context"
	"fmt"
	"os"
	"os/exec"
	"runtime"
	"time"
)

func main() {
	fmt.Println("Starting...")

	ctx, _ := context.WithTimeout(context.Background(), 2*time.Second)
	cmd := exec.Command("sleep", "5")
	if err := cmd.Start(); err != nil {
		panic(err)
	}

	go func() {
		<-ctx.Done()

		// Windows doesn't support Interrupt
		if runtime.GOOS == "windows" {
			_ = cmd.Process.Signal(os.Kill)
			return
		}

		go func() {
			time.Sleep(2 * time.Second)
			_ = cmd.Process.Signal(os.Kill)
		}()
		cmd.Process.Signal(os.Interrupt)
	}()

	if err := cmd.Wait(); err != nil {
		panic(err)
	}

	fmt.Println("Finishing...")
}

@rsc
Copy link
Contributor

rsc commented Nov 16, 2017

As noted earlier, we did consider it at length in #21135 but decided not to. You can see the discussion there. We're not going to revisit this now, but maybe in a year if more evidence of demand accumulates. Sorry.

@marco-m
Copy link

marco-m commented Apr 26, 2020

My 2 cents to indicate additional demand to reconsider the option of being able to send SIGINT (+ timeout + SIGKILL) instead of SIGKILL on Context cancellation.

@cpuguy83
Copy link

I threw together a wrapper to handle this: https://github.com/cpuguy83/execctx

It indeed seems iffy to include something like this in the stdlib.

@bcmills
Copy link
Contributor

bcmills commented Jul 1, 2020

@cpuguy83, we've had a bit more experience with gracefully terminating subprocesses in various parts of the Go project, and the pattern that is emerging seems to be a variant of (*exec.Cmd).Wait rather than exec.Command:

go/src/cmd/go/script_test.go

Lines 1091 to 1098 in cacac8b

// waitOrStop waits for the already-started command cmd by calling its Wait method.
//
// If cmd does not return before ctx is done, waitOrStop sends it the given interrupt signal.
// If killDelay is positive, waitOrStop waits that additional period for Wait to return before sending os.Kill.
//
// This function is copied from the one added to x/playground/internal in
// http://golang.org/cl/228438.
func waitOrStop(ctx context.Context, cmd *exec.Cmd, interrupt os.Signal, killDelay time.Duration) error {

@marco-m
Copy link

marco-m commented Jul 1, 2020

@bcmills thanks! You showed a treasure I was not aware of.

MattWindsor91 added a commit to c4-project/c4t that referenced this issue Dec 15, 2020
This is important to make the backend runner able to support partial traces; the
default CommandContext setup always sigkills, and we need to sigterm things like
rmem to get them to dump a partial result.

Seems to work; could do with some testing, but how?, and refactoring, but how?.

This isn't yet supported in the machine node, mainly because that doesn't quite
support the service.Runner idiom yet.

See golang/go#22757.
@mvdan
Copy link
Member

mvdan commented Feb 4, 2021

@bcmills thanks for the link above. I'm a bit curious - isn't using that function for "graceful shutdown" on windows largely unhelpful, since os.Interrupt is not supported there? In other words, waitOrStop on Windows seems to kill the process immediately after the context is done, meaning waitOrStop isn't really improving things for Windows.

I'm once again wanting to have a "graceful shutdown of an exec.Cmd with a timeout for an os.Kill", but I just keep finding snippets which don't support Windows, or which only support Windows :) I ask here and now because you say there's a pattern emerging in various parts of the Go project, so I wonder if another version of this pattern does have Windows support like what's here: https://github.com/golang/go/blob/master/src/os/signal/signal_windows_test.go

@bcmills
Copy link
Contributor

bcmills commented Feb 5, 2021

@mvdan, I honestly haven't thought much about Windows since I'm mostly using the function to get better test output in case of deadlocks, and (in the programs I'm working with) deadlocks that occur on Windows also tend to occur elsewhere.

That said, I could easily imagine a variant of the above waitOrStop that instead invokes GenerateConsoleCtrlEvent on Windows if the interrupt signal is equal to os.Interrupt.

@bronger
Copy link

bronger commented Aug 28, 2021

@cpuguy83, we've had a bit more experience with gracefully terminating subprocesses in various parts of the Go project, and the pattern that is emerging seems to be a variant of (*exec.Cmd).Wait rather than exec.Command:

go/src/cmd/go/script_test.go

Lines 1091 to 1098 in cacac8b

// waitOrStop waits for the already-started command cmd by calling its Wait method.
//
// If cmd does not return before ctx is done, waitOrStop sends it the given interrupt signal.
// If killDelay is positive, waitOrStop waits that additional period for Wait to return before sending os.Kill.
//
// This function is copied from the one added to x/playground/internal in
// http://golang.org/cl/228438.
func waitOrStop(ctx context.Context, cmd *exec.Cmd, interrupt os.Signal, killDelay time.Duration) error {

I’m not competent enough in UNIX, so may I ask whether it makes sense to leave killDelay to 0? I know that this default value being 0 makes sense in either case, so my question is only about how to use this function properly.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

9 participants