Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

os/exec: Current requirement for all reads to be completed before calling cmd.Wait can block indefinitely #60309

Closed
mitar opened this issue May 19, 2023 · 15 comments
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided.
Milestone

Comments

@mitar
Copy link
Contributor

mitar commented May 19, 2023

What version of Go are you using (go version)?

$ go version
go version go1.20.4 linux/amd64

Does this issue reproduce with the latest release?

Yes.

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/mitar/.cache/go-build"
GOENV="/home/mitar/.config/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/home/mitar/.go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/mitar/.go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GOVCS=""
GOVERSION="go1.20.4"
GCCGO="/usr/bin/gccgo"
GOAMD64="v1"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/tmp/git/go.mod"
GOWORK=""
CGO_CFLAGS="-O2 -g"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-O2 -g"
CGO_FFLAGS="-O2 -g"
CGO_LDFLAGS="-O2 -g"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -Wl,--no-gc-sections -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build4256193346=/tmp/go-build -gno-record-gcc-switches"

What did you do?

I tried to run the following program:

package main

import (
	"io"
	"log"
	"os"
	"os/exec"
	"sync"
	"syscall"
	"time"
)

func execute() error {
	cmd := exec.Command("bash", "-c", "(sleep infinity)")

	stdout, _ := cmd.StdoutPipe()
	stderr, _ := cmd.StderrPipe()

	if err := cmd.Start(); err != nil {
		log.Printf("Error executing command: %s......\n", err.Error())
		return err
	}

	go func() {
		<-time.After(10 * time.Second)
		log.Printf("sending SIGTERM")
		cmd.Process.Signal(syscall.SIGTERM)
	}()

	var wg sync.WaitGroup

	wg.Add(1)
	go func() {
		defer wg.Done()
		io.Copy(os.Stdout, stdout)
	}()

	wg.Add(1)
	go func() {
		defer wg.Done()
		io.Copy(os.Stderr, stderr)
	}()

	wg.Wait()

	if err := cmd.Wait(); err != nil {
		log.Printf("Error waiting for command execution: %s......\n", err.Error())
		return err
	}

	log.Printf("end")

	return nil
}

func main() {
	execute()
}

This is a variation of an example from #38268. I constructed this example but in my real app I do not just copy subprocess stdout to main stdout, but I do a transformation on it. Anyway, the issue is that it seems when bash subprocess gets terminated, reader does not get EOF and so io.Copy does not end, so wg.Wait never finishes, so we never get to cmd.Wait. If I comment out wg.Wait, then cmd.Wait correctly reaps the subprocess (but of course it could mean not everything gets copied).

Currently, documentation of StdoutPipe and StderrPipe requires:

It is thus incorrect to call Wait before all reads from the pipe have completed.

But it seems it is not possible to always know when all reads from the pipe have completed?

What did you expect to see?

2023/05/19 19:44:58 sending SIGTERM
2023/05/19 19:44:58 Error waiting for command execution: signal: terminated......

And programs terminates.

What did you see instead?

2023/05/19 19:44:58 sending SIGTERM

And program never terminates.

@heschi
Copy link
Contributor

heschi commented May 19, 2023

See #23019, perhaps. cc @ianlancetaylor @bcmills

@heschi heschi added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label May 19, 2023
@heschi heschi added this to the Backlog milestone May 19, 2023
@bcmills
Copy link
Contributor

bcmills commented May 19, 2023

Here is a Playground variation on that program that exhibits the same behavior:
https://go.dev/play/p/34d4n3sTlDq

@bcmills
Copy link
Contributor

bcmills commented May 19, 2023

Actually, the problem on the Playground seems to come from the Playground-specific faketime hook somehow interfering with the network poller. Adding a time.Sleep loop to it seems to fix it to give the expected behavior:
https://go.dev/play/p/OodMdJxLEJf

@bcmills
Copy link
Contributor

bcmills commented May 19, 2023

it seems when bash subprocess gets terminated, reader does not get EOF and so io.Copy does not end

@mitar, I think the problem you observe comes from sending the signal to the bash process rather than its sleep subprocess, which causes both processes to remain blocked. (In particular, the sleep subprocess holds open stdout.)

Adjusting the program to create a process group and send the signal to the whole process group also gives the expected behavior.

package main

import (
	"io"
	"log"
	"os"
	"os/exec"
	"sync"
	"syscall"
	"time"
)

func execute() error {
	cmd := exec.Command("bash", "-c", "(sleep infinity)")
	cmd.SysProcAttr = &syscall.SysProcAttr{
		Setpgid: true,
	}

	stdout, _ := cmd.StdoutPipe()
	stderr, _ := cmd.StderrPipe()

	if err := cmd.Start(); err != nil {
		log.Printf("Error executing command: %s......\n", err.Error())
		return err
	}

	go func() {
		<-time.After(1 * time.Second)
		log.Printf("sending SIGTERM")
		syscall.Kill(-cmd.Process.Pid, syscall.SIGTERM)
	}()

	var wg sync.WaitGroup

	wg.Add(1)
	go func() {
		defer wg.Done()
		io.Copy(os.Stdout, stdout)
	}()

	wg.Add(1)
	go func() {
		defer wg.Done()
		io.Copy(os.Stderr, stderr)
	}()

	wg.Wait()

	if err := cmd.Wait(); err != nil {
		log.Printf("Error waiting for command execution: %s......\n", err.Error())
		return err
	}

	log.Printf("end")

	return nil
}

func main() {
	execute()
}

@bcmills
Copy link
Contributor

bcmills commented May 19, 2023

As far as I can tell, this is more of a bash usability problem than a Go os/exec problem.

@bcmills bcmills added the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label May 19, 2023
@mitar
Copy link
Contributor Author

mitar commented May 19, 2023

Thanks. A process group approach is nice and is a workaround when you know that your child process is misbehaving like bash. But the issue is that in my particular case I cannot know that in advance. I am writing an init process in Go and some services might behave well and expect only the top-level process to get the signal and some might need the process group approach. For example, this could be a case where process does double fork to get to daemonize. In that case I want to reap the direct child and if it happens that stdin/stdout got inherited by the second daemonized process should not prevent me from doing this.

To me the surprising thing is Go API requirement here. Without this API requirement, I could wait for the process, see that it terminated, decide to end processing of stdout and stderr (if that has not yet finished), and leave to the stray process to get reparented to PID 1. Even more, I could even decide that I want to continue processing stdout and stderr, but that I also want to reap the now dead children.

@bcmills
Copy link
Contributor

bcmills commented May 19, 2023

In that case, don't use StdoutPipe and StderrPipe. Instead, you can call os.Pipe to obtain pipes that won't be closed when Wait returns. (I took such an approach in https://go.dev/cl/484741.)

It's kind of an unfortunate workaround, but it avoids the close-on-Wait race without having to make a backward-incompatible change to the interaction between StdoutPipe and Wait.

@bcmills bcmills added WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. and removed WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. labels May 19, 2023
@ianlancetaylor
Copy link
Member

I don't think there is anything to be done here, so closing. Please comment if you disagree.

@ianlancetaylor ianlancetaylor closed this as not planned Won't fix, can't repro, duplicate, stale May 19, 2023
@mitar
Copy link
Contributor Author

mitar commented May 20, 2023

I am confused by looking at https://go.dev/cl/484741, because from what I understand from it, it creates a pipe, calls run, calls wait, and when the overall function returns, it closes the pipe without waiting for everything to be read from it. Isn't this exactly the same as what would happen if I just used StdoutPipe and called Wait without waiting for everything to be read from it? It would first wait on the process and then close parentIOPipes.

So my question is: could documentation for StdoutPipe be relaxed and instead of saying:

It is thus incorrect to call Wait before all reads from the pipe have completed.

To something like (paraphrasing):

If you call Wait before all reads from the pipe have completed, you might miss some of the data wrote to the pipe and if the pipe got inherited by some other process, that process might be killed if it continues to write into a closed pipe?

Am I missing something here? Maybe I do not see how is your "unfortunate workaround" doing anything different with the close-on-Wait race?

@ianlancetaylor
Copy link
Member

It seems to me that the current documentation is reasonable for almost all users: it says what happens and makes a recommendation. You clearly understand in detail what is happening. If StdoutPipe works for you, go ahead and use it.

@bcmills
Copy link
Contributor

bcmills commented Jun 21, 2023

when the overall function returns, it closes the pipe without waiting for everything to be read from it. Isn't this exactly the same as what would happen if I just used StdoutPipe and called Wait without waiting for everything to be read from it?

The difference is that with your own pipe, you can continue to read from the pipe (and wait for that read to complete) even after Wait has returned. (In the linked CL, the defer closure waits until ctx.Done() is closed, which may be an arbitrarily long time after Wait.)

In contrast, with StdoutPipe the pipe is always closed (and the read canceled) as soon as Wait unblocks.

@mitar
Copy link
Contributor Author

mitar commented Jun 21, 2023

Could StdoutPipe be made so that if it is used, Wait only returns after everything from StdoutPipe has been read?

@bcmills
Copy link
Contributor

bcmills commented Jun 21, 2023

No. That is exactly the opposite of the current documented behavior, and could introduce deadlocks in existing programs that (accidentally or intentionally) rely on that behavior.

@mitar
Copy link
Contributor Author

mitar commented Jun 21, 2023

I am just thinking that this is a common use case. One wanting to wait for all data and for program to finish. Maybe then another Wait-like method should be introduced? Like PipeWait?

@bcmills
Copy link
Contributor

bcmills commented Jun 21, 2023

Again, you can fairly easily use os.Pipe and start your own copying goroutine. Just substituting os.Pipe for Cmd.StdoutPipe is really not that much extra code.

(Most of the extra complexity in the linked CL is because it handles arbitrary writers for stdout and stderr, not just pipes.)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided.
Projects
None yet
Development

No branches or pull requests

5 participants