Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: manual instrumentation of KeepAlive is fraught #34810

Open
beoran opened this issue Oct 10, 2019 · 28 comments
Open

runtime: manual instrumentation of KeepAlive is fraught #34810

beoran opened this issue Oct 10, 2019 · 28 comments
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@beoran
Copy link

beoran commented Oct 10, 2019

What version of Go are you using (go version)?

$ go version
go version go1.12 linux/amd64

Does this issue reproduce with the latest release?

Likely, yes.

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/bjorn/.cache/go-build"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/home/bjorn/src/go"
GOPROXY="http://claudette.applied-maths.local:13909"
GORACE=""
GOROOT="/home/bjorn/opt/go"
GOTMPDIR=""
GOTOOLDIR="/home/bjorn/opt/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/home/bjorn/work/src/i41healthapp/go.mod"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build701296847=/tmp/go-build -gno-record-gcc-switches"

What did you do?

I am currently writing a game library in Go that avoids cgo and calls the
OS directly, certainly on Linux thanks to it's stable kernel API. (See:
https://gitlab.com/beoran/galago https://gitlab.com/beoran/galago/blob/master/os/linux/input/input_linux.go
)

I have an input Device like this:

// Device models an input device
type Device struct {
    FileName string 
    *os.File
    // Cached device information
    Info struct { 
        Events          []SupportedEvent
        Keys            []SupportedKey
        Axes            []AbsoluteAxis
        Rolls           []RelativeAxis
        Name            string
        ID              string
    }
}

// Keeps the device's file from being garbage collected.
func (d * Device) KeepAlive() {
    runtime.KeepAlive(d.File)
}

// Icotl performs an ioctl on the given device
func (d * Device) Ioctl(code uint32, pointer unsafe.Pointer) error {
    fmt.Printf("ioctl: %d %d %d\n", uintptr(d.Fd()), uintptr(code),
uintptr(pointer))
    _, _, errno := syscall.Syscall(
        syscall.SYS_IOCTL,
        uintptr(d.Fd()),
        uintptr(code),
        uintptr(pointer))
    if (errno != 0) {
        return errno
    }
    d.KeepAlive()
    return nil
}

Notice the KeepAlive? If I leave that out the program crashes, because the device 's io.File gets garbage collected. This took me quite some time to figure out and it is not obvious that that would happen, and that runitme.KeepAlive() is needed here. It would be great if I didn't have to manually insert runtime.KeepAlive calls when using os.File.Fd when using system calls. Or if there was at least vet/lint tooling to suggest when I would probably need it.

I posted this as a new issue to split it off from #34684, where it was a tangential issue.

@acln0
Copy link
Contributor

acln0 commented Oct 10, 2019

Consider using https://golang.org/pkg/os/#File.SyscallConn over the the raw file descriptor returned by Fd. The SyscallConn API is perhaps a little more verbose, but it deals with lifetime issues correctly, and wrappers like the Ioctl method in your snippet can hide it from callers entirely.

@beoran
Copy link
Author

beoran commented Oct 10, 2019

I took a look at SyscallConn and it looks like it will work, at the hopefully small overhead of having to use a closure.

But TIL that File.Fd() sets the file to blocking mode, which I explains why I was unable to do nonblocking IO on the devices. I feel that should have been in the documentation of File.Fd(). Furthermore, it would be a good idea to refer from the documentation of File.FD to that of File.SyscallConn().

@acln0
Copy link
Contributor

acln0 commented Oct 10, 2019

I feel that should have been in the documentation of File.Fd().

It is, sort of. See #22934 and 4153495.

@andybons andybons changed the title Garbage collector should keep os.File alive when using os.File.Fd runtime: garbage collector should keep os.File alive when using os.File.Fd Oct 10, 2019
@andybons andybons added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Oct 10, 2019
@andybons andybons added this to the Unplanned milestone Oct 10, 2019
@andybons
Copy link
Member

@beoran is this still an issue for you or would you like to re-purpose this to a documentation fix?

@andybons andybons added the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label Oct 10, 2019
@beoran
Copy link
Author

beoran commented Oct 10, 2019 via email

@andybons andybons removed the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label Oct 10, 2019
@andybons andybons changed the title runtime: garbage collector should keep os.File alive when using os.File.Fd os: document that File.Fd() sets the file to blocking mode Oct 10, 2019
@andybons
Copy link
Member

@ianlancetaylor in case there are objections.

@mdempsky
Copy link
Contributor

I think this issue should be kept open until we've discussed it further.

I'm not convinced the Go compiler / runtime necessarily need to do anything differently, but I do think there's opportunity for vet or lint or something should probably warn users when they need to use runtime.KeepAlive.

For example, I just happened to be looking at github.com/google/gopacket, and I noticed that its pcap library suffers from this same issue: they call file.Fd() but don't have any runtime.KeepAlive(file) call afterwards to ensure that file isn't GC'd.

@andybons andybons added NeedsFix The path to resolution is known, but the work has not been done. and removed NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Oct 10, 2019
@mdempsky mdempsky changed the title os: document that File.Fd() sets the file to blocking mode runtime: manual instrumentation of KeepAlive is fraught Oct 10, 2019
@mdempsky mdempsky added NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. and removed NeedsFix The path to resolution is known, but the work has not been done. labels Oct 10, 2019
@beoran
Copy link
Author

beoran commented Oct 10, 2019

https://gitlab.com/beoran/galago/blob/master/os/linux/input/input_linux.go now uses File.SyscallConn , and as a bonus, reading Linux input events has non-blocking as well. Yay ^_^. But, if @acln0 hadn't kindly informed me of the existence this function, then I would still be struggling.

So yes, both better documentation and a go vet or go lint check would be welcome to help others avoid this issue.

@andybons
Copy link
Member

Sounds good to me. Thanks for chiming in, @mdempsky.

@beoran
Copy link
Author

beoran commented Oct 11, 2019

A very related problem I am having now, with my library is that the SyscallConn can't be used reliably in conjunction with unix.Poll(fds []PollFd, timeout int) (n int, err error). I would either have to make the closure passed to Control() call Control() recursively over all Files I want to poll on , or I have to cheat and store the FD somewhere and hope that it will stay valid for the file. I could of course use File.Fd() again, but that makes the file blocking, which is what I want to avoid in the first place. What should I do to poll on several files without using File.Fd()?

@eliasnaur
Copy link
Contributor

eliasnaur commented Oct 11, 2019 via email

@beoran
Copy link
Author

beoran commented Oct 11, 2019

That could work, but I'm not sure how to do that. File.Fd() sets a nonblocking flag to false in the underlying structure for os.File directly. AFAIK, `unix.SetNonblock(fd, true) has no effect on this, or am I mistaken?

@eliasnaur
Copy link
Contributor

eliasnaur commented Oct 11, 2019 via email

@beoran
Copy link
Author

beoran commented Oct 11, 2019

Well, yes, but that was exactly the problem that SyscallConn was supposed to solve. With it you can get access to the FD while the os.File() still works. Since this is platform specific code anyway, I might as well then open the file with unix.Openin stead, and just handle everything myself, although I fear this might interfere with the way goroutines are supposed to work.

@ianlancetaylor
Copy link
Member

Yes, if you want to manage I/O yourself, I recommend that you use unix.Open. That won't interfere with goroutines. It just means that I/O on those descriptors won't use the runtime poller, but it seems that you want to call unix.Poll yourself anyhow. There is no good way to both use your own poller and the runtime poller, and there doesn't seem to be much point to doing so.

@beoran
Copy link
Author

beoran commented Oct 11, 2019

Well, i'm calling unix.Poll myself because I don't know how I could use the runtime poller in my situation. Would it even be possible, and how? If not I'll stick to managing everything myself.

The top issue remains though, this is all rather fraught and not well documented. Certainly x/sys/unix could do with more documentation and examples.

@ianlancetaylor
Copy link
Member

@beoran I don't really know what you are doing so I don't know why you can't use the runtime poller. In general it's fine to call ioctl on os.File or net.Conn values using SyscallConn and to also use the ordinary Read and Write methods which will use the runtime poller.

@mdempsky
Copy link
Contributor

It seems like this discussion is drifting a bit from the initial report, which was that os.File.Fd is dangerous because it (in general) requires using runtime.KeepAlive, and we don't have any tooling to help check for that.

Some questions:

  1. Is the transformation fn(file.Fd()); runtime.KeepAlive(file) => file.SyscallConn().Control(fn) (or maybe Read or Write instead of Control) safe?

  2. If so, should we just deprecate os.File.Fd and encourage users to switch to os.File.SyscallConn?

  3. If not (or maybe regardless), should we implement a lint/vet warning that file.Fd() should always be used in conjunction with runtime.KeepAlive(file)?

@ianlancetaylor
Copy link
Member

  1. Yes, that is safe provided that fn does not preserve its fd argument in any way.
  2. Well, no, because some people really do need to hang on to the file descriptor.
  3. I'm not sure. Assuming your program does not change os.Stdout there's nothing wrong with os.Stdout.Fd(), or in general calling Fd on a file descriptor that is stored in a global variable or otherwise preserved.

@mdempsky
Copy link
Contributor

  1. Well, no, because some people really do need to hang on to the file descriptor.

Do you have any examples of this?

For this to be safe, the user has to also keep the os.File alive. And if they already have to do that, it seems easier to just hang onto the *os.File or *syscall.RawConn, and convert it to uintptr via RawConn.Control as needed.

It's also an option to dup the FD, so they can manage its lifetime itself.

Assuming your program does not change os.Stdout there's nothing wrong with os.Stdout.Fd(), or in general calling Fd on a file descriptor that is stored in a global variable or otherwise preserved.

We have room to make a vet/lint check arbitrarily sophisticated to try to avoid false positives; but in general, I'm pretty skeptical of the ability to use it safely.

E.g., I just started a random audit of os.File.Fd call sites, and I already found that package os itself doesn't use it safely:

go/src/os/exec_posix.go

Lines 46 to 49 in 54abb5f

sysattr.Files = make([]uintptr, 0, len(attr.Files))
for _, f := range attr.Files {
sysattr.Files = append(sysattr.Files, f.Fd())
}

(Admittedly this case would be very awkward to rewrite using SyscallConn, but not impossible.)

After looking at about 100 call sites, cmd/compile is the only code that I found that uses os.File.Fd and then explicitly keeps the os.File alive. (And I'm not 100% confident that if I wrote that code today that I'd remember to have done that.)

I'd estimate I saw 10% of cases that were bad; 20% of cases that were okay (i.e., the os.File was obviously still alive after the Fd call); 20% of cases that used os.Std{in,out,err} (so also okay); but a huge chunk of remaining cases where the *os.File is dead after the Fd call (at least locally within the function).

It would be interesting to instrument os.File.Fd() with a call to runtime.GC() immediately before returning. I wouldn't be surprised if some programs start failing.

@ianlancetaylor
Copy link
Member

An example where you need the file descriptor is when you are passing a set of file descriptors to another process via syscall.UnixRights.

The use in package os that you mention is safe because the file descriptors only have to live as long as the fork, and the fork code will be using the SysProcAttr that points to the os.File values. Not that I would be opposed to adding some KeepAlive calls.

I agree that some sort of instrumentation would be interesting. It could perhaps be done via GODEBUG.

@mdempsky
Copy link
Contributor

mdempsky commented Oct 12, 2019

An example where you need the file descriptor is when you are passing a set of file descriptors to another process via syscall.UnixRights.

Can't you do that inside the RawConn.Control?

the fork code will be using the SysProcAttr that points to the os.File values.

I assume you mean ProcAttr, and ProcAttr takes the files as uintptr, not os.File: https://golang.org/pkg/syscall/#ProcAttr

Am I missing something?

Edit: I don't think I'm missing anything: #34858

@beoran
Copy link
Author

beoran commented Oct 12, 2019 via email

@mdempsky
Copy link
Contributor

In situations where you need a set of fds, such as for ProcAttr, RawConn.Control is impractical because for N files you would have to call it N times it recursively from the closure.

I'll admit it's a bit tedious, but I don't think it's impractical. The recursion can be easily abstracted away behind a helper like so: (caveat: untested)

func MultiControl(conns []syscall.RawConn, f func(fds []uintptr)) error {
	if len(conns) == 0 {
		f(nil)
		return nil
	}

	var err error
	fds := make([]uintptr, len(conns))
	i := 0

	var do func(fd uintptr)
	do = func(fd uintptr) {
		fds[i] = fd
		i++
		if i == len(conns) {
			f(fds)
			return
		}
		err0 := conns[i].Control(do)
		if err == nil {
			err = err0
		}
	}
	err0 := conns[i].Control(do)
	if err == nil {
		err = err0
	}
	return err
}

However, while writing this, I did notice a somewhat thornier issue: Plan 9 doesn't support SyscallConn. It just returns EPLAN9.

@ericlagergren
Copy link
Contributor

To add another data point: at work we've a (rather large) package that wraps a certain DLL. Pretty much every constructed object needs to be closed in order to not leak memory, etc. As a safety feature, we've added finalizers. Just like os.File.

Most DLL functions require a handle, which is just a number. This leaves us open to the GC collecting the object out from under us if we forget to use runtime.KeepAlive. Just like os.(*File).Fd.

Even though we try to do due diligence, we've run into the issue a couple of times (thankfully just in testing). It's fairly easy to remember to use runtime.KeepAlive for os.File, because it's more or less the only part of the stdlib that you use with any frequency that has this problem.

I don't know what the valid criteria are for sniffing this out, but whenever I've run into this problem there's been two similarities:

(a) passing a uintptr (or a type whose underlying type is uintptr) to another function, and
(b) accessing that uintptr via some method call

@ianlancetaylor
Copy link
Member

@mdempsky You're right, I was thinking of the ExtraFiles field in os/exec.Cmd. That is used to set the Files field in syscall.ProcAttr. The use of Files in os.ProcAttr does seem problematic. It seems to need a runtime.KeepAlive(attr).

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/201198 mentions this issue: os: keep attr.Files alive when calling StartProcess

gopherbot pushed a commit that referenced this issue Oct 15, 2019
Updates #34810
Fixes #34858

Change-Id: Ie934861e51eeafe8a7fd6653c4223a5f5d45efe8
Reviewed-on: https://go-review.googlesource.com/c/go/+/201198
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
@beoran
Copy link
Author

beoran commented Dec 17, 2019

Well, while I am happy to see that some bugs in the Go standard library were fixed because of my observation, this issue seems a bit stalled. How could we fix the original issue? RawConn.Control is not supported on all platforms and hard to use for a set of FD's.

petemoore added a commit to taskcluster/taskcluster that referenced this issue Feb 3, 2022
…ntptrescapes

The unofficial documentation for this pragma can be found in the source code:
  * https://github.com/golang/go/blob/go1.17.6/src/cmd/compile/internal/noder/lex.go#L71-L81

Instead of relying on this pragma, generic-worker now calls
syscall.Syscall<x> directly. This change has been made since I suspect
that nested go:uintptrescapes functions may not work as expected, but
furthmore, since this pragma is not an official feature, it presumably
may change over time, so we probably should not assume it will always be
supported.

There have also been several issues raised against functions and methods
that rely on its behaviour, so it is not clear whether it is entirely
safe to use:

  * golang/go#16035
  * golang/go#23045
  * golang/go#34474
  * golang/go#34642
  * golang/go#34684
  * golang/go#34810
  * golang/go#42680

Note, in future an option might be to generate the function bodies using mkwinsyscall:
  * https://github.com/golang/go/blob/go1.17.6/src/internal/syscall/windows/mksyscall.go#L9
  * https://github.com/golang/go/blob/go1.17.6/src/syscall/mksyscall_windows.go#L47
  * https://pkg.go.dev/golang.org/x/sys/windows/mkwinsyscall
  * https://github.com/golang/sys/blob/master/windows/mkwinsyscall/mkwinsyscall.go

One advantage of this approach is that we can probably log the syscalls
again using:

  * https://github.com/golang/sys/blob/50617c2ba19781ae46f34bb505064996b8fa32e8/windows/mkwinsyscall/mkwinsyscall.go#L43-L44

Alternatively if tracing doesn't do what we expect, we could write our own generator.
DmitriyMV added a commit to DmitriyMV/go-blockdevice that referenced this issue Jun 10, 2024
Without them, when we pass the result of Fd() into unix.Syscall, Go runtime is free to call finalizer set in os.newFile.
More info [here](golang/go#34810).

The proper fix is to either:

1. Use unix.Open/unix.Close as descriptors (ints) everywhere in the Device code. This should hide fd's from Go runtime.
2. Use os.File.SyscallConn().Control which guarantees that descriptor survives. This will also do not put fd's into the blocking mode.

Otherwise, even with os.File.Close it's not guaranteed that runtime.SetFinalizer will not come for `os.File.file`.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

8 participants