-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: manual instrumentation of KeepAlive is fraught #34810
Comments
Consider using https://golang.org/pkg/os/#File.SyscallConn over the the raw file descriptor returned by Fd. The SyscallConn API is perhaps a little more verbose, but it deals with lifetime issues correctly, and wrappers like the Ioctl method in your snippet can hide it from callers entirely. |
I took a look at SyscallConn and it looks like it will work, at the hopefully small overhead of having to use a closure. But TIL that File.Fd() sets the file to blocking mode, which I explains why I was unable to do nonblocking IO on the devices. I feel that should have been in the documentation of File.Fd(). Furthermore, it would be a good idea to refer from the documentation of File.FD to that of File.SyscallConn(). |
@beoran is this still an issue for you or would you like to re-purpose this to a documentation fix? |
This issue may be repurposed as a documentation issue. Also, the RawConn
documentation really could use an example on how to use it.
Op do 10 okt. 2019 17:30 schreef Andrew Bonventre <notifications@github.com
…:
@beoran <https://github.com/beoran> is this still an issue for you or
would you like to re-purpose this to a documentation fix?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#34810?email_source=notifications&email_token=AAARM6MQJHZICVIXOP3UI2TQN5DCHA5CNFSM4I7IPINKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEA4XX7Y#issuecomment-540638207>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAARM6NH2FRVIIF4HPDSDQLQN5DCHANCNFSM4I7IPINA>
.
|
@ianlancetaylor in case there are objections. |
I think this issue should be kept open until we've discussed it further. I'm not convinced the Go compiler / runtime necessarily need to do anything differently, but I do think there's opportunity for vet or lint or something should probably warn users when they need to use runtime.KeepAlive. For example, I just happened to be looking at github.com/google/gopacket, and I noticed that its pcap library suffers from this same issue: they call |
https://gitlab.com/beoran/galago/blob/master/os/linux/input/input_linux.go now uses File.SyscallConn , and as a bonus, reading Linux input events has non-blocking as well. Yay ^_^. But, if @acln0 hadn't kindly informed me of the existence this function, then I would still be struggling. So yes, both better documentation and a go vet or go lint check would be welcome to help others avoid this issue. |
Sounds good to me. Thanks for chiming in, @mdempsky. |
A very related problem I am having now, with my library is that the |
A very related problem I am having now, with my library is that the `SyscallConn` can't be used reliably in conjunction with `unix.Poll(fds []PollFd, timeout int) (n int, err error)`. I would either have to make the closure passed to Control() call Control() recursively over all Files I want to poll on , or I have to cheat and store the FD somewhere and hope that it will stay valid for the file. I could of course use File.Fd() again, but that makes the file blocking, which is what I want to avoid in the first place. What should I do to poll on several files without using File.Fd()?
Is there a reason you can't mark the file descriptor non-blocking after
acquiring it from File.Fd()?
|
That could work, but I'm not sure how to do that. File.Fd() sets a nonblocking flag to false in the underlying structure for os.File directly. AFAIK, `unix.SetNonblock(fd, true) has no effect on this, or am I mistaken? |
On Fri Oct 11, 2019 at 1:59 AM Beoran wrote:
That could work, but I'm not sure how to do that. File.Fd() sets a nonblocking flag to false in the underlying structure for os.File directly. AFAIK, `unix.SetNonblock(fd, true) has no effect on this, or am I mistaken?
I consider Fd() to mean that I take over full responsibility of the descriptor,
which also means I can't call os.File methods and expect them to work the same,
or at all.
…-- elias
|
Well, yes, but that was exactly the problem that SyscallConn was supposed to solve. With it you can get access to the FD while the os.File() still works. Since this is platform specific code anyway, I might as well then open the file with |
Yes, if you want to manage I/O yourself, I recommend that you use |
Well, i'm calling unix.Poll myself because I don't know how I could use the runtime poller in my situation. Would it even be possible, and how? If not I'll stick to managing everything myself. The top issue remains though, this is all rather fraught and not well documented. Certainly x/sys/unix could do with more documentation and examples. |
@beoran I don't really know what you are doing so I don't know why you can't use the runtime poller. In general it's fine to call |
It seems like this discussion is drifting a bit from the initial report, which was that Some questions:
|
|
Do you have any examples of this? For this to be safe, the user has to also keep the os.File alive. And if they already have to do that, it seems easier to just hang onto the *os.File or *syscall.RawConn, and convert it to uintptr via It's also an option to
We have room to make a vet/lint check arbitrarily sophisticated to try to avoid false positives; but in general, I'm pretty skeptical of the ability to use it safely. E.g., I just started a random audit of os.File.Fd call sites, and I already found that package os itself doesn't use it safely: Lines 46 to 49 in 54abb5f
(Admittedly this case would be very awkward to rewrite using After looking at about 100 call sites, cmd/compile is the only code that I found that uses os.File.Fd and then explicitly keeps the os.File alive. (And I'm not 100% confident that if I wrote that code today that I'd remember to have done that.) I'd estimate I saw 10% of cases that were bad; 20% of cases that were okay (i.e., the os.File was obviously still alive after the Fd call); 20% of cases that used os.Std{in,out,err} (so also okay); but a huge chunk of remaining cases where the *os.File is dead after the Fd call (at least locally within the function). It would be interesting to instrument os.File.Fd() with a call to runtime.GC() immediately before returning. I wouldn't be surprised if some programs start failing. |
An example where you need the file descriptor is when you are passing a set of file descriptors to another process via The use in package os that you mention is safe because the file descriptors only have to live as long as the fork, and the fork code will be using the I agree that some sort of instrumentation would be interesting. It could perhaps be done via |
Can't you do that inside the RawConn.Control?
I assume you mean ProcAttr, and ProcAttr takes the files as uintptr, not os.File: https://golang.org/pkg/syscall/#ProcAttr Am I missing something? Edit: I don't think I'm missing anything: #34858 |
As I discovered during this discussion, RawConn.Control is only useful for
operations on a single file. In situations where you need a set of fds,
such as for ProcAttr, RawConn.Control is impractical because for N files
you would have to call it N times it recursively from the closure. I would
say we need some functionality that works like RawConn, but then for
multiple files. RawConnSet, maybe?
|
I'll admit it's a bit tedious, but I don't think it's impractical. The recursion can be easily abstracted away behind a helper like so: (caveat: untested)
However, while writing this, I did notice a somewhat thornier issue: Plan 9 doesn't support SyscallConn. It just returns EPLAN9. |
To add another data point: at work we've a (rather large) package that wraps a certain DLL. Pretty much every constructed object needs to be closed in order to not leak memory, etc. As a safety feature, we've added finalizers. Just like Most DLL functions require a handle, which is just a number. This leaves us open to the GC collecting the object out from under us if we forget to use Even though we try to do due diligence, we've run into the issue a couple of times (thankfully just in testing). It's fairly easy to remember to use I don't know what the valid criteria are for sniffing this out, but whenever I've run into this problem there's been two similarities: (a) passing a |
@mdempsky You're right, I was thinking of the |
Change https://golang.org/cl/201198 mentions this issue: |
Updates #34810 Fixes #34858 Change-Id: Ie934861e51eeafe8a7fd6653c4223a5f5d45efe8 Reviewed-on: https://go-review.googlesource.com/c/go/+/201198 Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Well, while I am happy to see that some bugs in the Go standard library were fixed because of my observation, this issue seems a bit stalled. How could we fix the original issue? RawConn.Control is not supported on all platforms and hard to use for a set of FD's. |
…ntptrescapes The unofficial documentation for this pragma can be found in the source code: * https://github.com/golang/go/blob/go1.17.6/src/cmd/compile/internal/noder/lex.go#L71-L81 Instead of relying on this pragma, generic-worker now calls syscall.Syscall<x> directly. This change has been made since I suspect that nested go:uintptrescapes functions may not work as expected, but furthmore, since this pragma is not an official feature, it presumably may change over time, so we probably should not assume it will always be supported. There have also been several issues raised against functions and methods that rely on its behaviour, so it is not clear whether it is entirely safe to use: * golang/go#16035 * golang/go#23045 * golang/go#34474 * golang/go#34642 * golang/go#34684 * golang/go#34810 * golang/go#42680 Note, in future an option might be to generate the function bodies using mkwinsyscall: * https://github.com/golang/go/blob/go1.17.6/src/internal/syscall/windows/mksyscall.go#L9 * https://github.com/golang/go/blob/go1.17.6/src/syscall/mksyscall_windows.go#L47 * https://pkg.go.dev/golang.org/x/sys/windows/mkwinsyscall * https://github.com/golang/sys/blob/master/windows/mkwinsyscall/mkwinsyscall.go One advantage of this approach is that we can probably log the syscalls again using: * https://github.com/golang/sys/blob/50617c2ba19781ae46f34bb505064996b8fa32e8/windows/mkwinsyscall/mkwinsyscall.go#L43-L44 Alternatively if tracing doesn't do what we expect, we could write our own generator.
Without them, when we pass the result of Fd() into unix.Syscall, Go runtime is free to call finalizer set in os.newFile. More info [here](golang/go#34810). The proper fix is to either: 1. Use unix.Open/unix.Close as descriptors (ints) everywhere in the Device code. This should hide fd's from Go runtime. 2. Use os.File.SyscallConn().Control which guarantees that descriptor survives. This will also do not put fd's into the blocking mode. Otherwise, even with os.File.Close it's not guaranteed that runtime.SetFinalizer will not come for `os.File.file`. Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
Likely, yes.
What operating system and processor architecture are you using (
go env
)?go env
OutputWhat did you do?
I am currently writing a game library in Go that avoids cgo and calls the
OS directly, certainly on Linux thanks to it's stable kernel API. (See:
https://gitlab.com/beoran/galago https://gitlab.com/beoran/galago/blob/master/os/linux/input/input_linux.go
)
I have an input Device like this:
Notice the KeepAlive? If I leave that out the program crashes, because the device 's io.File gets garbage collected. This took me quite some time to figure out and it is not obvious that that would happen, and that runitme.KeepAlive() is needed here. It would be great if I didn't have to manually insert runtime.KeepAlive calls when using os.File.Fd when using system calls. Or if there was at least vet/lint tooling to suggest when I would probably need it.
I posted this as a new issue to split it off from #34684, where it was a tangential issue.
The text was updated successfully, but these errors were encountered: