-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Does I/O safety forbid dup
ing arbitrary file descriptors?
#114167
Comments
This is @rust-lang/libs-api I guess (certainly the part about potentially deprecating safe |
Even without This might suggest adding a rule to Further, this kind of exclusivity is undermined by the fact that explicit And, on some platforms, including macOS, it's not even possible to set This is all in addition to the fact that files can be opened multiple times, potentially even by different names, from the same program or different programs. There are so many ways to undermine this kind of exclusivity, that it would be tricky and limiting to maintain, even without From another angle, it also happens that this kind of exclusivity is much less useful for fds than it is for memory. In Rust we're all accustomed to thinking about exclusive ownership. However, when working with fds, this kind of exclusivity is almost never what matters for avoiding UB. Among other things, OS's implicitly perform thread synchronization under the covers, so there's never a data race on a fd. There are a few situations where this kind of exclusivity would matter. For example, if one does Consequently, this kind of exclusivity guarantee at the
I agree. And yes, you can create |
I think exclusivity over the FD itself is important so nobody else can close it and |
|
Yes I would have expected that to be unsound.
Fork is terribly unsafe already anyway. But the others are valid point... I guess you would want to still get "close-on-drop" behavior for them.
Okay, so "exclusively owning a non-duped FD" is a thing we accept one can express, but we don't provide a type for that. I think I can live with that. It needs documentation though. |
Confirming that @sunfishcode's description is what I'd always expected from I/O safety as well, since it was first proposed. The safety there is about not using closed FDs or random numbers as FDs, and about preventing double-close errors, not about preventing you from calling |
OwnedFd::try_clone
dup
ing arbitrary file descriptors?
I renamed the issue to reflect that indeed I'd very much like us to say that yes this is possible. However, I see two problems with this. The first is a spec/doc problem: we cannot just say " The second problem is a practical one:
I don't see how that solves anything. How am I supposed to do that global reasoning? It's not possible for
The former is impractical, since it's not like the caller is in any better of a position than |
It's like building a safe API around something like I don't think the environment telling you |
Yes, that is a good analogy. The main difference is that for the FD case, we don't have to accept that this may lead to UB. We have the alternative of saying that no code may rely on its file descriptor remaining encapsulated. |
Agreed.
Yes it is, but we already have a problem here, whether we understand it in these terms or not. Even before we bring I/O safety into the picture, can we rigorously charactarize the behavior of So I propose that we do accept that this may lead to UB. |
If we say that no code must rely on FD encapsulation, then we can at least rigorously characterize this case as "not UB". That's worth a lot -- it is all that the Rust type system ever guarantees. Put differently, in such a world we could offer a safe function that turns any integer into |
Well, that'd just be shifting the burden to any code that trusts the environment to not corrupt data that is temporarily moved from memory into IO. It doesn't even require mmap. Just writing a If we assume that any IO can get corrupted then all IO code must be written as if we'd be talking to an untrusted TCP stream. The necessary validation can come with significant overhead because it can require looking every single byte (e.g. validating all utf8-sequences) or checking more complex invariants. |
Parsing in Rust usually happens with crates that guarantee non-UB even for mal-formed inputs. If you trust your inputs that far that you are okay with causing UB, then clearly you should be using But someone using a library such as |
Here we do some sanity check but don't revalidate all the bytes: rust/compiler/rustc_serialize/src/serialize.rs Lines 122 to 128 in e845910
ser_raw seems even less defensive. I've only glanced over the docs and looked at the types that implement it. It supports The general pattern here is that a process is trusting another instance of itself. It may do a version check to guard against format changes but after that the data isn't treated like coming from an external entity that could be malicious. And then there's mmap. E.g. io_uring can communicate pointers through its shared memory map. If something can steal the uring file descriptor and establish another map then it can of course mess with the pointers you get back from the kernel. But normally you'd trust those values because you put them yourself in the queue in the first place. |
Fair, so we even do this in rustc. I wasn't aware. Thanks for pointing this out.
So I guess that's another way in which a program with a bad jobserver env var can go wrong, if What can I say, I am torn. I want to have my cake and eat it, too, but I cannot. ;) Deciding between "a bad jobserver env var can cause UB" and "code can rely on FD encapsulation for soundness" is not trivial. |
A jobserver seems to be a special case tho. An unfortunate one to be sure but if a choice has to be made then I think it's better to document OS specific weirdness than to make fds nearly impossible to reason about. |
Jobserver isn't that special. After all std claims FDs 0/1/2 by convention (instead of configuration). And systemd can pass additional descriptors in its socket activation protocol which is also configured through environment variables. Though there's a safer way to use the jobserver protocol, I don't know why we're not using that by default. |
I still think that the right path forward that is invariant over cake fate is to declare these things outside of Rust's model: we attempt to handle file descriptor ownership (who can close an fd) but in general doing weird things to arbitrary fds is an OS problem and still a potential source of unsoundness, but not one Rust attempts to reason about. |
The libstd initialization code ensures that 0/1/2 exist and if not opens them as |
I was just showing that inheriting and claiming resources from the environment is an existing pattern just like creating and maintaining exclusive access to resources is also a pattern and we'll have to deal with both of those uses. |
add more explicit I/O safety documentation Fixes rust-lang/unsafe-code-guidelines#434 Cc rust-lang#114167 Cc `@Manishearth` `@sunfishcode` `@joshtriplett`
#114780 has landed, so the docs now state fairly clearly that it's not okay to |
add more explicit I/O safety documentation Fixes rust-lang/unsafe-code-guidelines#434 Cc rust-lang/rust#114167 Cc `@Manishearth` `@sunfishcode` `@joshtriplett`
add more explicit I/O safety documentation Fixes rust-lang/unsafe-code-guidelines#434 Cc rust-lang/rust#114167 Cc `@Manishearth` `@sunfishcode` `@joshtriplett`
add more explicit I/O safety documentation Fixes rust-lang/unsafe-code-guidelines#434 Cc rust-lang/rust#114167 Cc `@Manishearth` `@sunfishcode` `@joshtriplett`
I/O safety is summarized as
I would argue then that if a function is written
foo(fd: OwnedFd)
, then it should be guaranteed that it is the only one with read/write access to the underlying resource managed by this FD. (Of course if it is a file, others might be able to open the file again, but not all FDs are for files that have a name in the file system.) And this guarantee almost holds, except forOwnedFd::try_clone
andBorrowedFd::try_clone_to_owned
. These will take a mere reference and return a fully ownedOwnedFd
(viadup
), so ourfoo
function has been foo'led and it is not actually exclusively reading/writing the underlying resource.I am somewhat surprised to discover this now; these functions were not mentioned in the original RFC. They fundamentally change what it means to hold an
OwnedFd
, and I think not for the better. They are incompatible with the mental model I used when helping design and defend I/O safety.So, what shall we do about this?
OwnedFd
carefully. They are very weak (basically just "nobody will close this FD while you have it").Also, we should clarify what this means for I/O safety overall. Would it be sound to do
dup(rand())
and return the result as anOwnedFd
? I think the answer should be "absolutely not"; if I create an FD and don't share it with anyone I should be guaranteed that no duplicates exist. This would at least make it possible for a user crate to defineNonDupOwnedFd
/NonDupBorrowedFd
types that guarantee an absence of duplicates.Cc @sunfishcode
The text was updated successfully, but these errors were encountered: