-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Determine and document the distinction between I/O safety and regular safety on unsafe stdlib APIs #434
Comments
The API that triggered this question is the following: (constructing a file from an fd from an env var. Reasonable to do for IPC cases, but a violation of the API requirements, currently. |
Violating I/O safety can trigger UB. For example if you have some code that has a userfaultfd open and then some other code violates I/O safety to close the userfaultfd fd (eg by creating a |
I think "I/O safety" is a red herring here. It has nothing to do with the reasons that |
@bjorn3 Okay, but isn't that analogous to the |
Not quite, opening |
Right but ... I can do that by opening the memory-mapped file directly with safe File APIs, too. That's kinda what I'm getting at: we do not seem to have a principled approach to this, it seems strange to single this out as potentially creating undefined behavior. The annoying thing here is not just that this method has preconditions, it has preconditions that cannot be checked. |
You can hide your memory-mapped files, unlink them (on unix) or make the directory non-traversible if you're paranoid. But again, it's basically reaching out to a system that has insufficient sandboxing. The OS offers you footguns, we can't do anything about that without shutting down all IO. Inside Rust we do care. |
You kind of can if you're |
Yeah, I was kind of opposed to codifying I/O safety as a type of safety for these reasons. It is subtle, as has been discussed here, because there is real UB that can occur if I/O safety is violated. I agree that it's not great that we have APIs with preconditions that cannot be checked (in practice anyway -- being early main is not an option most of the time). |
And then you need to worry about other crates doing stuff; which you can control by not using other crates but it's still an action-at-a-distance thing. (And yeah, even if you whittle this down to "you only call trusted APIs before doing this", your dependencies are free to link in life-before-main C code)
I think now that this can of worms has been opened we need to either fully specify it or decide it is an additional library invariant that callers are not required to follow; the status quo does not make much sense. |
Well sure, but C is never safe. We can't do anything about that. C programs would also break if other threads start mucking around with random file descriptors they don't semantically own. |
Yeah, I think I lean towards the "decide it is an additional library invariant that callers are not required to follow", but with a long list of caveats, which definitely do need to be worked though. I think it's worth CCing @sunfishcode, who was the author of the RFC, to see if they have a more nuanced take here, I suspect that they will. |
Note that to the extent that "I/O safety" is documented, it's documented here: https://doc.rust-lang.org/std/os/unix/io/index.html One problem with |
That's still a library invariant though, not UB |
We could have put the The only difference is that there are some Fds which are harmless to use with some syscalls. But I guess you could do the same with pointers, like calling madvise on them or something. |
Here's how I understand the situation: I/O safety is not about the type of the I/O safety was added as part of a retroactive explanation for why /proc/self/mem is equivalent to a computer having a USB robotic arm with a soldering iron that can reach the computer's own motherboard. It can Do Things. And one can reasonably question the wisdom of such a device. But, it does not affect memory safety or I/O safety. The safety of double mapping is addressed by I/O safety hazards can almost always be reduced to memory safety hazards. For example, a double- Now, looking at the code in question here, it's using Unfortunately, I don't yet know what to recommend for the code in question. Ideally, there should be more ergonomic and safer ways to pass file descriptors between process, though that's not a simple fix. |
The |
I mean, I'd rather not focus on the code in question too much. There are a bunch of options there, including just using
Yeah, but I guess my position is that all of that is outside of Rust's model, much like the "/proc/self/mem is when your computer has a robotic arm with a soldering iron". It's the same thing with linker safety. |
/proc/self/fd/<fd> does avoid the
👍
With /proc/self/mem and linker safety, the boundaries where they go outside Rust's model are clearer. Liink-time intrigue comes from special linker flags, But with fd numbers coming in from the outside, if I do something like: fn foo() {
// SAFETY: We're supposing we don't need to propagate this `unsafe` to our callers.
let file = unsafe { File::from_raw_fd(var("THE_FD").unwrap()) };
} and I happen to call |
Yes, I understand, I was illustrating how the same operation is possible in safe rust anyway without jumping through hoops.
Yes, and you have the same problem with taking in a filepath argument and someone passing in a special file. My core point is that there is a line being drawn here between "things that are not rust's problem" and "things that are rust's problem" and I think it is arbitrary and I'm not sure it holds up to scrutiny. My request to the unsafe-code-guidelines group, in part, is to figure out how to make this less arbitrary, and more importantly, useful.
I kinda feel like "dealing with weird OS stuff" falls in the same bucket as this list. Or, at least, it does not clearly not fall in that bucket. It's very much similar to "linking to non-Rust crates" in effect in my mind. This thread started because the standard library is documented as having an extra safety invariant it chooses to uphold, one that is called out as not being a part of the Rust language's memory safety model. If the standard library chooses to do this, it should document the difference thoroughly in safety invariants. However, I am being repeatedly informed that no, this is a part of the Rust language's memory safety model. Ok, fine, in that case we need to settle on and document that. Like yes, I very clearly see the sources of unsafety. I'm just not clear on the line being drawn, and that is a problem for people who write or review unsafe code, and also probably needs to be balanced against valid usecases in the IPC/etc world to ensure they're not left with no correct path forward. I'm asking the group of people in charge of formally specifying this stuff to formally specify this stuff. (Yes, I understand that they can't do this right now and don't mind being patient, but there's a reason I filed the issue on this particular repo) Also, I'll point out that this is the current safety comment on that API:
Here is code that fully satisfies that documentation, and exhibits exactly the memory safety problems that people have been mostly focusing on in this thread: let fd = rand(); // or get it from `env()`, or from `args()`
unsafe {
let status = fcntl(fd, F_GETFD); // just gets fd status, always safe
if status == -1 || errno.is_error() { // invalid fd
return;
}
}
let file = File::from_raw_fd(fd); As far as I can tell, (this is much like how segmentation faults are the safest thing that can happen to your program; it's better than not getting the segfault and instead having actual UAF/"mis"compiles/etc) |
The reason why it is important to differentiate between a library invariant and language UB in this case is because only the latter is our (t-opsem's) jurisdiction. Having determined that this is a library UB issue, the onus for documenting this goes to t-libs-api, if I understand correctly, and is off-topic for this repo (or at least, off topic for t-opsem, I think we are still discovering what we want this repo to be about).
I assume your complaint is that the documentation says nothing about there not being any other extant |
I agree. You've reported an interesting problem which we hadn't considered before. It's not yet clear what the answer should be. My understanding of all of the ideas suggested so far is that they either don't fully solve the problem, or they introduce new problems. We have more work to do.
The same problem doesn't occur with filenames. With fds, you could get memory corruption from code patterns elsewhere in the program. With filenames, you could have a bug, but it won't cause memory corruption, unless some other boundary gets crossed.
That example would be the I/O safety analog of doing this: let addr = rand(); // or whatevs
unsafe {
let status = madvise(addr as _, 1024, MADV_NORMAL); // just gets addr status, "always safe"
if status == -1 || errno.is_error() { // invalid addr
return;
}
}
let buf = slice::from_raw_parts_mut(addr as *mut u8, 1024); This may avoid a segfault, but it still invokes UB. In the same way, under I/O safety, using
Creating a |
Given that you can always pass in
Yes, I understand. I explicitly said "exhibits exactly the memory safety problems that people have been mostly focusing on in this thread", clearly I know there is memory unsafety there. What I'm saying is that the documentation is not correct. "valid" fd is a specific preexisting concept, and here we are effectively inventing a new kind of validity around ownership; which we should have proper terminology on (some introduced by your RFC) and use it in the documentation. The current documentation and semantics; the thing this issue is about, A thing that's a bit frustrating to me here is that people keep explaining the UB potential here to me: Yes, I understand the kinds of things you can do with such APIs. I understand that you can break an mmap implementation, or that you can break the allocator (if it uses mmap). This is not a discussion I am attempting to have at the level where I'm curious what the UB potential is of this API (besides pinning down if we care about niche optimizations and such). I made the very reasonable assumption by believing the text of the RFC that the standard library has been making a distinction between "safety" and "I/O safety". It's a distinction that makes a lot of sense to attempt to make in the context of the proc/self/mem issue: it means "memory unsafety" can be very clearly bounded as a concept, and "I/O unsafety" becomes a nebulously bounded thing that has a clear line with "memory unsafety" but a less clear line between itself and "safety", which might be an ok state of affairs. This distinction is also one that's incredibly useful for practical applications dealing with I/O. This is the context in which I opened the issue. I do still think that it's reasonable to try and slice this space that way1. I can see that a lot of people think we should slice it a different way; where "io unsafety" is a subset of memory unsafety. That's fine, that's reasonable as well, though we need to work on that fuzzy boundary between "memory unsafety" and "safety" then (and "io safety" may still work as a useful tool in that work!). Footnotes
|
I do not see how the conclusion of this discussion is that it's a library invariant, perhaps I am using the term differently from you? It is absolutely an invariant upheld by the library, but the reasons it is upholding it seem to be language reasons. In my view a library invariant is something another library doing the same thing may choose to not have. For example, is it kosher for a different library to expose this same type of API (or, But from that discussion it does not seem like that is actually the case; and people consider such APIs to be within Rust's safety model, in which case it becomes an opsem thing because the answer is relevant for people writing and reviewing unsafe code regardless of whether they use the standard library.
That's not correct either, this code can still be unsound if, say, the allocator was using that fd. It's not just
one which would be dropped, which would cause But as I said this is already UB because it can mess with the allocator, so you don't need "both sides" anyway. |
It seems like there's two different things tangled up here. The issues of special files like The other thing is single ownership and lifetimes for fds to uphold invariants we rely on. Obviously, given it's easy in Rust, we use this kind of model for a lot of library things. I'm not really clear what this has to do with opsem. Better guidelines around fd use are of course very welcome but this seems very clearly in the realm of being a libs issue. |
Concrete question for opsem: would opsem consider it a violation of Rust's rules for unsafe code for a different libary that has nothing to do with the stdlib exposes marked-safe APIs that perform the same logical operation that Because the libs team does not maintain all crates that deal with IO, just one (well, two, if you count Either it is an issue for all crates, in which case I think opsem needs to define what IO skullduggery it considers in scope vs out of scope for safety (since /proc/self/mem is clearly out of scope, and in this case we are arguing that file descripter crimes are in scope). Or it is not an issue for all crates, and then I shall ask the libs team to document what is a library vs language invariant in their API. Until that question is answered here I can't quite ask that question of the libs team. It does seem from this discussion that a lot of people believe this to be in scope for what is considered unsafe across Rust (not just as a stdlib invariant).
To be clear, I know that's been a long-standing issue and I don't actually care too much about it being documented, it got brought up as an example of the very fuzzy boundary we're dealing with here. |
It'll take me some time to read and reply here, but I do want to quickly point out that Rust does already have documentation about /proc/self/mem. Suggestions for improving it are welcome. |
Indeed it is, I think this has already been covered in the zulip discussion and it would be great if we could not rehash it again. I'm not sure if there's still any open question regarding that?
Hrrm. I think it's not a clear subset. You can get memory unsafety as a consequence of IO unsafety, sure. But you can also get other misbehaviors like clobbering files that should be in the exclusive ownership of another struct. It's a bit like memory safety, but it's not about pointer-addressable memory, it can also cover other bags of bytes (files in a filesystem). And you cannot necessarily cause those kinds of corruptions on platforms which use capabilities-based filesystem access. Only the thing that has a handle to a directory should be allowed to access files in that directory tree. Accessing the wrong file descriptors violates that isolation.
That's basically the old composing-levels-of-safety question, how different features may be safe in isolation but become unsafe if combined and then a decision needs to be made who is responsible. Mucking around with random file descriptors in a process is "fine" (assuming a sandboxed process which only has harmless file descriptors lying around). mmap is "fine" (perhaps with additional assumptions such as using a tempfile or proc not being mounted or whatever) if properly encapsulated inside rust. Once you have both you need to make at least one of them
I think this is also a platform issue. Unixes (and especially linux) are just bad in this regard due to the dense ID space for file descriptors and the magic proc files. There If
then you'd only be able to violate the library safety conditions of various FD-wrapper-types by using And yes the |
Yep, I've seen that 😄. I think it's a decent amount of documentation defining the boundary between that and "actually safe", but I'm less satisfied about the boundary between that and "considered unsafe".
I do not think that zulip discussion has produced a clear definition of "outside the model", so, no? I'm not trying to rehash it; but I do not think we have a clear distinction here that is useful to people writing and reviewing unsafe code. (I also don't expect this is something we can hash out in a day, and that's fine, but I don't want to consider that "closed")
Sorry, when I said it was a subset I was contrasting with it being non-overlapping. Pretend I said overlapping instead of "subset".
Yeah and I'm open to documenting this as a per-platform thing too. I said this already at some point, but
I'm coming at this from the perspective of libraries, since I both author unsafe code in libraries, and, more often, I review unsafe code in libraries. From that perspective, "how you use it" is not something you have an answer to, but it is something you must instruct your users about after marking an API as safe. As noted already the stdlib does not sufficiently do this. But it is also a question whether the pan-language definition of unsafe should be doing this, as choices made by the stdlib io module do not affect third-party io crates that are doing their own io. (this is the "concrete question for opsem" that I posted above) |
@Manishearth I think maybe what you are looking for is documentation saying that as long as you don't give out the I think part of the confusion here is that people think of the "library invariant" as being a pure mathematical statement about the data that is stored at a particular value. That is the wrong image. The Rust type system has ownership powers, and invariants can (and do) express ownership. The Basically, library invariants are written in separation logic, not just "pure" propositional logic. (Now I see @digama0 has made the same point while I was reading and writing.) |
will respond in full later but I will highlight that yes, I understand that callers are typically expected to follow library invariants, and I understand how that works for (highlighting this now because I'm worried that once again the conversation has shifted to a different level and I am having basic stuff explained to me, and I would rather it not veer too far down that path because it represents a misunderstanding of what I am trying to tease apart here. I have already spent some effort trying to clarify this before and am happy to do it again but it may feel like rehashing to others) |
Yes, that is true. There is a global decision that has been made by IO safety: that file descriptors are an "owneable resource" (in formal terms: that the separation logic that the Rust type system embeds into has a notion of "ownership of an FD"). This is not a decision that has anything to do with operational semantics or memory safety, it is a decision about the reasoning power of the ambient framework that the type system lives in. Clearly that logic has "ownership of a range of memory" as a concept (in various forms, such as immutable, read/write, read/write/dealloc, ...), and it can be freely extended with "ghost ownership" (e.g. to give meaning to things like This decision can only be made globally: if any part of the program wants a guarantee that an FD that it freshly created is not messed with by anyone else (i.e., nobody is just calling A consequence of this decision is that all syscalls that work on FDs have appropriate ownership as a precondition. (Without this, "ownership of an FD" would be meaningless as people could still call Basically there is a global decision Rust has to made: does unsafe code have to be robust against a thread that does
I'm sorry if I am rehashing pre-treaded ground. Is there a particular one of your clarifying replies that I should look at in lieu of reading this entire enormous thread? |
I disagree with this, it is about ownership of the FD. Calling EDIT: Ah, this is unfortunately less clear-cut than I had hoped. But still, I/O safety I think should allow users to define a type of "exclusively owned FD" that is guaranteed to not have duplicates, even if |
There's a very major difference between this API and (As mentoned before I don't care about a formal definition here: I just want a useful one) "borrowed" fd is also a tricky concept; what does it mean for an fd you plan to use being already borrowed, and what does it mean for you to borrow an fd? In general I think the relationship can be specified with basically the "lifetime" of Note that some FromRawFd implementations borrow, and are only useful when borrowing (the stdio ones, mostly). That's another thing that needs documenting. And that's another can of worms: you can't even say that it's just about "it's fine to borrow an fd as long as you know it will be closed after you borrow it" because that's perfectly compatible with a type that opens random fds and uses So it's really something like
At the end of the day, I need to be able to look at a usage of this API and convince myself that it is sound, especially when it involves another syscall. This set of rules is checkable! It basically disallows the "get an fd from the environment" pattern in general purpose code, but that's fine, because it's clear about it, and with some more clarity (I expand on this below) on what kinds of violations cause what could still enable such patterns to be written with a clear conscience when needed without impacting the crate environment. I keep bringing up The rules above seem quite nice when talking about ownership and very weird when talking about memory mappings; it would be nice if the actual rules were just the ownership ones and the memory mapping stuff was Bonus Rules. Given that in general people are of the opinion here that "a different crate choosing to allow safe
Nah, I'm not worried about reexplaining my stance to you; that was more just a message to the other people in this thread that I may end up rehashing a bit when explaining things to you. I'm not really sure where I can find individual comments explaining the relevant bits of my stance without myself doing a bunch of digging.
Generally, yes, callers are required to follow library invariants. However in many specific cases, like you note for We have specified that the
Yes, in part, but "FD ownership discipline" also needs to be sketched out in the docs. As I mentioned I'd also somewhat want language that notes that other libraries and applications may choose to make the call differently. Ideally, it is written in such a way that libraries using Also ideally a guarantee about what niche optimizations are present, if any. Really, just a list of what is absolutely an invalid fd for these APIs; saying "negative numbers are invalid on linux and windows" would be totally fine and easily checked; and leaves plenty room for niches in the future.
Yeah, that's part of what I'm getting at here; I was trying to figure out if this was a global decision made for the ecosystem at large, or a stdlib-specific thing. (There might be some things I have missed but hopefully this gives a clear idea of where I'm coming from) Footnotes |
I have to push back here a little, or I might be misunderstanding. The following two things are not sound:
The first one is a direct violation of what std's
There is an ecosystem-wide decision involved.
That nix API can fundamentally not soundly coexist with any part of the program that uses I/O safety to justify its correctness. For any part of the program to use ownership-based reasoning all parts of the program need to respect the ownership discipline. Regarding the broader questions... I am honestly not sure how to best answer them. The short, "low-entropy" answer is in terms of program logics (specifically separation logic) and their notion of ownership; it is probably not very useful to people that don't already have intuition for these things. The long answer tries to use example to convey the same intuition but I don't know which examples are best suited to build that intuition. Honestly I think a real-time meeting would be much more effective here than a long text thread. The short summary is: we already have an idea of "memory that is exclusively owned". When thinking about Rust programs, not only can we make statements of the form "this location in memory has that value", we can also make statements of the form "this location in memory has that value and nobody else is making any kind of assertion about this location" -- we can say that we own some location in memory. To make statements of this form possible, a new obligation is imposed on all code: you may only access memory that you own! If there is a chance that someone else is claiming ownership of a piece of memory, then you may not access it. (The full version distinguishes between read and write accesses, I am simplifying.) Rust then provides types that conveniently wrap this kind of reasoning so it can all be checked by the compiler. I/O safety adds two new kinds of statements that can be made:
This also creates new obligations that all code must respect:
This is the underlying reason why the two kinds of APIs I listed at the top of this post are unsound. Note that this notion of ownership is completely a compile-time fiction. It has nothing to do with whether So far this is all just establishing the terminology and discipline that we use to reason about programs. In a next step, we can then explain the meaning of Rust types in terms of that terminology: When reviewing code that works on FD, fundamentally one needs to ensure that each call to an FD-consuming operation is performed with a caller that has ownership of that FD -- borrowed ownership is sufficient for non-destructive operations, full ownership is required to close the FD. This ownership will usually be carried in by Rust types that have such ownership as their invariant, such as A library can totally decide that it doesn't care much to protect the FDs it creates from actions being taken on them by other parts of the program -- but it must still respect the fact that other parts of the program might want to protect their FDs. Nix is currently failing to do that (Cc nix-rust/nix#1750). The situation of an FD being passed in via an env var is an interesting one. In an ideal world, where the environment is immutable, this is how I'd think about it: whoever spawns this program has an obligation to actually set up some FD at the number indicated by the env var. (Weak) Ownership of that FD is then passed to the program and made available in a globally shared way -- this means everyone can read or write the FD but nobody is allowed to close it. Effectively its type is In the real world, One potentially problematic consequence of this entire setup is that if whoever spawns this program sets the env var to an FD that doesn't actually exist, that causes library UB, which can balloon into language UB (namely, some part of the code could rely on an FD being private, but sadly it got exactly the FD that the env var was set to, and some other part of the code calls TL;DR if we want a guarantee that |
It's complicated. The immediate level below the file descriptor is the file description (this gets shared by calling |
Well, yes, it's an ecosystem-wide decision; I more meant that was the decision made in a way that constrains the entire ecosystem in how they can use such operations, or just stdlib implementations. (I don't think everyone in this thread falls neatly on one side of this)
This hits at the heart of my contention: if the people doing the most IO in the ecosystem don't feel like they can work with this, perhaps we've decided wrong?
One of the questions being discussed here is does the stdlib get to do that and invent new operations that are UB for everyone (not just people directly interfacing with the stdlib). You can kinda make the same point about the rules of UB around allocation ("the existence of (and yes, before everyone re-re-explains to me how you can cause memory unsafety with these APIs regardless of the stdlib; that goes back into my point about "can we declare this outside of what Rust's model attempts to handle", much like The answer I'm getting here is that "yes, the stdlib does get to do that", but this was why I brought it up in the UCG repo first.
It's unclear to me how useful this is when weighed against the necessities of heavy IO-using crates. It seems like the target audience from the ecosystem POV has chosen differently, which is a strong signal to me.
"file descriptor equivalence class" is my mental model. But as @the8472 mentions there's a lot of nuance as to what an fd references. From the pov of "FD ownership" treating them as an equivalence class and talking about whose responsibility it is to close it is sufficient IMO. |
Is this actually the case? The only example that has been brought up so far of such a library is Besides, if you are a "heavy IO-using crate" you always have an out: just declare these functions as
I wouldn't really say that "std invented a new operation that is UB for everyone", this was an RFC and not restricted in scope to std - the RFC indicates that it requires rolling out across the whole ecosystem. |
I'm very wary of one line of reasoning, which can be re-casted as this:
The whole point of Rust is to be able to craft safe APIs on top of unsafe components. People can write a lot of programs in Rust using a lot of libraries and never have to write |
I/O safety was in part an attempt to actually specify the ownership semantics we already had. E.g. why I guess you could say our model was always wrong. A |
Even with that rewrite, I think the analogy still holds: a classic example of a "heavy memory-using crate" which uses Of course, we should strive to minimize the number of cases where |
Yes, we never close the file descriptor. More generally, I like this formulation, but yeah:
This is true, and it's hard to imagine
This doesn't concern me because the person spawning the program has an arbitrary number of ways they can cause UB when spawning the program. |
I don't think it needs to be that specific, though. Instead, being more general gets the point across perfectly well; something along the lines of "the environment acts as an open set of global variables in a flat namespace; setting variables used elsewhere in the code can lead to arbitrary misbehavior of that code, potentially even including unsound behavior if that code makes unsafe assumptions which no longer hold." It's not just about IO safety, but also e.g. In a way, w.r.t. safety, setting environment variables is no different from choosing a random bytes in the global variable section of memory and writing to them. The one way that it's actually different is that there's a greater expectation of trustable isolation between standard globals than there is of the Any scheme which passes fds via envvar is putting a global proof burden that no code abuses the relevant envvar names, as the compiler doesn't offer any assistance in isolation there. It's equivalent to name mangling and name collisions of |
Yes, it is intended to constrain the ecosystem to not do
Note that the RFC didn't introduce any new language UB. The operational semantics is unaffected. Whether a concrete program has UB or not is unaffected. It introduced a new reasoning principle, and it changed/clarified which APIs are sound. It introduced new terminology that libraries can use to define whether something is or is not library UB. (That's probably what you meant but I feel precise terminology can help us tease out the core questions here.) That probably means this decision should have involved T-lang, not just T-libs-api. And yes if T-lang were to do this arbitrarily, that would be a backwards compatibility hazard. Was this particular RFC a breaking change? I don't know. Was
Do they feel that? I'd like to learn more about that. nix predates the RFC, and the nix issue nix-rust/nix#1750 so far looks to me like it is totally possible to make nix IO-safe. I don't see nix as having actively decided against IO safety, they just didn't have But yes if
It's not great, yeah. We could say that, as @CAD97 suggests, changing any variable requires knowing all invariants associated with that variable and proving they are being maintained -- but then code still has to argue why it can know that a given variable doesn't have an invariant associated with it in another part of the program. It's not great. We could also give up on what I called "strong ownership" of an FD, and say that read/write/etc are free game on arbitrary FDs, and only |
yeah I think that's part of the source of my confusion
While digging for this i did seem to find discussions amongst the nix maintainers that seemed to indicate otherwise but I can't find them now, I could just have misread. I do think it is worth talking to the tokio and other async library maintainers about this though. I don't think I saw much from them on that RFC. |
If major I/O stakeholders feel like I/O safety (even the weaker form that is just about not closing other file descriptors) gets in their way, that would absolutely be a good reason to re-evaluate this RFC, yes. I would expect that to take the form of a new RFC that describes the usecases blocked by the I/O safety RFC and suggests to abolish the concept and describes how that impacts standard library APIs. Meanwhile I don't see a ton for us on the UCG side to do here. I/O safety is clearly an intended reasoning principle right now, and some hints of it go back all the way to Rust 1.0 (with Is this a documentation issue? I don't see any mention of I/O safety in the standard library docs, so that seems like something that should definitely be improved. |
rust-lang/rust#114780 should fix the documentation issue. |
Yes, it's mostly a documentation issue. |
I also made a list of things I plan to improve about the documentation, based on the discussion in this thread. |
add more explicit I/O safety documentation Fixes rust-lang/unsafe-code-guidelines#434 Cc rust-lang#114167 Cc `@Manishearth` `@sunfishcode` `@joshtriplett`
The most pressing docs issue should hopefully be fixed by rust-lang/rust#114780, though if @sunfishcode gets around to improving them further based on their list of course that would be even better. :) Still, the issue here can be closed now I think. |
For the question of how |
add more explicit I/O safety documentation Fixes rust-lang/unsafe-code-guidelines#434 Cc rust-lang/rust#114167 Cc `@Manishearth` `@sunfishcode` `@joshtriplett`
add more explicit I/O safety documentation Fixes rust-lang/unsafe-code-guidelines#434 Cc rust-lang/rust#114167 Cc `@Manishearth` `@sunfishcode` `@joshtriplett`
add more explicit I/O safety documentation Fixes rust-lang/unsafe-code-guidelines#434 Cc rust-lang/rust#114167 Cc `@Manishearth` `@sunfishcode` `@joshtriplett`
Previously: rust-lang/rust#72175, rust-lang/rust#93562
The stdlib has a bunch of
unsafe
APIs (e.g.FromRawFd
) that are primarilyunsafe
because they care about "I/O safety".Rust libraries are free to expand the scope of what they consider
unsafe
, but this is typically a crate-local decision. The stdlib due to its special status risks imposing this on other crates.Basically, if a crate or project wishes to not consider I/O safety a problem (which is often necessary in more complicated I/O code! Exactly the kind of code that would wish to use these APIs) these APIs are not useful to them: it is currently unclear as to what usages of these APIs are undefined behavior vs a violation of I/O safety.
There is a valid optimization that could be performed here in the future which would be to use niches for
-1
fds on Unix (etc), so there is a potential for this API being Actually UB, but that's actually something that can be checked (unlike "is a real owned FD").It would be nice if we could settle on what is Actually UB in these APIs, and what is "a violation of I/O safety", and document it.
cc @thomcc
The text was updated successfully, but these errors were encountered: