-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clarify that Rust std library unsafe
tricks can't always be used by others
#58582
Comments
cc @RalfJung I suspect that the answer to this is essentially "yes" but because the standard library is packaged with a particular compiler we ignore it. It could pose a problem for making sure miri doesn't error out on the standard library. |
One problem is that people inspect std code may learn from that and assume it is safe to do so without checking its actual semantics. I think std should still try to avoid exploiting undefined behavior as much as possible, or at least have some comment to explicitly warn people reading the code. |
Isn't the fact that it is enclosed by an |
As I mentioned in that comment, if std has to do this, there should probably at least be some comments warning reader that this is not considered a safe usage of |
That specific example needn’t to be non-repr(C), so a PR that adds |
It's not really UB but it is relying on unspecified details about the layout of unions -- details that can change any time. This is equivalent (in spirit) to what methods like
That is a good suggestions. PRs adding such comments would be welcome! I opened rust-lang/unsafe-code-guidelines#90 to collect instances of us exploiting unspecified layout in libstd; if you know or find more cases like this, please add them there. |
Note that this is a safe usage of A normal Rust crate can also make use of this knowledge in a safe way and reliable way, if, for example, it would test all its releases against all past and present Rust toolchains to ensure its correctness on all of them.
|
@gnzlbg That's a good argument. In that case, I think we should remove all the “[src]” links from document of std, since that's strongly encouraging readers to inspect the implementation. Actually, I noticed this issue because there was a novice read that code and paste it in our group after being told to use that method for what they wanted to do. |
I’m with @upsuper on this one. I also know many people (myself included!) that have either used There’s also something to be said here in terms of Rust’s claims around performance, security, stability, and the like. If only “privileged” code can tap into those things (at least stable-y so), it detracts from those things. Also, nobody wants to look The goal should be to make Ok, I’m done with my $.02 :) |
I personally think that being able to see the "[src]" of standard is a great feature of the docs. What I think we should clarify is that the source code of the standard library isn't a resource intended for learning Rust (those are listed in https://www.rust-lang.org/learn). The source code of the standard library and its comments are written with specific goals and audience in mind. The goal is often to explain why the standard library implementation is correct, and the audience is people that could potentially modify that source code and needs to understand the tricky things that must be considered when doing so. This audience is not "people learning Rust", but often people with a good grasp of not only Rust basics, but also unsafe Rust, undefined behavior, and often also compiler implementation details. So if you are in this audience, and have this particular goal, then reading the comments is a must and these comments are for you. But if you are learning Rust, these comments will often not make any sense (e.g. https://github.com/rust-lang/rust/blob/master/src/liballoc/boxed.rs#L206 or https://github.com/rust-lang/rust/blob/master/src/liballoc/collections/vec_deque.rs#L1016 requires knowing what Stacked Borrows is).
This knowledge is often documented in the Nomicon, the UCG book, and the doc comments in the compiler or miri themselves. The problem is that to understand why certain unsafe code is correct or not, the amount of background required is large, and the comments are not aimed at an audience lacking this knowledge. If anything, we should make it clear what the audience for these comments is, and document where the prerequisite knowledge can be obtained, and in which order, to help new developers get up to speed. We can also discuss how much knowledge can be assumed from the reader of these comments. But removing any pre-requisite knowledge from the comments would basically require explaining the whole Rust language every time something happens.
This has nothing to do with the standard library being able to do privileged things. While the standard library uses many unstable features that might be stabilized some day (and which are documented in the unstable book), it also uses many features that are only relevant if the only problem you are trying to solve is implementing Rust's standard library. Like in this case, implementing the The standard library is actually pretty big, and there might be parts of it that could be useful as a learning resource for programmers not working on the standard library itself. However, at some points those parts end, and the APIs into the compiler being, and being able to tell these appart require a certain amount of judgement. I don't think it is possible to numb the standard library down to the point that it can be a helpful learning resource for Rust. OTOH, documenting things like when is implementation specific behavior used is useful even for standard library developers / maintainers. In this particular case, the audience this code is intended for knows that all fields of non- |
While I agree that libstd is generally not a great resource for learning Rust (also because some of it is old and would be written differently in modern Rust, I think), I still think it'd make sense to be more explicit about where libstd exploits its special status, and document that in comments in the code when needed. In many cases this is obvious, such as when implementing a lang item, but in other cases it is not, and we sometimes document that but sometimes we do not, and we should (and you seem to agree). |
I don’t think people just read But ignoring the “casual reader” angle, I think subtle leverage of unspecified-but-known impl details warrants at least a code comment for future maintainers, particularly around things that could lead to UB if things change. Relying on code coverage via tests to find this is highly optimistic IMO. I think some standardized macro/attribute would be nice to standardize how such comments/remarks are left in code; it would certainly make it easier to grep/search for such locations. |
@RalfJung From a T-lang perspective I find your idea quite important. Particularly, if people read the standard library and use unspecified ABI details, we end up with e.g. https://github.com/reem/rust-traitobject/blob/master/src/lib.rs#L11-L13 which assumes the underlying representation of trait objects (and so our hands become ~tied). |
I do agree that these comments add value for some particular audiences. For example, I'd find this comment here useful: // This relies on all repr(rust) union fields being located at offset 0,
// which is currently not guaranteed, see:
// * [link to unions RFC 1.2]
// * https://github.com/rust-rfcs/unsafe-code-guidelines/issues/13 I don't think such a comment adds much value to a beginner. I'd hope that they would at least ask, fill an issue, etc., but if we really want to address this audience, we'd need to add something else, like: "DO NOT USE THIS IN YOUR OWN CODE".
Rust source code might end up relying on unspecified behavior for a variety of reasons. Anecdotal evidence, but many of the issues in Actix related to unspecified behavior where due to: "I tried it and it works for me", and that's what I expect to be the common case in the wild (happens with If the intent is to prevent people from accidentally relying on unspecified behavior, inline comments in the standard library are a very poor tool for that (e.g. warnings would be much better, and we could warn about this issue on all So I'm all in for comments documenting these, I think they would be useful to me, and many others, but @upsuper issue was that a beginner went through that code, and picked up the idiom. I am not sure what kind of comments would be required to address that, but I have the feeling that a warning would be a better tool to fix this particular issue. |
Yes, that seems useful.
Adding such a note would be useful as well... but I'm not primarily worried about the beginners. I'm worried about the senior rustaceans who use snippets from the standard library that e.g. assume unspecified ABIs about
In this case that might work, but its hard to deal with things involving |
Bingo! It's the intermediate/advanced users (i.e. the ones that know just enough "to be dangerous" :)) that will refer to We can debate the merits of looking at |
@nagisa showed (#51294 (comment)) that, back then, erroring on transmutes that rely on unspecified behavior broke ~7000 crates. I agree that we should add comments to the standard library, and I share all of @Centril's concerns, but the argument that comments help even a little bit with this problem sounds too optimistic to me. Given the scale of the problem, I would be surprised if comments make a difference at all. I do think this is a problem worth solving, but I'm much more optimistic about pro-active warnings that help users write code that does not rely on unspecified implementation details. |
I'm intrigued about the prospect of having explicit lints to detect common undefined behavior exploits that are used in |
unsafe
tricks can't always be used by others
The union RFC says:
However, there are several cases which explicitly read from a field different from the one written to, for example,
str.as_bytes()
which was introduced in #50863 to make many of the stuffconst
.The text was updated successfully, but these errors were encountered: