Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify that Rust std library unsafe tricks can't always be used by others #58582

Open
upsuper opened this issue Feb 20, 2019 · 18 comments
Open
Labels
A-docs Area: documentation for any part of the project, including the compiler, standard library, and tools C-enhancement Category: An issue proposing an enhancement or a PR with one. T-libs Relevant to the library team, which will review and decide on the PR/issue.

Comments

@upsuper
Copy link
Contributor

upsuper commented Feb 20, 2019

The union RFC says:

In addition, since a union declared without #[repr(C)] uses an unspecified binary layout, code reading fields of such a union or pattern-matching such a union must not read from a field other than the one written to.

However, there are several cases which explicitly read from a field different from the one written to, for example, str.as_bytes() which was introduced in #50863 to make many of the stuff const.

@jonas-schievink jonas-schievink added T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. C-bug Category: This is a bug. labels Feb 20, 2019
@Mark-Simulacrum
Copy link
Member

cc @RalfJung

I suspect that the answer to this is essentially "yes" but because the standard library is packaged with a particular compiler we ignore it. It could pose a problem for making sure miri doesn't error out on the standard library.

@upsuper
Copy link
Contributor Author

upsuper commented Feb 20, 2019

One problem is that people inspect std code may learn from that and assume it is safe to do so without checking its actual semantics. I think std should still try to avoid exploiting undefined behavior as much as possible, or at least have some comment to explicitly warn people reading the code.

@estebank
Copy link
Contributor

One problem is that people inspect std code may learn from that and assume it is safe to do so without checking its actual semantics.

Isn't the fact that it is enclosed by an unsafe block enough of a warning of "here be dragons"?

@upsuper
Copy link
Contributor Author

upsuper commented Feb 20, 2019

Isn't the fact that it is enclosed by an unsafe block enough of a warning of "here be dragons"?

unsafe usages can be safe if used correctly, and unsafe is created to do that. Rust std (among other popular crates) can be a good source of learning common pattern of safe usage of unsafe. So I don't think having unsafe block itself is enough an excuse for exploiting undefined behavior.

As I mentioned in that comment, if std has to do this, there should probably at least be some comments warning reader that this is not considered a safe usage of unsafe, and it's fine inside std only because std is always shipped with the same version of compiler, and changing the behavior is not considered a breaking change to the language.

@nagisa
Copy link
Member

nagisa commented Feb 20, 2019

That specific example needn’t to be non-repr(C), so a PR that adds repr(C) there would be welcome.

@RalfJung
Copy link
Member

It's not really UB but it is relying on unspecified details about the layout of unions -- details that can change any time. This is equivalent (in spirit) to what methods like from_raw_parts do to create a slice: that kind of code is fine in libstd because it is tied closely to the compiler and will be updated together with the compiler when the layout changes. That same code in a user library is exploiting unspecified layout behavior, and hence can break at any time.

As I mentioned in that comment, if std has to do this, there should probably at least be some comments warning reader that this is not considered a safe usage of unsafe, and it's fine inside std only because std is always shipped with the same version of compiler, and changing the behavior is not considered a breaking change to the language.

That is a good suggestions. PRs adding such comments would be welcome!

I opened rust-lang/unsafe-code-guidelines#90 to collect instances of us exploiting unspecified layout in libstd; if you know or find more cases like this, please add them there.

@gnzlbg
Copy link
Contributor

gnzlbg commented Feb 20, 2019

warning reader that this is not considered a safe usage of unsafe, and it's fine inside std only because std is always shipped with the same version of compiler, and changing the behavior is not considered a breaking change to the language.

Note that this is a safe usage of unsafe because that code knows how unspecified behavior is specified.

A normal Rust crate can also make use of this knowledge in a safe way and reliable way, if, for example, it would test all its releases against all past and present Rust toolchains to ensure its correctness on all of them.

One problem is that people inspect std code may learn from that and assume it is safe to do so without checking its actual semantics.

libstd uses a lot of not only unstable Rust features, but both documented and undocumented perma-unstable Rust features. Learning Rust from libstd is probably as much of a good idea as learning C++ from reading libc++'s code or learning C from reading glibc. If anything we should strongly discourage learning Rust from reading libstd unless you are trying to specifically learn either how libstd works internally, or how those perma-unstable features can be used on a particular toolchain.

@upsuper
Copy link
Contributor Author

upsuper commented Feb 20, 2019

@gnzlbg That's a good argument. In that case, I think we should remove all the “[src]” links from document of std, since that's strongly encouraging readers to inspect the implementation.

Actually, I noticed this issue because there was a novice read that code and paste it in our group after being told to use that method for what they wanted to do.

@vitalyd
Copy link

vitalyd commented Feb 21, 2019

I’m with @upsuper on this one. I also know many people (myself included!) that have either used std as an example of something or told others to use it as such. Now, yes, it’s true that it can do things “ordinary” crates can’t, such as using unstable features/APIs, but I don’t think that should be a blanket excuse for more subtle things nor should the decision to use unspecified aspects be made lightly (not saying it is, though).

There’s also something to be said here in terms of Rust’s claims around performance, security, stability, and the like. If only “privileged” code can tap into those things (at least stable-y so), it detracts from those things.

Also, nobody wants to look libc++ or glibc :) - it’s not a measuring stick (or reference point) that Rust should lean on.

The goal should be to make std approachable and sensible, both for casual readers and would-be contributors. For the latter, a bunch of tribal knowledge shared by just a few isn’t going to help.

Ok, I’m done with my $.02 :)

@gnzlbg
Copy link
Contributor

gnzlbg commented Feb 21, 2019

I personally think that being able to see the "[src]" of standard is a great feature of the docs. What I think we should clarify is that the source code of the standard library isn't a resource intended for learning Rust (those are listed in https://www.rust-lang.org/learn).

The source code of the standard library and its comments are written with specific goals and audience in mind. The goal is often to explain why the standard library implementation is correct, and the audience is people that could potentially modify that source code and needs to understand the tricky things that must be considered when doing so. This audience is not "people learning Rust", but often people with a good grasp of not only Rust basics, but also unsafe Rust, undefined behavior, and often also compiler implementation details.

So if you are in this audience, and have this particular goal, then reading the comments is a must and these comments are for you. But if you are learning Rust, these comments will often not make any sense (e.g. https://github.com/rust-lang/rust/blob/master/src/liballoc/boxed.rs#L206 or https://github.com/rust-lang/rust/blob/master/src/liballoc/collections/vec_deque.rs#L1016 requires knowing what Stacked Borrows is).

The goal should be to make std approachable and sensible, both for casual readers and would-be contributors. For the latter, a bunch of tribal knowledge shared by just a few isn’t going to help.

This knowledge is often documented in the Nomicon, the UCG book, and the doc comments in the compiler or miri themselves. The problem is that to understand why certain unsafe code is correct or not, the amount of background required is large, and the comments are not aimed at an audience lacking this knowledge.

If anything, we should make it clear what the audience for these comments is, and document where the prerequisite knowledge can be obtained, and in which order, to help new developers get up to speed. We can also discuss how much knowledge can be assumed from the reader of these comments.

But removing any pre-requisite knowledge from the comments would basically require explaining the whole Rust language every time something happens.

If only “privileged” code can tap into those things (at least stable-y so), it detracts from those things.

This has nothing to do with the standard library being able to do privileged things. While the standard library uses many unstable features that might be stabilized some day (and which are documented in the unstable book), it also uses many features that are only relevant if the only problem you are trying to solve is implementing Rust's standard library. Like in this case, implementing the str type: impl str { ... }. Because str is a built-in type, you have to implement stuff for it in the crate that defines it, but built-in types by definition aren't defined in any crate, they are part of the language, so the standard library uses many compiler internals to be able to do these things. These are more a necessity, than a privilege. Without these, as_bytes would need to be implemented inside the compiler itself.


The standard library is actually pretty big, and there might be parts of it that could be useful as a learning resource for programmers not working on the standard library itself. However, at some points those parts end, and the APIs into the compiler being, and being able to tell these appart require a certain amount of judgement.

I don't think it is possible to numb the standard library down to the point that it can be a helpful learning resource for Rust. OTOH, documenting things like when is implementation specific behavior used is useful even for standard library developers / maintainers.

In this particular case, the audience this code is intended for knows that all fields of non-repr(C) unions are put at offset 0 by the compiler even though this is not guaranteed. They might even know that whether doing something different could enable some optimizations is something that's being currently explored. And they know that if Rust starts doing something different here, all uses of unions in the whole library will need to be fixed.

@RalfJung
Copy link
Member

RalfJung commented Feb 21, 2019

While I agree that libstd is generally not a great resource for learning Rust (also because some of it is old and would be written differently in modern Rust, I think), I still think it'd make sense to be more explicit about where libstd exploits its special status, and document that in comments in the code when needed. In many cases this is obvious, such as when implementing a lang item, but in other cases it is not, and we sometimes document that but sometimes we do not, and we should (and you seem to agree).

@vitalyd
Copy link

vitalyd commented Feb 21, 2019

I don’t think people just read libstd to “learn Rust”. I’m sure some do, but what I was talking about is referring to certain idioms/constructs, particularly the trickier/more subtle ones, used by it. This is particularly so with topics such as unsafety which, despite people’s past and current efforts, aren’t all that well specified in any existing resource that I know. Nomicon is great but isn’t complete. We can debate the merits of people looking to std as examples of things, but I’m pretty certain it’ll continue to be the case; it’s also a very common thing to do in other languages (pretty much all mainstream ones other than C and C++).

But ignoring the “casual reader” angle, I think subtle leverage of unspecified-but-known impl details warrants at least a code comment for future maintainers, particularly around things that could lead to UB if things change. Relying on code coverage via tests to find this is highly optimistic IMO.

I think some standardized macro/attribute would be nice to standardize how such comments/remarks are left in code; it would certainly make it easier to grep/search for such locations.

@Centril
Copy link
Contributor

Centril commented Feb 21, 2019

@RalfJung From a T-lang perspective I find your idea quite important. Particularly, if people read the standard library and use unspecified ABI details, we end up with e.g. https://github.com/reem/rust-traitobject/blob/master/src/lib.rs#L11-L13 which assumes the underlying representation of trait objects (and so our hands become ~tied).

@gnzlbg
Copy link
Contributor

gnzlbg commented Feb 21, 2019

@Centril

I do agree that these comments add value for some particular audiences. For example, I'd find this comment here useful:

// This relies on all repr(rust) union fields being located at offset 0,
// which is currently not guaranteed, see:
// * [link to unions RFC 1.2]
// * https://github.com/rust-rfcs/unsafe-code-guidelines/issues/13

I don't think such a comment adds much value to a beginner. I'd hope that they would at least ask, fill an issue, etc., but if we really want to address this audience, we'd need to add something else, like: "DO NOT USE THIS IN YOUR OWN CODE".

Particularly, if people read the standard library and use unspecified ABI details, we end up with e.g. https://github.com/reem/rust-traitobject/blob/master/src/lib.rs#L11-L13

Rust source code might end up relying on unspecified behavior for a variety of reasons. Anecdotal evidence, but many of the issues in Actix related to unspecified behavior where due to: "I tried it and it works for me", and that's what I expect to be the common case in the wild (happens with &str, slices, etc., all the time, so much that we can't even enable warnings for these).

If the intent is to prevent people from accidentally relying on unspecified behavior, inline comments in the standard library are a very poor tool for that (e.g. warnings would be much better, and we could warn about this issue on all repr(rust) unions telling people to use repr(C), but we currently do not).

So I'm all in for comments documenting these, I think they would be useful to me, and many others, but @upsuper issue was that a beginner went through that code, and picked up the idiom. I am not sure what kind of comments would be required to address that, but I have the feeling that a warning would be a better tool to fix this particular issue.

@Centril
Copy link
Contributor

Centril commented Feb 21, 2019

For example, I'd find this comment here useful:

Yes, that seems useful.

I don't think such a comment adds much value to a beginner. I'd hope that they would at least ask, fill an issue, etc., but if we really want to address this audience, we'd need to add something else, like: "DO NOT USE THIS IN YOUR OWN CODE".

Adding such a note would be useful as well... but I'm not primarily worried about the beginners. I'm worried about the senior rustaceans who use snippets from the standard library that e.g. assume unspecified ABIs about repr(Rust), use that in popular crates, and then force the language team to make that the specification.

I am not sure what kind of comments would be required to address that, but I have the feeling that a warning would be a better tool to fix this particular issue.

In this case that might work, but its hard to deal with things involving transmutes and stuff. I also think it's not an either-or proposition, you can do both if possible.

@vitalyd
Copy link

vitalyd commented Feb 21, 2019

but I'm not primarily worried about the beginners. I'm worried about the senior rustaceans who use snippets from the standard library that e.g. assume unspecified ABIs about repr(Rust), use that in popular crates, and then force the language team to make that the specification.

Bingo! It's the intermediate/advanced users (i.e. the ones that know just enough "to be dangerous" :)) that will refer to libstd in hopes of "clarifying" something they may not have found in the Nomicon (or similar).

We can debate the merits of looking at libstd till the end of time, but IMO, it's safe to assume that people will continue to do so until, at least, there's a complete language reference, including fully specified memory/unsafety model (i.e. what @RalfJung & co are working on, basically). libstd doesn't need to pretend that every line of code is meant to demonstrate Rust to the world, but leaving remarks for tricky constructs/snippets seems prudent (and again, not just for casual readers but for future maintainers of the code as well).

@gnzlbg
Copy link
Contributor

gnzlbg commented Feb 22, 2019

@nagisa showed (#51294 (comment)) that, back then, erroring on transmutes that rely on unspecified behavior broke ~7000 crates.

I agree that we should add comments to the standard library, and I share all of @Centril's concerns, but the argument that comments help even a little bit with this problem sounds too optimistic to me. Given the scale of the problem, I would be surprised if comments make a difference at all.

I do think this is a problem worth solving, but I'm much more optimistic about pro-active warnings that help users write code that does not rely on unspecified implementation details.

@estebank
Copy link
Contributor

I'm intrigued about the prospect of having explicit lints to detect common undefined behavior exploits that are used in std. This would let people learn about the constructs from std, and rustc would kindly remind prospective "exploiters" that there be dragons with a lengthy enough explanation of the assumptions that need to hold for the current use to work correctly.

@Enselic Enselic added C-enhancement Category: An issue proposing an enhancement or a PR with one. and removed C-bug Category: This is a bug. labels Nov 18, 2023
@Enselic Enselic changed the title Rust std library exploits undefined behavior of union? Clarify that Rust std library unsafe tricks can't always be used by others Nov 18, 2023
@Enselic Enselic added T-libs Relevant to the library team, which will review and decide on the PR/issue. A-docs Area: documentation for any part of the project, including the compiler, standard library, and tools and removed T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. labels Nov 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-docs Area: documentation for any part of the project, including the compiler, standard library, and tools C-enhancement Category: An issue proposing an enhancement or a PR with one. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

10 participants