-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add two new pointer-sized integer types; uptr and iptr #1635
Conversation
It be a lot of churn, I think its worth it. Even on "normal" architectures, it allows types to better explain intent so I don't think this adds an extra mental burden there at all but rather does the opposite. Furthermore, they make some UB rules about pointers vs indexes a bit more intuitive when even in their "rawest form" they are different types. Ultimately, just as Rust makes regular systems programming a lot less scary, I'd hope this would make embedded systems programing less scary. |
I'd be fine with this if it doesn't change any public APIs. |
@eddyb In |
@Ericson2314 |
Ah whew, good point. |
Given that As for point (2), new lints aren't breaking changes unless they are going to be upgraded to errors eventually. Point (7) sounds like needless complexity and a source of gotchas (literals being different from variables). Can we just rely on Finally, it would be good to see more code examples of current code that would become wrong, and the good code that would replace it using these new types. I do not understand the one example given. Why should the length of a slice be a pointer-sized-integer? |
You cannot write portable code that converts from pointer types to integer types. It's unfortunate but true. Going back is even less portable. The new lint isn't the breaking change. The breaking change is that usize is currently a pointer-sized type. The lint is in order to inform people that there is a breaking change, and also to catch people who are using the "wrong" integer types. No, we can't. First is in consts: you can't use Because that's currently what |
|
||
We want to support the embedded space, and new CPUs like the CHERI. These CPUs | ||
do not support `usize` == `uptr`, and, in the case of the CHERI, don't support | ||
`uptr` at all. Most CPUs don't actually support the idea that `uptr == usize`: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most CPUs don't actually support the idea that
uptr == usize
: just the currently popular ones.
[citation needed]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have any, it was just a nice phrase I heard; I'll take it out.
That's a non-issue, since it will be stable some day
by calling whatever intrinsic std::ptr::null() will use. I don't really get why anyone wants to convert pointers to numbers... Can't you just work with |
Yes, I know. That's the problem this RFC is trying to solve, I think. But you propose to introduce some new types that aren't even defined on some platforms. By "portable" I meant "compiles on multiple platforms", not necessarily "does the same thing on multiple platforms". It seems to need a cfg that tells you whether the types even exist, unless you have an exhaustive list so you can do
OK, it would be good to make it clearer in the detailed design what the breaking change is, since as written it seems to say that the warning on casts is the change in question. I guess the breaking change is actually point (1) more than point (2)?
These are not really answering the substance of my objection :) which is that adding complexity and subtle differences between
So as I said, the example is confusing. Please explain better which one currently generates right and wrong code, and what the function signature would look like using your new types. Thanks for responding to all my questions so soon :D |
Any concrete and practical plans to support platforms with |
The title is "Add two new pointer types..." -- should this maybe be "Add two new pointer-sized integer types..." ? |
I don't see the point of this RFC:
So the RFC won't benefit CHERI at all?? If the point is to prevent the pointer ↔ integer cast, one could just add a lint, without introducing any new types. If the motivation was FFI semantics of usize on platforms where |
An amusing RFC. I brought this topic up back in 2013 but was told that much of the stdlib relied of the fact that size_t == uinptr_t. But I can't imagine that a platform that is fundamentally incompatible with the C standard is relevant in any way. After all, the standard says that uintptr_t exists on all platforms and supports round-tripping of pointers. (Unless I'm mistaken.) |
That being said, the linux kernel uses multiple address spaces (that is, pointers that carry compile time information so that they don't accidentally get mixed.) That might (in some limited sense) justify the idea that pointers are not just integers (in the sense of bi-directional conversion.) This is not quite the same as segmented memory but does (in some sense) justify the claim of the RFC that not all programs use only one flat address space. |
@mahkoh ah good point. There is the Mill's proposed ABI (and I assume probably others do this too) where virtual memory is done with one address space but per-process permissions, so the max object size in a "conceptual per-process address space" is indeed far smaller. |
This document describes a version of the GCC features used by the linux kernel: http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1275.pdf (p 28 ff) In particular it describes the case where pointers to different address spaces don't necessarily have the same size. This might or might not be relevant to this RFC but it describes some of the thoughts on pointers of people who work in embedded software. |
If one considers the case of multiple address spaces of different sizes, then one might also consider the case where the size of size_t depends on the address space. This seems to go beyond this RFC. Since the document posted above (2007) is not part of C11, it's not clear how useful this idea is and whether it should be considered in the overall design of usize vs. uptr. |
@mahkoh If there are multiple address spaces the |
That is clear since on can simply choose usize = max(size_t, uintptr_t) on every platform. It is already the case that, in general, not all usize values are valid pointers or object sizes. One interesting thing mentioned in the RFC is that sizeof(size_t) is not necessarily sizeof(void *). Maybe that is the one thing where the documentation should be updated. I do not know if there is a case currently where size_t > uintptr_t (or even size_t != uintptr_t). But the guarantee that usize has the same size as ordinary pointers is certainly used in some contexts (e.g. transmute.) It seems that the previous sentence already implies that any change that makes the size of usize unequal to the size of ordinary pointers is a breaking change. Another interesting idea from the C standard is that function pointers are not necessarily related to ordinary pointers. I don't know if there are any platforms where this is the case. But if one considers the case where a platform has multiple address spaces, then one might also consider a case where code and data don't live in the same address spaces. This might also be something to incorporate into the RFC. |
I recall that there was some discussion related to function pointers recently where it was recommended that functions be cast to usize before being transmuted. I believe that @nikomatsakis was the one talking about this. That is certainly problematic when usize != uintptr_t. |
Alright, I'm going to respond to different people in multiple comments. @durka The idea is that they're just completely unsupported, which means they're just not defined, on platforms like CHERI. Having some kind of Right, the breaking change is that "an integer which has the same number of bits as a pointer" isn't how So, the issue there is that, with
This also gives an example of an architecture with 48-bit pointers, which wouldn't be supported by our current model, unless we want to use a 48-bit usize. Note that two of these companies are defunct, although Bull is still making servers, supercomputers, and smartphones. It's unlikely to matter in practice; we could also guarantee a Neither of them are correct. The most correct, would, in fact, be something like void takes_slice_from_c(int const* ptr, void* len); because a The correct signature, assuming rustc chooses to make void takes_slice_from_c(int const* ptr, size_t len); (Rust doesn't change at all) |
No, there are no concrete plans at this time. Those are examples to show how broken our current model is, not really intended to be the driving force behind the change. The main driving force is that The only thing I don't see Rust supporting is architectures where different pointer types are differently sized, and even then, you could make everything everywhere. |
@kennytm That's really not the major motivation. The motivation is threefold: 1) segmented architectures, where object size isn't equal to pointer size (which we want to support, because replacing C is 👍), 2) on architectures where converting from pointer to integer and back isn't supported, error, instead of silently failing, and 3) (and most important) currently in the language, there's no way to semantically differentiate between a Also, our definition of |
Right, making Function pointers are already not guaranteed to be the size of regular pointers. I'm... confused what you mean by "functions be cast to usize before being transmuted". |
If we can't make |
@durka Okay, if you think so. Now that I'm actually looking, I think it'd be fine to say that "bits 0 is NULL", so we may just allow |
Would anyone mind a "How do we teach this" section? I think #1636 is an awesome idea, and I'd love to implement it in my own RFC. |
Such a section would be great, especially since this is adding complexity
|
[citation needed]
transmute(f as usize) |
Never specified otherwise, therefore it's not a guarantee (unless I'm completely wrong, but I can't find any guarantees about it). You shouldn't be transmuting like that. It's won't break code, but you can just do |
Clearly this implies that there is a single platform pointer type including function pointers.
Why? And more importantly: Why does @nikomatsakis say that one should transmute like that?
That makes no sense since you just said that function pointers and data pointers are incompatible. If anything, this cast has undefined behavior. |
Look, what's behind this? Of course function and data pointers are incompatible. It's a cast, however, it can happen in safe code, it's not UB. I very much doubt he ever said what you think he said, and if he did, he's wrong. Nobody is perfect. That doesn't mean anything. Implications and reality are very, very different. Edit: To be clear, what he said was fn f(a: A);
let f: fn(B) = std::mem::transmute(f as usize); instead of using fn f(a: A);
let f: fn(B) = std::mem::transmute(f as fn(A)); in case you wanted to be lazy, due to the fn-types-are-zero-sized thing. |
I literally quoted a part of the reference that talks about the ONE pointer type of a platform. And this is the natural way of looking at things since all relevant platform do not distinguish between the two types. And it is also often recommended to use usize in place of function pointers since rust lacks function pointers (in the sense of not being a reference.) Of course all of this is common knowledge so I'm surprised that you think the opposite is common knowledge and call it "of course." Maybe you could point out where you get this from.
I'm quite sure he did but it's best to simply wait for him to comment on this. (But if you insist then I can also search for the comment. It should not take much time.) In the meantime, since you've already acknowledged that there is no such thing as one pointer type (at present we have "ordinary data pointers", "function pointers", and possibly "data pointers that point into a different address space and possibly have a different pointer size"). I don't quite see why you would want to add types called "uptr" to the language when they clearly don't correspond to pointers. It is not clear if such a type would have to be compatible with only data pointers, only function pointers, or possibly both. Would a platform where data pointers and function pointers have a different size have this type? Or is it strictly for usage with data pointers? If so then the name seems quite confusing. |
Since rust doesn't have function pointers it's probably impossible to use it on platforms where function pointers are incompatible because the concept of a function pointer (not reference) simply cannot be expressed. |
https://internals.rust-lang.org/t/tootsie-pop-model-for-unsafe-code/3522/39 It supports round-tripping, but only for a very, very narrow (and honestly useless) definition of round-tripping. The only operation that needs to behave sensibly on a round-tripped pointer is comparing as equal to the original, which permits the round-trip result being a pointer to a zero-length subobject at the same address as the beginning of the original pointer's pointee. Such a pointer cannot be meaningfully dereferenced; my understanding is that CHERI takes advantage of this. |
Wait, what? Are |
That interpretation seems far removed from practice: https://www.cl.cam.ac.uk/~pes20/cerberus/notes50-survey-discussion.html See question 5.
They can be dereferenced without an unsafe block so they are certainly not pointers. |
In what way do they differ from pointers, other than not requiring unsafe? Are you implying that they inherit (data) references' requirement to always point to valid memory at the cost of UB? I guess that could be an issue in some cases if true, but I wouldn't take it as a given, especially since on the vast majority of architectures there is no benefit to making such an assumption... By the way, for the record: one platform with differently sized data and function pointers is 16-bit x86, in some variants; on the other hand, POSIX forbids it. |
If that were not the case then you could cause UB from safe code by calling such an invalid reference. Since that goes against the intention of rust, it seems reasonable to assume that creating such an object is UB.
The numeric values of function pointers are significant in some posix interfaces so it seems to require even more than just them being the same size. (Edit: Or maybe not since SIG_* are not necessarily defined to be numeric values. But I haven't bothered to look it up.) |
So, there's some confusion on the exact meaning of the different terms used here: Pointer: A pointer type. Includes references ( Reference: A type of pointer. Always points to valid data, guaranteed to never be null, not allowed to alias mutably (excluding Raw Pointer: Not guaranteed to point to valid data, not guaranteed to not be null, not guaranteed to not alias: Function pointers: Guaranteed to point to a valid callable function (probably, not actually sure on this), guaranteed to never be null, may be a pointer, may just have "pointer" in the name due to legacy reasons. It's not like they actually act or look like pointers. |
Honestly, my biggest concern here is that I am very wary of turning out like C -- basically, my feeling in C is that there are a bazillion integer types and nobody knows how to use them (e.g., Moreover, whenever I've tried (in C) to be very precise about which of those different integer types I'm using, I wind up in a bind, because I find that (for some reason or another) I have some value that (e.g.) started out as a In contrast, I've basically found working in Rust to be a breath of fresh air. I guess the price I am paying for this is that my code is less portable, but I'm not sure how much this matters in practice.(There are some places where the same problems arise in Rust; typically when converting between (Note that I feel basically the same about keeping a sharp distinction between "fn pointers", "other pointers", and "pointer-sized integers" -- it seems theoretically good, but in practice kind of a hassle, and doesn't seem to matter much in practice.) That said, I think the portability thing is real. It'd be great to be able to target more platforms. To me, this all seems pretty related to @aturon's "pre-RFC" on how to handle platform support in the standard library. In particular, we may want some way to let people indicate that they intend to target more esoteric platforms, and thus opt-in to a certain amount of pain in the form of lints and the like, while still keeping the defaults relatively lax. |
Note that this would preclude any code relying on such behavior from supporting the RV128 variant of RISC-V in the future. |
@eternaleye I spoke a bit loosely. I should have said "...on all platforms where that makes sense". But yes, this would make it more natural to target 32-bit or 64-bit than 128-bit. Targeting 128-bit would be an active choice (and I could imagine that in the future we might alter our lints or defaults once 128-bit is better established). The key point here is that we might have widening transforms or implicit integer conversions that vary from platform to platform. I think we should try to ensure that the defaults ensure portability across "common" architectures but not "all" architectures past and future. |
@nikomatsakis "Widening" transforms that makes real Rust code not work in 20 years when many have switched to 128-bit is not a good idea. RV128 is only the first platform to support 128 bit. As to The CHERI is in a different situation; its pointers are 192 bits wide. |
Quoting the definition of
Why does CHERI define an |
The phrasing @ubsan used is subtly incorrect, and likely comes from me being incautious in my phrasing when using CHERI to argue about Specifically, on CHERI, both pointers and intptr_t are 64-bit - CHERI is a set of capability extensions on top of regular BERI MIPS. However, capabilities are 192 bits (or in CHERIv5, 256-bit with a 128-bit compressed form), and the interactions between pointers and capabilities influence dereferenceability, which was the question at hand in that discussion. In particular, a pointer round-tripped through uintptr_t only has one behavior guaranteed by the spec: Compare as equal to the original pointer. Equality includes pointers to prefix subobjects, and that means a zero-length (non-dereferenceable) subobject can be used in order to prevent the possibility of capability violations by way of pointer arithmetic. |
@eternaleye Actually, on further reading of the C standard, I would disagree with that assessment. I think the CHERI implementation is buggy, and they shouldn't implement |
This is unlikely to ever happen. I'm going to close it. |
usize
andisize
serve dual purposes; this RFC splits the purposes apart into two types.