-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sizeof, alignof, offsetof, typeof #591
Conversation
My attempt at implementing this RFC (impelemented sizeof, alignof and offsetof for regular structs) |
- Beef up rustc's constant folder, so it can see though \<raw-pointer-deref\>-\<field-select\>-\<get-address-of\> combos. It would then be possible to express `sizeof` and `offsetof` in terms of raw pointer operations. For example: `macro_rules! offsetof(($f:ident in $t:ty) => (unsafe{ &(*(0 as *const $t)).$f as usize }));` | ||
Cons: The implementations of these operators via pointer arithmetic use dubious tricks like dereferencing a null pointer, and may become broken by future improvements in LLVM optimizations. Also, this approach seems to be falling out of fashion even in C/C++ compilers (I guess for the same reason): `alignof`is now a standard operator in C++11; the `offsetof` hasn't been standartized yet, but both GCC and clang implement it as a custom built-in. | ||
|
||
- Implement a limited form of compile-time function evaluation (CTFE) by hard-coding knowledge of intrinsics such as `size_of<T>()` and `align_of<T>()` into the constant folder. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The compiler has to know about them, as intrinsics.
We already special-case other function calls, like tuple struct and enum variant constructors, so this wouldn't be much of an issue, implementation wise.
The hard part, as with any other form of sizeof
at the type level, no matter how it is exposed, is to compute the size of a type before trans even creates the LLVM context.
I toyed with the idea of moving ADT optimizations/representation logic higher up and allow computing sizes/alignments without depending on LLVM at all.
This would also allows us to "lower" some higher level type optimizations into some sort of MIR (Mid-level Intermediate Representation, like the Control Flow Graph that gets mentioned from time to time).
I haven't actually worked on it, but it seems feasible - maybe a little trickier to define platform-specific data layouts and encode the C ABI.
So there has been this long-standing debate about whether to add these as keywords or some kind of intrinsics. I think I've decided I prefer the keywords for the following reasons (from least to most important):
That said, I might prefer to separate I haven't looked at the impl but definitely having working code makes me feel better about this RFC -- mostly because I like the idea but don't think we would have the bandwidth to make it happen before 1.0. |
I've introduced Regarding implementation, it turns out that my code only covers declaration of statics and consts. You still wouldn't be able to use |
On Tue, Feb 03, 2015 at 03:21:06PM -0800, Vadim Chugunov wrote:
I would only support
Yes, without reading your code, I assumed this was the case. I #![allow(dead_code)]
struct Foo { x: usize }
const X: &'static Foo = &Foo { x: 3 };
const Y: usize = X.x; // No error here
struct Bar {
f: [u32; X.x] // ERROR here
}
fn main() {
}
I believe that is one possible solution but not the only one. In |
|---------------------|------------------| | ||
|`sizeof(T)` |The size of type T.| | ||
|`alignof(T)` |The minimal alignment of type T.| | ||
|`offsetof(F in S)` |Offset of field F in struct S. F may be either an identifier for regular structs or an integer index for tuples and tuple structs.| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need preferred alignment as well as minimal alignment?
@vadimcn it seems the biggest open question here is what to do with typeof. It seems that sizeof etc. should only take a type. I see a few options:
I suppose the first option is only viable if we are happy to not support sizeof etc. for expressions. That seems to be the state of the implementation - is it useful as it stands? It seems that the syntax would proposed is:
Where This is effectively adding three keywords to the language. I think that is OK, but I'd like to hear others chime in about that (@brson, @huonw, @alexcrichton, @aturon, @pcwalton, @wycats, @steveklabnik, anyone else...). It would be good to add a staging section to the RFC - discuss what can be implemented first (or that you've already implemented) and what can be left till post-1.0 (typeof/some of the things discussed above). Also, I presume these should be feature gated, you should propose a feature name. |
|`sizeof(T)` |The size of type T.| | ||
|`alignof(T)` |The minimal alignment of type T.| | ||
|`offsetof(F in S)` |Offset of field F in struct S. F may be either an identifier for regular structs or an integer index for tuples and tuple structs.| | ||
|`typeof(F in S)` |The type of field F in struct S. F may be either an identifier for regular structs or an integer index for tuples and tuple structs.| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect this may stop x in y
ever becoming an expression (I don't know if we want to do this at all, but...).
Quick note: In C I often find it useful to do |
re
IMO,
These are already reserved keywords.
|
@vadimcn I would just remove type_of for now, we can deal with it later in a different RFC. With sizeof, etc. as keywords, is there anything stopping them being used in types? If you're happy to update the RFC, I'll try and get it approved and merged this week. |
I am against any keyword-based solutions for anything other than |
Was @vadimcn involved in the discussion? In any case, IRC is transient, providing a summary comment of discussions (at least linking to logs) is a good way to make sure everyone is on board and on the same page. |
@eddyb, IMO, your proposal is sufficiently different from this one to become its own RFC. |
@nick29581: Removed |
Hear ye, hear ye. This RFC is now in final comment period until June 2nd. |
Any thoughts about my suggestion above to allow multiple field/index accesses in offsetof? (i.e. Edit: I don't object to this RFC, but now that 1.0 is out I'd love to see CTFE improvements sooner rather than later. |
|
||
The above syntax looks very similar to regular function calls. Some alternatives: | ||
- Macro-like syntax: `sizeof!(MyStruct); alignof!(f64); offsetof!(field2 in MyStruct))`. | ||
Pros: distinct from regular functions; "non-standard" syntax is par for the course in macros; it's immediately obvious that these are evaluated at compile time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have some compiler-builtin macroses already, so adding a few new ones would not be too bad IMO. And they can still be expanded to their literal value.
That being said, I think I’d prefer the macro version.
@comex with the right field "reflection" primitives, you could do that in a macro. Sadly, a full specification of such primitives may not materialize before June 2nd. |
Just a thought on the syntax bikeshed: wouldn't introducing these as keywords be breaking backward compatibility? I mean, what if there's code that contains macro_rules! sizeof { ($t:ty) => (__internal_sizeof($t)) } |
@DanielKeep see #1122, which covers the addition of new keywords in an opt-in manner. |
All of these are reserved words anyway. |
Here follows one version of the proposal I kept mentioning; this particular instance uses associated items for both field lookup and the offset constant: extern "rust-intrinsic" {
const fn size_of<T>() -> usize;
const fn align_of<T>() -> usize;
}
// if T has a field foo: F, then FieldsOf<T>::foo: Field<Source=T, Type=F>.
enum FieldsOf<T> {}
// Implemented by the compiler on the anonymous types of FieldsOf<T>'s
// artificial fields.
trait Field {
type Source; // T in the example above.
type Type; // F in the example above.
const OFFSET: usize;
}
// Implemented by the compiler on all [T; N] types.
trait Array {
type Element; // T
const LEN: usize; // N
}
macro_rules! sizeof { ($ty:ty) => (intrinsics::size_of::<$ty>()) }
macro_rules! alignof { ($ty:ty) => (intrinsics::align_of::<$ty>()) }
macro_rules! offsetof { ($field:ident in $ty:ty) => (real_offsetof!($ty .$field)) }
macro_rules! fieldof { ($ty:ty .$field:ident) => (intrinsics::FieldsOf<$ty>::$field) }
macro_rules! real_offsetof {
($ty:ty .$field:ident $($rest:tt)*) => (
<fieldof!($ty .$field) as Field>::OFFSET +
real_offsetof!(<fieldof!($ty .$field) as Field>::Type $($rest)*)
);
($ty:ty [$idx:expr] $($rest:tt)*) => (
$idx * sizeof!(<$ty as Array>::Element) +
real_offsetof!(<$ty as Array>::Type $($rest)*)
);
($ty:ty) => (0)
} |
wouldn't it make more sense to add compiler-built associated constants for |
@oli-obk That's actually slightly more work (after making the actual values available earlier during compilation), but it's possible to make |
point taken. So it's more of a question of style, whether the standard library and compiler stuff should prefer associated constants over trait SizeOf {
const SIZE_OF: usize;
}
impl<T> SizeOf for T {
const SIZE_OF: usize = size_of::<T>();
} |
I'm +1 for @eddyb's proposal: I like the idea of having each field in a structure have an anonymous type attached to it. My specific interest is that this concept would bring us closer to type-safe intrusive structure support: it makes it easier to make it a compile-time error to try to obtain the container pointer from the wrong field pointer, if there is an automatic way to enforce that different intrusive fields in a container must be of different types: struct Container {
intrusive_field!(pub link1: Container.Link),
intrusive_field!(pub link2: Container.Link),
} where struct Container {
pub link1: Link<FieldsOf<Container>::link1>,
pub link2: Link<FieldsOf<Container>::link2>,
} (I know this won't work in anything proposed so far, this would need further design, it's written just to demonstrate an idea.) I know intrusive structures have not been a priority for the language so far (they interact awkwardly at best with Rust's model of enforcing memory safety), but they can be a very useful tool in systems engineering, helping to avoid use of the heap. (Note that, for example, the MISRA standards preclude using dynamic memory allocation, but do allow some forms of intrusive structure.) |
For me macros follow the "principle of least astonishment":
So for all this, +1 for @eddyb |
@eddyb's Array trait is very exciting, I wonder though if it is redundant as soon as we have integer parameters in generics. I'd prefer to avoid macros & avoid keywords if functions suffice. |
@bluss having Macros are only required if you want some kind of chaining |
Hmm. So I agree that @eddyb's approach is potentially very cool. I like that it seems to have the potential for enabling other kinds of "meta-reasoning". I'd like to explore this direction but am a bit nervous about "fumbling" our way into it -- I'd love to have a broader look at the requisite use cases and so forth. Put another way, this is clearly a kind of reflection system for working with types, and I think there's a lot of related work to investigate in that direction (with which I am not all that familiar). But that's all kind of neither here nor there, because what must be decided here is the fate of this RFC. In other words, would we want to decline this RFC's more standard keyword-based approach in order to have time to explore the idea of using traits and so forth. (As a middle ground, it's plausible we could stabilize macros in some form.) |
I realize I left my question unanswered. At the moment, I lean against accepting this RFC, because I think that @eddyb's alternative is worth investigating. |
One thing to bring up is that this kind of type-level reflection may raise parametricity concerns. I don't immediately see any problems, but it's always something to keep in mind with reflection. |
One thing to note is that // if T has a field foo: F, then FieldsOf::<T>.foo: Field<Source=T, Type=F>.
// if T: [F; N], then FieldsOf::<T>[i]: Field<Source=T, Type=E>.
struct FieldsOf<T>;
// Implemented by the compiler on the anonymous types of FieldsOf<T>'s
// artificial fields.
trait Field {
type Source; // T in the example above.
type Type; // F in the example above.
const OFFSET: usize;
}
macro_rules! offsetof { ($field:ident in $ty:ty) => (real_offsetof!($ty .$field)) }
macro_rules! fieldof {
($ty:ty $($f:tt)+) => (typeof(intrinsics::FieldsOf::<$ty> $($f)+))
}
macro_rules! real_offsetof {
($ty:ty .$field:tt $($rest:tt)*) => (
<fieldof!($ty .$field) as Field>::OFFSET +
real_offsetof!(<fieldof!($ty .$field) as Field>::Type $($rest)*)
);
($ty:ty [$idx:expr] $($rest:tt)*) => (
<fieldof!($ty [$idx]) as Field>::OFFSET +
real_offsetof!(<fieldof!($ty .$field) as Field>::Type $($rest)*)
);
($ty:ty) => (0)
} @nikomatsakis as for parametricity concerns, this would only work on concrete types, in the same contexts where a direct field access on a value of that type would be allowed. A bonus (offtopic) example: impl<T, F: Field<Source=T>> Index<F> for T {
type Output = F::Type;
fn index(&self, _: F) -> &F::Type {
&*((self as *const _ as *const u8).offset(F::OFFSET) as *const F::Type)
}
}
impl<T, F: Field<Source=T>> IndexMut<F> for T {
fn index_mut(&mut self, _: F) -> &mut F::Type {
&mut *((self as *mut _ as *mut u8).offset(F::OFFSET) as *mut F::Type)
}
} |
@nikomatsakis As another middle ground, defer deciding this RFC's fate until @eddyb can put together a more formal proposal. (In other words, abort the "final comment period" process for now, and leave the RFC open.) It seems unfair to close this RFC in favor of something that hasn't been more formally proposed, and it seems inappropriate to accept this RFC while there's a potentially preferred approach identified that we think will deserve consideration. This RFC and @eddyb's proposal deserve an even footing for a serious debate, IMO. |
@aidancully how about splitting away |
@eddyb I agree that struct Opaque<T> {
aligner: [align_of::<T>; 0]
sizer: [u8; size_of::<T>]
} since |
It seems clear that while this RFC proposes a reasonable solution to a known shortcoming, there is also a desire to explore @eddyb's proposal more thoroughly before making a final decision. Therefore, I'm going to close this RFC as postponed under issue #1144. Thanks to @vadimcn very much for the submission. |
Summary
Add sizeof, alignof and offsetof operators similar to the ones found in C/C++.
Add typeof operator for finding out the type of an expression or a struct field.
Rendered