Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sizeof, alignof, offsetof, typeof #591

Closed

Conversation

vadimcn
Copy link
Contributor

@vadimcn vadimcn commented Jan 18, 2015

Summary

Add sizeof, alignof and offsetof operators similar to the ones found in C/C++.
Add typeof operator for finding out the type of an expression or a struct field.

Rendered

@vadimcn
Copy link
Contributor Author

vadimcn commented Jan 18, 2015

My attempt at implementing this RFC (impelemented sizeof, alignof and offsetof for regular structs)

- Beef up rustc's constant folder, so it can see though \<raw-pointer-deref\>-\<field-select\>-\<get-address-of\> combos. It would then be possible to express `sizeof` and `offsetof` in terms of raw pointer operations. For example: `macro_rules! offsetof(($f:ident in $t:ty) => (unsafe{ &(*(0 as *const $t)).$f as usize }));`
Cons: The implementations of these operators via pointer arithmetic use dubious tricks like dereferencing a null pointer, and may become broken by future improvements in LLVM optimizations. Also, this approach seems to be falling out of fashion even in C/C++ compilers (I guess for the same reason): `alignof`is now a standard operator in C++11; the `offsetof` hasn't been standartized yet, but both GCC and clang implement it as a custom built-in.

- Implement a limited form of compile-time function evaluation (CTFE) by hard-coding knowledge of intrinsics such as `size_of<T>()` and `align_of<T>()` into the constant folder.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The compiler has to know about them, as intrinsics.
We already special-case other function calls, like tuple struct and enum variant constructors, so this wouldn't be much of an issue, implementation wise.
The hard part, as with any other form of sizeof at the type level, no matter how it is exposed, is to compute the size of a type before trans even creates the LLVM context.
I toyed with the idea of moving ADT optimizations/representation logic higher up and allow computing sizes/alignments without depending on LLVM at all.
This would also allows us to "lower" some higher level type optimizations into some sort of MIR (Mid-level Intermediate Representation, like the Control Flow Graph that gets mentioned from time to time).
I haven't actually worked on it, but it seems feasible - maybe a little trickier to define platform-specific data layouts and encode the C ABI.

@nrc nrc self-assigned this Jan 22, 2015
@nikomatsakis
Copy link
Contributor

So there has been this long-standing debate about whether to add these as keywords or some kind of intrinsics. I think I've decided I prefer the keywords for the following reasons (from least to most important):

  1. It allows us to put off the notion of CTFE a little longer. We'll want to get a story here at some point but it doesn't feel necessary just for these basic concepts.
  2. The keywords have a lot of precedent and feel familiar.
  3. I don't see how offsetof can be done as an intrinsic really, beyond the "cast a null pointer" trick. If we have offsetof as a keyword, then it seems consistent to have sizeof and alignof as keywords. (I believe the RFC made a similar argument.)

That said, I might prefer to separate typeof from the other constructs. It involves potentially deeper changes to the front-end and I'm not sure I want to go there. (For example there are questions about where typeof can be permitted and so forth -- in struct declarations? only in fn bodies?)

I haven't looked at the impl but definitely having working code makes me feel better about this RFC -- mostly because I like the idea but don't think we would have the bandwidth to make it happen before 1.0.

@vadimcn
Copy link
Contributor Author

vadimcn commented Feb 3, 2015

I've introduced typeof because, for parity with C/C++, sizeof and alignof would have to accept either a type or an expression, which would be a parsing ambiguity in Rust. We could, however, implement them only for types at this time, and defer the typeof part for later.

Regarding implementation, it turns out that my code only covers declaration of statics and consts. You still wouldn't be able to use sizeof as a part of array size expression, unfortunately. I am afraid that @eddyb was correct in that a full implementation will require adding the concepts of platform-specific type layout and alignment into the ADT layer.

@nikomatsakis
Copy link
Contributor

On Tue, Feb 03, 2015 at 03:21:06PM -0800, Vadim Chugunov wrote:

I've introduced typeof because, for parity with C/C++, sizeof and alignof would have to accept either a type or an expression, which would be a parsing ambiguity in Rust. We could, however, implement them only for types at this time, and defer the typeof part for later.

I would only support sizeof type, not sizeof expr. And yes we can bring typeof in later.

Regarding implementation, it turns out that my code only covers
declaration of statics and consts. You still wouldn't be able to
use sizeof as a part of array size expression, unfortunately.

Yes, without reading your code, I assumed this was the case. I
consider it an orthogonal issue. In particular, I believe there are other
kinds of integral constant expressions that are not currently legal
in type position. Here is an example program that demonstrates what I mean:

#![allow(dead_code)]

struct Foo { x: usize }

const X: &'static Foo = &Foo { x: 3 };
const Y: usize = X.x; // No error here

struct Bar {
    f: [u32; X.x] // ERROR here
}

fn main() {

}

I am afraid that @eddyb was correct in that a full implementation
will require adding the concepts of platform-specific type layout
and alignment into the ADT layer.

I believe that is one possible solution but not the only one. In
particular we could just consider constant expressions that we can't
evaluate to a fixed integer to be a kind of "abstract number" that is
not necessarily equal to other abstract numbers. This means that we
would know that sizeof(Foo) == sizeof(Foo) but not that sizeof(Foo) == sizeof(Bar), even if Foo and Bar contain the same data
types. We do need to add something like this to support associated
constants anyway. I don't know which I consider a better approach, but
anyway I think it can be left undecided for now.

|---------------------|------------------|
|`sizeof(T)` |The size of type T.|
|`alignof(T)` |The minimal alignment of type T.|
|`offsetof(F in S)` |Offset of field F in struct S. F may be either an identifier for regular structs or an integer index for tuples and tuple structs.|
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need preferred alignment as well as minimal alignment?

@nrc
Copy link
Member

nrc commented Feb 9, 2015

@vadimcn it seems the biggest open question here is what to do with typeof. It seems that sizeof etc. should only take a type. I see a few options:

  • remove typeof from this RFC and mention it as future work
  • fully specify typeof here, in particular where it can be used
  • specify a minimal version of typeof here, with future work to allow it in other positions in the future. This minimal version might be only in expression position or only in function bodies, etc.

I suppose the first option is only viable if we are happy to not support sizeof etc. for expressions. That seems to be the state of the implementation - is it useful as it stands?

It seems that the syntax would proposed is:

op ::= sizeof | offsetof | alignof
const_int_expr ::= int_literal | op ( type )

Where const_int_expr can be used anywhere an integer literal can be used today. Is that correct?

This is effectively adding three keywords to the language. I think that is OK, but I'd like to hear others chime in about that (@brson, @huonw, @alexcrichton, @aturon, @pcwalton, @wycats, @steveklabnik, anyone else...).

It would be good to add a staging section to the RFC - discuss what can be implemented first (or that you've already implemented) and what can be left till post-1.0 (typeof/some of the things discussed above). Also, I presume these should be feature gated, you should propose a feature name.

|`sizeof(T)` |The size of type T.|
|`alignof(T)` |The minimal alignment of type T.|
|`offsetof(F in S)` |Offset of field F in struct S. F may be either an identifier for regular structs or an integer index for tuples and tuple structs.|
|`typeof(F in S)` |The type of field F in struct S. F may be either an identifier for regular structs or an integer index for tuples and tuple structs.|
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect this may stop x in y ever becoming an expression (I don't know if we want to do this at all, but...).

@comex
Copy link

comex commented Feb 11, 2015

Quick note: In C I often find it useful to do offsetof(x, y.z) or offsetof(x, some_array_field[n]) (where n may or may not be constant), although allowing this is a GNU extension. Not sure how difficult that would be to implement for Rust, but I see no reason to disallow it long term.

@vadimcn
Copy link
Contributor Author

vadimcn commented Feb 11, 2015

re typeof: I am afraid I do not understand how type inference in rustc works well enough. Can someone experienced in the area recommend the course of action? Would it be easy to implement if we do it for expression positions only?

I suppose the first option is only viable if we are happy to not support sizeof etc. for expressions. That seems to be the state of the implementation - is it useful as it stands?

IMO, sizeof(expr) is needed way less often than sizeof(type), so that is probably fine.
I am actually more concerned about the inability to use them in type position (e.g. in array size expression). Without that, the only value sizeof and alignof will add on top of existing intrinsics, is the use in static init expressions. On the other hand, offsetof is a feature genuinely missing from the language, so maybe that makes all this worth it?

This is effectively adding three keywords to the language.

These are already reserved keywords.

Also, I presume these should be feature gated, you should propose a feature name.

#![feature(size_operators)] ?

@nrc
Copy link
Member

nrc commented Feb 24, 2015

@vadimcn I would just remove type_of for now, we can deal with it later in a different RFC.

With sizeof, etc. as keywords, is there anything stopping them being used in types?

If you're happy to update the RFC, I'll try and get it approved and merged this week.

@eddyb
Copy link
Member

eddyb commented Feb 24, 2015

I am against any keyword-based solutions for anything other than typeof.
We can handle size_of::<T>(), align_of::<T>() and OffsetsOf::<T>.field.
Someone on IRC confirmed it would be possible to support all offsetof forms using the last one.
I was hoping this RFC to be updated after that discussion, but I see that it was not.

@huonw
Copy link
Member

huonw commented Feb 24, 2015

I was hoping this RFC to be updated after that discussion, but I see that it was not.

Was @vadimcn involved in the discussion? In any case, IRC is transient, providing a summary comment of discussions (at least linking to logs) is a good way to make sure everyone is on board and on the same page.

@vadimcn
Copy link
Contributor Author

vadimcn commented Feb 24, 2015

@eddyb, IMO, your proposal is sufficiently different from this one to become its own RFC.
Besides, I don't quite understand how offsetof::<T>.field is supposed to work...

@eddyb
Copy link
Member

eddyb commented Feb 24, 2015

@huonw I will check my logs whenver I'll get the chance to do so.
@vadimcn a struct OffsetsOf<T>; lang item that allows field access (I guess indexing would be fine, too) based on T's fields, the result being an usize value equal to the offset of that field in T.

@vadimcn
Copy link
Contributor Author

vadimcn commented Feb 26, 2015

@nick29581: Removed typeof from the main RFC text, as you asked.
Though in light of @eddyb's recent submission, perhaps we should wait till that plays out? Personally, I am not opposed to implementing these as const functions / whatever we'll call them.

@nrc nrc added the T-lang Relevant to the language team, which will review and decide on the RFC. label May 15, 2015
@nikomatsakis
Copy link
Contributor

Hear ye, hear ye. This RFC is now in final comment period until June 2nd.

@nikomatsakis nikomatsakis added the final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. label May 26, 2015
@comex
Copy link

comex commented May 26, 2015

Any thoughts about my suggestion above to allow multiple field/index accesses in offsetof? (i.e. offsetof(bar.baz in Foo) or offsetof(baz[5] in Foo))

Edit: I don't object to this RFC, but now that 1.0 is out I'd love to see CTFE improvements sooner rather than later.


The above syntax looks very similar to regular function calls. Some alternatives:
- Macro-like syntax: `sizeof!(MyStruct); alignof!(f64); offsetof!(field2 in MyStruct))`.
Pros: distinct from regular functions; "non-standard" syntax is par for the course in macros; it's immediately obvious that these are evaluated at compile time.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have some compiler-builtin macroses already, so adding a few new ones would not be too bad IMO. And they can still be expanded to their literal value.

That being said, I think I’d prefer the macro version.

@eddyb
Copy link
Member

eddyb commented May 27, 2015

@comex with the right field "reflection" primitives, you could do that in a macro. Sadly, a full specification of such primitives may not materialize before June 2nd.

@DanielKeep
Copy link

Just a thought on the syntax bikeshed: wouldn't introducing these as keywords be breaking backward compatibility? I mean, what if there's code that contains fn sizeof() { ... }? My first thought on seeing the syntax was that they could be surfaced as macros which expand to bog standard "whack a bunch of underscores out the front" magic keywords; i.e.

macro_rules! sizeof { ($t:ty) => (__internal_sizeof($t)) }

@sfackler
Copy link
Member

@DanielKeep see #1122, which covers the addition of new keywords in an opt-in manner.

@huonw
Copy link
Member

huonw commented May 27, 2015

All of these are reserved words anyway.

@eddyb
Copy link
Member

eddyb commented May 27, 2015

Here follows one version of the proposal I kept mentioning; this particular instance uses associated items for both field lookup and the offset constant:

extern "rust-intrinsic" {
    const fn size_of<T>() -> usize;
    const fn align_of<T>() -> usize;
}

// if T has a field foo: F, then FieldsOf<T>::foo: Field<Source=T, Type=F>.
enum FieldsOf<T> {}

// Implemented by the compiler on the anonymous types of FieldsOf<T>'s
// artificial fields.
trait Field {
    type Source; // T in the example above.
    type Type; // F in the example above.
    const OFFSET: usize;
}

// Implemented by the compiler on all [T; N] types.
trait Array {
    type Element; // T
    const LEN: usize; // N
}

macro_rules! sizeof { ($ty:ty) => (intrinsics::size_of::<$ty>()) }
macro_rules! alignof { ($ty:ty) => (intrinsics::align_of::<$ty>()) }
macro_rules! offsetof { ($field:ident in $ty:ty) => (real_offsetof!($ty .$field)) }

macro_rules! fieldof { ($ty:ty .$field:ident) => (intrinsics::FieldsOf<$ty>::$field) }
macro_rules! real_offsetof {
    ($ty:ty .$field:ident $($rest:tt)*) => (
         <fieldof!($ty .$field) as Field>::OFFSET +
         real_offsetof!(<fieldof!($ty .$field) as Field>::Type $($rest)*)
    );
    ($ty:ty [$idx:expr] $($rest:tt)*) => (
         $idx * sizeof!(<$ty as Array>::Element) +
         real_offsetof!(<$ty as Array>::Type $($rest)*)
    );
    ($ty:ty) => (0)
}

@oli-obk
Copy link
Contributor

oli-obk commented May 27, 2015

wouldn't it make more sense to add compiler-built associated constants for size_of and align_of to all concrete types?

@eddyb
Copy link
Member

eddyb commented May 27, 2015

@oli-obk That's actually slightly more work (after making the actual values available earlier during compilation), but it's possible to make size_of::<T>() use <T as Sized>::SIZE_OF or the other way around, it's just an implementation choice.

@oli-obk
Copy link
Contributor

oli-obk commented May 27, 2015

point taken. So it's more of a question of style, whether the standard library and compiler stuff should prefer associated constants over const fn when the const fn has no arguments.

trait SizeOf {
    const SIZE_OF: usize;
}
impl<T> SizeOf for T {
    const SIZE_OF: usize = size_of::<T>();
}

@aidancully
Copy link

I'm +1 for @eddyb's proposal: I like the idea of having each field in a structure have an anonymous type attached to it. My specific interest is that this concept would bring us closer to type-safe intrusive structure support: it makes it easier to make it a compile-time error to try to obtain the container pointer from the wrong field pointer, if there is an automatic way to enforce that different intrusive fields in a container must be of different types:

struct Container {
  intrusive_field!(pub link1: Container.Link),
  intrusive_field!(pub link2: Container.Link),
}

where intrusive_field! would expand to something like:

struct Container {
  pub link1: Link<FieldsOf<Container>::link1>,
  pub link2: Link<FieldsOf<Container>::link2>,
}

(I know this won't work in anything proposed so far, this would need further design, it's written just to demonstrate an idea.) I know intrusive structures have not been a priority for the language so far (they interact awkwardly at best with Rust's model of enforcing memory safety), but they can be a very useful tool in systems engineering, helping to avoid use of the heap. (Note that, for example, the MISRA standards preclude using dynamic memory allocation, but do allow some forms of intrusive structure.)

@tomaka
Copy link

tomaka commented Jun 1, 2015

For me macros follow the "principle of least astonishment":

  • As written in the RFC the keyword(param) syntax makes it look similar to a function. Except that it has several differences with regular functions, which makes it confusing.
  • It would introduce a new syntax, which is ident(Type). On the other side, ident!(Type) already exists and is widely spread.
  • For something that expands at compile-time it would be preferable to look like a macro and not like a function, as currently the only things that expand at compile-time are macros.
  • There are already macros in the stdlib whose implementation are hidden in the compiler's internals, like line! or include!, so that's not a problem.
  • When CTFE is introduced, people will ask why the sizeof(T) and alignof(T) syntax exist at all while you will just be able to use mem::size_of::<T>() and mem::align_of::<T>() for the same result.
  • More keywords means a more complex language, which is always bad. Using macros avoids this.

So for all this, +1 for @eddyb

@bluss
Copy link
Member

bluss commented Jun 1, 2015

@eddyb's Array trait is very exciting, I wonder though if it is redundant as soon as we have integer parameters in generics.

I'd prefer to avoid macros & avoid keywords if functions suffice. size_of::<T>() should be enough with CTFE.

@eddyb
Copy link
Member

eddyb commented Jun 1, 2015

@bluss having size_of::<T>() work in constants is kind of a pre-requisite for this.
Or the simplest implementation strategy, in any case.

Macros are only required if you want some kind of chaining offsetof syntax (like what real_offsetof! in my example provides), size_of and align_of can be const fn or associated constants in the Sized trait, but most likely both.

@nikomatsakis
Copy link
Contributor

Hmm. So I agree that @eddyb's approach is potentially very cool. I like that it seems to have the potential for enabling other kinds of "meta-reasoning". I'd like to explore this direction but am a bit nervous about "fumbling" our way into it -- I'd love to have a broader look at the requisite use cases and so forth. Put another way, this is clearly a kind of reflection system for working with types, and I think there's a lot of related work to investigate in that direction (with which I am not all that familiar).

But that's all kind of neither here nor there, because what must be decided here is the fate of this RFC. In other words, would we want to decline this RFC's more standard keyword-based approach in order to have time to explore the idea of using traits and so forth. (As a middle ground, it's plausible we could stabilize macros in some form.)

@nikomatsakis
Copy link
Contributor

I realize I left my question unanswered. At the moment, I lean against accepting this RFC, because I think that @eddyb's alternative is worth investigating.

@nikomatsakis
Copy link
Contributor

One thing to bring up is that this kind of type-level reflection may raise parametricity concerns. I don't immediately see any problems, but it's always something to keep in mind with reflection.

@eddyb
Copy link
Member

eddyb commented Jun 1, 2015

One thing to note is that typeof would enable more approaches, for example here is one supporting unnamed fields and indexes directly:

// if T has a field foo: F, then FieldsOf::<T>.foo: Field<Source=T, Type=F>.
// if T: [F; N], then FieldsOf::<T>[i]: Field<Source=T, Type=E>.
struct FieldsOf<T>;

// Implemented by the compiler on the anonymous types of FieldsOf<T>'s
// artificial fields.
trait Field {
    type Source; // T in the example above.
    type Type; // F in the example above.
    const OFFSET: usize;
}

macro_rules! offsetof { ($field:ident in $ty:ty) => (real_offsetof!($ty .$field)) }

macro_rules! fieldof {
    ($ty:ty $($f:tt)+) => (typeof(intrinsics::FieldsOf::<$ty> $($f)+))
}
macro_rules! real_offsetof {
    ($ty:ty .$field:tt $($rest:tt)*) => (
         <fieldof!($ty .$field) as Field>::OFFSET +
         real_offsetof!(<fieldof!($ty .$field) as Field>::Type $($rest)*)
    );
    ($ty:ty [$idx:expr] $($rest:tt)*) => (
         <fieldof!($ty [$idx]) as Field>::OFFSET +
         real_offsetof!(<fieldof!($ty .$field) as Field>::Type $($rest)*)
    );
    ($ty:ty) => (0)
}

@nikomatsakis as for parametricity concerns, this would only work on concrete types, in the same contexts where a direct field access on a value of that type would be allowed.
It can be extended to a HasField trait, allowing a form of row polymorphism.
Or it could be used in conjunction with associated constants/types in traits to get associated fields.

A bonus (offtopic) example:

impl<T, F: Field<Source=T>> Index<F> for T {
    type Output = F::Type;
    fn index(&self, _: F) -> &F::Type {
        &*((self as *const _ as *const u8).offset(F::OFFSET) as *const F::Type)
    }
}
impl<T, F: Field<Source=T>> IndexMut<F> for T {
    fn index_mut(&mut self, _: F) -> &mut F::Type {
        &mut *((self as *mut _ as *mut u8).offset(F::OFFSET) as *mut F::Type)
    }
}

@aidancully
Copy link

@nikomatsakis As another middle ground, defer deciding this RFC's fate until @eddyb can put together a more formal proposal. (In other words, abort the "final comment period" process for now, and leave the RFC open.) It seems unfair to close this RFC in favor of something that hasn't been more formally proposed, and it seems inappropriate to accept this RFC while there's a potentially preferred approach identified that we think will deserve consideration. This RFC and @eddyb's proposal deserve an even footing for a serious debate, IMO.

@eddyb
Copy link
Member

eddyb commented Jun 1, 2015

@aidancully how about splitting away offsetof and implementing just size_of and align_of for now? (presumably by making the intrinsics const fn).

@aidancully
Copy link

@eddyb I agree that size_of and align_of should be const fn, but it doesn't seem that that's what this RFC suggests? For what it's worth, I'm 👎 on this RFC unless the motivation section can be expanded. In particular, I'd like to see some discussion on how a compile-time size_of and align_of could be used in Rust. (For example, I can imagine these concepts being useful to define an Opaque<T> type, which would be guaranteed to be transmutable to / from T, but I don't yet see how they would be applied. You can't do this:

struct Opaque<T> {
  aligner: [align_of::<T>; 0]
  sizer: [u8; size_of::<T>]
}

since align_of returns a usize, rather than a type.)

@nikomatsakis
Copy link
Contributor

It seems clear that while this RFC proposes a reasonable solution to a known shortcoming, there is also a desire to explore @eddyb's proposal more thoroughly before making a final decision. Therefore, I'm going to close this RFC as postponed under issue #1144. Thanks to @vadimcn very much for the submission.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. T-lang Relevant to the language team, which will review and decide on the RFC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.