Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zero Page Optimization #2400

Closed
wants to merge 8 commits into from
Closed

Conversation

ishitatsuyuki
Copy link
Contributor

@ishitatsuyuki ishitatsuyuki commented Apr 12, 2018

@skade

This comment has been minimized.

@mark-i-m
Copy link
Member

We will add a target-specific constant to determine the availability and size of the zero page.

How is this constant set? Is it a language item? Is it usable on stable? Is it set in the linker config script? Does it default to just the null pointer?

I'm ok with the principle behind this RFC, but for any embedded/bare metal/kernel development, this needs to be configurable on stable. Otherwise, it will be impossible to write things like bootloaders or microcontrollers on stable rust in some cases, since it is often necessary to use the lower bytes in (for example) 8- or 16-bit modes.

Also, I think this should be independent of page size entirely. For mainstream OSes like Linux, the "null range" happens to be the first page because most MMUs cannot enforce finer-grain controls, but there is no fundamental reason why the compiler should be tied to the same constraint.

Copy link

@hanna-kruppe hanna-kruppe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before I say anything on whether Rust ought to do this, I want to register my confusion about the claim that the zero page is already effectively assumed to exist by today's Rust.


Inside Rust std, we rely on the assumption that zero page exists:

https://github.com/rust-lang/rust/blob/ca26ef321c44358404ef788d315c4557eb015fb2/src/liballoc/heap.rs#L238

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand how this link supports the claim that std assumes a zero page. It links to ZST allocation, but ZST allocation can hand out whatever pointers it wants. It could return something as ridiculous as the address of main, if that is suitably aligned!

And besides, alignment can be much larger than the page size, so the linked line can create pointers not on the zero page.

To make things worse, such usage is also seen outside std, on crates that compile
on stable Rust:

https://github.com/rust-lang-nursery/futures-rs/blob/856fde847d4062f5d2af5d85d6640028297a10f1/futures-util/src/lock.rs#L157-L169

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code, too, doesn't seem to me like it assumes that a zero page exists. It stores an address in an AtomicUsize and assumes 1 can't be such an address, but that assumption is true because of alignment (it's storing the address of a Waker, which contains a pointer[1]), not because of anything about the zero page.

[1] it is theoretically conceivable to have a platform where pointers are just one byte or where pointers can be unaligned, but Rust doesn't support any such targets, and even if it did futures-util could simply add #[repr(align(2))] to Waker.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's true that we can use a few bits for embedding data when alignment is involved. However, we can't call the current BiLock code sound; as you mentioned, it assumes alignment on pointers to exist, without any comments indicating that.

always true. For instance, microcontrollers without MMU doesn't implement such
guards at all, and `0` is a valid address where the entrypoint lies. See
[Cortex-M4](https://developer.arm.com/docs/ddi0439/latest/programmers-model/system-address-map)'s
design as one of such example.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it true that address 0 is valid in Cortex-M and that you can’t validly create a Rust reference &T to it, but it’s not like arbitrary data can end up there by chance. That address is reserved for some early boot detail that most application don’t deal with directly. In the cortext-m-rt crate there is not even a corresponding Rust item, it is entirely dealt with in the linker script.

So I don’t think there is a problem here in practice.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SimonSapin Are you suggesting that access to 0 should be strictly unsafe?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I’m only saying that the ARM Cortex case is not really relevant to the "Rust makes bad assumptions" argument. But then what do you mean by "access to 0"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i.e. do I have to use *mut _ or *const _ if I want to access part of the "null range"?

@Centril Centril added the T-compiler Relevant to the compiler team, which will review and decide on the RFC. label Apr 12, 2018
@joshtriplett
Copy link
Member

Personally, I would like to see the possibility of this optimization (automatically hiding enum variants or small values in the low bits of a pointer), but we also need to make sure people don't rely on non-portable assumptions.

targeted at people dealing with FFI or unsafe.

The recently stabilized `NonNull` type will have more strict requirements:
the pointer must be not in the null page, and it must be valid to dereference.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you clarify "valid to dereference?" Surely it is not meant that the pointer must point to valid data, as dangling is also stable...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, this part of the RFC sounds incorrect. ptr::NonNull is a pointer that is not null. It makes no guarantee beyond that.

Copy link
Contributor Author

@ishitatsuyuki ishitatsuyuki Apr 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems it need to be changed to have a new type for this then, as this conflicts with how NonNull works currently. The new type would have similar semantics to a reference, where it always points to valid data.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand this might be a consequence of terribly unfortunate timing, but I think it is wise to tread carefully given how recently NonNull was stabilized, and how it was clearly intended to be the defacto way to receive Rust's zero-discriminant optimizations. In particular, any of the following:

  • introducing a replacement for NonNull<T>
  • removing the promise that Option<NonNull<T>> is the same size as NonNull<T>
  • deprecating NonNull<T>

so early after its stabilization will send a poor message about what it means for something to become stable in rust.

@comex
Copy link

comex commented Apr 12, 2018

On embedded systems where 0 is a valid address, 0 itself is usually some form of interrupt vector or entry point, unlikely to be a valid code or data pointer. But the same can’t be said for the entire first page – indeed, you can get tiny ARM microcontrollers with as little as 4KB of flash total, and it’s mapped at 0! So we’d have to make sure to turn this off for (even potentially) freestanding targets.

On the other end, 64-bit macOS and iOS by default reserves a whole 4GB of memory starting at 0.

@ishitatsuyuki
Copy link
Contributor Author

The code mentioned here is not strictly sound, but in practice no way to exploit such unsoundness exists. This RFC is just proposing a better way to present those enumerations; I'll update the wordings.

@nagisa
Copy link
Member

nagisa commented Apr 13, 2018 via email

can exploit this for ~12 bits of storage for secondary variants.

[Inside Rust std](https://github.com/rust-lang/rust/blob/ca26ef321c44358404ef788d315c4557eb015fb2/src/liballoc/heap.rs#L238),
we use a "dangling" pointer for ZST allocations; this involves a somewhat

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't understand how this relates at all to the motivation for this RFC.


The recently stabilized `NonNull` type will have more strict requirements:
the pointer must be not in the null page. `NonNull::dangling` will be
deprecated in favor of this optimization.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's deprecated, what's the replacement?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In favour of the zero page optimization. That is, using an enumeration instead.

Copy link

@hanna-kruppe hanna-kruppe Apr 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand. You propose that the current way to get a NonNull that is non-null and aligned is deprecated. What non-deprecated thing can current users of that method do instead to get a NonNull with the same properties? that is similarly valid with the new invariant?

(Leaving aside the question of whether it's OK to change the meaning of NonZero like this after stabilization.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue is that NonNull::dangling() seems to be just a hack where Option<NonNull<T>> should be used. NonNull::dangling() advocates less idiomatic coding, and Option<NonNull<T>> should be a perfect fit as a replacement.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a big claim that requires a fair bit of support given that the API was accepted and stabilized.

Furthermore, using Option is not equivalent to using a dangling pointer since it "uses up" the null value: e.g. Vec<T> contains a NonNull<T> and this makes Option<Vec<T>> the same size as Vec<T>, if it used Option<NonNull<T>> instead, Option<Vec<T>> would be bigger.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ishitatsuyuki Sorry, I don’t see how NonNull::dangling is related to Option<NonNull<_>> at all. dangling is for creating an arbitrary pointer that is correctly aligned without being null. It is used for zero-size allocations, for example in Vec: https://github.com/rust-lang/rust/blob/fb730d75d4c1c05c90419841758300b6fbf01250/src/liballoc/raw_vec.rs#L93

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rkruppe Can I suggest that a code search only showed usage for "optional allocation", for either ZST or an absent node in the linked data structure? Also, the original intent of this addition seems to be "we need this to interact with allocator": rust-lang/rust#45527

@SimonSapin Using NonNull::dangling is a convention inside the alloc related functions, but it's not expressed through types. Using an enum makes it less error prone, catching the cases where we may pass an dangling pointer to the underlying allocator.

Copy link

@ExpHP ExpHP Apr 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ishitatsuyuki If I understand correctly, you are saying that if the following occurred:

  • Vec<T> instead stored Option<NonNull<T>>
  • NonNull<T> was changed to forbid pointers in the null page

Then Option<Vec<T>> could still receive optimization? If this is the case, it might help to demonstrate this explicitly.

That said, I think part of the concern here is that there are places where Vec<T> benefits specifically from the fact that dangling() is aligned. e.g., a slice can be constructed directly from the pointer without having to branch on None. ISTM that would be impossible when using Option<NonNull<T>> as it must remain possible to take a reference to the option.

Edit: Or wait... maybe it is possible. The pointer for Some(vec![]) would be null, and the representation of None::<Vec<T>> would begin with 1 where the Option<NonNull<T>> is stored. Hm...

Edit 2: but then what about Vec<Option<T>>? We end up with an Option<NonNull<Option<T>> whose None representation is 1, which is not aligned when interpreted as a pointer. Or something like that. My brain hurts.

@hanna-kruppe
Copy link

The motivation for this RFC currently seems weak to me. None of the code cited in the motivation section actually needs a zero page reserved to be valid (if it even relates to low addresses being valid or not). More enum layout optimizations become possible (though non-portably), but how commonly will that apply?

Consider that the compiler is already allowed to exploit alignment to get a few extra values for discriminants (this is not currently implemented but it's been discussed a lot), so it would only kick in if you have both a lot of field-less variants and a pointer to a type of low alignment AFAICT.

Combine that with the very real concerns of breaking embedded use cases (or at least being incompatible with them, which would mean small microcontrollers that could get the most value out of saving some memory don't get those optimizations) and the other smaller concerns around NonZero already mentioned and the prospect of exploiting the zero page is not very appealing.

@joshtriplett
Copy link
Member

Personally, I'd like to see ways for Rust to more naturally and automatically handle things that C code often does manually, which includes stunts like reusing the low bits of aligned pointers, or knowing that valid pointers can never point into a particular range. I'd much rather have those things handled automatically and consistently by the compiler.

I also think, ideally, that we should not require explicit declaration of valid pointer ranges within individual structures containing pointers, nor require every use of a pointer type to include such a declaration. Having some special kind of pointer that excludes the first 4k or 1M or similar would require changing substantial amounts of code to take advantage of an optimization like this. We don't have any similar requirement to take advantage of the optimization of the null-pointer optimization for things like Option<&T>.

I don't, however, think we should do this so aggressively that we break embedded use cases. I'd like to see people able to write Rust code for platforms where you can have valid data at addresses 4 and 8. And even, with enough care, valid data at address 0, though I don't mind if that requires some special accessors known to not mind null pointers.

Given that, I have a question for the people currently objecting to this RFC: would your objections be fully addressed by a feature that was under the full control of the person invoking the compiler, such as via command-line options or optimization options (that would affect the Rust ABI) to specify the range of invalid pointers? (With some careful target-specific defaults, such as for x86 Linux or x86 Windows versus x86 ELF.)

That shouldn't break any use case, embedded or otherwise. People creating a new target can determine the correct default values, with the default default being "just the null pointer". People using an embedded target should find that this optimization doesn't apply unless they specifically enable it. And people targeting a platform like x86 Linux but wanting to write code that uses pointers near or at 0 (requiring a change to /proc/sys/vm/mmap_min_addr for instance) could disable this optimization easily enough.

@mark-i-m
Copy link
Member

@joshtriplett I think that would be good. It's a bit annoying that this information is often already in linker scripts and configs, though...

@joshtriplett
Copy link
Member

@mark-i-m For embedded applications, kernels, and similar, yes. Standard applications, on the other hand, don't typically have such linker scripts or configs.

@nox
Copy link
Contributor

nox commented Apr 14, 2018

Personally, I would like to see the possibility of this optimization (automatically hiding enum variants or small values in the low bits of a pointer), but we also need to make sure people don't rely on non-portable assumptions.

I have code for this, somewhat, where I use the alignment of T in &T to teach rustc that 1, 2 and 3 will never be a valid representation of &usize, for example. That breaks transmute (and other things) because of how layout is computed for generic types currently. I plan to work on that later this year, but it's definitely not an easy task. I can write some more about this and link to IRC discussions with people (where people is "just Eddy" as you could have guessed) if there is interest about this.

🥖 @rust-lang/wg-codegen

@Manishearth
Copy link
Member

cc @ticki @steveklabnik @phil-opp @SergioBenitez

(adding folks doing OS work in Rust)

@Manishearth
Copy link
Member

During the migration, we should migrate the impact with a crater run. If changing the behavior directly is unacceptable, then we'll have to create a new type instead.

I'm iffy on this; a lot of the problems that could be caused by this may not be immediately obvious in a crater run. Especially for FFI-using applications that don't get tested via crater since crater doesn't know how to build them (or if they're on crates.io).

I'd rather just do a new type period.

To take advantage of zero page optimization, use transmute from and to usize. This will cause compilation to fail if such optimization is not permitted on the target.

Not applicable: Null pointer optimization is Rust specific, and this enhancement is Rust specific too.

This optimization is pretty common in C++ codebases, manually done.
This is a hack. We could use target_feature for this, probably.

@oli-obk
Copy link
Contributor

oli-obk commented Apr 16, 2018

I'd rather just do a new type period.

A new "optimized" reference type could have many cool benefits, not just this:

  • reuse bits that are zero due to alignment for storing discriminants
  • references to zsts are zsts
  • references to uninhabited types are uninhabited

wrt embedded needing pointers to low integer addresses:

The zero page size could be a target specific information, just like pointer sizes or endianess

Introduce a new type that can benefit from more optimizations.
@ishitatsuyuki
Copy link
Contributor Author

Based on what @oli-obk suggested I've revamped this RFC. Basically, this now also acts as groundwork toward more optimization we can do in the future.

of an enumeration in a way similar to before, except that we will allow
discriminants of up to the zero page size (typically 4095).
- These types will be ZST if `T` is ZST. An arbitrary constant is returned as
the inner raw pointer. `0` is a good candidate here because we don't actually
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ZST pointer addresses are their alignment, not 0.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you saying that because we used to assign such value? I think we no longer have to do that complicated thing, 0 makes the logic more simple.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well.. I would have assumed that we should separate the actual value from the memory representation. While the representation is (), the value should still be meaningful (not 0, because that has the "invalid" meaning for pointers)

- These types will be ZST if `T` is ZST. An arbitrary constant is returned as
the inner raw pointer. `0` is a good candidate here because we don't actually
store it, we don't have to worry about it conflicting with the optimization.
- These types will be inhabitable if `T` is inhabitable.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That can only be done for Shared, the other types can't have these optimizations, as that would break code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate on how this can break code?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These things have been discussed in detail in #2040

We should refactor the allocation related code to prefer enumerations over
`NonNull::dangling`. Taking `RawVec` code as an example, we would use
`Option<Shared<T>>` to store the internal pointer. For ZST, we initialize
with an arbitrary value (as we don't store it); for zero-length vector, we make
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see the comment above about zst pointers

@CAD97
Copy link

CAD97 commented Apr 18, 2018

Isn't it impossible to have Option<Option<&_>> be flat? Since Option::as_ref exists, there needs to exist a valid Option<&_> to be pointed at.

As I understand how niche filling works today, enum discriminants can be snuggled into padding, because the contents of those bits are undefined when you just have the T.

I guess since Option+&_ has magic on it already for the null pointer optimization, it might be possible to change Option::<&_>::is_some from (ptr) != 0 to (ptr) > ZERO_PAGE, and then you could say Option::<...<Option<&_>...>::is_some is (ptr) > (1 << nesting level), effectively treating the end of a pointer into the zero page as padding which can be filled as a niche.

Never mind me, I convinced myself that this is (probably) sound. Just treat the least significant bits of a pointer as a niche that can be filled until setting that bit might make it a valid pointer.

It needs to be noted, though: there's a lot of 1 as {ptr} out there. This'll probably open all of those up to soundness holes, even though it intends to make them unnecessary.

@Manishearth
Copy link
Member

@CAD97 fwiw your comment seems to assume this is specifically implemented in Option's code; it's not, it's a generic optimization. is_some is just a match.

Anyway, this is exactly what the proposal is talking about. The double option isn't a problem because the invalid states only occur when there's a None; i.e. when there's nothing to point at.

@CAD97
Copy link

CAD97 commented Apr 18, 2018

I was just talking about is_some to make it easier to talk about, it's of course a match. Though the check is emitted at some level. Obviously though my understanding was a bit off, and now it's a bit less off.

Is there a concrete reason ptr::NonNull doesn't just get the added guarantee of more niche space for the optimizer to work with? ptr::NonNull::dangling could then be the first aligned pointer after the niche space. Though I suppose ptr::NonNull::new_unchecked only requires the argument is non-zero, so I've answered my own question.

Definitely extending the niche space on &_ is useful, though, as that is fully under Rust's control.

I'm going to go back to not pretending I know how pointers work now

@nox
Copy link
Contributor

nox commented Apr 18, 2018

@CAD97 As I mentioned earlier, using alignment to increase the niche space of &_ is not as trivial as it seems.

@Amanieu
Copy link
Member

Amanieu commented Apr 18, 2018

I don't like the idea of crates being able to change the "null range" through an attribute. This is an ABI-breaking change and should only be configurable at the target level using a field in the target json file.

@oli-obk
Copy link
Contributor

oli-obk commented Apr 18, 2018

One thing about microcontrollers is that they tend to have little memory. So on embedded we could use the high addresses instead of the low ones. This would of course not be target specific but specific to the actual physical controller you are targetting.

@ishitatsuyuki
Copy link
Contributor Author

So @Amanieu said that the null range shouldn't be changed by crates. I also noticed that altering the size via an attribute only affects that crate, which isn't the thing we want to do on microcontrollers. @mark-i-m Can you elaborate on what options we have for stable microcontroller runtimes? Or, is it bad to hard-code the value per target inside rustc?


@oli-obk Yeah, using the high address is good, except it breaks the None is NULL convention. Do you think we should make this breaking change, or on the other hand use 0 plus a high range for the optimization?


Also, it seems that ZST references/pointers have their own troubles. I'm going to remove them from this RFC for the meanwhile (this can be discussed in a further RFC).

@mark-i-m
Copy link
Member

@oli-obk

This would of course not be target specific but specific to the actual physical controller you are targetting.

Sorry, I didn't quite understand this. "specific to the actual physical controller you are targetting" isn't "target specific"?

@ishitatsuyuki

@mark-i-m Can you elaborate on what options we have for stable microcontroller runtimes? Or, is it bad to hard-code the value per target inside rustc?

TBH, my knowledge of pure embedded systems spefically is limited, but in terms of other bare metal software (e.g. OS kernels), it seems that "null range" is a property of the target platform itself. For example, an OS kernel could choose where it wants to place the "null range" in virtual address spaces (currently most choose the first page, such as 0-4096 on x86).

At first glance, the target .json file seems like the ideal place for that:

{
    "llvm-target": "i686-unknown-none-gnu",
    "data-layout": "e-m:e-p:32:32-f64:32:64-f80:32-n8:16:32-S128",
    "target-endian": "little",
    "target-pointer-width": "32",
    "target-c-int-width": "32",
    "os": "none",
    "arch": "x86",
    "target-env": "gnu",
    "pre-link-args": [ "-m32" ],
    "features": "",
    "disable-redzone": true,
    "eliminate-frame-pointer": false,
    "linker-is-gnu": true,
    "no-compiler-rt": true,
    "archive-format": "gnu",
    "linker-flavor": "ld"

    // Add two more options
    "null-range": "0x0-0x1000"
    // or alternately
    "null-range": "none"
}

The compiler can then choose to take into account this range when laying out structures. I don't think anything about the null pointer optimization is specific to the value 0x0, right?

Of course, we could also add a -C flag or something, but I think this has the same problems as a per-crate attribute, right?

@oli-obk
Copy link
Contributor

oli-obk commented Apr 19, 2018

Sorry, I didn't quite understand this. "specific to the actual physical controller you are targetting" isn't "target specific"?

Well... I always see the targets as a specific processor, not the entire board. But for the address space everything connected to the memory BUS needs to be known.

@nagisa
Copy link
Member

nagisa commented Apr 20, 2018 via email

@pythonesque
Copy link
Contributor

pythonesque commented May 5, 2018

tl;dr I think this RFC is pretty useless as written, because of the existence of unexploited alignment bits and the nonexistence of more general layout optimizations like custom ranges, which could be exploited very cheaply for most of the same use cases this RFC is designed to cover. It could be made useful in combination with extremely aggressive size optimizations of aligned bits, but the performance cost those in those cases is high enough that you'd probably need to use a new kind of representation in order to benefit from them.

Long version:

People have talked about this only being useful if you only had pointers to "low alignment" types, and while that's true I want to more explicitly point out that it's only useful if you have pointers to byte aligned types. That's because using alignment bits for just carrying variants of an enum T without data gives you usize::MAX - usize::MAX / align_of::<T>() extra variants to play with, which for every type but u8 is much higher than what you get from just using the zero page on most architectures (especially those architectures where the zero page optimization would actually apply, since architectures without much memory probably wouldn't reserve the zero page this way). That is, even with 16-bit alignment you have half of all addresses available to use as tags. That should presumably apply to NonNull and Unique pointers as well. Next to that, any optimizations from using the zero page aren't exactly compelling--even if you assume the whole first 4GB is reserved on some 64-bit systems, that only gives you 2^32 variants, while exploiting tag bits on 16-bit aligned data gives you 2^63. I am pretty sure anyone with an enum with more than 2^63 variants wouldn't be satisfied with a measly 2^31 more :P.

Therefore, this RFC only makes sense in three contexts:

(1) more than one single variant with one pointer variant, where you need/want the single variants' numerical values to be tightly packed for some reason. I'm not really sure whether you'd really get much out of this though--in the alignment based solution, even though the pointers are spread out, they're spread out at aligned intervals, so it's pretty cheap to turn the variants into packed versions for the purpose of using a jump table or something: packed_variant = raw_variant - raw_variant & ~alignment_mask (there may even be much cheaper ways to do it). For most operations, like copying or storing data, the fact that they weren't packed would not really be relevant. Detecting the non-variant case is also easy--just check whether packed_variant & alignment_mask is zero--and is at least as cheap as doing so in the "numeric values are packed" case (since the latter requires a comparison). So I think this use case doesn't really justify including this.

(2) more than one single variant with one pointer variant to a type with 8-bit alignment (let's say an `&[u8]). It's true that byte slices are pretty common in Rust; however, I'm not convinced that there are a lot of use cases where this specific pattern matters for byte slices.

First, this wouldn't help almost any of the cases where a &[u8] or Vec<u8> is returned as part of an io::Result, because it has a non-nullary error variant. For the same reason, it wouldn't help cases like Cow<str> which frustratingly takes up 32 bytes instead of 24, or frankly most of the situations I've wanted to use byte slices or Vec<u8> in enums. There are probably some where it would be beneficial, but I'm not sure there are enough that it'd be worth the effort.

Fortunately, there is an approach that covers many such use cases. Rust guarantees that byte slices allocated from Vecs only have sizes up to isize::MAX, so your first use case should (with sufficient cleverness on Vec's part) be able to tell the type system that any larger values for length are free game (this is an example where a type system that can [unsafely] opt out of ranges on a type by type basis, working in tandem with a compiler that knows how to exploit them, can enable cleverer optimizations than either could do on its own). That would enable both Result types returning Vecs and Cow types to store the variant tag in the Vec's length field for everything but the variant that actually contained the Vec, which would greatly improve the memory consumption of these types.

To give another example of where being able to opt out of particular ranges for a type is useful: I have run into situations where I had an enum with three variants: a nullary one, and two that carried integers that I knew would never exceed i32::MAX. This presents an obvious encoding into an i32, with one of the variants taking the negative range, another 0, and a third the positive range. However, because Rust doesn't provide any way to explicitly opt out of particular ranges, I couldn't do that even if I was willing to manually make the values positive, and had to resort to a manual encoding on the i32. Such an optimization would be never be applied by the compiler unless it was asked to do so, because it's only safe due to the semantics of the code using the integer ranges, and it wouldn't be helped by the zero page optimization you're proposing.

It is true that such range-based solutions don't obviously help with byte slices, since those lengths are in general (I think) allowed to exceed isize::MAX. If you are working with enums with lots of variants and just one byte slice a lot, or have large nested option sequences with byte slices, then your proposal is worthwhile; but the wins there seem low priority to me compared to properly exploiting alignment and being able to specify legal ranges explicitly.

(3) In conjunction with using alignment bits for tags with data.

To me, this is by far the most interesting use case. The reason is that needing to shrink the sizes of enums with large, but not too large, numbers of data-carrying variants comes up a lot. In Rust right now, the best size you can hope for without copious amounts of unsafe code is to create a single enum with a variant for each kind of node, and box or reference the contents of each variant, which usually eats at least one word (and in practice at least two in many cases, since you often want to align AST nodes from an arena). With alignment bits used for variants, though, the size can go down to a single word, as long as the values the type points to have enough spare alignment bits that they can store all the variants. For instance, with 8-byte alignment (the usual alignment of a Box on a 64-bit system, which is already sort of mandated by the existence of a Box in one of the variants), you can hold up to 8 tagged variants containing pointers to one or more values of the same type--an incredibly common case for ASTs! That would essentially let you pack AST nodes as tight as possible outside of succinct implementations, and still keep them a single word. Not only that, but as long as you were willing to align all but 7 of the values (or whatever) at more than 8 byte boundaries (which in practice is often fine since jemalloc likes to allocate at 16-byte boundaries), at a performance cost you could use a variable length encoding and use different numbers of alignment bits per variant.

However, unlike the case where you're using alignment bits for nullary variants, it's quite possible that you would run out of alignment bits long before you ran out of nullary ones. Even with a cache-aligned encoding (to 64-byte boundaries, say) you'd only have at most 64 variants to use, and it would be fewer if you had to use a variable length encoding. In this case being able to use known illegal values for nullary variants would be quite compelling, I think!

Unfortunately, using alignment bits for tags for variants with data isn't free, since [at least in most of the cases I can think of?] it would be hard to avoid having to always mask any pointer "derived" from such a type before using it. Besides the operation itself taking time, that seems like it would lead to much more register pressure, since you have to leave the original pointer untouched. Even worse, I'm not sure how "tainting" pointers from the type would actually work with mutable references, since you'd need to be careful to avoid disturbing the alignment bits--since Rust doesn't tell you whether a reference to a type is part of a structure exploiting this optimization, it would be really hard to avoid having to changing all writes to pointers to explicitly be masked | assignments. Even if the operation itself is cheap, I imagine not using direct assignments and loads breaks a lot of optimization passes and confuses the branch predictor. Maybe I'm wrong and these are not really issues nowadays, since many modern runtimes like JavaScript, use tagged pointers pervasively, but it certainly seems like a lot of mandatory overhead on pointer writes of the same sort that GCs tend to induce, and Rust has strenuously tried to avoid adding in by default.

So, I conclude that you would probably only want to perform optimizations around using alignment bits for variants with data by (1) explicitly opting in, and (2) having them apply only to special pointer types (that always had their alignment bits masked out before an assignment). That way you could give them different codegen and/or semantics (for instance, you could disallow taking general references into the interiors of enums with repr(packed), which would avoid having to worry about mutation through them needing to be tracked, and change the codegen to mask out the alignment bits when reading from variants). But if you to have distinct pointer types [or at least representations] anyway in order for the alignment optimizations to kick in, why not incorporate the optimization you're describing only into that type, instead of applying it to all pointers? I think this is what @oli-obk was getting at, but I think I'm willing to make the stronger statement that a zero-page optimization isn't even useful without a "size optimized pointer type" as long as it's remotely feasible to exploit alignment bits.

@ishitatsuyuki
Copy link
Contributor Author

I agree that exploiting alignment bits may be a more powerful solution. But as you said, it won't work on &[u8], which means that it's not a universal solution either. Plus, it has additional complexity on implementing, compared to the null page because we already have some ranging semantics but not any framework for alignment bits.

Also, embedding data inside a pointer will violate the type system's contract, where you can always take address of a value. This means that (3) in your comment is basically not achievable without a completely new mechanism for such representations.

Please also note that although the name of this RFC primarily proposes to use the "zero page", it also provides various additions which is why this is written into a RFC (if we just wanted to compress enum, we can just implement it in the compiler without RFC discussion). The motivation, as well, is to expose a more type based API for embedding tag values inside a pointer.

@pythonesque
Copy link
Contributor

pythonesque commented May 5, 2018

Yes, (3) required brand new representations. But my argument is that in its absence, the only major use this RFC would have is for &[u8] slices (in theory it also covers things like bool slices but most people who care about space would be using packed representations for those already). I don't think that's that compelling a use case for an optimization so fragile (in the sense that it relies on extremely implementation specific details of the target platform to work, which are liable to change without warning). If we're going to perform dubious optimizations like that we might as well take advantage of the fact that the upper 16 bits of 64-bit pointers can't point to valid addresses on most Intel systems, (and I think the most any mainstream system allows right now is 52?) which if anything is actually less likely to change since JavaScript exploits that. That would limit the utility of the zero page optimization to enums with only one non-nullary variant and multiple nullary ones holding &[u8] slices on 32-bit platforms, I think... it feels very niche to me.

The original null pointer optimization was important mostly because Option<&T> lets you match C on space in the very common case where people use NULL to signal an error, and the main reason people felt confident in it is that in practice all modern architectures (essentially) don't use 0 for anything. But all the subsequent work around making the null pointer optimization more robust (by giving the compiler a general framework for understanding how to exploit unused values) feels much more powerful to me, and I'd rather we exploit that to the hilt before we start looking at more stuff that takes advantage of anything nonportable.

I especially think that any optimization like this should do way more than just extend the null guarantee to point a bit further; if you want to give platforms a way to opt out of bit ranges for pointers, why not go further? Generally speaking, on any platform that supports memory mapping, you should be able to guarantee that certain ranges are always unmapped, making them usable for variant data. The mapping could be either implicit (from the OS) or explicit (from the running program), and would certainly be unsafe, but at least that would be a general framework for doing this sort of optimization.

You can be even more precise if you have control over the allocator. Postgres does a cute thing where data allocated by each process lives in a disjoint memory address space with the same size; in Rust, if you set things up such that thread-local types were only ever allocated in the thread's local pool, you would have extra unused values such that the number was statically known, but the process of determining the mapping from their runtime value to their compile time value was dynamically determined based on the thread id. I don't realistically expect Rust to ever support anything insane like that automatically--I'm just pointing out that if you want to allow ruling out address values based on combining implementation details about the runtime with type system knowledge, there are far more possibilities than what this RFC proposes.

Please also note that although the name of this RFC primarily proposes to use the "zero page", it also provides various additions which is why this is written into a RFC (if we just wanted to compress enum, we can just implement it in the compiler without RFC discussion). The motivation, as well, is to expose a more type based API for embedding tag values inside a pointer.

The thing is, I don't really see how this RFC actually does that. It talks about some optimizations to & and &mut and wants to bring back the Shared pointer, but it all seems to be for the purpose of making the zero page optimization work. In particular, is it not the case that the issue withNonNull is that it's too specific about the fact that it's ruling out null instead of insisting on something stronger? That's why I wouldn't want to add a new type that makes the same mistake, just extending from 0 to a larger range.

A much more powerful invariant (that would actually make a meaningful difference in the kinds of uses it would have) would be something like "shared pointers may dangle, but must be assigned to an address that was a legal instance of T at some point previously in the program; failure to respect this turns their values into poison". I think that's both a much better interpretation of what Shared was actually supposed to be (a shared pointer without a lifetime), and provides clear semantics for what messing up entails; in particular, if their values only turn into poison if a write is invalid, it should be fine to set them to an illegal value provided that (1) poison spreads from the variants of an enum to the enum itself, and (2) you overwrite all poisoned data before it's read again.

Finally: I don't think there's actually that much value in extending the zero range from a semantics perspective. The proposed solution (transmute to usize) means that programs could still fail depending on the target platform, since the optimization wouldn't apply universally. Moreover, because the layout optimization is supposed to be "composable" in the way that Rust's enum layout optimizations are today, you couldn't necessarily switch on whether the zero page optimization was available (or even how many values it had) in order to provide a fallback path. You'd probably just end up either failing on those architectures, or providing a custom flag that users could specify if they wanted to use a platform that didn't have the optimization. The semantic guarantee would benefit the current implementation of BiLock (since it explicitly uses address values, so it could just switch on whether zero_page_optimization was enabled) but I thought the whole point was to not have to do low level implementations like that.

For that reason, even if this optimization were implemented, I think it would work better as an optimization than an actual guarantee (vs. something like alignment, which is actually guaranteed). Making it just an optimization discourages people from trying to do unnecessary low level bit hacking (which, as you note, isn't always portable) while at the same time practically addressing the actual issue here (enums take up more space than they need to).

@shepmaster
Copy link
Member

and 0 and 1 is a valid address where the entrypoint lies

This is also true for WASM, as far as I understand.

@Centril Centril added A-optimization Optimization related proposals & ideas A-repr #[repr(...)] related proposals & ideas labels Nov 22, 2018
@nikomatsakis
Copy link
Contributor

Hello! We discussed this RFC in the compiler team's backlog today, and we decided to close this RFC. The RFC itself seems to have a few different ideas (e.g., Shared<T> as a way to get a &T-like type without a lifetime, optimizing nested enums, etc) combined but they need more discussion, and we think the appropriate venue would be as part of the unsafe code guidelines work. As the thread has shown, there has already been discussion about a number of these goals and the ways we could achieve them. I would encourage folks in the thread to pursue these ideas, by opening issues on the https://github.com/rust-lang/unsafe-code-guidelines (if one doesn't already exist) or -- perhaps -- by experimentation with the implementation.

@joshtriplett
Copy link
Member

I would love to see a separated version of just the notion of reserved pointer values and using those as a niche.

@glandium
Copy link

The notion of reserved values (not only pointers) for use as niche would be great.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-optimization Optimization related proposals & ideas A-repr #[repr(...)] related proposals & ideas T-compiler Relevant to the compiler team, which will review and decide on the RFC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.