Optimize enums by niche-filling even when the smaller variants also have data. #46213

eddyb · 2017-11-23T16:54:19Z

The current implementation only handles enums of roughly the form:

enum E {
    A(ALeft..., Niche, ARight...),
    B,
    C
}

This can be seen as a special-case, where B and C occupy 0 bytes, of:

enum E {
    A(ALeft..., Niche, ARight...),
    B(B...),
    C(C...)
}

As long as B and C can fit before or after A's Niche, we can still apply the optimization.
Also see rust-lang/rfcs#1230 (comment) for the initial description.

The text was updated successfully, but these errors were encountered:

est31 · 2017-11-23T18:17:12Z

what is "niche" in this case?

Gankra · 2017-11-23T18:25:15Z

A niche is a location in the type where some bit patterns aren't valid.

For instance struct Foo(u32, bool, u32) has a niche where the bool is, because only 0 and 1 are valid bools. Option<Foo> can therefore use Foo.1 == 2 to represent None.

est31 · 2017-11-23T18:30:08Z

@gankro thanks, that makes sense.

eddyb · 2017-11-23T18:37:37Z

We probably need better terminology, "invalid" values and "invalid reuse" optimization?

Gankra · 2017-11-23T18:40:53Z

Here is a test case we currently fail to optimize, but this issue would fix:

Result<Result<u32, u32>, u32>

The compiler sees:

enum Result<Result<u32, u32>, u32>  {
  Ok({ tag: 0u32 | 1u32, payload: u32 }),
  Err(u32),
}

which should lower to:

{ tag: 0u32 | 1u32 | 2u32, payload: u32 }

because the Err payload fits after the niche (Ok.tag).

leonardo-m · 2017-11-23T19:11:00Z

Before merging this PR I think we should see benchmarks of both memory saved and run-time saved for some real program, like the rust compiler and Servo.

leonardo-m · 2017-11-23T19:12:23Z

I have yet to see similar benchmaks for #45225 .

Gankra · 2017-11-23T19:20:22Z

Some slightly more complex cases from webrender that would also benefit from this:

// Use the Opacity.0.tag niche
pub enum FilterOp {
    Blur(f32),
    Brightness(f32),
    Contrast(f32),
    Grayscale(f32),
    HueRotate(f32),
    Invert(f32),
    Opacity(PropertyBinding<f32>, f32),
    Saturate(f32),
    Sepia(f32),
}

pub enum PropertyBinding<T> {
    Value(T),
    Binding(PropertyBindingKey<T>),
}

pub struct PropertyBindingKey<T> {
    pub id: PropertyBindingId,
    _phantom: PhantomData<T>,
}

pub struct PropertyBindingId {
    namespace: IdNamespace,
    uid: u32,
}

pub struct IdNamespace(pub u32);

// use the RoundedRect.1.mode.tag niche
pub enum LocalClip {
    Rect(LayoutRect),
    RoundedRect(LayoutRect, ComplexClipRegion),
}

#[repr(C)]
pub struct ComplexClipRegion {
    pub rect: LayoutRect,
    pub radii: BorderRadius,
    pub mode: ClipMode,
}

#[repr(C)]
pub struct BorderRadius {
    pub top_left: LayoutSize,
    pub top_right: LayoutSize,
    pub bottom_left: LayoutSize,
    pub bottom_right: LayoutSize,
}

#[repr(C)]
pub enum ClipMode {
    Clip,
    ClipOut,
}

#[repr(C)]
struct LayoutRect(f32, f32, f32, f32);

#[repr(C)]
struct LayoutSize(f32, f32);

abonander · 2017-12-17T17:47:45Z

A simpler case that I think fits this optimization (or a specialization of it as a starting point):

// my use-case, a single value is most common
enum OccupiedSmallVec {
     Single(Foo),
     Multiple(Vec<Foo>),
}

If Foo is 2 usizes or smaller, then OccupiedSmallVec should be the same size as Vec, as if it was implemented via this union:

#![feature(untagged_unions)]
union OccupiedSmallVec {
    // where `single.0 == 0`
    single: (usize, Foo),
    multiple: Vec<Foo>,
}

However, it currently requires a separate tag and padding to align the pointers, wasting basically another usize.

eddyb · 2017-12-17T18:03:22Z

@abonander Not really, that's the entire optimization, assuming Vec<Foo> starts with a non-null pointer, and you're packing Foo in the leftover space after that pointer.

eddyb · 2018-03-20T09:02:12Z

From IRC (cc @nox): we could extend this to always compute the size of the tagged layout (which would likely be larger than A itself) and allow using more space around A's fields up to that size, to fit other variants, if we generally want to avoid overlapping variant data where possible.

Then Result<&T, E> and similar would have the layout of (Option<&T>, E).

nox · 2018-03-20T09:06:27Z

Logs: https://botbot.me/mozilla/rustc/2018-03-20/?msg=98016356&page=1

killercup · 2018-06-04T15:07:57Z

IIUC, this optimization could also work to make Cow<str> and Cow<[T]> 3 words, right? (cf. discussion on reddit)

nox · 2018-06-04T15:09:10Z

Yes.

newpavlov · 2018-08-24T16:13:43Z

Another example which can be optimized is:

pub enum Bar {
    A = 1,
    B = 2,
    C = 3,
}

pub enum Baz {
    D = 4,
    E = 5,
    F = 6,
}

pub enum Foo {
    Bar(Bar),
    Baz(Baz),
}

Currently Foo takes 2 bytes, while it can be trivially represented as 1.

nox · 2018-08-24T16:21:53Z

@newpavlov That’s way harder to optimise because it becomes more complicated to compute the discriminant of a Foo value.

nagisa · 2018-08-24T18:00:12Z

Another example which can be optimized is:

This cannot happen, unless we make Bar and Baz types that have size less than byte, which is not something we can do (for variety reasons). Consider that it is possible to take an internal reference into data stored within some variant of Foo (i.e. either &Bar or &Baz) and this reference, when dereferenced should return the original data. To achieve that, the only way to do so is to use some of the "unused" bits to store the active variant of the enum, but that would then require bit twiddling when dereferencing any &Bar or &Baz, even if those weren’t stored within the enum.

RalfJung · 2018-08-24T18:01:48Z

@nagisa Bar and Baz have disjoint sets of valid values, though. So it could be done.

It doesn't fit the "niche" scheme, though.

newpavlov · 2018-08-24T18:03:43Z

@nox I think it can be handled as a special case, i.e. if all enum elements are field-less enums, check if tags do not overlap and fit into a minimal repr. Yes, making it more generic will be significantly more involved, but even with field-less constraint this optimization will help a lot for FFI and parsing cases. (currently you often have to keep huge amount of consts, unwieldy enums or sacrifice performance)

@nagisa Ah, I should've explicitly mentioned, I want it foremost for field-less enums. (see linked issue before my first message) More generic handling would be nice, but less important.

nox · 2018-08-26T16:54:03Z

It makes mem::discriminant way more complex. Currently computing the discriminant of any enum value is at most a comparison and a subtraction. Such optimisations would make the computation more complex, for example what would if let Foo::Bar(_) = foo { ... } compile to?

alercah · 2018-09-11T08:11:05Z

Optimizing for discriminant extraction seems, to me, far less valuable on average than optimizing for space. Optimizing for the ease of writing an intrisinc for it seems even less valuable. In the provided example, if let Foo::Bar(_) = foo { .. } is if *std::mem::transmute::<&u8>(&foo) <= 3 { ... }, which doesn't even have a subtraction.

alercah · 2018-09-11T08:40:42Z

I just now realized that std::mem::discriminant is, in general, not required to correspond to the specified discriminant. The Discriminant type does not directly expose the actual value of the discriminant, and like all intrinsics, the underlying intrinsic is unstable. Only the Debug and Hash instances expose it indirectly, and I don't think those should be considered guaranteed to be stable.

In fact, I believe that the only way you can observe the discriminant of a variant of a #[repr(Rust)] enum is via numeric cast, and you can't do that for an enum with fields. So I would propose that, at least for now, explicit discriminants only be permitted on enums with other representations.

upsuper · 2022-09-03T11:41:25Z

IIUC, this optimization could also work to make Cow<str> and Cow<[T]> 3 words, right? (cf. discussion on reddit)

So this could potentially regress accessing data of Cow with slice / str.

Currently, for code like

pub fn read_len(x: &std::borrow::Cow<str>) -> usize {
    x.len()
}

the compiler generates optimized code:

example::read_len:
        mov     rax, qword ptr [rdi]
        mov     rax, qword ptr [rdi + 8*rax + 16]
        ret

i.e. without check of the tag.

But if we optimize Cow to take advantage of the non-nullness of ptr, the ptr and len fields would no longer align across the two variants, and it would need to always do an extra tag check for accessing data.

SimonSapin · 2022-09-03T17:04:18Z

The code above does check the tag since the length field is at different offsets in the two variants, though the 8*rax multiplication makes the offset computation branchless.

With a more compact Cow<str> representation the length will be at the same offset so the code accessing it will be even simpler. However accessing the pointer field will regress and require a branch:

https://play.rust-lang.org/?version=stable&mode=release&edition=2021&gist=1b23594473567cc429a98932b9f8592d

playground::read_len:
	mov	rax, qword ptr [rdi + 16]
	ret

playground::read_ptr:
	mov	rax, qword ptr [rdi]
	test	rax, rax
	jne	.LBB1_2
	mov	rax, qword ptr [rdi + 8]

.LBB1_2:
	ret

pub union Cow {
    borrowed: (usize, &'static str),
    owned: std::mem::ManuallyDrop<String>,
    owned_repr: (*mut u8, usize, usize), // Hack for demo outside of std
}

impl std::ops::Deref for Cow {
    type Target = str;
    
    #[inline]
    fn deref(&self) -> &str {
        unsafe {
            if self.owned_repr.0.is_null() {
                self.borrowed.1
            } else {
                &**self.owned
            }
        }
    }
}

pub fn read_len(x: &Cow) -> usize {
    x.len()
}

pub fn read_ptr(x: &Cow) -> *const u8 {
    x.as_bytes().as_ptr()
}

oxalica · 2022-09-03T19:46:21Z

However accessing the pointer field will regress and require a branch:
playground::read_ptr:
	mov	rax, qword ptr [rdi]
	test	rax, rax
	jne	.LBB1_2
	mov	rax, qword ptr [rdi + 8]

.LBB1_2:
	ret

It's possible to eliminate that branch via CMOVE. Not sure why LLVM refuse to do that currently.
But it's definitely not blocked by niche-filling.

EDIT: The LLVM IR do optimize the code into select. Thus I'd assume LLVM thinks CMOV here would perform no better than JNE + MOV, considering it always reads the memory.

Use niche-filling optimization even when multiple variants have data. Fixes rust-lang#46213

eddyb mentioned this issue Nov 23, 2017

More Exotic Enum Layout Optimizations rust-lang/rfcs#1230

Closed

kennytm added the C-enhancement Category: An issue proposing an enhancement or a PR with one. label Nov 23, 2017

This comment has been minimized.

Sign in to view

nox mentioned this issue Mar 20, 2018

Layout optimization for Result<&T, E>-like types #48741

Closed

eddyb mentioned this issue Apr 10, 2018

Consider improving layout of this enum #24041

Closed

gnzlbg mentioned this issue May 28, 2018

Tracking issue for the GlobalAlloc trait and related APIs #49668

Closed

8 tasks

kennytm mentioned this issue Jul 12, 2018

Odd enum size_of computation with named struct variant #52283

Closed

leoschwarz mentioned this issue Aug 24, 2018

Optimize layout of nested field-less enums #53669

Closed

bors added a commit to rust-lang-ci/rust that referenced this issue Sep 7, 2022

Auto merge of rust-lang#94075 - mikebenfield:wip-enum, r=oli-obk

512bd84

Use niche-filling optimization even when multiple variants have data. Fixes rust-lang#46213

bors closed this as completed in d7a750b Sep 8, 2022

ikrivosheev mentioned this issue Oct 28, 2022

Add clippy into CI and fix clippy warnings rust-bakery/nom#1569

Closed

the8472 mentioned this issue Nov 24, 2022

Niche placement heuristic: place at beginning or end of type #104807

Closed

programmerjake mentioned this issue Jun 5, 2024

optimize enum of #[repr(int)] enums with non-overlapping discriminants to just use inner discriminant #126010

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize enums by niche-filling even when the smaller variants also have data. #46213

Optimize enums by niche-filling even when the smaller variants also have data. #46213

eddyb commented Nov 23, 2017

This comment has been minimized.

est31 commented Nov 23, 2017

Gankra commented Nov 23, 2017 •

edited

Loading

est31 commented Nov 23, 2017

eddyb commented Nov 23, 2017

Gankra commented Nov 23, 2017 •

edited

Loading

leonardo-m commented Nov 23, 2017

leonardo-m commented Nov 23, 2017

Gankra commented Nov 23, 2017 •

edited

Loading

abonander commented Dec 17, 2017 •

edited

Loading

eddyb commented Dec 17, 2017

eddyb commented Mar 20, 2018 •

edited

Loading

nox commented Mar 20, 2018

killercup commented Jun 4, 2018

nox commented Jun 4, 2018

newpavlov commented Aug 24, 2018

nox commented Aug 24, 2018 via email

nagisa commented Aug 24, 2018

RalfJung commented Aug 24, 2018

newpavlov commented Aug 24, 2018 •

edited

Loading

nox commented Aug 26, 2018

alercah commented Sep 11, 2018

alercah commented Sep 11, 2018 •

edited

Loading

upsuper commented Sep 3, 2022

SimonSapin commented Sep 3, 2022 •

edited

Loading

oxalica commented Sep 3, 2022 •

edited

Loading

Optimize enums by niche-filling even when the smaller variants also have data. #46213

Optimize enums by niche-filling even when the smaller variants also have data. #46213

Comments

eddyb commented Nov 23, 2017

This comment has been minimized.

est31 commented Nov 23, 2017

Gankra commented Nov 23, 2017 • edited Loading

est31 commented Nov 23, 2017

eddyb commented Nov 23, 2017

Gankra commented Nov 23, 2017 • edited Loading

leonardo-m commented Nov 23, 2017

leonardo-m commented Nov 23, 2017

Gankra commented Nov 23, 2017 • edited Loading

abonander commented Dec 17, 2017 • edited Loading

eddyb commented Dec 17, 2017

eddyb commented Mar 20, 2018 • edited Loading

nox commented Mar 20, 2018

killercup commented Jun 4, 2018

nox commented Jun 4, 2018

newpavlov commented Aug 24, 2018

nox commented Aug 24, 2018 via email

nagisa commented Aug 24, 2018

RalfJung commented Aug 24, 2018

newpavlov commented Aug 24, 2018 • edited Loading

nox commented Aug 26, 2018

alercah commented Sep 11, 2018

alercah commented Sep 11, 2018 • edited Loading

upsuper commented Sep 3, 2022

SimonSapin commented Sep 3, 2022 • edited Loading

oxalica commented Sep 3, 2022 • edited Loading

Gankra commented Nov 23, 2017 •

edited

Loading

Gankra commented Nov 23, 2017 •

edited

Loading

Gankra commented Nov 23, 2017 •

edited

Loading

abonander commented Dec 17, 2017 •

edited

Loading

eddyb commented Mar 20, 2018 •

edited

Loading

newpavlov commented Aug 24, 2018 •

edited

Loading

alercah commented Sep 11, 2018 •

edited

Loading

SimonSapin commented Sep 3, 2022 •

edited

Loading

oxalica commented Sep 3, 2022 •

edited

Loading