-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize enums by niche-filling even when the smaller variants also have data. #46213
Comments
This comment has been minimized.
This comment has been minimized.
what is "niche" in this case? |
A niche is a location in the type where some bit patterns aren't valid. For instance |
@gankro thanks, that makes sense. |
We probably need better terminology, "invalid" values and "invalid reuse" optimization? |
Here is a test case we currently fail to optimize, but this issue would fix:
The compiler sees:
which should lower to:
because the Err payload fits after the niche ( |
Before merging this PR I think we should see benchmarks of both memory saved and run-time saved for some real program, like the rust compiler and Servo. |
I have yet to see similar benchmaks for #45225 . |
Some slightly more complex cases from webrender that would also benefit from this: // Use the Opacity.0.tag niche
pub enum FilterOp {
Blur(f32),
Brightness(f32),
Contrast(f32),
Grayscale(f32),
HueRotate(f32),
Invert(f32),
Opacity(PropertyBinding<f32>, f32),
Saturate(f32),
Sepia(f32),
}
pub enum PropertyBinding<T> {
Value(T),
Binding(PropertyBindingKey<T>),
}
pub struct PropertyBindingKey<T> {
pub id: PropertyBindingId,
_phantom: PhantomData<T>,
}
pub struct PropertyBindingId {
namespace: IdNamespace,
uid: u32,
}
pub struct IdNamespace(pub u32); // use the RoundedRect.1.mode.tag niche
pub enum LocalClip {
Rect(LayoutRect),
RoundedRect(LayoutRect, ComplexClipRegion),
}
#[repr(C)]
pub struct ComplexClipRegion {
pub rect: LayoutRect,
pub radii: BorderRadius,
pub mode: ClipMode,
}
#[repr(C)]
pub struct BorderRadius {
pub top_left: LayoutSize,
pub top_right: LayoutSize,
pub bottom_left: LayoutSize,
pub bottom_right: LayoutSize,
}
#[repr(C)]
pub enum ClipMode {
Clip,
ClipOut,
}
#[repr(C)]
struct LayoutRect(f32, f32, f32, f32);
#[repr(C)]
struct LayoutSize(f32, f32); |
A simpler case that I think fits this optimization (or a specialization of it as a starting point): // my use-case, a single value is most common
enum OccupiedSmallVec {
Single(Foo),
Multiple(Vec<Foo>),
} If #![feature(untagged_unions)]
union OccupiedSmallVec {
// where `single.0 == 0`
single: (usize, Foo),
multiple: Vec<Foo>,
} However, it currently requires a separate tag and padding to align the pointers, wasting basically another |
@abonander Not really, that's the entire optimization, assuming |
From IRC (cc @nox): we could extend this to always compute the size of the tagged layout (which would likely be larger than Then |
IIUC, this optimization could also work to make |
Yes. |
Another example which can be optimized is:
Currently |
@newpavlov That’s way harder to optimise because it becomes more complicated to compute the discriminant of a Foo value.
|
This cannot happen, unless we make |
@nagisa It doesn't fit the "niche" scheme, though. |
@nox I think it can be handled as a special case, i.e. if all enum elements are field-less enums, check if tags do not overlap and fit into a minimal repr. Yes, making it more generic will be significantly more involved, but even with field-less constraint this optimization will help a lot for FFI and parsing cases. (currently you often have to keep huge amount of @nagisa Ah, I should've explicitly mentioned, I want it foremost for field-less enums. (see linked issue before my first message) More generic handling would be nice, but less important. |
It makes |
Optimizing for discriminant extraction seems, to me, far less valuable on average than optimizing for space. Optimizing for the ease of writing an intrisinc for it seems even less valuable. In the provided example, |
I just now realized that In fact, I believe that the only way you can observe the discriminant of a variant of a |
So this could potentially regress accessing data of Currently, for code like pub fn read_len(x: &std::borrow::Cow<str>) -> usize {
x.len()
} the compiler generates optimized code: example::read_len:
mov rax, qword ptr [rdi]
mov rax, qword ptr [rdi + 8*rax + 16]
ret i.e. without check of the tag. But if we optimize |
The code above does check the tag since the length field is at different offsets in the two variants, though the With a more compact playground::read_len:
mov rax, qword ptr [rdi + 16]
ret
playground::read_ptr:
mov rax, qword ptr [rdi]
test rax, rax
jne .LBB1_2
mov rax, qword ptr [rdi + 8]
.LBB1_2:
ret pub union Cow {
borrowed: (usize, &'static str),
owned: std::mem::ManuallyDrop<String>,
owned_repr: (*mut u8, usize, usize), // Hack for demo outside of std
}
impl std::ops::Deref for Cow {
type Target = str;
#[inline]
fn deref(&self) -> &str {
unsafe {
if self.owned_repr.0.is_null() {
self.borrowed.1
} else {
&**self.owned
}
}
}
}
pub fn read_len(x: &Cow) -> usize {
x.len()
}
pub fn read_ptr(x: &Cow) -> *const u8 {
x.as_bytes().as_ptr()
} |
It's possible to eliminate that branch via EDIT: The LLVM IR do optimize the code into |
Use niche-filling optimization even when multiple variants have data. Fixes rust-lang#46213
The current implementation only handles enums of roughly the form:
This can be seen as a special-case, where B and C occupy 0 bytes, of:
As long as
B
andC
can fit before or afterA
'sNiche
, we can still apply the optimization.Also see rust-lang/rfcs#1230 (comment) for the initial description.
The text was updated successfully, but these errors were encountered: