Attempt to improve codegen for arrays of repeated enums #104384

clubby789 · 2022-11-13T23:27:19Z

For enums where the tag and value are the same type, we can get better codegen for [Enum; N] by emitting a vector store instruction.

1024 as a limit was picked randomly and is probably too low - I'm unsure how to construct a vector of alternating elements without it being O(n)

rustbot · 2022-11-13T23:27:26Z

r? @wesleywiser

(rustbot has picked a reviewer for you, use r? to override)

scottmcm · 2022-11-14T07:18:34Z

src/test/codegen/enum-repeat.rs

+pub fn some_repeat() -> [Option<u8>; 64] {
+    // CHECK: store <128 x i8>
+    // CHECK-NEXT: ret void
+    [Some(0); 64]


It's interesting to me that some_repeat seems to do 16 bytes at a time on x64, but none_repeat doesn't. Might be interesting to look at why LLVM is treating them differently.

Also, out of curiosity, what's the assembly difference before/after this PR?

godbolt - This is none_repeat before and after LLVM opts - it looks like LLVM is optimising out the undefined write and forgetting about it. This happened even if I explicitly added store undef.

The current assembly output is

None; 64 - 64 single byte movs to the tag

Some(0); 64 - A 16 byte constant vector (alternating 0/1) is stored into the array with movups

Some(1); 64 - The same as Some(0), but with an all-1's pattern

With this patch

None; 64 - 16 byte vector created with xorps, then movups

Some(0); 64/Some(1); 64 - Same as before

scottmcm · 2022-11-14T07:28:16Z

I'm unsure how to construct a vector of alternating elements without it being O(n)

No idea if this is a good way, but the way that jumps to mind is to shufflevector together shorter ones double the length each time. Though that's still O(n) in the mask vector, so maybe using memory operations to do it would be better.

That said, the other thing that comes to mind is that it might be easier to lower the [x; N] to a call to some lang-item core::array::repeat::<_, N>(x), so these optimizations could be done in Rust code instead of codegen. Then other backends would get them too, and it's not obvious to me that making 1024-element vectors is the best way here. Is more than 64 bytes at once ever better, on current machines? In rust code it could, for example, loop over as_chunks_mut to write chunks at a time and let LLVM then optimize that.

scottmcm · 2022-11-14T07:37:13Z

compiler/rustc_codegen_llvm/src/builder.rs

+                let vec = unsafe { llvm::LLVMGetUndef(self.type_vector(ty, count as u64)) };
+                let vec = (0..count as usize).fold(vec, |acc, x| {
+                    let elt = [v1, v2][x % 2];
+                    self.insert_element(acc, elt, self.cx.const_i32(x as i32))


No idea if this is better, but you could do it in O(1) LLVM instructions by making a <2 x _> (with two insert_elements) and then repeating that one with a shuffle like

shufflevector <2 x i8> %x, <2 x i8> undef, <64 x i32> <i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1>

https://llvm.godbolt.org/z/5feKaEo5z

clubby789 · 2022-11-14T10:20:17Z

I've implemented it using shufflevector now, but I've noticed that this means that cases like Some(1); 1024 (which used to be a memset) are now a long series of vector stores. This seems like something LLVM should be handling but isn't - I'll see if I can get better emitted code.

@rustbot author

clubby789 · 2022-11-14T19:02:07Z

My original approach worked at the rustc_codegen_ssa level and did a memset with the tag value - that improved the None case, but the Some(1) case got much worse. Something along those lines might help

bors · 2022-11-17T05:43:54Z

☔ The latest upstream changes (presumably #103138) made this pull request unmergeable. Please resolve the merge conflicts.

clubby789 · 2022-11-17T10:54:22Z

I'm going to close this since it only targets LLVM, is a hacky way of solving the problem, and actually causes regressions 😅

rustbot assigned wesleywiser Nov 13, 2022

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Nov 13, 2022

Improve codegen for arrays of repeated enums

c1b1d73

clubby789 force-pushed the improve-enum-array branch from 8035543 to c1b1d73 Compare November 14, 2022 00:19

clubby789 marked this pull request as ready for review November 14, 2022 00:19

clubby789 changed the title ~~WIP: Attempt to improve codegen for arrays of repeated enums~~ Attempt to improve codegen for arrays of repeated enums Nov 14, 2022

scottmcm reviewed Nov 14, 2022

View reviewed changes

Repeat elements with shuffle_vector

4c12ecf

rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Nov 14, 2022

clubby789 mentioned this pull request Nov 15, 2022

Special case repeating fieldless enum variants #104452

Closed

clubby789 closed this Nov 17, 2022

clubby789 deleted the improve-enum-array branch February 11, 2023 14:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Attempt to improve codegen for arrays of repeated enums #104384

Attempt to improve codegen for arrays of repeated enums #104384

clubby789 commented Nov 13, 2022 •

edited

Loading

rustbot commented Nov 13, 2022

scottmcm Nov 14, 2022

clubby789 Nov 14, 2022

scottmcm commented Nov 14, 2022

scottmcm Nov 14, 2022

clubby789 commented Nov 14, 2022

clubby789 commented Nov 14, 2022

bors commented Nov 17, 2022

clubby789 commented Nov 17, 2022

Attempt to improve codegen for arrays of repeated enums #104384

Attempt to improve codegen for arrays of repeated enums #104384

Conversation

clubby789 commented Nov 13, 2022 • edited Loading

rustbot commented Nov 13, 2022

scottmcm Nov 14, 2022

Choose a reason for hiding this comment

clubby789 Nov 14, 2022

Choose a reason for hiding this comment

scottmcm commented Nov 14, 2022

scottmcm Nov 14, 2022

Choose a reason for hiding this comment

clubby789 commented Nov 14, 2022

clubby789 commented Nov 14, 2022

bors commented Nov 17, 2022

clubby789 commented Nov 17, 2022

clubby789 commented Nov 13, 2022 •

edited

Loading