-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Attempt to improve codegen for arrays of repeated enums #104384
Conversation
r? @wesleywiser (rustbot has picked a reviewer for you, use r? to override) |
8035543
to
c1b1d73
Compare
pub fn some_repeat() -> [Option<u8>; 64] { | ||
// CHECK: store <128 x i8> | ||
// CHECK-NEXT: ret void | ||
[Some(0); 64] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's interesting to me that some_repeat
seems to do 16 bytes at a time on x64, but none_repeat
doesn't. Might be interesting to look at why LLVM is treating them differently.
Also, out of curiosity, what's the assembly difference before/after this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
godbolt - This is none_repeat
before and after LLVM opts - it looks like LLVM is optimising out the undefined write and forgetting about it. This happened even if I explicitly added store undef
.
The current assembly output is
None; 64
- 64 single bytemov
s to the tagSome(0); 64
- A 16 byte constant vector (alternating 0/1) is stored into the array withmovups
Some(1); 64
- The same asSome(0)
, but with an all-1's pattern
With this patch
None; 64
- 16 byte vector created withxorps
, thenmovups
Some(0); 64
/Some(1); 64
- Same as before
No idea if this is a good way, but the way that jumps to mind is to That said, the other thing that comes to mind is that it might be easier to lower the |
let vec = unsafe { llvm::LLVMGetUndef(self.type_vector(ty, count as u64)) }; | ||
let vec = (0..count as usize).fold(vec, |acc, x| { | ||
let elt = [v1, v2][x % 2]; | ||
self.insert_element(acc, elt, self.cx.const_i32(x as i32)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No idea if this is better, but you could do it in O(1) LLVM instructions by making a <2 x _>
(with two insert_elements) and then repeating that one with a shuffle like
shufflevector <2 x i8> %x, <2 x i8> undef, <64 x i32> <i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1>
I've implemented it using @rustbot author |
My original approach worked at the |
☔ The latest upstream changes (presumably #103138) made this pull request unmergeable. Please resolve the merge conflicts. |
I'm going to close this since it only targets LLVM, is a hacky way of solving the problem, and actually causes regressions 😅 |
Fixes #101685
For enums where the tag and value are the same type, we can get better codegen for
[Enum; N]
by emitting a vector store instruction.1024 as a limit was picked randomly and is probably too low - I'm unsure how to construct a vector of alternating elements without it being O(n)