-
Notifications
You must be signed in to change notification settings - Fork 12.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Return values up to 128 bits in registers #76986
Conversation
r? @estebank (rust_highfive has picked a reviewer for you, use r? to override) |
Is there a reason not to do this on all x86_64 architectures? @bors try @rust-timer queue |
Awaiting bors try build completion |
⌛ Trying commit 1be289d9c76441135b72186b133e949411657024 with merge 90ed0d04e2f640ecd65d3b39ffd37f7fd9b3cd25... |
☀️ Try build successful - checks-actions, checks-azure |
Queued 90ed0d04e2f640ecd65d3b39ffd37f7fd9b3cd25 with parent 1fd5b9d, future comparison URL. |
Finished benchmarking try commit (90ed0d04e2f640ecd65d3b39ffd37f7fd9b3cd25): comparison url. Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up. @bors rollup=never |
Hey that's neat, the improvements are in the compiler itself, so this seems to have a practical impact! It looks like most regressions are in LLVM, which is somewhat expected, since it might deal differently with the code now. |
The generated assembly here doesn't seem to match the code. It looks like you've used example::sum_c:
mov eax, dword ptr [rsi]
add eax, dword ptr [rdi]
mov ecx, dword ptr [rsi + 4]
add ecx, dword ptr [rdi + 4]
mov edx, dword ptr [rsi + 8]
add edx, dword ptr [rdi + 8]
shl rcx, 32
or rax, rcx
ret |
@calebsander ah, sorry, my mistake. Updated the snippet. |
Why not make this change for all 64bit platforms? All major ones do this in C: https://godbolt.org/z/ds1ezh |
Because I only know x86_64 |
Can you add a codegen test for this that verifies we don't materialize the return value into stack/memory? r=me after that. |
@bors r=nagisa |
📌 Commit cc2ba3b has been approved by |
☀️ Test successful - checks-actions, checks-azure |
The final perf results for this PR are in. Instruction counts have increased on most benchmarks, and task-clock shows no improvement. This is a bit disappointing, as I expected this PR to be a pretty clear win. The try run also showed small losses across the board (never trust the emoji), although stress tests of the trait resolution code ( |
Oh, that's disappointing. But this PR was mostly aimed at improving the generated code, not speeding up rustc. Seems likely that #77041 has resulted in the same improvements I saw here. |
It looks to me like the only heavy operation here is a string comparison for Can it be removed somehow? For example replace it with a check for pointer size == 64 |
The regressions are in LLVM from what I can tell, not the code I added |
Oh I see, that's too bad. |
Ah that's true. It seems like |
/// Returns the maximum size of return values to be passed by value in the Rust ABI. | ||
/// | ||
/// Return values beyond this size will use an implicit out-pointer instead. | ||
pub fn max_ret_by_val<C: HasTargetSpec + HasDataLayout>(spec: &C) -> Size { | ||
match spec.target_spec().arch.as_str() { | ||
// System-V will pass return values up to 128 bits in RAX/RDX. | ||
"x86_64" => Size::from_bits(128), | ||
|
||
_ => spec.data_layout().pointer_size, | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there anything wrong with 2 * pointer_size
? IIRC we already return pairs in two registers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is, we already do 2 * pointer_size
on all architectures, just not for arbitrary data, only pairs of scalars.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that makes sense. Opened #77434.
…-boogalo, r=nagisa Returns values up to 2*usize by value Addresses rust-lang#76986 (comment) and rust-lang#76986 (comment) by doing the optimization on all targets. This matches what we do for functions returning `&[T]` and other fat pointers, so it should be Harmless™
This fixes #26494 (comment) by making Rust's default ABI pass return values up to 128 bits in size in registers, just like the System V ABI.
The result is that these methods from the comment linked above now generate the same code, making the Rust ABI as efficient as the
"C"
ABI: