Improve core::intrinsics::black_box
output.
#99899
Labels
A-codegen
Area: Code generation
A-intrinsics
Area: Intrinsics
C-enhancement
Category: An issue proposing an enhancement or a PR with one.
T-compiler
Relevant to the compiler team, which will review and decide on the PR/issue.
On discord, the user kangalioo (unsure of github name) shared a custom version of the
black_box
(#64102) function they're using to improve the asm output of black_box, and reduce the overhead of its use. It does this by passing small things in registers instead of by pointers.Which produces the following output:
In comparison, the current black box
black_box
spills the output in basically all cases. The equivalent output with the currentblack_box
is as follows (Godbolt for all this is available here https://godbolt.org/z/a7evcEP6x):I believe this is basically because we just lower the intrinsic as passing a pointer to the value into an inline asm block, which forces the spilling.
I don't believe this can be fixed by libs changes, as we are just calling into an intrinsic and need to remain that way to support all targets (and cases like miri). Additionally, the version posted in discord has a soundness hole, and is considered UB if
T
contains padding bytes (and can't be fixed at the moment as passingMaybeUninit
via registers isn't currently possible).However, because we just pass the argument to an intrinsic, it seems likely that the compiler can lower it in a more optimal way, which seems to be a less error-prone way of handling this anyway.
Improving this output seems beneficial, since the whole point of this intrinsic is to have as close to 0 cost as possible while still providing an optimization barrier. I think the basic idea behind the
black_box
provided above is a reasonable starting point of what would be good, but it's obviously not a requirement that it's lowered in that manner.The text was updated successfully, but these errors were encountered: