-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cranelift: Optimize op+splat
into splat+op
in the mid-end
#6828
Comments
@afonso360 I'd like to pick this one up. Will be doing some reading on ISLE first. |
@afonso360 How do I see the generated code after all the mid-end opts are applied to a func? Some of my tests are failing for some operators. For example for
the following test fails:
with the following error:
So I just wanted to debug and see how the generated code differs from the test expectation. |
Oh! Right I forgot that we have a few special opcodes like We should be able to ignore those since the mid-end will never see them (I think!) before they are expanded into multiple operations. With everything that you have on that PR, I think you are just missing the transform for But to check the optimized code you can use the slightly complicated command:
This runs the whole compile pipeline, and |
Thanks for the info @afonso360 . The test for What about the family of shift and rotate ops such as |
Yeah, they should be! However note that So the optimization is slightly different. We only need to ensure that the lhs is splatted. So it would look something like this It's also probably worth adding a comment with that reasoning in the code! I didn't remember it myself until I tried it. The verifier shouldn't even let you build a |
👋 Hey,
Feature
This was pointed out by @jameysharp in #6815 (review)!
We should try to transform
(op (splat x) (splat y) ...)
into(splat (op x y ...))
for operations that support this.Benefit
This transforms SIMD operations into their scalar counterpart which should be beneficial. We also have better constant propagation on scalars, so this is also an opportunity to do that further.
RISC-V specifically really benefits from this optimization since we have opcodes that can eat the splat on the transformed version.
Implementation
The e-graphs mid end is awesome for stuff like this! Here's one rule:
Pasting this into
cranelift/codegen/src/opts/vector.isle
makes this tranform work onimul
!You can try running this test to verify that it works:
(Run this with
cargo run -- test ./the-above.clif
from the/cranelift
directory)This is also a good test to add to our testsuite in
cranelift/filetests/filetests/egraph/...
.There are so many opcodes that this optimization works for that I actually can't list them all. Here's a few:
iadd
,isub
,imul
,ineg
,iabs
,umulhi
,smulhi
,... Just look at our opcode list and a lot of them will work out.Where it wont work:
idiv
/urem
/srem
). This doesn't work because we currently don't perform optimizations on these operations (see cranelift/egraphs: allow simplifying trapping arithmetic #5908)Where it may not work:
Alternatives
There are so many opcodes for which this rule can be implemented that it may be worth considering auto-generating it. However I doubt it would be worth the effort + maintenance complexity of that. Copy pasting a bunch of times is sometimes better!
The text was updated successfully, but these errors were encountered: