Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cranelift(Aarch64): Optimize lowering of icmps with immediates #5252

Conversation

fitzgen
Copy link
Member

@fitzgen fitzgen commented Nov 10, 2022

We can encode more constants into 12-bit immediates if we do the following rewrite for comparisons with odd constants:

    A >= B + 1
==> A - 1 >= B
==> A > B

(And similar for less-than-or-equals.)

@fitzgen fitzgen requested a review from elliottt November 10, 2022 23:58
Comment on lines +96 to +102
; movz w10, #65512
; subs xzr, x9, x10
; b.ls label1 ; b label2
; block1:
; add x11, x0, x1, UXTW
; add x11, x11, #16
; movz x10, #65512
; movz w10, #65512
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why these changed here, but also I think it is benign since movz is zero-extending the value?

@fitzgen
Copy link
Member Author

fitzgen commented Nov 11, 2022

Here is an example of what we used to lower these to (i.e. what happens when I comment out the rules that implement these rewrites):

--- expected
+++ actual
@@ -1,4 +1,6 @@
 block0:
-  subs wzr, w0, #1118208
-  cset x0, hi
+  movz w2, #4097
+  movk w2, w2, #17, LSL #16
+  subs wzr, w0, w2
+  cset x0, hs
   ret

@github-actions github-actions bot added cranelift Issues related to the Cranelift code generator cranelift:area:aarch64 Issues related to AArch64 backend. labels Nov 11, 2022
Copy link
Member

@elliottt elliottt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me!

We should investigate adding something like FlagsAndCC to the prelude, as there are similar types defined in both the x64 and s390x backends.

We can encode more constants into 12-bit immediates if we do the following
rewrite for comparisons with odd constants:

        A >= B + 1
    ==> A - 1 >= B
    ==> A > B

(And similar for less-than-or-equals.)
@fitzgen fitzgen force-pushed the optimize-comparisons-with-immediates-on-aarch64 branch from d1f5fef to 5e03b65 Compare November 15, 2022 16:36
@fitzgen
Copy link
Member Author

fitzgen commented Nov 15, 2022

For those following along at home: the test failure from the previous iteration of this PR was because the rules previously implemented for this "same" optimization for turning less-than-or-equal into less-than was incorrect (it would apply if the constant was on the LHS, but was firing when the constant was on the RHS).

In the future, we could try and "rotate" const <= value into value >= const and then go through this same optimization path so that we don't need to duplicate the rules but with tiny tweaks. I opted not to do that in this PR and removed the buggy optimizations to just get this land-able.

Comment on lines +132 to +135
fn make_imm12_from_u64(&mut self, n: u64) -> Option<Imm12> {
Imm12::maybe_from_u64(n)
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is make_imm12_from_u64 separate from imm12_from_u64? They look identical to me.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One is a pure constructor and the other is a fallible extractor. They are identical but can't replace one another in ISLE's type system.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't it be an infallible extractor? Then there'd be no Option in the return type, and the same rust function could be reused for the extractor and constructor in isle.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scratch that -- I just realized that we're relying on the failure behavior in the extractor case :)

Copy link
Member Author

@fitzgen fitzgen Nov 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can't be infallible because not all u64s can be represented with Imm12s

edit: yes, exactly

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. But then you could have used the extractor instead of a constructor, right?

(if-let (imm12_from_u64 imm) (u64_sub b 1))

@fitzgen fitzgen merged commit 9967782 into bytecodealliance:main Nov 15, 2022
@fitzgen fitzgen deleted the optimize-comparisons-with-immediates-on-aarch64 branch November 15, 2022 17:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cranelift:area:aarch64 Issues related to AArch64 backend. cranelift Issues related to the Cranelift code generator
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants