Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Thumb] Resolve FIXME: Use 'mov hi, $src; mov $dst, hi' #81908

Merged
merged 1 commit into from
Apr 5, 2024

Conversation

AZero13
Copy link
Contributor

@AZero13 AZero13 commented Feb 15, 2024

Consider the following:

    ldr     r0, [r4]
    ldr     r7, [r0, #4]
    cmp     r7, r3
    bhi     .LBB0_6
    cmp     r0, r2
    push    {r0}
    pop     {r4}
    bne     .LBB0_3
    movs    r0, r6
    pop     {r4, r5, r6, r7}
    pop     {r1}
    bx      r1

Here is a snippet of the generated THUMB1 code of the K&R malloc function that clang currently compiles to.

push {r0} ends up being popped to pop {r4}.

movs r4, r0 would destroy the flags set by cmp right above.

The compiler has no alternative in this case, except one:
the only alternative is to transfer through a high register.

However, it seems like LLVM does not consider that this is a valid approach, even though it is a free clobbering a high register.

This patch addresses the FIXME so the compiler can do that when it can in r10 or r11, or r12.

@llvmbot
Copy link
Member

llvmbot commented Feb 15, 2024

@llvm/pr-subscribers-backend-arm

Author: AtariDreams (AtariDreams)

Changes

Full diff: https://github.com/llvm/llvm-project/pull/81908.diff

1 Files Affected:

  • (modified) llvm/lib/Target/ARM/Thumb1InstrInfo.cpp (+29-3)
diff --git a/llvm/lib/Target/ARM/Thumb1InstrInfo.cpp b/llvm/lib/Target/ARM/Thumb1InstrInfo.cpp
index 85eabdb17ad190..01b8c76674d207 100644
--- a/llvm/lib/Target/ARM/Thumb1InstrInfo.cpp
+++ b/llvm/lib/Target/ARM/Thumb1InstrInfo.cpp
@@ -53,9 +53,6 @@ void Thumb1InstrInfo::copyPhysReg(MachineBasicBlock &MBB,
         .addReg(SrcReg, getKillRegState(KillSrc))
         .add(predOps(ARMCC::AL));
   else {
-    // FIXME: Can also use 'mov hi, $src; mov $dst, hi',
-    // with hi as either r10 or r11.
-
     const TargetRegisterInfo *RegInfo = st.getRegisterInfo();
     if (MBB.computeRegisterLiveness(RegInfo, ARM::CPSR, I)
         == MachineBasicBlock::LQR_Dead) {
@@ -65,6 +62,35 @@ void Thumb1InstrInfo::copyPhysReg(MachineBasicBlock &MBB,
       return;
     }
 
+    // Can also use 'mov hi, $src; mov $dst, hi',
+    // with hi as either r10 or r11.
+    if (MBB.computeRegisterLiveness(RegInfo, ARM::R10, I) ==
+        MachineBasicBlock::LQR_Dead) {
+      // Use high register to move source to destination
+      BuildMI(MBB, I, DL, get(ARM::tMOVr), ARM::R10)
+          .addReg(SrcReg, getKillRegState(KillSrc))
+          .add(predOps(ARMCC::AL));
+      BuildMI(MBB, I, DL, get(ARM::tMOVr), DestReg)
+          .addReg(ARM::R10, getKillRegState(KillSrc))
+          .add(predOps(ARMCC::AL))
+          ->addRegisterDead(ARM::R10, RegInfo);
+      return;
+    }
+
+    if (MBB.computeRegisterLiveness(RegInfo, ARM::R11, I) ==
+        MachineBasicBlock::LQR_Dead) {
+      // Use high register to move source to destination
+      BuildMI(MBB, I, DL, get(ARM::tMOVr), ARM::R11)
+          .addReg(SrcReg, getKillRegState(KillSrc))
+          .add(predOps(ARMCC::AL));
+      BuildMI(MBB, I, DL, get(ARM::tMOVr), DestReg)
+          .addReg(ARM::R11, getKillRegState(KillSrc))
+          .add(predOps(ARMCC::AL))
+          ->addRegisterDead(ARM::R11, RegInfo);
+      ;
+      return;
+    }
+
     // 'MOV lo, lo' is unpredictable on < v6, so use the stack to do it
     BuildMI(MBB, I, DL, get(ARM::tPUSH))
         .add(predOps(ARMCC::AL))

Copy link

⚠️ We detected that you are using a GitHub private e-mail address to contribute to the repo.
Please turn off Keep my email addresses private setting in your account.
See LLVM Discourse for more information.

@AZero13 AZero13 force-pushed the fixme branch 2 times, most recently from b34cf5b to 3382508 Compare February 15, 2024 21:30
@AZero13 AZero13 changed the title [Thumb1] Resolve Fixme: use 'mov hi, $src; mov $dst, hi' [Thumb1] Resolve FIXME: use 'mov hi, $src; mov $dst, hi' Feb 15, 2024
@AZero13
Copy link
Contributor Author

AZero13 commented Feb 15, 2024

@nikic

@AZero13
Copy link
Contributor Author

AZero13 commented Feb 15, 2024

Full source if anyone is curious:

https://godbolt.org/z/h7Pq3s6dM

@efriedma-quic
Copy link
Collaborator

(See https://lists.llvm.org/pipermail/llvm-dev/2014-August/075967.html for additional context.)

r10 and r11 are callee-save, so you'd need to make sure they get spilled. We have code to do the spilling, but it's not triggering here. I guess this code runs too late for that to trigger?

@AZero13
Copy link
Contributor Author

AZero13 commented Feb 15, 2024

(See https://lists.llvm.org/pipermail/llvm-dev/2014-August/075967.html for additional context.)

r10 and r11 are callee-save, so you'd need to make sure they get spilled. We have code to do the spilling, but it's not triggering here. I guess this code runs too late for that to trigger?

Pretty sure only r12 is, hence why this is not an option.

@AZero13
Copy link
Contributor Author

AZero13 commented Feb 15, 2024

(See https://lists.llvm.org/pipermail/llvm-dev/2014-August/075967.html for additional context.)

r10 and r11 are callee-save, so you'd need to make sure they get spilled. We have code to do the spilling, but it's not triggering here. I guess this code runs too late for that to trigger?

r10 - Caller save (in ARM mode, unused in Thumb mode)
r11 Caller save (in ARM mode, unused in Thumb mode)

@AZero13
Copy link
Contributor Author

AZero13 commented Feb 15, 2024

r10 and r11 are callee-save, so you'd need to make sure they get spilled

r10 and r11 are callee-save, so you'd need to make sure they get spilled

I only think this is an issue if the compiler were to randomly switch between thumb and arm mode and vice versa mid function, which is a thing that only really happens in handwritten code anyway.

@efriedma-quic
Copy link
Collaborator

Pretty sure that's just wrong (or referring to some other calling convention). From the specification (https://github.com/ARM-software/abi-aa/releases):

A subroutine must preserve the contents of the registers r4-r8, r10, r11 and SP (and r9 in PCS variants that designate r9 as v6).

@AZero13 AZero13 marked this pull request as draft February 15, 2024 22:53
@AZero13 AZero13 marked this pull request as ready for review February 16, 2024 16:26
@AZero13 AZero13 force-pushed the fixme branch 7 times, most recently from 2ddc5ab to fbd4a85 Compare February 16, 2024 17:59
@AZero13
Copy link
Contributor Author

AZero13 commented Feb 16, 2024

@efriedma-quic Okay, I fixed the issues!

@AZero13 AZero13 force-pushed the fixme branch 2 times, most recently from ed0251c to f3cdd9c Compare February 16, 2024 18:04
@efriedma-quic
Copy link
Collaborator

For r10 and r11, as the tests show, they're almost never available; the compiler won't naturally spill them. You can force them to spill using inline asm, but that's so rare it's not really relevant. So I don't really see the point of checking, unless you're planning some sort of followup.

For r12... this is probably okay? We don't use r12 in the scavenger anymore (see ab1d73e). We still use R12 as a scratch register in a few different places, but not in a way that would conflict, I think. It would be helpful if you could take some time to verify that.

If there's some way r12 can actually be live (inline asm?), we should make sure there's at least one test that actually still exercises the code to generate push/pop.

@AZero13 AZero13 force-pushed the fixme branch 10 times, most recently from 1fb69eb to 7b6cca2 Compare March 12, 2024 19:57
@AZero13
Copy link
Contributor Author

AZero13 commented Mar 14, 2024

@efriedma-quic Ping?

@AZero13
Copy link
Contributor Author

AZero13 commented Mar 28, 2024

@topperc Thoughts?

Copy link
Contributor

@TNorthover TNorthover left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly looks pretty sensible. Couple of comments.

llvm/lib/Target/ARM/Thumb1InstrInfo.cpp Show resolved Hide resolved
llvm/lib/Target/ARM/Thumb1InstrInfo.cpp Outdated Show resolved Hide resolved
@AZero13
Copy link
Contributor Author

AZero13 commented Apr 2, 2024

@TNorthover Addressed!

Copy link
Contributor

@TNorthover TNorthover left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I think this looks good now.

@AZero13 AZero13 force-pushed the fixme branch 2 times, most recently from ad1712f to 9ab332a Compare April 2, 2024 15:45
@AZero13 AZero13 changed the title [Thumb1] Resolve FIXME: Use 'mov hi, $src; mov $dst, hi' [Thumb] Resolve FIXME: Use 'mov hi, $src; mov $dst, hi' Apr 2, 2024
Consider the following:

        ldr     r0, [r4]
        ldr     r7, [r0, #4]
        cmp     r7, r3
        bhi     .LBB0_6
        cmp     r0, r2
        push    {r0}
        pop     {r4}
        bne     .LBB0_3
        movs    r0, r6
        pop     {r4, r5, r6, r7}
        pop     {r1}
        bx      r1

Here is a snippet of the generated THUMB1 code of the K&R malloc function that clang currently compiles to.

push    {r0} ends up being popped to pop {r4}.

movs r4, r0 would destroy the flags set by cmp right above.

The compiler has no alternative in this case, except one:
the only alternative is to transfer through a high register.

However, it seems like LLVM does not consider that this is a valid approach, even though it is a free clobbering a high register.

This patch addresses the FIXME so the compiler can do that when it can in r10 or r11, or r12.
@AZero13 AZero13 requested a review from TNorthover April 3, 2024 14:29
@TNorthover TNorthover merged commit c5d000b into llvm:main Apr 5, 2024
4 checks passed
@AZero13 AZero13 deleted the fixme branch April 5, 2024 11:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants