Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Thumb] Resolve FIXME: Use 'mov hi, $src; mov $dst, hi' #81908

Merged
merged 1 commit into from
Apr 5, 2024

Conversation

AreaZR
Copy link
Contributor

@AreaZR AreaZR commented Feb 15, 2024

Consider the following:

    ldr     r0, [r4]
    ldr     r7, [r0, #4]
    cmp     r7, r3
    bhi     .LBB0_6
    cmp     r0, r2
    push    {r0}
    pop     {r4}
    bne     .LBB0_3
    movs    r0, r6
    pop     {r4, r5, r6, r7}
    pop     {r1}
    bx      r1

Here is a snippet of the generated THUMB1 code of the K&R malloc function that clang currently compiles to.

push {r0} ends up being popped to pop {r4}.

movs r4, r0 would destroy the flags set by cmp right above.

The compiler has no alternative in this case, except one:
the only alternative is to transfer through a high register.

However, it seems like LLVM does not consider that this is a valid approach, even though it is a free clobbering a high register.

This patch addresses the FIXME so the compiler can do that when it can in r10 or r11, or r12.

@llvmbot
Copy link
Collaborator

llvmbot commented Feb 15, 2024

@llvm/pr-subscribers-backend-arm

Author: AtariDreams (AtariDreams)

Changes

Full diff: https://github.com/llvm/llvm-project/pull/81908.diff

1 Files Affected:

  • (modified) llvm/lib/Target/ARM/Thumb1InstrInfo.cpp (+29-3)
diff --git a/llvm/lib/Target/ARM/Thumb1InstrInfo.cpp b/llvm/lib/Target/ARM/Thumb1InstrInfo.cpp
index 85eabdb17ad190..01b8c76674d207 100644
--- a/llvm/lib/Target/ARM/Thumb1InstrInfo.cpp
+++ b/llvm/lib/Target/ARM/Thumb1InstrInfo.cpp
@@ -53,9 +53,6 @@ void Thumb1InstrInfo::copyPhysReg(MachineBasicBlock &MBB,
         .addReg(SrcReg, getKillRegState(KillSrc))
         .add(predOps(ARMCC::AL));
   else {
-    // FIXME: Can also use 'mov hi, $src; mov $dst, hi',
-    // with hi as either r10 or r11.
-
     const TargetRegisterInfo *RegInfo = st.getRegisterInfo();
     if (MBB.computeRegisterLiveness(RegInfo, ARM::CPSR, I)
         == MachineBasicBlock::LQR_Dead) {
@@ -65,6 +62,35 @@ void Thumb1InstrInfo::copyPhysReg(MachineBasicBlock &MBB,
       return;
     }
 
+    // Can also use 'mov hi, $src; mov $dst, hi',
+    // with hi as either r10 or r11.
+    if (MBB.computeRegisterLiveness(RegInfo, ARM::R10, I) ==
+        MachineBasicBlock::LQR_Dead) {
+      // Use high register to move source to destination
+      BuildMI(MBB, I, DL, get(ARM::tMOVr), ARM::R10)
+          .addReg(SrcReg, getKillRegState(KillSrc))
+          .add(predOps(ARMCC::AL));
+      BuildMI(MBB, I, DL, get(ARM::tMOVr), DestReg)
+          .addReg(ARM::R10, getKillRegState(KillSrc))
+          .add(predOps(ARMCC::AL))
+          ->addRegisterDead(ARM::R10, RegInfo);
+      return;
+    }
+
+    if (MBB.computeRegisterLiveness(RegInfo, ARM::R11, I) ==
+        MachineBasicBlock::LQR_Dead) {
+      // Use high register to move source to destination
+      BuildMI(MBB, I, DL, get(ARM::tMOVr), ARM::R11)
+          .addReg(SrcReg, getKillRegState(KillSrc))
+          .add(predOps(ARMCC::AL));
+      BuildMI(MBB, I, DL, get(ARM::tMOVr), DestReg)
+          .addReg(ARM::R11, getKillRegState(KillSrc))
+          .add(predOps(ARMCC::AL))
+          ->addRegisterDead(ARM::R11, RegInfo);
+      ;
+      return;
+    }
+
     // 'MOV lo, lo' is unpredictable on < v6, so use the stack to do it
     BuildMI(MBB, I, DL, get(ARM::tPUSH))
         .add(predOps(ARMCC::AL))

Copy link

⚠️ We detected that you are using a GitHub private e-mail address to contribute to the repo.
Please turn off Keep my email addresses private setting in your account.
See LLVM Discourse for more information.

@AreaZR AreaZR force-pushed the fixme branch 2 times, most recently from b34cf5b to 3382508 Compare February 15, 2024 21:30
@AreaZR AreaZR changed the title [Thumb1] Resolve Fixme: use 'mov hi, $src; mov $dst, hi' [Thumb1] Resolve FIXME: use 'mov hi, $src; mov $dst, hi' Feb 15, 2024
@AreaZR
Copy link
Contributor Author

AreaZR commented Feb 15, 2024

@nikic

@AreaZR
Copy link
Contributor Author

AreaZR commented Feb 15, 2024

Full source if anyone is curious:

https://godbolt.org/z/h7Pq3s6dM

@efriedma-quic
Copy link
Collaborator

(See https://lists.llvm.org/pipermail/llvm-dev/2014-August/075967.html for additional context.)

r10 and r11 are callee-save, so you'd need to make sure they get spilled. We have code to do the spilling, but it's not triggering here. I guess this code runs too late for that to trigger?

@AreaZR
Copy link
Contributor Author

AreaZR commented Feb 15, 2024

(See https://lists.llvm.org/pipermail/llvm-dev/2014-August/075967.html for additional context.)

r10 and r11 are callee-save, so you'd need to make sure they get spilled. We have code to do the spilling, but it's not triggering here. I guess this code runs too late for that to trigger?

Pretty sure only r12 is, hence why this is not an option.

@AreaZR
Copy link
Contributor Author

AreaZR commented Feb 15, 2024

(See https://lists.llvm.org/pipermail/llvm-dev/2014-August/075967.html for additional context.)

r10 and r11 are callee-save, so you'd need to make sure they get spilled. We have code to do the spilling, but it's not triggering here. I guess this code runs too late for that to trigger?

r10 - Caller save (in ARM mode, unused in Thumb mode)
r11 Caller save (in ARM mode, unused in Thumb mode)

@AreaZR
Copy link
Contributor Author

AreaZR commented Feb 15, 2024

r10 and r11 are callee-save, so you'd need to make sure they get spilled

r10 and r11 are callee-save, so you'd need to make sure they get spilled

I only think this is an issue if the compiler were to randomly switch between thumb and arm mode and vice versa mid function, which is a thing that only really happens in handwritten code anyway.

@efriedma-quic
Copy link
Collaborator

Pretty sure that's just wrong (or referring to some other calling convention). From the specification (https://github.com/ARM-software/abi-aa/releases):

A subroutine must preserve the contents of the registers r4-r8, r10, r11 and SP (and r9 in PCS variants that designate r9 as v6).

@AreaZR AreaZR marked this pull request as draft February 15, 2024 22:53
@AreaZR AreaZR marked this pull request as ready for review February 16, 2024 16:26
@AreaZR AreaZR force-pushed the fixme branch 7 times, most recently from 2ddc5ab to fbd4a85 Compare February 16, 2024 17:59
@AreaZR
Copy link
Contributor Author

AreaZR commented Feb 16, 2024

@efriedma-quic Okay, I fixed the issues!

@AreaZR AreaZR force-pushed the fixme branch 2 times, most recently from ed0251c to f3cdd9c Compare February 16, 2024 18:04
@efriedma-quic
Copy link
Collaborator

For r10 and r11, as the tests show, they're almost never available; the compiler won't naturally spill them. You can force them to spill using inline asm, but that's so rare it's not really relevant. So I don't really see the point of checking, unless you're planning some sort of followup.

For r12... this is probably okay? We don't use r12 in the scavenger anymore (see ab1d73e). We still use R12 as a scratch register in a few different places, but not in a way that would conflict, I think. It would be helpful if you could take some time to verify that.

If there's some way r12 can actually be live (inline asm?), we should make sure there's at least one test that actually still exercises the code to generate push/pop.

@AreaZR AreaZR force-pushed the fixme branch 10 times, most recently from 1fb69eb to 7b6cca2 Compare March 12, 2024 19:57
@AreaZR
Copy link
Contributor Author

AreaZR commented Mar 14, 2024

@efriedma-quic Ping?

@AreaZR
Copy link
Contributor Author

AreaZR commented Mar 28, 2024

@topperc Thoughts?

Copy link
Contributor

@TNorthover TNorthover left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly looks pretty sensible. Couple of comments.

llvm/lib/Target/ARM/Thumb1InstrInfo.cpp Show resolved Hide resolved
llvm/lib/Target/ARM/Thumb1InstrInfo.cpp Outdated Show resolved Hide resolved
@AreaZR
Copy link
Contributor Author

AreaZR commented Apr 2, 2024

@TNorthover Addressed!

Copy link
Contributor

@TNorthover TNorthover left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I think this looks good now.

@AreaZR AreaZR force-pushed the fixme branch 2 times, most recently from ad1712f to 9ab332a Compare April 2, 2024 15:45
@AreaZR AreaZR changed the title [Thumb1] Resolve FIXME: Use 'mov hi, $src; mov $dst, hi' [Thumb] Resolve FIXME: Use 'mov hi, $src; mov $dst, hi' Apr 2, 2024
Consider the following:

        ldr     r0, [r4]
        ldr     r7, [r0, #4]
        cmp     r7, r3
        bhi     .LBB0_6
        cmp     r0, r2
        push    {r0}
        pop     {r4}
        bne     .LBB0_3
        movs    r0, r6
        pop     {r4, r5, r6, r7}
        pop     {r1}
        bx      r1

Here is a snippet of the generated THUMB1 code of the K&R malloc function that clang currently compiles to.

push    {r0} ends up being popped to pop {r4}.

movs r4, r0 would destroy the flags set by cmp right above.

The compiler has no alternative in this case, except one:
the only alternative is to transfer through a high register.

However, it seems like LLVM does not consider that this is a valid approach, even though it is a free clobbering a high register.

This patch addresses the FIXME so the compiler can do that when it can in r10 or r11, or r12.
@AreaZR AreaZR requested a review from TNorthover April 3, 2024 14:29
@TNorthover TNorthover merged commit c5d000b into llvm:main Apr 5, 2024
4 checks passed
@AreaZR AreaZR deleted the fixme branch April 5, 2024 11:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants