-
Notifications
You must be signed in to change notification settings - Fork 12.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RISCV] fix SP recovery in a function epilogue #110809
Conversation
@llvm/pr-subscribers-backend-risc-v Author: None (dlav-sc) ChangesThis patch fixes SP register recovery in the function epilogue. Full diff: https://github.com/llvm/llvm-project/pull/110809.diff 8 Files Affected:
diff --git a/llvm/lib/Target/RISCV/RISCVFrameLowering.cpp b/llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
index b0c525ea8c2996..7cbd1a35b25839 100644
--- a/llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
+++ b/llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
@@ -755,6 +755,19 @@ void RISCVFrameLowering::emitPrologue(MachineFunction &MF,
}
}
+void RISCVFrameLowering::deallocateStack(MachineFunction &MF,
+ MachineBasicBlock &MBB,
+ MachineBasicBlock::iterator MBBI,
+ const DebugLoc &DL, uint64_t StackSize,
+ int64_t CFAOffset) const {
+ const RISCVRegisterInfo *RI = STI.getRegisterInfo();
+
+ Register SPReg = getSPReg(STI);
+
+ RI->adjustReg(MBB, MBBI, DL, SPReg, SPReg, StackOffset::getFixed(StackSize),
+ MachineInstr::FrameDestroy, getStackAlign());
+}
+
void RISCVFrameLowering::emitEpilogue(MachineFunction &MF,
MachineBasicBlock &MBB) const {
const RISCVRegisterInfo *RI = STI.getRegisterInfo();
@@ -786,59 +799,70 @@ void RISCVFrameLowering::emitEpilogue(MachineFunction &MF,
--MBBI;
}
- const auto &CSI = getUnmanagedCSI(MF, MFI.getCalleeSavedInfo());
+ const auto &CSI = MFI.getCalleeSavedInfo();
// Skip to before the restores of scalar callee-saved registers
// FIXME: assumes exactly one instruction is used to restore each
// callee-saved register.
- auto LastFrameDestroy = MBBI;
- if (!CSI.empty())
- LastFrameDestroy = std::prev(MBBI, CSI.size());
+ auto LastFrameDestroy = std::prev(MBBI, getUnmanagedCSI(MF, CSI).size());
- uint64_t RealStackSize = getStackSizeWithRVVPadding(MF);
- uint64_t StackSize = RealStackSize - RVFI->getReservedSpillsSize();
- uint64_t FPOffset = RealStackSize - RVFI->getVarArgsSaveSize();
+ uint64_t FirstSPAdjustAmount = getFirstSPAdjustAmount(MF);
+ uint64_t RealStackSize = FirstSPAdjustAmount ? FirstSPAdjustAmount
+ : getStackSizeWithRVVPadding(MF);
+ uint64_t StackSize = FirstSPAdjustAmount ? FirstSPAdjustAmount
+ : getStackSizeWithRVVPadding(MF) -
+ RVFI->getReservedSpillsSize();
+ uint64_t FPOffset = FirstSPAdjustAmount ? FirstSPAdjustAmount
+ : getStackSizeWithRVVPadding(MF) -
+ RVFI->getVarArgsSaveSize();
uint64_t RVVStackSize = RVFI->getRVVStackSize();
- // Restore the stack pointer using the value of the frame pointer. Only
- // necessary if the stack pointer was modified, meaning the stack size is
- // unknown.
- //
- // In order to make sure the stack point is right through the EH region,
- // we also need to restore stack pointer from the frame pointer if we
- // don't preserve stack space within prologue/epilogue for outgoing variables,
- // normally it's just checking the variable sized object is present or not
- // is enough, but we also don't preserve that at prologue/epilogue when
- // have vector objects in stack.
- if (RI->hasStackRealignment(MF) || MFI.hasVarSizedObjects() ||
- !hasReservedCallFrame(MF)) {
- assert(hasFP(MF) && "frame pointer should not have been eliminated");
- RI->adjustReg(MBB, LastFrameDestroy, DL, SPReg, FPReg,
- StackOffset::getFixed(-FPOffset),
- MachineInstr::FrameDestroy, getStackAlign());
- } else {
- if (RVVStackSize)
+ bool RestoreFP = RI->hasStackRealignment(MF) || MFI.hasVarSizedObjects() ||
+ !hasReservedCallFrame(MF);
+
+ if (RVVStackSize) {
+ // If restoreFP the stack pointer will be restored using the frame pointer
+ // value.
+ if (!RestoreFP) {
adjustStackForRVV(MF, MBB, LastFrameDestroy, DL, RVVStackSize,
MachineInstr::FrameDestroy);
+ }
}
- uint64_t FirstSPAdjustAmount = getFirstSPAdjustAmount(MF);
if (FirstSPAdjustAmount) {
uint64_t SecondSPAdjustAmount =
getStackSizeWithRVVPadding(MF) - FirstSPAdjustAmount;
assert(SecondSPAdjustAmount > 0 &&
"SecondSPAdjustAmount should be greater than zero");
- RI->adjustReg(MBB, LastFrameDestroy, DL, SPReg, SPReg,
- StackOffset::getFixed(SecondSPAdjustAmount),
- MachineInstr::FrameDestroy, getStackAlign());
+ // If restoreFP the stack pointer will be restored using the frame pointer
+ // value.
+ if (!RestoreFP) {
+ RI->adjustReg(MBB, LastFrameDestroy, DL, SPReg, SPReg,
+ StackOffset::getFixed(SecondSPAdjustAmount),
+ MachineInstr::FrameDestroy, getStackAlign());
+ }
}
- if (FirstSPAdjustAmount)
- StackSize = FirstSPAdjustAmount;
+ // Restore the stack pointer using the value of the frame pointer. Only
+ // necessary if the stack pointer was modified, meaning the stack size is
+ // unknown.
+ //
+ // In order to make sure the stack point is right through the EH region,
+ // we also need to restore stack pointer from the frame pointer if we
+ // don't preserve stack space within prologue/epilogue for outgoing variables,
+ // normally it's just checking the variable sized object is present or not
+ // is enough, but we also don't preserve that at prologue/epilogue when
+ // have vector objects in stack.
+ if (RestoreFP) {
+ RI->adjustReg(MBB, LastFrameDestroy, DL, SPReg, FPReg,
+ StackOffset::getFixed(-FPOffset), MachineInstr::FrameDestroy,
+ getStackAlign());
+ }
- if (RVFI->isPushable(MF) && MBBI != MBB.end() &&
- MBBI->getOpcode() == RISCV::CM_POP) {
+ bool ApplyPop = RVFI->isPushable(MF) && MBBI != MBB.end() &&
+ MBBI->getOpcode() == RISCV::CM_POP;
+ if (ApplyPop) {
// Use available stack adjustment in pop instruction to deallocate stack
// space. Align the stack size down to a multiple of 16. This is needed for
// RVE.
@@ -846,14 +870,19 @@ void RISCVFrameLowering::emitEpilogue(MachineFunction &MF,
uint64_t Spimm = std::min(alignDown(StackSize, 16), (uint64_t)48);
MBBI->getOperand(1).setImm(Spimm);
StackSize -= Spimm;
- }
- // Deallocate stack
- if (StackSize != 0) {
- RI->adjustReg(MBB, MBBI, DL, SPReg, SPReg, StackOffset::getFixed(StackSize),
- MachineInstr::FrameDestroy, getStackAlign());
+ if (StackSize != 0)
+ deallocateStack(MF, MBB, MBBI, DL, StackSize,
+ /*stack_adj of cm.pop instr*/ RealStackSize - StackSize);
+
+ MBBI = std::next(MBBI);
}
+ // Deallocate stack if StackSize isn't a zero and if we didn't already do it
+ // during cm.pop handling.
+ if (StackSize != 0 && !ApplyPop)
+ deallocateStack(MF, MBB, MBBI, DL, StackSize, 0);
+
// Emit epilogue for shadow call stack.
emitSCSEpilogue(MF, MBB, MBBI, DL);
}
@@ -1566,6 +1595,7 @@ void RISCVFrameLowering::emitCalleeSavedRVVPrologCFI(
int FI = CS.getFrameIdx();
if (FI >= 0 && MFI.getStackID(FI) == TargetStackID::ScalableVector) {
MCRegister BaseReg = TRI.getSubReg(CS.getReg(), RISCV::sub_vrm1_0);
+
// If it's not a grouped vector register, it doesn't have subregister, so
// the base register is just itself.
if (BaseReg == RISCV::NoRegister)
diff --git a/llvm/lib/Target/RISCV/RISCVFrameLowering.h b/llvm/lib/Target/RISCV/RISCVFrameLowering.h
index 28ab4aff3b9d51..89f95f2aa04aa6 100644
--- a/llvm/lib/Target/RISCV/RISCVFrameLowering.h
+++ b/llvm/lib/Target/RISCV/RISCVFrameLowering.h
@@ -88,9 +88,15 @@ class RISCVFrameLowering : public TargetFrameLowering {
void adjustStackForRVV(MachineFunction &MF, MachineBasicBlock &MBB,
MachineBasicBlock::iterator MBBI, const DebugLoc &DL,
int64_t Amount, MachineInstr::MIFlag Flag) const;
+
void emitCalleeSavedRVVPrologCFI(MachineBasicBlock &MBB,
MachineBasicBlock::iterator MI,
bool HasFP) const;
+
+ void deallocateStack(MachineFunction &MF, MachineBasicBlock &MBB,
+ MachineBasicBlock::iterator MBBI, const DebugLoc &DL,
+ uint64_t StackSize, int64_t CFAOffset) const;
+
std::pair<int64_t, Align>
assignRVVStackObjectOffsets(MachineFunction &MF) const;
};
diff --git a/llvm/test/CodeGen/RISCV/branch-relaxation.ll b/llvm/test/CodeGen/RISCV/branch-relaxation.ll
index 3d48dc9637eaed..ec77d54da116d3 100644
--- a/llvm/test/CodeGen/RISCV/branch-relaxation.ll
+++ b/llvm/test/CodeGen/RISCV/branch-relaxation.ll
@@ -824,10 +824,7 @@ define void @relax_jal_spill_32_adjust_spill_slot() {
; CHECK-RV32-NEXT: #APP
; CHECK-RV32-NEXT: # reg use t6
; CHECK-RV32-NEXT: #NO_APP
-; CHECK-RV32-NEXT: lui a0, 2
-; CHECK-RV32-NEXT: sub sp, s0, a0
-; CHECK-RV32-NEXT: addi a0, a0, -2032
-; CHECK-RV32-NEXT: add sp, sp, a0
+; CHECK-RV32-NEXT: addi sp, s0, -2032
; CHECK-RV32-NEXT: lw ra, 2028(sp) # 4-byte Folded Reload
; CHECK-RV32-NEXT: lw s0, 2024(sp) # 4-byte Folded Reload
; CHECK-RV32-NEXT: lw s1, 2020(sp) # 4-byte Folded Reload
@@ -1073,10 +1070,7 @@ define void @relax_jal_spill_32_adjust_spill_slot() {
; CHECK-RV64-NEXT: #APP
; CHECK-RV64-NEXT: # reg use t6
; CHECK-RV64-NEXT: #NO_APP
-; CHECK-RV64-NEXT: lui a0, 2
-; CHECK-RV64-NEXT: sub sp, s0, a0
-; CHECK-RV64-NEXT: addiw a0, a0, -2032
-; CHECK-RV64-NEXT: add sp, sp, a0
+; CHECK-RV64-NEXT: addi sp, s0, -2032
; CHECK-RV64-NEXT: ld ra, 2024(sp) # 8-byte Folded Reload
; CHECK-RV64-NEXT: ld s0, 2016(sp) # 8-byte Folded Reload
; CHECK-RV64-NEXT: ld s1, 2008(sp) # 8-byte Folded Reload
@@ -2323,10 +2317,7 @@ define void @relax_jal_spill_64_adjust_spill_slot() {
; CHECK-RV32-NEXT: #APP
; CHECK-RV32-NEXT: # reg use t6
; CHECK-RV32-NEXT: #NO_APP
-; CHECK-RV32-NEXT: lui a0, 2
-; CHECK-RV32-NEXT: sub sp, s0, a0
-; CHECK-RV32-NEXT: addi a0, a0, -2032
-; CHECK-RV32-NEXT: add sp, sp, a0
+; CHECK-RV32-NEXT: addi sp, s0, -2032
; CHECK-RV32-NEXT: lw ra, 2028(sp) # 4-byte Folded Reload
; CHECK-RV32-NEXT: lw s0, 2024(sp) # 4-byte Folded Reload
; CHECK-RV32-NEXT: lw s1, 2020(sp) # 4-byte Folded Reload
@@ -2560,10 +2551,7 @@ define void @relax_jal_spill_64_adjust_spill_slot() {
; CHECK-RV64-NEXT: #APP
; CHECK-RV64-NEXT: # reg use t6
; CHECK-RV64-NEXT: #NO_APP
-; CHECK-RV64-NEXT: lui a0, 2
-; CHECK-RV64-NEXT: sub sp, s0, a0
-; CHECK-RV64-NEXT: addiw a0, a0, -2032
-; CHECK-RV64-NEXT: add sp, sp, a0
+; CHECK-RV64-NEXT: addi sp, s0, -2032
; CHECK-RV64-NEXT: ld ra, 2024(sp) # 8-byte Folded Reload
; CHECK-RV64-NEXT: ld s0, 2016(sp) # 8-byte Folded Reload
; CHECK-RV64-NEXT: ld s1, 2008(sp) # 8-byte Folded Reload
diff --git a/llvm/test/CodeGen/RISCV/rvv/addi-scalable-offset.mir b/llvm/test/CodeGen/RISCV/rvv/addi-scalable-offset.mir
index 5f0e1a9b9aa24c..43fb0c10ca46f6 100644
--- a/llvm/test/CodeGen/RISCV/rvv/addi-scalable-offset.mir
+++ b/llvm/test/CodeGen/RISCV/rvv/addi-scalable-offset.mir
@@ -46,9 +46,7 @@ body: |
; CHECK-NEXT: $x10 = ADDI killed $x10, -2048
; CHECK-NEXT: $x10 = ADDI killed $x10, -224
; CHECK-NEXT: VS1R_V killed renamable $v8, killed renamable $x10
- ; CHECK-NEXT: $x2 = frame-destroy ADDI $x8, -2048
- ; CHECK-NEXT: $x2 = frame-destroy ADDI killed $x2, -224
- ; CHECK-NEXT: $x2 = frame-destroy ADDI $x2, 240
+ ; CHECK-NEXT: $x2 = frame-destroy ADDI $x8, -2032
; CHECK-NEXT: $x1 = LD $x2, 2024 :: (load (s64) from %stack.3)
; CHECK-NEXT: $x8 = LD $x2, 2016 :: (load (s64) from %stack.4)
; CHECK-NEXT: $x2 = frame-destroy ADDI $x2, 2032
diff --git a/llvm/test/CodeGen/RISCV/rvv/callee-saved-regs.ll b/llvm/test/CodeGen/RISCV/rvv/callee-saved-regs.ll
index c1ce2e988fc511..c0b10be847d1ff 100644
--- a/llvm/test/CodeGen/RISCV/rvv/callee-saved-regs.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/callee-saved-regs.ll
@@ -109,9 +109,7 @@ define riscv_vector_cc void @local_stack_allocation_frame_pointer() "frame-point
; SPILL-O2-NEXT: addi sp, sp, -480
; SPILL-O2-NEXT: lbu a0, -1912(s0)
; SPILL-O2-NEXT: sb a0, -1912(s0)
-; SPILL-O2-NEXT: addi sp, s0, -2048
-; SPILL-O2-NEXT: addi sp, sp, -464
-; SPILL-O2-NEXT: addi sp, sp, 480
+; SPILL-O2-NEXT: addi sp, s0, -2032
; SPILL-O2-NEXT: lw ra, 2028(sp) # 4-byte Folded Reload
; SPILL-O2-NEXT: lw s0, 2024(sp) # 4-byte Folded Reload
; SPILL-O2-NEXT: addi sp, sp, 2032
diff --git a/llvm/test/CodeGen/RISCV/rvv/emergency-slot.mir b/llvm/test/CodeGen/RISCV/rvv/emergency-slot.mir
index c4bc794b8aeb38..1b9ce12af01f96 100644
--- a/llvm/test/CodeGen/RISCV/rvv/emergency-slot.mir
+++ b/llvm/test/CodeGen/RISCV/rvv/emergency-slot.mir
@@ -150,9 +150,7 @@ body: |
; CHECK-NEXT: PseudoBR %bb.2
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.2:
- ; CHECK-NEXT: $x2 = frame-destroy ADDI $x8, -2048
- ; CHECK-NEXT: $x2 = frame-destroy ADDI killed $x2, -256
- ; CHECK-NEXT: $x2 = frame-destroy ADDI $x2, 272
+ ; CHECK-NEXT: $x2 = frame-destroy ADDI $x8, -2032
; CHECK-NEXT: $x1 = LD $x2, 2024 :: (load (s64) from %stack.3)
; CHECK-NEXT: $x8 = LD $x2, 2016 :: (load (s64) from %stack.4)
; CHECK-NEXT: $x18 = LD $x2, 2008 :: (load (s64) from %stack.5)
diff --git a/llvm/test/CodeGen/RISCV/rvv/large-rvv-stack-size.mir b/llvm/test/CodeGen/RISCV/rvv/large-rvv-stack-size.mir
index b4d8805b65bd8f..22a7425bf98b8e 100644
--- a/llvm/test/CodeGen/RISCV/rvv/large-rvv-stack-size.mir
+++ b/llvm/test/CodeGen/RISCV/rvv/large-rvv-stack-size.mir
@@ -33,9 +33,7 @@
; CHECK-NEXT: vs1r.v v25, (a0) # Unknown-size Folded Spill
; CHECK-NEXT: ld a0, 8(sp)
; CHECK-NEXT: call spillslot
- ; CHECK-NEXT: addi sp, s0, -2048
- ; CHECK-NEXT: addi sp, sp, -256
- ; CHECK-NEXT: addi sp, sp, 272
+ ; CHECK-NEXT: addi sp, s0, -2032
; CHECK-NEXT: ld ra, 2024(sp) # 8-byte Folded Reload
; CHECK-NEXT: ld s0, 2016(sp) # 8-byte Folded Reload
; CHECK-NEXT: addi sp, sp, 2032
diff --git a/llvm/test/CodeGen/RISCV/stack-realignment.ll b/llvm/test/CodeGen/RISCV/stack-realignment.ll
index 034ebadc76af26..58876e4888a913 100644
--- a/llvm/test/CodeGen/RISCV/stack-realignment.ll
+++ b/llvm/test/CodeGen/RISCV/stack-realignment.ll
@@ -815,8 +815,7 @@ define void @caller1024() {
; RV32I-NEXT: andi sp, sp, -1024
; RV32I-NEXT: addi a0, sp, 1024
; RV32I-NEXT: call callee
-; RV32I-NEXT: addi sp, s0, -2048
-; RV32I-NEXT: addi sp, sp, 16
+; RV32I-NEXT: addi sp, s0, -2032
; RV32I-NEXT: lw ra, 2028(sp) # 4-byte Folded Reload
; RV32I-NEXT: lw s0, 2024(sp) # 4-byte Folded Reload
; RV32I-NEXT: addi sp, sp, 2032
@@ -836,8 +835,7 @@ define void @caller1024() {
; RV32I-ILP32E-NEXT: andi sp, sp, -1024
; RV32I-ILP32E-NEXT: addi a0, sp, 1024
; RV32I-ILP32E-NEXT: call callee
-; RV32I-ILP32E-NEXT: addi sp, s0, -2048
-; RV32I-ILP32E-NEXT: addi sp, sp, 4
+; RV32I-ILP32E-NEXT: addi sp, s0, -2044
; RV32I-ILP32E-NEXT: lw ra, 2040(sp) # 4-byte Folded Reload
; RV32I-ILP32E-NEXT: lw s0, 2036(sp) # 4-byte Folded Reload
; RV32I-ILP32E-NEXT: addi sp, sp, 2044
@@ -857,8 +855,7 @@ define void @caller1024() {
; RV64I-NEXT: andi sp, sp, -1024
; RV64I-NEXT: addi a0, sp, 1024
; RV64I-NEXT: call callee
-; RV64I-NEXT: addi sp, s0, -2048
-; RV64I-NEXT: addi sp, sp, 16
+; RV64I-NEXT: addi sp, s0, -2032
; RV64I-NEXT: ld ra, 2024(sp) # 8-byte Folded Reload
; RV64I-NEXT: ld s0, 2016(sp) # 8-byte Folded Reload
; RV64I-NEXT: addi sp, sp, 2032
@@ -878,8 +875,7 @@ define void @caller1024() {
; RV64I-LP64E-NEXT: andi sp, sp, -1024
; RV64I-LP64E-NEXT: addi a0, sp, 1024
; RV64I-LP64E-NEXT: call callee
-; RV64I-LP64E-NEXT: addi sp, s0, -2048
-; RV64I-LP64E-NEXT: addi sp, sp, 8
+; RV64I-LP64E-NEXT: addi sp, s0, -2040
; RV64I-LP64E-NEXT: ld ra, 2032(sp) # 8-byte Folded Reload
; RV64I-LP64E-NEXT: ld s0, 2024(sp) # 8-byte Folded Reload
; RV64I-LP64E-NEXT: addi sp, sp, 2040
@@ -959,10 +955,7 @@ define void @caller2048() {
; RV32I-NEXT: addi a0, sp, 2047
; RV32I-NEXT: addi a0, a0, 1
; RV32I-NEXT: call callee
-; RV32I-NEXT: lui a0, 1
-; RV32I-NEXT: sub sp, s0, a0
-; RV32I-NEXT: addi sp, sp, 2032
-; RV32I-NEXT: addi sp, sp, 32
+; RV32I-NEXT: addi sp, s0, -2032
; RV32I-NEXT: lw ra, 2028(sp) # 4-byte Folded Reload
; RV32I-NEXT: lw s0, 2024(sp) # 4-byte Folded Reload
; RV32I-NEXT: addi sp, sp, 2032
@@ -984,10 +977,7 @@ define void @caller2048() {
; RV32I-ILP32E-NEXT: addi a0, sp, 2047
; RV32I-ILP32E-NEXT: addi a0, a0, 1
; RV32I-ILP32E-NEXT: call callee
-; RV32I-ILP32E-NEXT: lui a0, 1
-; RV32I-ILP32E-NEXT: sub sp, s0, a0
-; RV32I-ILP32E-NEXT: addi sp, sp, 2044
-; RV32I-ILP32E-NEXT: addi sp, sp, 8
+; RV32I-ILP32E-NEXT: addi sp, s0, -2044
; RV32I-ILP32E-NEXT: lw ra, 2040(sp) # 4-byte Folded Reload
; RV32I-ILP32E-NEXT: lw s0, 2036(sp) # 4-byte Folded Reload
; RV32I-ILP32E-NEXT: addi sp, sp, 2044
@@ -1009,10 +999,7 @@ define void @caller2048() {
; RV64I-NEXT: addi a0, sp, 2047
; RV64I-NEXT: addi a0, a0, 1
; RV64I-NEXT: call callee
-; RV64I-NEXT: lui a0, 1
-; RV64I-NEXT: sub sp, s0, a0
-; RV64I-NEXT: addi sp, sp, 2032
-; RV64I-NEXT: addi sp, sp, 32
+; RV64I-NEXT: addi sp, s0, -2032
; RV64I-NEXT: ld ra, 2024(sp) # 8-byte Folded Reload
; RV64I-NEXT: ld s0, 2016(sp) # 8-byte Folded Reload
; RV64I-NEXT: addi sp, sp, 2032
@@ -1034,10 +1021,7 @@ define void @caller2048() {
; RV64I-LP64E-NEXT: addi a0, sp, 2047
; RV64I-LP64E-NEXT: addi a0, a0, 1
; RV64I-LP64E-NEXT: call callee
-; RV64I-LP64E-NEXT: lui a0, 1
-; RV64I-LP64E-NEXT: sub sp, s0, a0
-; RV64I-LP64E-NEXT: addi sp, sp, 2040
-; RV64I-LP64E-NEXT: addi sp, sp, 16
+; RV64I-LP64E-NEXT: addi sp, s0, -2040
; RV64I-LP64E-NEXT: ld ra, 2032(sp) # 8-byte Folded Reload
; RV64I-LP64E-NEXT: ld s0, 2024(sp) # 8-byte Folded Reload
; RV64I-LP64E-NEXT: addi sp, sp, 2040
@@ -1119,10 +1103,7 @@ define void @caller4096() {
; RV32I-NEXT: lui a0, 1
; RV32I-NEXT: add a0, sp, a0
; RV32I-NEXT: call callee
-; RV32I-NEXT: lui a0, 2
-; RV32I-NEXT: sub sp, s0, a0
-; RV32I-NEXT: addi a0, a0, -2032
-; RV32I-NEXT: add sp, sp, a0
+; RV32I-NEXT: addi sp, s0, -2032
; RV32I-NEXT: lw ra, 2028(sp) # 4-byte Folded Reload
; RV32I-NEXT: lw s0, 2024(sp) # 4-byte Folded Reload
; RV32I-NEXT: addi sp, sp, 2032
@@ -1146,10 +1127,7 @@ define void @caller4096() {
; RV32I-ILP32E-NEXT: lui a0, 1
; RV32I-ILP32E-NEXT: add a0, sp, a0
; RV32I-ILP32E-NEXT: call callee
-; RV32I-ILP32E-NEXT: lui a0, 2
-; RV32I-ILP32E-NEXT: sub sp, s0, a0
-; RV32I-ILP32E-NEXT: addi a0, a0, -2044
-; RV32I-ILP32E-NEXT: add sp, sp, a0
+; RV32I-ILP32E-NEXT: addi sp, s0, -2044
; RV32I-ILP32E-NEXT: lw ra, 2040(sp) # 4-byte Folded Reload
; RV32I-ILP32E-NEXT: lw s0, 2036(sp) # 4-byte Folded Reload
; RV32I-ILP32E-NEXT: addi sp, sp, 2044
@@ -1173,10 +1151,7 @@ define void @caller4096() {
; RV64I-NEXT: lui a0, 1
; RV64I-NEXT: add a0, sp, a0
; RV64I-NEXT: call callee
-; RV64I-NEXT: lui a0, 2
-; RV64I-NEXT: sub sp, s0, a0
-; RV64I-NEXT: addiw a0, a0, -2032
-; RV64I-NEXT: add sp, sp, a0
+; RV64I-NEXT: addi sp, s0, -2032
; RV64I-NEXT: ld ra, 2024(sp) # 8-byte Folded Reload
; RV64I-NEXT: ld s0, 2016(sp) # 8-byte Folded Reload
; RV64I-NEXT: addi sp, sp, 2032
@@ -1200,10 +1175,7 @@ define void @caller4096() {
; RV64I-LP64E-NEXT: lui a0, 1
; RV64I-LP64E-NEXT: add a0, sp, a0
; RV64I-LP64E-NEXT: call callee
-; RV64I-LP64E-NEXT: lui a0, 2
-; RV64I-LP64E-NEXT: sub sp, s0, a0
-; RV64I-LP64E-NEXT: addiw a0, a0, -2040
-; RV64I-LP64E-NEXT: add sp, sp, a0
+; RV64I-LP64E-NEXT: addi sp, s0, -2040
; RV64I-LP64E-NEXT: ld ra, 2032(sp) # 8-byte Folded Reload
; RV64I-LP64E-NEXT: ld s0, 2024(sp) # 8-byte Folded Reload
; RV64I-LP64E-NEXT: addi sp, sp, 2040
|
@topperc @kito-cheng FYI |
a58cf3f
to
ac123f9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM :)
This patch needs a better description. Is it fixing a correctness issue or just making code more optimal? |
I've updated the description |
!hasReservedCallFrame(MF); | ||
|
||
if (RVVStackSize) { | ||
// If restoreFP the stack pointer will be restored using the frame pointer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
restoreFP -> RestoreFP
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addressed
assert(SecondSPAdjustAmount > 0 && | ||
"SecondSPAdjustAmount should be greater than zero"); | ||
|
||
// If restoreFP the stack pointer will be restored using the frame pointer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
restoreFP -> RestoreFP
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addressed
if (RVVStackSize) { | ||
// If restoreFP the stack pointer will be restored using the frame pointer | ||
// value. | ||
if (!RestoreFP) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Drop curly braces
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addressed
|
||
// If restoreFP the stack pointer will be restored using the frame pointer | ||
// value. | ||
if (!RestoreFP) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Drop curly braces
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addressed
ac123f9
to
536cbea
Compare
This patch fixes SP register recovery in the function epilogue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thanks for taking a look at the patch |
Currently, in the cases when fp register is presented and sp register is adjusted at the second time, sp recovery in a function epilogue isn't performed in the best way, for example: ``` lui a0, 2 sub sp, s0, a0 addi a0, a0, -2044 add sp, sp, a0 ``` This patch improves sp register recovery in such cases and the code snippet above becomes: ``` addi sp, s0, -2044 ```
I believe this change is the cause of a miscompilation I'm seeing (automatically linked above). Please take a look, and kindly revert the change until that issue is resolved. |
Hello, |
@bscarlet, @joanahalili Here is what I've tried to do:
It does not look like my patch is the reason for the failure. Besides, Could you please elaborate how can I reproduce the issue using existing codebase? |
Could you please include the complete flags you use to build LLVM so that I can reproduce your entire procedure. Did you use the flags I included in my reproduction case in your steps 3 & 5? |
Oh, I'm sorry, I've compiled your example using the wrong clang, my bad. Anyway, using the right version at this time I've received almost identical snippets: with_patch.txt, without_patch.txt. In fact, they only differ in the clang commit hash, so you can ensure that the first one was compiled with my patch and the second one without it.
Yep
|
We can now confirm that despite initial appearances, this patch does not seem to be the culprit. I've closed the corresponding issue. Apologies for the false alarm, and thank you for helping investigate. |
uint64_t StackSize = FirstSPAdjustAmount ? FirstSPAdjustAmount | ||
: getStackSizeWithRVVPadding(MF) - | ||
RVFI->getReservedSpillsSize(); | ||
uint64_t FPOffset = FirstSPAdjustAmount ? FirstSPAdjustAmount |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please double-check that this line is correct?
When I change it to
uint64_t FPOffset = RealStackSize - RVFI->getVarArgsSaveSize();
our tests pass.
(Looks like FP offset should always depend on getVarArgsSaveSize(), also if FirstSPAdjustAmount != 0?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you provide a reproduction, please?
Could you follow the steps from my comment #110809 (comment) and ensure that the first executable crashes and the second one doesn't.
I'm not sure about crashing, but looking at disassembler at https://godbolt.org/z/z8Kehr5ob: The new code for function foo has prologue (line 68):
and epilogue (line 91):
|
@dlav-sc is the above sufficient? Can you please take a look soon? This is blocking our internal testing. |
Yes, I've reproduced your problem and preparing a fix now, I hope I could open PR in an hour. Thank you for your example, I'll make a test from it. |
This patch fixes sp recovery in the epilogue in varargs functions when fp register is presented and second sp adjustment is applied. Source of the issue: #110809
Currently, in the cases when fp register is presented and sp register is adjusted at the second time, sp recovery in a function epilogue isn't performed in the best way, for example:
This patch improves sp register recovery in such cases and the code snippet above becomes: