Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SystemZ] LLVM ERROR: Error while trying to spill R14D from class ADDR64Bit: Cannot scavenge register without an emergency spill slot! #79046

Closed
JonPsson1 opened this issue Jan 22, 2024 · 4 comments

Comments

@JonPsson1
Copy link
Contributor

tc_regscav.tar.gz

llc -mtriple=s390x-linux-gnu -mcpu=z16 ./tc_regscav.ll -misched=shuffle

LLVM ERROR: Error while trying to spill R14D from class ADDR64Bit: Cannot scavenge register without an emergency spill slot!
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0. Program arguments: ./bin/llc -mtriple=s390x-linux-gnu -mcpu=z16 ./tc_regscav.ll -misched=shuffle

  1. Running pass 'Function Pass Manager' on module './tc_regscav.ll'.
  2. Running pass 'Prologue/Epilogue Insertion & Frame Finalization' on function '@fun'

@uweigand This test case has i128 registers, but it is an ADDR64 that needs scavenging...

@uweigand
Copy link
Member

This function has two objects in the frame:

  fi#0: size=196, align=4, at location [SP]
  fi#1: size=3360, align=8, at location [SP]

where fi#0 ends up at SP + 164, and fi#1 ends up at SP + 360.

The problem occurs when finalizing the following instruction:

  VST renamable $v0, %stack.1.A1, 3872, renamable $r1d :: (store (s128) into %ir.scevgep7, align 8)

where %stack.1.A1 + 3872 gets resolved as SP + 4232, where the offset is >= 4096 and thus out of range for the VST instruction.

That's when LLVM is now looking to scavenge an address register to hold SP + 4232 (actually, SP + 4096, and keep the extra 136 as displacement), and does not find any. The reason for this is that back-end does not allocate any emergency scavenge slots, because it assumes they cannot be needed as the total stack frame size of this function is < 4096 bytes.

Now, where does the large offset come frome? In fact, even %stack.1.A1 + 3872 looks invalid at first glance as the A1 object has a total size of only 3360 bytes. However, there is an extra offset in the $r1d register, and this is set up to hold a negative value of -3360, which makes the overall target address valid again. But we still get the displacement overflow in the interim ...

These offsets are set up as a result of the Loop Strength Reduction pass - not sure why it thinks this transformation useful:

; Loop: 
for.body:                                         ; preds = %for.body, %entry
  %lsr.iv = phi i64 [ %lsr.iv.next, %for.body ], [ -3360, %entry ]
  %IV = phi i64 [ 0, %entry ], [ %IV.next, %for.body ]
  %ArrIdx0 = getelementptr inbounds [3 x [7 x [10 x i128]]], ptr %A1, i64 0, i64 %IV, i64 0, i64 6
  %scevgep6 = getelementptr i8, ptr %A1, i64 %lsr.iv 
  %scevgep7 = getelementptr i8, ptr %scevgep6, i64 3872
  store i128 17174966165894859678, ptr %scevgep7, align 8
  %scevgep4 = getelementptr i8, ptr %A1, i64 %lsr.iv
  %scevgep5 = getelementptr i8, ptr %scevgep4, i64 4144
  store i128 17174966165894859678, ptr %scevgep5, align 8
  %scevgep2 = getelementptr i8, ptr %A1, i64 %lsr.iv
  %scevgep3 = getelementptr i8, ptr %scevgep2, i64 4416
  store i128 17174966165894859678, ptr %scevgep3, align 8
  %scevgep = getelementptr i8, ptr %A1, i64 %lsr.iv
  %scevgep1 = getelementptr i8, ptr %scevgep, i64 4448
  store i128 17174966165894859678, ptr %scevgep1, align 8
  store i128 17174966165894859678, ptr poison, align 8
  %IV.next = add nuw nsw i64 %IV, 1
  %lsr.iv.next = add nsw i64 %lsr.iv, 1120
  %exitcond.not.i.i = icmp eq i64 %lsr.iv.next, 0 
  br i1 %exitcond.not.i.i, label %exit, label %for.body

It's unclear how to fix this - all other targets also seem to never allocate an emergency scavenge slot for functions with small total frame size, and it has always seemed to work so far.

@JonPsson1
Copy link
Contributor Author

I see this in the debug output from LSR:


After filtering out undesirable candidates:
LSR is examining the following uses:
  LSR Use: Kind=ICmpZero, Offsets={0}, widest fixup type: i64
    reg({3,+,-1}<nsw><%for.body>)
    -4448 + reg({1088,+,1120}<nuw><nsw><%for.body>)
    -3360 + reg({0,+,1120}<nuw><nsw><%for.body>)
    reg({-3360,+,1120}<nuw><nsw><%for.body>)
  LSR Use: Kind=Address of i128 in addrspace(0), Offsets={1088,1056,784,512}, widest fixup type: ptr
    reg({%A1,+,1120}<%for.body>)
    reg(%A1) + 1*reg({0,+,1120}<nuw><nsw><%for.body>)
    -1088 + reg(%A1) + 1*reg({1088,+,1120}<nuw><nsw><%for.body>)
    3360 + reg(%A1) + 1*reg({-3360,+,1120}<nuw><nsw><%for.body>)
New best at 2 instructions 2 regs, with addrec cost 2, plus 46 imm cost, plus 2 setup cost.
Regs:
- {3,+,-1}<nsw><%for.body>
- {%A1,+,1120}<%for.body>

New best at 2 instructions 2 regs, with addrec cost 1, plus 41 imm cost, plus 2 setup cost.
Regs:
- {1088,+,1120}<nuw><nsw><%for.body>
- %A1

New best at 1 instruction 2 regs, with addrec cost 1, plus 55 imm cost, plus 2 setup cost.
Regs:
- {-3360,+,1120}<nuw><nsw><%for.body>
- %A1

It seems it is deciding on the IV that will end on 0 which is good for the ICmpZero and that also works with the Address uses, all with a single IV increment. I guess that is useful generally, but in this test case it is no real improvement. The fact that the Address uses end going out of range seems to be due to SystemZTargetLowering::isLegalAddressingMode() not requesting a 12-bit displacement for i128 as it does for vector types. So LSR thinks that the VST will be in range which is not correct.

Patch for isLegalAddressingMode(): #79221. This seems to fix this in a way, but the backend should have probably still have been able to cope with this regardless.

@JonPsson1
Copy link
Contributor Author

I am not sure if #79221 is a full handling of this or not. In one way, the backend should/could be able to handle this, but on the other hand the TargetLowering isLegalAddressingMode() should reflect if the addressing is legal or not, so one could perhaps argue that all addressing must conform to this after lowering.

A further handling in the backend looks like doing a full scan of instructions after frame objects have been ordered in calculateFrameObjectOffsets(). This could be done in processFunctionBeforeFrameIndicesReplaced(). The immediate offset of e.g. a VST + the offset of the FI could be checked then.

@uweigand
Copy link
Member

I am not sure if #79221 is a full handling of this or not. In one way, the backend should/could be able to handle this, but on the other hand the TargetLowering isLegalAddressingMode() should reflect if the addressing is legal or not, so one could perhaps argue that all addressing must conform to this after lowering.

A further handling in the backend looks like doing a full scan of instructions after frame objects have been ordered in calculateFrameObjectOffsets(). This could be done in processFunctionBeforeFrameIndicesReplaced(). The immediate offset of e.g. a VST + the offset of the FI could be checked then.

I'm a bit reluctant to add additional compile time overhead, in particular given that no other target seems to do anything along those lines. Let's add the fix in #79221 (which is necessary anyway), and then see if there's any further problems ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants