[SystemZ] LLVM ERROR: Error while trying to spill R14D from class ADDR64Bit: Cannot scavenge register without an emergency spill slot! #79046

JonPsson1 · 2024-01-22T20:03:18Z

tc_regscav.tar.gz

llc -mtriple=s390x-linux-gnu -mcpu=z16 ./tc_regscav.ll -misched=shuffle

LLVM ERROR: Error while trying to spill R14D from class ADDR64Bit: Cannot scavenge register without an emergency spill slot!
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0. Program arguments: ./bin/llc -mtriple=s390x-linux-gnu -mcpu=z16 ./tc_regscav.ll -misched=shuffle

Running pass 'Function Pass Manager' on module './tc_regscav.ll'.
Running pass 'Prologue/Epilogue Insertion & Frame Finalization' on function '@fun'

@uweigand This test case has i128 registers, but it is an ADDR64 that needs scavenging...

uweigand · 2024-01-23T15:59:59Z

This function has two objects in the frame:

  fi#0: size=196, align=4, at location [SP]
  fi#1: size=3360, align=8, at location [SP]

where fi#0 ends up at SP + 164, and fi#1 ends up at SP + 360.

The problem occurs when finalizing the following instruction:

  VST renamable $v0, %stack.1.A1, 3872, renamable $r1d :: (store (s128) into %ir.scevgep7, align 8)

where %stack.1.A1 + 3872 gets resolved as SP + 4232, where the offset is >= 4096 and thus out of range for the VST instruction.

That's when LLVM is now looking to scavenge an address register to hold SP + 4232 (actually, SP + 4096, and keep the extra 136 as displacement), and does not find any. The reason for this is that back-end does not allocate any emergency scavenge slots, because it assumes they cannot be needed as the total stack frame size of this function is < 4096 bytes.

Now, where does the large offset come frome? In fact, even %stack.1.A1 + 3872 looks invalid at first glance as the A1 object has a total size of only 3360 bytes. However, there is an extra offset in the $r1d register, and this is set up to hold a negative value of -3360, which makes the overall target address valid again. But we still get the displacement overflow in the interim ...

These offsets are set up as a result of the Loop Strength Reduction pass - not sure why it thinks this transformation useful:

; Loop: 
for.body:                                         ; preds = %for.body, %entry
  %lsr.iv = phi i64 [ %lsr.iv.next, %for.body ], [ -3360, %entry ]
  %IV = phi i64 [ 0, %entry ], [ %IV.next, %for.body ]
  %ArrIdx0 = getelementptr inbounds [3 x [7 x [10 x i128]]], ptr %A1, i64 0, i64 %IV, i64 0, i64 6
  %scevgep6 = getelementptr i8, ptr %A1, i64 %lsr.iv 
  %scevgep7 = getelementptr i8, ptr %scevgep6, i64 3872
  store i128 17174966165894859678, ptr %scevgep7, align 8
  %scevgep4 = getelementptr i8, ptr %A1, i64 %lsr.iv
  %scevgep5 = getelementptr i8, ptr %scevgep4, i64 4144
  store i128 17174966165894859678, ptr %scevgep5, align 8
  %scevgep2 = getelementptr i8, ptr %A1, i64 %lsr.iv
  %scevgep3 = getelementptr i8, ptr %scevgep2, i64 4416
  store i128 17174966165894859678, ptr %scevgep3, align 8
  %scevgep = getelementptr i8, ptr %A1, i64 %lsr.iv
  %scevgep1 = getelementptr i8, ptr %scevgep, i64 4448
  store i128 17174966165894859678, ptr %scevgep1, align 8
  store i128 17174966165894859678, ptr poison, align 8
  %IV.next = add nuw nsw i64 %IV, 1
  %lsr.iv.next = add nsw i64 %lsr.iv, 1120
  %exitcond.not.i.i = icmp eq i64 %lsr.iv.next, 0 
  br i1 %exitcond.not.i.i, label %exit, label %for.body

It's unclear how to fix this - all other targets also seem to never allocate an emergency scavenge slot for functions with small total frame size, and it has always seemed to work so far.

JonPsson1 · 2024-01-23T23:13:35Z

I see this in the debug output from LSR:


After filtering out undesirable candidates:
LSR is examining the following uses:
  LSR Use: Kind=ICmpZero, Offsets={0}, widest fixup type: i64
    reg({3,+,-1}<nsw><%for.body>)
    -4448 + reg({1088,+,1120}<nuw><nsw><%for.body>)
    -3360 + reg({0,+,1120}<nuw><nsw><%for.body>)
    reg({-3360,+,1120}<nuw><nsw><%for.body>)
  LSR Use: Kind=Address of i128 in addrspace(0), Offsets={1088,1056,784,512}, widest fixup type: ptr
    reg({%A1,+,1120}<%for.body>)
    reg(%A1) + 1*reg({0,+,1120}<nuw><nsw><%for.body>)
    -1088 + reg(%A1) + 1*reg({1088,+,1120}<nuw><nsw><%for.body>)
    3360 + reg(%A1) + 1*reg({-3360,+,1120}<nuw><nsw><%for.body>)
New best at 2 instructions 2 regs, with addrec cost 2, plus 46 imm cost, plus 2 setup cost.
Regs:
- {3,+,-1}<nsw><%for.body>
- {%A1,+,1120}<%for.body>

New best at 2 instructions 2 regs, with addrec cost 1, plus 41 imm cost, plus 2 setup cost.
Regs:
- {1088,+,1120}<nuw><nsw><%for.body>
- %A1

New best at 1 instruction 2 regs, with addrec cost 1, plus 55 imm cost, plus 2 setup cost.
Regs:
- {-3360,+,1120}<nuw><nsw><%for.body>
- %A1

It seems it is deciding on the IV that will end on 0 which is good for the ICmpZero and that also works with the Address uses, all with a single IV increment. I guess that is useful generally, but in this test case it is no real improvement. The fact that the Address uses end going out of range seems to be due to SystemZTargetLowering::isLegalAddressingMode() not requesting a 12-bit displacement for i128 as it does for vector types. So LSR thinks that the VST will be in range which is not correct.

Patch for isLegalAddressingMode(): #79221. This seems to fix this in a way, but the backend should have probably still have been able to cope with this regardless.

JonPsson1 · 2024-01-24T00:40:39Z

I am not sure if #79221 is a full handling of this or not. In one way, the backend should/could be able to handle this, but on the other hand the TargetLowering isLegalAddressingMode() should reflect if the addressing is legal or not, so one could perhaps argue that all addressing must conform to this after lowering.

A further handling in the backend looks like doing a full scan of instructions after frame objects have been ordered in calculateFrameObjectOffsets(). This could be done in processFunctionBeforeFrameIndicesReplaced(). The immediate offset of e.g. a VST + the offset of the FI could be checked then.

uweigand · 2024-01-24T08:19:51Z

I am not sure if #79221 is a full handling of this or not. In one way, the backend should/could be able to handle this, but on the other hand the TargetLowering isLegalAddressingMode() should reflect if the addressing is legal or not, so one could perhaps argue that all addressing must conform to this after lowering.

A further handling in the backend looks like doing a full scan of instructions after frame objects have been ordered in calculateFrameObjectOffsets(). This could be done in processFunctionBeforeFrameIndicesReplaced(). The immediate offset of e.g. a VST + the offset of the FI could be checked then.

I'm a bit reluctant to add additional compile time overhead, in particular given that no other target seems to do anything along those lines. Let's add the fix in #79221 (which is necessary anyway), and then see if there's any further problems ...

github-actions bot added the new issue label Jan 22, 2024

EugeneZelenko added backend:SystemZ and removed new issue labels Jan 22, 2024

JonPsson1 closed this as completed Jan 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SystemZ] LLVM ERROR: Error while trying to spill R14D from class ADDR64Bit: Cannot scavenge register without an emergency spill slot! #79046

[SystemZ] LLVM ERROR: Error while trying to spill R14D from class ADDR64Bit: Cannot scavenge register without an emergency spill slot! #79046

JonPsson1 commented Jan 22, 2024

uweigand commented Jan 23, 2024

JonPsson1 commented Jan 23, 2024

JonPsson1 commented Jan 24, 2024

uweigand commented Jan 24, 2024

[SystemZ] LLVM ERROR: Error while trying to spill R14D from class ADDR64Bit: Cannot scavenge register without an emergency spill slot! #79046

[SystemZ] LLVM ERROR: Error while trying to spill R14D from class ADDR64Bit: Cannot scavenge register without an emergency spill slot! #79046

Comments

JonPsson1 commented Jan 22, 2024

uweigand commented Jan 23, 2024

JonPsson1 commented Jan 23, 2024

JonPsson1 commented Jan 24, 2024

uweigand commented Jan 24, 2024