incorrect coalescing of stores by AArch64 global isel backend #90242

regehr · 2024-04-26T18:00:30Z

this function is getting lowered incorrectly by global isel for AArch64:

@G = external global [10 x i32]

define void @f(i64 %0) {
  %2 = getelementptr [10 x i32], ptr @G, i64 0, i64 %0
  store i32 0, ptr %2, align 4
  store i32 0, ptr getelementptr inbounds ([10 x i32], ptr @G, i64 0, i64 1), align 4
  ret void
}

it should be storing 0 to index 1 and also index %0 but global isel is incorrectly coalescing these into a single store:

_f:                                     ; @f
	adrp	x8, _G@GOTPAGE
	lsl	x9, x0, #2
	ldr	x8, [x8, _G@GOTPAGEOFF]
	str	xzr, [x8, x9]
	ret

cc @Hatsunespica

The text was updated successfully, but these errors were encountered:

llvmbot · 2024-04-26T18:00:46Z

@llvm/issue-subscribers-backend-aarch64

Author: John Regehr (regehr)

this function is getting lowered incorrectly by global isel for AArch64: ```llvm @G = external global [10 x i32]

define void @f(i64 %0) {
%2 = getelementptr [10 x i32], ptr @G, i64 0, i64 %0
store i32 0, ptr %2, align 4
store i32 0, ptr getelementptr inbounds ([10 x i32], ptr @G, i64 0, i64 1), align 4
ret void
}

it should be storing 0 to index 1 and also index `%0` but global isel is incorrectly coalescing these into a single store:

_f: ; @f
adrp x8, _G@GOTPAGE
lsl x9, x0, #2
ldr x8, [x8, _G@GOTPAGEOFF]
str xzr, [x8, x9]
ret


cc @<!-- -->hatsunespica

</details>

tschuett · 2024-04-26T18:10:26Z

After ir-translator:

  %0:_(s64) = COPY $x0
    %1:_(p0) = G_GLOBAL_VALUE @G
    %6:_(s32) = G_CONSTANT i32 0
    %8:_(s64) = G_CONSTANT i64 4
    %7:_(p0) = nuw G_PTR_ADD %1, %8(s64)
    %2:_(s64) = G_CONSTANT i64 4
    %3:_(s64) = G_MUL %0, %2
    %4:_(p0) = G_PTR_ADD %1, %3(s64)
    %5:_(p0) = COPY %4(p0)
    G_STORE %6(s32), %5(p0) :: (store (s32) into %ir.2)
    G_STORE %6(s32), %7(p0) :: (store (s32) into `ptr getelementptr inbounds ([10 x i32], ptr @G, i64 0, i64 1)`)
    RET_ReallyLR

tschuett · 2024-04-26T18:21:12Z

GlobalIsel has a load stop opt pass.

%5 depends on the global value and %0.
%7 depends on the global value.

aemerson · 2024-04-27T05:53:05Z

Well well well...if it isn't the consequences of my own actions...

v01dXYZ · 2024-04-28T16:42:15Z

@aemerson Hi,
I'm looking at the GlobalISelect.LoadStoreOpt pass because it's quite a small pass and it's good starting point to get into LLVM. Nevertheless, it seems this pass is quite sensible to the order of the load/stores.

For example, swapping the order would trigger the optimisation only for the second one g, (though the emitted code is the same as there is an AArch64-specific optimisation that will detect and coalesce the stores).

@G = external global [10 x i16]

;; 1 then 0
define void @f(i64 %0) {
  %inc.ptr = getelementptr [10 x i16], ptr @G, i64 0, i64 1
  store i16 0, ptr %inc.ptr, align 4

  %ptr =     getelementptr [10 x i16], ptr @G, i64 0, i64 0
  store i16 0, ptr %ptr, align 4

  ret void
}

;; 0 then 1
define void @g(i64 %0) {
  %ptr =     getelementptr [10 x i16], ptr @G, i64 0, i64 0
  store i16 0, ptr %ptr, align 4

  %inc.ptr = getelementptr [10 x i16], ptr @G, i64 0, i64 1
  store i16 0, ptr %inc.ptr, align 4

  ret void
}

The partial debug output:

// No optimisation for f

# *** IR Dump After LoadStoreOpt (loadstore-opt) ***:
# Machine code for function f: IsSSA, TracksLiveness
Function Live Ins: $x0

bb.1 (%ir-block.1):
  liveins: $x0
  %2:_(s64) = G_CONSTANT i64 2
  %1:_(p0) = G_GLOBAL_VALUE @G
  %3:_(p0) = G_PTR_ADD %1:_, %2:_(s64)
  %4:_(s16) = G_CONSTANT i16 0
  G_STORE %4:_(s16), %3:_(p0) :: (store (s16) into %ir.inc.ptr, align 4)
  G_STORE %4:_(s16), %1:_(p0) :: (store (s16) into %ir.ptr1, align 4)
  RET_ReallyLR

# End machine code for function f.

[...]

// Optimisation by a later combiner

Find match for:   STRHHui $wzr, renamable $x8, 1 :: (store (s16) into %ir.inc.ptr, align 4)
Analysing 2nd insn:   STRHHui $wzr, killed renamable $x8, 0 :: (store (s16) into %ir.ptr1, align 4)
Checking, can combine 2nd into 1st insn:
Reg '$wzr' not modified: true
Reg '$wzr' not used: true
No aliases found
Creating wider store. Replacing instructions:
    STRHHui $wzr, renamable $x8, 1 :: (store (s16) into %ir.inc.ptr, align 4)
    STRHHui $wzr, killed renamable $x8, 0 :: (store (s16) into %ir.ptr1, align 4)
  with instruction:
    STRWui $wzr, renamable $x8, 0 :: (store (s16) into %ir.inc.ptr, align 4), (store (s16) into %ir.ptr1, align 4)

Find match for:   STRWui $wzr, renamable $x8, 0 :: (store (s16) into %ir.inc.ptr, align 4), (store (s16) into %ir.ptr1, align 4)

[...]

// Optimisation for g

# *** IR Dump After LoadStoreOpt (loadstore-opt) ***:
# Machine code for function g: IsSSA, TracksLiveness
Function Live Ins: $x0

bb.1 (%ir-block.1):
  liveins: $x0
  %1:_(p0) = G_GLOBAL_VALUE @G
  %5:_(s32) = G_CONSTANT i32 0
  G_STORE %5:_(s32), %1:_(p0) :: (store (s32) into %ir.ptr1)
  RET_ReallyLR

# End machine code for function g.

Do you know why this pass is so sensible to the order of stores to merge them at a target-independent stage ?

aemerson · 2024-04-28T22:53:18Z

The reason is that it was written to catch the most common cases where stores are storing in ascending addresses as you go down the block. This pattern for example is the most common but I think it could be extended to also consider other patterns too. However one of the goals of this pass was to be fast and linear time so we should be careful in how we modify the algorithm.

…ex expr as 0. (#90375) During analysis, we incorrectly leave the offset part of an address info struct as zero, when in actual fact we failed to decompose it into base + offset. This results in incorrectly assuming that the address is adjacent to another store addr. To fix this we wrap the offset in an optional<> so we can distinguish between real zero and unknown. Fixes issue #90242

…ex expr as 0. (llvm#90375) During analysis, we incorrectly leave the offset part of an address info struct as zero, when in actual fact we failed to decompose it into base + offset. This results in incorrectly assuming that the address is adjacent to another store addr. To fix this we wrap the offset in an optional<> so we can distinguish between real zero and unknown. Fixes issue llvm#90242 (cherry picked from commit 19f4d68)

regehr added backend:AArch64 llvm:codegen miscompilation llvm:globalisel labels Apr 26, 2024

aemerson self-assigned this Apr 27, 2024

aemerson mentioned this issue Apr 28, 2024

[GlobalISel] Fix store merging incorrectly classifying an unknown index expr as 0. #90375

Merged

aemerson linked a pull request Apr 28, 2024 that will close this issue

[GlobalISel] Fix store merging incorrectly classifying an unknown index expr as 0. #90375

Merged

aemerson closed this as completed in #90375 Apr 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

incorrect coalescing of stores by AArch64 global isel backend #90242

incorrect coalescing of stores by AArch64 global isel backend #90242

regehr commented Apr 26, 2024

llvmbot commented Apr 26, 2024

tschuett commented Apr 26, 2024

tschuett commented Apr 26, 2024

aemerson commented Apr 27, 2024

v01dXYZ commented Apr 28, 2024

aemerson commented Apr 28, 2024

incorrect coalescing of stores by AArch64 global isel backend #90242

incorrect coalescing of stores by AArch64 global isel backend #90242

Comments

regehr commented Apr 26, 2024

llvmbot commented Apr 26, 2024

tschuett commented Apr 26, 2024

tschuett commented Apr 26, 2024

aemerson commented Apr 27, 2024

v01dXYZ commented Apr 28, 2024

aemerson commented Apr 28, 2024