Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

incorrect coalescing of stores by AArch64 global isel backend #90242

Closed
regehr opened this issue Apr 26, 2024 · 6 comments · Fixed by #90375
Closed

incorrect coalescing of stores by AArch64 global isel backend #90242

regehr opened this issue Apr 26, 2024 · 6 comments · Fixed by #90375

Comments

@regehr
Copy link
Contributor

regehr commented Apr 26, 2024

this function is getting lowered incorrectly by global isel for AArch64:

@G = external global [10 x i32]

define void @f(i64 %0) {
  %2 = getelementptr [10 x i32], ptr @G, i64 0, i64 %0
  store i32 0, ptr %2, align 4
  store i32 0, ptr getelementptr inbounds ([10 x i32], ptr @G, i64 0, i64 1), align 4
  ret void
}

it should be storing 0 to index 1 and also index %0 but global isel is incorrectly coalescing these into a single store:

_f:                                     ; @f
	adrp	x8, _G@GOTPAGE
	lsl	x9, x0, #2
	ldr	x8, [x8, _G@GOTPAGEOFF]
	str	xzr, [x8, x9]
	ret

cc @Hatsunespica

@llvmbot
Copy link
Collaborator

llvmbot commented Apr 26, 2024

@llvm/issue-subscribers-backend-aarch64

Author: John Regehr (regehr)

this function is getting lowered incorrectly by global isel for AArch64: ```llvm @G = external global [10 x i32]

define void @f(i64 %0) {
%2 = getelementptr [10 x i32], ptr @G, i64 0, i64 %0
store i32 0, ptr %2, align 4
store i32 0, ptr getelementptr inbounds ([10 x i32], ptr @G, i64 0, i64 1), align 4
ret void
}

it should be storing 0 to index 1 and also index `%0` but global isel is incorrectly coalescing these into a single store:

_f: ; @f
adrp x8, _G@GOTPAGE
lsl x9, x0, #2
ldr x8, [x8, _G@GOTPAGEOFF]
str xzr, [x8, x9]
ret


cc @<!-- -->hatsunespica

</details>

@tschuett
Copy link
Member

After ir-translator:

  %0:_(s64) = COPY $x0
    %1:_(p0) = G_GLOBAL_VALUE @G
    %6:_(s32) = G_CONSTANT i32 0
    %8:_(s64) = G_CONSTANT i64 4
    %7:_(p0) = nuw G_PTR_ADD %1, %8(s64)
    %2:_(s64) = G_CONSTANT i64 4
    %3:_(s64) = G_MUL %0, %2
    %4:_(p0) = G_PTR_ADD %1, %3(s64)
    %5:_(p0) = COPY %4(p0)
    G_STORE %6(s32), %5(p0) :: (store (s32) into %ir.2)
    G_STORE %6(s32), %7(p0) :: (store (s32) into `ptr getelementptr inbounds ([10 x i32], ptr @G, i64 0, i64 1)`)
    RET_ReallyLR

@tschuett
Copy link
Member

GlobalIsel has a load stop opt pass.

%5 depends on the global value and %0.
%7 depends on the global value.

@aemerson
Copy link
Contributor

Well well well...if it isn't the consequences of my own actions...

@v01dXYZ
Copy link
Contributor

v01dXYZ commented Apr 28, 2024

@aemerson Hi,
I'm looking at the GlobalISelect.LoadStoreOpt pass because it's quite a small pass and it's good starting point to get into LLVM. Nevertheless, it seems this pass is quite sensible to the order of the load/stores.

For example, swapping the order would trigger the optimisation only for the second one g, (though the emitted code is the same as there is an AArch64-specific optimisation that will detect and coalesce the stores).

@G = external global [10 x i16]

;; 1 then 0
define void @f(i64 %0) {
  %inc.ptr = getelementptr [10 x i16], ptr @G, i64 0, i64 1
  store i16 0, ptr %inc.ptr, align 4

  %ptr =     getelementptr [10 x i16], ptr @G, i64 0, i64 0
  store i16 0, ptr %ptr, align 4

  ret void
}

;; 0 then 1
define void @g(i64 %0) {
  %ptr =     getelementptr [10 x i16], ptr @G, i64 0, i64 0
  store i16 0, ptr %ptr, align 4

  %inc.ptr = getelementptr [10 x i16], ptr @G, i64 0, i64 1
  store i16 0, ptr %inc.ptr, align 4

  ret void
}

The partial debug output:

// No optimisation for f

# *** IR Dump After LoadStoreOpt (loadstore-opt) ***:
# Machine code for function f: IsSSA, TracksLiveness
Function Live Ins: $x0

bb.1 (%ir-block.1):
  liveins: $x0
  %2:_(s64) = G_CONSTANT i64 2
  %1:_(p0) = G_GLOBAL_VALUE @G
  %3:_(p0) = G_PTR_ADD %1:_, %2:_(s64)
  %4:_(s16) = G_CONSTANT i16 0
  G_STORE %4:_(s16), %3:_(p0) :: (store (s16) into %ir.inc.ptr, align 4)
  G_STORE %4:_(s16), %1:_(p0) :: (store (s16) into %ir.ptr1, align 4)
  RET_ReallyLR

# End machine code for function f.

[...]

// Optimisation by a later combiner

Find match for:   STRHHui $wzr, renamable $x8, 1 :: (store (s16) into %ir.inc.ptr, align 4)
Analysing 2nd insn:   STRHHui $wzr, killed renamable $x8, 0 :: (store (s16) into %ir.ptr1, align 4)
Checking, can combine 2nd into 1st insn:
Reg '$wzr' not modified: true
Reg '$wzr' not used: true
No aliases found
Creating wider store. Replacing instructions:
    STRHHui $wzr, renamable $x8, 1 :: (store (s16) into %ir.inc.ptr, align 4)
    STRHHui $wzr, killed renamable $x8, 0 :: (store (s16) into %ir.ptr1, align 4)
  with instruction:
    STRWui $wzr, renamable $x8, 0 :: (store (s16) into %ir.inc.ptr, align 4), (store (s16) into %ir.ptr1, align 4)

Find match for:   STRWui $wzr, renamable $x8, 0 :: (store (s16) into %ir.inc.ptr, align 4), (store (s16) into %ir.ptr1, align 4)

[...]

// Optimisation for g

# *** IR Dump After LoadStoreOpt (loadstore-opt) ***:
# Machine code for function g: IsSSA, TracksLiveness
Function Live Ins: $x0

bb.1 (%ir-block.1):
  liveins: $x0
  %1:_(p0) = G_GLOBAL_VALUE @G
  %5:_(s32) = G_CONSTANT i32 0
  G_STORE %5:_(s32), %1:_(p0) :: (store (s32) into %ir.ptr1)
  RET_ReallyLR

# End machine code for function g.

Do you know why this pass is so sensible to the order of stores to merge them at a target-independent stage ?

@aemerson
Copy link
Contributor

The reason is that it was written to catch the most common cases where stores are storing in ascending addresses as you go down the block. This pattern for example is the most common but I think it could be extended to also consider other patterns too. However one of the goals of this pass was to be fast and linear time so we should be careful in how we modify the algorithm.

aemerson added a commit that referenced this issue Apr 30, 2024
…ex expr as 0. (#90375)

During analysis, we incorrectly leave the offset part of an address info
struct
as zero, when in actual fact we failed to decompose it into base +
offset.
This results in incorrectly assuming that the address is adjacent to
another store
addr. To fix this we wrap the offset in an optional<> so we can
distinguish between
real zero and unknown.

Fixes issue #90242
llvmbot pushed a commit to llvmbot/llvm-project that referenced this issue Apr 30, 2024
…ex expr as 0. (llvm#90375)

During analysis, we incorrectly leave the offset part of an address info
struct
as zero, when in actual fact we failed to decompose it into base +
offset.
This results in incorrectly assuming that the address is adjacent to
another store
addr. To fix this we wrap the offset in an optional<> so we can
distinguish between
real zero and unknown.

Fixes issue llvm#90242

(cherry picked from commit 19f4d68)
tstellar pushed a commit to llvmbot/llvm-project that referenced this issue May 1, 2024
…ex expr as 0. (llvm#90375)

During analysis, we incorrectly leave the offset part of an address info
struct
as zero, when in actual fact we failed to decompose it into base +
offset.
This results in incorrectly assuming that the address is adjacent to
another store
addr. To fix this we wrap the offset in an optional<> so we can
distinguish between
real zero and unknown.

Fixes issue llvm#90242

(cherry picked from commit 19f4d68)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants