-
Notifications
You must be signed in to change notification settings - Fork 12.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
incorrect coalescing of stores by AArch64 global isel backend #90242
Comments
@llvm/issue-subscribers-backend-aarch64 Author: John Regehr (regehr)
this function is getting lowered incorrectly by global isel for AArch64:
```llvm
@G = external global [10 x i32]
define void @f(i64 %0) {
_f: ; @f
|
After ir-translator:
|
GlobalIsel has a load stop opt pass.
|
Well well well...if it isn't the consequences of my own actions... |
@aemerson Hi, For example, swapping the order would trigger the optimisation only for the second one @G = external global [10 x i16]
;; 1 then 0
define void @f(i64 %0) {
%inc.ptr = getelementptr [10 x i16], ptr @G, i64 0, i64 1
store i16 0, ptr %inc.ptr, align 4
%ptr = getelementptr [10 x i16], ptr @G, i64 0, i64 0
store i16 0, ptr %ptr, align 4
ret void
}
;; 0 then 1
define void @g(i64 %0) {
%ptr = getelementptr [10 x i16], ptr @G, i64 0, i64 0
store i16 0, ptr %ptr, align 4
%inc.ptr = getelementptr [10 x i16], ptr @G, i64 0, i64 1
store i16 0, ptr %inc.ptr, align 4
ret void
} The partial debug output: // No optimisation for f
# *** IR Dump After LoadStoreOpt (loadstore-opt) ***:
# Machine code for function f: IsSSA, TracksLiveness
Function Live Ins: $x0
bb.1 (%ir-block.1):
liveins: $x0
%2:_(s64) = G_CONSTANT i64 2
%1:_(p0) = G_GLOBAL_VALUE @G
%3:_(p0) = G_PTR_ADD %1:_, %2:_(s64)
%4:_(s16) = G_CONSTANT i16 0
G_STORE %4:_(s16), %3:_(p0) :: (store (s16) into %ir.inc.ptr, align 4)
G_STORE %4:_(s16), %1:_(p0) :: (store (s16) into %ir.ptr1, align 4)
RET_ReallyLR
# End machine code for function f.
[...]
// Optimisation by a later combiner
Find match for: STRHHui $wzr, renamable $x8, 1 :: (store (s16) into %ir.inc.ptr, align 4)
Analysing 2nd insn: STRHHui $wzr, killed renamable $x8, 0 :: (store (s16) into %ir.ptr1, align 4)
Checking, can combine 2nd into 1st insn:
Reg '$wzr' not modified: true
Reg '$wzr' not used: true
No aliases found
Creating wider store. Replacing instructions:
STRHHui $wzr, renamable $x8, 1 :: (store (s16) into %ir.inc.ptr, align 4)
STRHHui $wzr, killed renamable $x8, 0 :: (store (s16) into %ir.ptr1, align 4)
with instruction:
STRWui $wzr, renamable $x8, 0 :: (store (s16) into %ir.inc.ptr, align 4), (store (s16) into %ir.ptr1, align 4)
Find match for: STRWui $wzr, renamable $x8, 0 :: (store (s16) into %ir.inc.ptr, align 4), (store (s16) into %ir.ptr1, align 4)
[...]
// Optimisation for g
# *** IR Dump After LoadStoreOpt (loadstore-opt) ***:
# Machine code for function g: IsSSA, TracksLiveness
Function Live Ins: $x0
bb.1 (%ir-block.1):
liveins: $x0
%1:_(p0) = G_GLOBAL_VALUE @G
%5:_(s32) = G_CONSTANT i32 0
G_STORE %5:_(s32), %1:_(p0) :: (store (s32) into %ir.ptr1)
RET_ReallyLR
# End machine code for function g. Do you know why this pass is so sensible to the order of stores to merge them at a target-independent stage ? |
The reason is that it was written to catch the most common cases where stores are storing in ascending addresses as you go down the block. This pattern for example is the most common but I think it could be extended to also consider other patterns too. However one of the goals of this pass was to be fast and linear time so we should be careful in how we modify the algorithm. |
…ex expr as 0. (#90375) During analysis, we incorrectly leave the offset part of an address info struct as zero, when in actual fact we failed to decompose it into base + offset. This results in incorrectly assuming that the address is adjacent to another store addr. To fix this we wrap the offset in an optional<> so we can distinguish between real zero and unknown. Fixes issue #90242
…ex expr as 0. (llvm#90375) During analysis, we incorrectly leave the offset part of an address info struct as zero, when in actual fact we failed to decompose it into base + offset. This results in incorrectly assuming that the address is adjacent to another store addr. To fix this we wrap the offset in an optional<> so we can distinguish between real zero and unknown. Fixes issue llvm#90242 (cherry picked from commit 19f4d68)
…ex expr as 0. (llvm#90375) During analysis, we incorrectly leave the offset part of an address info struct as zero, when in actual fact we failed to decompose it into base + offset. This results in incorrectly assuming that the address is adjacent to another store addr. To fix this we wrap the offset in an optional<> so we can distinguish between real zero and unknown. Fixes issue llvm#90242 (cherry picked from commit 19f4d68)
this function is getting lowered incorrectly by global isel for AArch64:
it should be storing 0 to index 1 and also index
%0
but global isel is incorrectly coalescing these into a single store:cc @Hatsunespica
The text was updated successfully, but these errors were encountered: