[MSSA] Don't check non-AA memory effects when using template access #76142

nikic · 2023-12-21T11:29:02Z

Sometimes, we create a MemoryAccess for an instruction, which is later simplified (e.g. via devirtualization) such that the new instruction has no memory effects anymore.

If we later clone the instruction (e.g. during unswitching), then MSSA will not create a MemoryAccess for the new instruction, breaking MSSAU assumptions.

There are principally two ways to solve this. The first is to remove the assumption that a MemoryAccess will be created from MSSAU. I initially pursued this approach, but it is not as simple as it sounds. The core problem is that if there is another later access in the block, we'll have to find a new defining access for it. This is easy if there is another MemoryDef before the MemoryDef we're no longer creating, but becomes complicated if it was the first access in the block. In that case we might have to insert a MemoryPhi and perform the necessary rewrites, which sounds like all too much hassle for this edge-case.

The alternative implemented here is to make sure that the template access is always used if it is specified -- even if we have obtained better information in the meantime. This was already the case wrt AA queries, but the separate non-AA check was performed unconditionally beforehand. Move this check into the non-template branch only.

Sometimes, we create a MemoryAccess for an instruction, which is later simplified (e.g. via devirtualization) such that the new instruction has no memory effects anymore. If we later clone the instruction (e.g. during unswitching), then MSSA will not create a MemoryAccess for the new instruction, breaking MSSAU assumptions. There are principally two ways to solve this. The first is to remove the assumption that a MemoryAccess will be created from MSSAU. I initially pursued this approach, but it is not as simple as it sounds. The core problem is that if there is another later access in the block, we'll have to find a new defining access for it. This is easy if there is another MemoryDef before the MemoryDef we're no longer creating, but becomes complicated if it was the first access in the block. In that case we might have to insert a MemoryPhi and perform the necessary rewrites, which sounds like all too much hassle for this edge-case. The alternative implemented here is to make sure that the template access is always used if it is specified -- even if we have obtained better information in the meantime. This was already the case wrt AA queries, but the separate non-AA check was performed unconditionally beforehand. Move this check into the non-template branch only.

llvmbot · 2023-12-21T11:29:31Z

@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-llvm-analysis

Author: Nikita Popov (nikic)

Changes

Sometimes, we create a MemoryAccess for an instruction, which is later simplified (e.g. via devirtualization) such that the new instruction has no memory effects anymore.

If we later clone the instruction (e.g. during unswitching), then MSSA will not create a MemoryAccess for the new instruction, breaking MSSAU assumptions.

There are principally two ways to solve this. The first is to remove the assumption that a MemoryAccess will be created from MSSAU. I initially pursued this approach, but it is not as simple as it sounds. The core problem is that if there is another later access in the block, we'll have to find a new defining access for it. This is easy if there is another MemoryDef before the MemoryDef we're no longer creating, but becomes complicated if it was the first access in the block. In that case we might have to insert a MemoryPhi and perform the necessary rewrites, which sounds like all too much hassle for this edge-case.

The alternative implemented here is to make sure that the template access is always used if it is specified -- even if we have obtained better information in the meantime. This was already the case wrt AA queries, but the separate non-AA check was performed unconditionally beforehand. Move this check into the non-template branch only.

Full diff: https://github.com/llvm/llvm-project/pull/76142.diff

2 Files Affected:

(modified) llvm/lib/Analysis/MemorySSA.cpp (+6-6)
(added) llvm/test/Transforms/SimpleLoopUnswitch/memssa-readnone-access.ll (+117)

diff --git a/llvm/lib/Analysis/MemorySSA.cpp b/llvm/lib/Analysis/MemorySSA.cpp
index 2cf92ceba01032..ce4942ea1477af 100644
--- a/llvm/lib/Analysis/MemorySSA.cpp
+++ b/llvm/lib/Analysis/MemorySSA.cpp
@@ -1732,12 +1732,6 @@ MemoryUseOrDef *MemorySSA::createNewAccess(Instruction *I,
     }
   }
 
-  // Using a nonstandard AA pipelines might leave us with unexpected modref
-  // results for I, so add a check to not model instructions that may not read
-  // from or write to memory. This is necessary for correctness.
-  if (!I->mayReadFromMemory() && !I->mayWriteToMemory())
-    return nullptr;
-
   bool Def, Use;
   if (Template) {
     Def = isa<MemoryDef>(Template);
@@ -1757,6 +1751,12 @@ MemoryUseOrDef *MemorySSA::createNewAccess(Instruction *I,
     }
 #endif
   } else {
+    // Using a nonstandard AA pipelines might leave us with unexpected modref
+    // results for I, so add a check to not model instructions that may not read
+    // from or write to memory. This is necessary for correctness.
+    if (!I->mayReadFromMemory() && !I->mayWriteToMemory())
+      return nullptr;
+
     // Find out what affect this instruction has on memory.
     ModRefInfo ModRef = AAP->getModRefInfo(I, std::nullopt);
     // The isOrdered check is used to ensure that volatiles end up as defs
diff --git a/llvm/test/Transforms/SimpleLoopUnswitch/memssa-readnone-access.ll b/llvm/test/Transforms/SimpleLoopUnswitch/memssa-readnone-access.ll
new file mode 100644
index 00000000000000..2aaf777683e116
--- /dev/null
+++ b/llvm/test/Transforms/SimpleLoopUnswitch/memssa-readnone-access.ll
@@ -0,0 +1,117 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 4
+; RUN: opt -S -passes="loop-mssa(loop-instsimplify,simple-loop-unswitch<nontrivial>)" < %s | FileCheck %s
+
+@vtable = constant ptr @foo
+
+declare void @foo() memory(none)
+declare void @bar()
+
+; The call becomes known readnone after simplification, but still have a
+; MemoryAccess. Make sure this does not lead to an assertion failure.
+define void @test(i1 %c) {
+; CHECK-LABEL: define void @test(
+; CHECK-SAME: i1 [[C:%.*]]) {
+; CHECK-NEXT:    [[C_FR:%.*]] = freeze i1 [[C]]
+; CHECK-NEXT:    br i1 [[C_FR]], label [[DOTSPLIT_US:%.*]], label [[DOTSPLIT:%.*]]
+; CHECK:       .split.us:
+; CHECK-NEXT:    br label [[LOOP_US:%.*]]
+; CHECK:       loop.us:
+; CHECK-NEXT:    call void @foo()
+; CHECK-NEXT:    br label [[EXIT_SPLIT_US:%.*]]
+; CHECK:       exit.split.us:
+; CHECK-NEXT:    br label [[EXIT:%.*]]
+; CHECK:       .split:
+; CHECK-NEXT:    br label [[LOOP:%.*]]
+; CHECK:       loop:
+; CHECK-NEXT:    call void @foo()
+; CHECK-NEXT:    br label [[LOOP]]
+; CHECK:       exit:
+; CHECK-NEXT:    ret void
+;
+  br label %loop
+
+loop:
+  %fn = load ptr, ptr @vtable, align 8
+  call void %fn()
+  br i1 %c, label %exit, label %loop
+
+exit:
+  ret void
+}
+
+; Variant with another access after the call.
+define void @test2(i1 %c, ptr %p) {
+; CHECK-LABEL: define void @test2(
+; CHECK-SAME: i1 [[C:%.*]], ptr [[P:%.*]]) {
+; CHECK-NEXT:    [[C_FR:%.*]] = freeze i1 [[C]]
+; CHECK-NEXT:    br i1 [[C_FR]], label [[DOTSPLIT_US:%.*]], label [[DOTSPLIT:%.*]]
+; CHECK:       .split.us:
+; CHECK-NEXT:    br label [[LOOP_US:%.*]]
+; CHECK:       loop.us:
+; CHECK-NEXT:    call void @foo()
+; CHECK-NEXT:    call void @bar()
+; CHECK-NEXT:    br label [[EXIT_SPLIT_US:%.*]]
+; CHECK:       exit.split.us:
+; CHECK-NEXT:    br label [[EXIT:%.*]]
+; CHECK:       .split:
+; CHECK-NEXT:    br label [[LOOP:%.*]]
+; CHECK:       loop:
+; CHECK-NEXT:    call void @foo()
+; CHECK-NEXT:    call void @bar()
+; CHECK-NEXT:    br label [[LOOP]]
+; CHECK:       exit:
+; CHECK-NEXT:    ret void
+;
+  br label %loop
+
+loop:
+  %fn = load ptr, ptr @vtable, align 8
+  call void %fn()
+  call void @bar()
+  br i1 %c, label %exit, label %loop
+
+exit:
+  ret void
+}
+
+; Variant with another access after the call and no access before the call.
+define void @test3(i1 %c, ptr %p) {
+; CHECK-LABEL: define void @test3(
+; CHECK-SAME: i1 [[C:%.*]], ptr [[P:%.*]]) {
+; CHECK-NEXT:    [[C_FR:%.*]] = freeze i1 [[C]]
+; CHECK-NEXT:    br i1 [[C_FR]], label [[DOTSPLIT_US:%.*]], label [[DOTSPLIT:%.*]]
+; CHECK:       .split.us:
+; CHECK-NEXT:    br label [[LOOP_US:%.*]]
+; CHECK:       loop.us:
+; CHECK-NEXT:    br label [[SPLIT_US:%.*]]
+; CHECK:       split.us:
+; CHECK-NEXT:    call void @foo()
+; CHECK-NEXT:    call void @bar()
+; CHECK-NEXT:    br label [[EXIT_SPLIT_US:%.*]]
+; CHECK:       exit.split.us:
+; CHECK-NEXT:    br label [[EXIT:%.*]]
+; CHECK:       .split:
+; CHECK-NEXT:    br label [[LOOP:%.*]]
+; CHECK:       loop:
+; CHECK-NEXT:    br label [[SPLIT:%.*]]
+; CHECK:       split:
+; CHECK-NEXT:    call void @foo()
+; CHECK-NEXT:    call void @bar()
+; CHECK-NEXT:    br label [[LOOP]]
+; CHECK:       exit:
+; CHECK-NEXT:    ret void
+;
+  br label %loop
+
+loop:
+  %fn = load ptr, ptr @vtable, align 8
+  br label %split
+
+split:
+  call void %fn()
+  call void @bar()
+  br i1 %c, label %exit, label %loop
+
+exit:
+  ret void
+}

alinas

I think this makes sense as the simple way to address this.
There's also the option to generalize what to do in the case of accesses that are optimized further.
I think that's possible but I may be missing edge cases not covered by current tests. Something like this resolves the testcases in this PR:

diff --git llvm/lib/Analysis/MemorySSAUpdater.cpp llvm/lib/Analysis/MemorySSAUpdater.cpp
index 9ad60f774e9f..0e533cb54f15 100644
--- llvm/lib/Analysis/MemorySSAUpdater.cpp
+++ llvm/lib/Analysis/MemorySSAUpdater.cpp
@@ -578,18 +578,9 @@ static MemoryAccess *getNewDefiningAccessForClone(MemoryAccess *MA,
       if (Instruction *NewDefMUDI =
               cast_or_null<Instruction>(VMap.lookup(DefMUDI))) {
         InsnDefining = MSSA->getMemoryAccess(NewDefMUDI);
-        if (!CloneWasSimplified)
-          assert(InsnDefining && "Defining instruction cannot be nullptr.");
-        else if (!InsnDefining || isa<MemoryUse>(InsnDefining)) {
+        if (!InsnDefining || isa<MemoryUse>(InsnDefining)) {
           // The clone was simplified, it's no longer a MemoryDef, look up.
-          auto DefIt = DefMUD->getDefsIterator();
-          // Since simplified clones only occur in single block cloning, a
-          // previous definition must exist, otherwise NewDefMUDI would not
-          // have been found in VMap.
-          assert(DefIt != MSSA->getBlockDefs(DefMUD->getBlock())->begin() &&
-                 "Previous def must exist");
-          InsnDefining = getNewDefiningAccessForClone(
-              &*(--DefIt), VMap, MPhiMap, CloneWasSimplified, MSSA);
+          InsnDefining = getNewDefiningAccessForClone(DefMUD->getDefiningAccess(), VMap, MPhiMap, CloneWasSimplified, MSSA);
         }
       }
     }
@@ -626,7 +617,7 @@ void MemorySSAUpdater::cloneUsesAndDefs(BasicBlock *BB, BasicBlock *NewBB,
             getNewDefiningAccessForClone(MUD->getDefiningAccess(), VMap,
                                          MPhiMap, CloneWasSimplified, MSSA),
             /*Template=*/CloneWasSimplified ? nullptr : MUD,
-            /*CreationMustSucceed=*/CloneWasSimplified ? false : true);
+            /*CreationMustSucceed=*/false);
         if (NewUseOrDef)
           MSSA->insertIntoListsForBlock(NewUseOrDef, NewBB, MemorySSA::End);
       }

But needs cleanup then of "CloneSimplified" that's now losing its purpose.

If you move fwd with creating accesses for non-memory affecting instructions, a follow up optimization could be to remember those accesses that were still created using a template, then call removeMemoryAccess (leaving optimizePhis to false) on those, after cloning and new phi assignment is complete. This should redirect the defining accesses properly.

nikic · 2023-12-22T08:38:45Z

I think that's possible but I may be missing edge cases not covered by current tests. Something like this resolves the testcases in this PR:

I think this may chose an incorrect defining access in some cases.

Let's say we have the following initial structure:

writes x
writes y
now_readnone (defining: writes x)
reads y (defining: now_readnone)

The important part is that the defining access of now_readnone skips an intermediate MemoryDef for whatever reason (say scoped alias metadata).

Then, I think your new code would set the defining access of clone(reads y) to clone(writes x), even though the correct one would be clone(writes y).

If you move fwd with creating accesses for non-memory affecting instructions, a follow up optimization could be to remember those accesses that were still created using a template, then call removeMemoryAccess (leaving optimizePhis to false) on those, after cloning and new phi assignment is complete. This should redirect the defining accesses properly.

Something we could also do is to already clean up such accesses inside LoopInstSimplify. That is, if we see that a callee has been replaced by a constant, check whether the access can be removed. This has the benefit that we'll simplify the original access as well, not just the cloned access. (Though I don't think we should rely on this happening, as there's just too many ways in that AA results can change.)

nikic · 2024-01-02T09:03:34Z

ping

alinas · 2024-01-02T20:19:15Z

In your example nowreadnone is a MemoryDef, which doesn't get the defining access updated. It has an optimized field which will be set to writes x but its defining should still be writes y.
The mechanism for deleting accesses relies on this too. If a clone is created for nowreadnone, then deleted later (if checked and found it doesn't access memory), then all it's uses will get as the defining access, its own defining access.

I'm ok with moving fwd with this solution to resolve the asserts. I haven't found a counter example for the generalization yet, but I need more time to check it's safe, it's still possible it's missing cases where it's not correct.

nikic · 2024-01-02T20:31:06Z

In your example nowreadnone is a MemoryDef, which doesn't get the defining access updated. It has an optimized field which will be set to writes x but its defining should still be writes y. The mechanism for deleting accesses relies on this too. If a clone is created for nowreadnone, then deleted later (if checked and found it doesn't access memory), then all it's uses will get as the defining access, its own defining access.

Ah yes, good point.

I'm ok with moving fwd with this solution to resolve the asserts. I haven't found a counter example for the generalization yet, but I need more time to check it's safe, it's still possible it's missing cases where it's not correct.

Hm, could it happen that the new defining access ends up in a different block, for which we haven't cloned accesses yet?

alinas · 2024-01-02T22:11:01Z

I don't think that's possible. The cloning logic works in RPO, so all clones are completed for defining accesses.

nikic · 2024-01-03T14:21:00Z

I went ahead and put up another PR with your variant: #76819 You convinced me that this approach is correct, and I think it's a bit cleaner overall.

Sometimes, we create a MemoryAccess for an instruction, which is later simplified (e.g. via devirtualization) such that the new instruction has no memory effects anymore. If we later clone the instruction (e.g. during unswitching), then MSSA will not create a MemoryAccess for the new instruction, triggering an assert. Disable the assertion (by passing CreationMustSucceed=false) and adjust getDefiningAccessForClone() to work correctly in that case. This PR implements the alternative suggestion by alinas from #76142.

nikic · 2024-01-09T11:11:51Z

Closing in favor of #76819.

Sometimes, we create a MemoryAccess for an instruction, which is later simplified (e.g. via devirtualization) such that the new instruction has no memory effects anymore. If we later clone the instruction (e.g. during unswitching), then MSSA will not create a MemoryAccess for the new instruction, triggering an assert. Disable the assertion (by passing CreationMustSucceed=false) and adjust getDefiningAccessForClone() to work correctly in that case. This PR implements the alternative suggestion by alinas from llvm#76142. (cherry picked from commit d02c793)

Sometimes, we create a MemoryAccess for an instruction, which is later simplified (e.g. via devirtualization) such that the new instruction has no memory effects anymore. If we later clone the instruction (e.g. during unswitching), then MSSA will not create a MemoryAccess for the new instruction, triggering an assert. Disable the assertion (by passing CreationMustSucceed=false) and adjust getDefiningAccessForClone() to work correctly in that case. This PR implements the alternative suggestion by alinas from llvm#76142.

Sometimes, we create a MemoryAccess for an instruction, which is later simplified (e.g. via devirtualization) such that the new instruction has no memory effects anymore. If we later clone the instruction (e.g. during unswitching), then MSSA will not create a MemoryAccess for the new instruction, triggering an assert. Disable the assertion (by passing CreationMustSucceed=false) and adjust getDefiningAccessForClone() to work correctly in that case. This PR implements the alternative suggestion by alinas from llvm#76142. (cherry picked from commit d02c793)

nikic requested review from fhahn and alinas December 21, 2023 11:29

llvmbot added llvm:analysis llvm:transforms labels Dec 21, 2023

nikic mentioned this pull request Dec 21, 2023

SIGSEGV in rustc upon compiling with --release rust-lang/rust#119179

Closed

alinas reviewed Dec 21, 2023

View reviewed changes

alinas approved these changes Jan 2, 2024

View reviewed changes

nikic mentioned this pull request Jan 3, 2024

[MSSA] Don't require clone creation to succeed #76819

Merged

nikic closed this Jan 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MSSA] Don't check non-AA memory effects when using template access #76142

[MSSA] Don't check non-AA memory effects when using template access #76142

nikic commented Dec 21, 2023

llvmbot commented Dec 21, 2023 •

edited

Loading

alinas left a comment

nikic commented Dec 22, 2023 •

edited

Loading

nikic commented Jan 2, 2024

alinas commented Jan 2, 2024

nikic commented Jan 2, 2024

alinas commented Jan 2, 2024

nikic commented Jan 3, 2024

nikic commented Jan 9, 2024

[MSSA] Don't check non-AA memory effects when using template access #76142

[MSSA] Don't check non-AA memory effects when using template access #76142

Conversation

nikic commented Dec 21, 2023

llvmbot commented Dec 21, 2023 • edited Loading

alinas left a comment

Choose a reason for hiding this comment

nikic commented Dec 22, 2023 • edited Loading

nikic commented Jan 2, 2024

alinas commented Jan 2, 2024

nikic commented Jan 2, 2024

alinas commented Jan 2, 2024

nikic commented Jan 3, 2024

nikic commented Jan 9, 2024

llvmbot commented Dec 21, 2023 •

edited

Loading

nikic commented Dec 22, 2023 •

edited

Loading