Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[JumpThreading] Missing optimization: thread over a nearly empty basicblock #76609

Closed
XChy opened this issue Dec 30, 2023 · 5 comments · Fixed by #86312
Closed

[JumpThreading] Missing optimization: thread over a nearly empty basicblock #76609

XChy opened this issue Dec 30, 2023 · 5 comments · Fixed by #86312

Comments

@XChy
Copy link
Member

XChy commented Dec 30, 2023

Alive2 proof: https://alive2.llvm.org/ce/z/Lv9e4r

Description:

define i1 @src(i1 %c1) {
entry:                                      
  %c = call i1 @cond()
  br i1 %c, label %then, label %else

then:                                        
  call void @dummy()
  br i1 %c1, label %else, label %return

else:                                         
  br label %return

return:                                           
  %retval.0 = phi i1 [ true, %else ], [ false, %then ]
  ret i1 %retval.0
}

can be folded to:

define i1 @tgt(i1 %c1) {
entry:
  %c = call i1 @cond()
  br i1 %c, label %then, label %return

then:                                        
  call void @dummy()
  br i1 %c1, label %else, label %return

else:                                         
  br label %return

return:                                           
  %retval.0 = phi i1 [ true, %entry ], [ false, %then ], [true, %else]
  ret i1 %retval.0
}

The conditional branch in BB entry can thread directly through nearly empty BB else to return.

Real-world motivation

This snippet of IR is derived from auto-generated qemu/xxx/qapi-visit-dump.c (after inline and O3 pipeline).
The example above is a reduced version. If you're interested in the original suboptimal IR and optimal IR, contact me to get it(Too big to upload here).

Let me know if you can confirm that it's an optimization opportunity, thanks.

@elhewaty
Copy link
Member

Hi, I find this one interesting. Can you mention a good starting point?

@XChy
Copy link
Member Author

XChy commented Jan 16, 2024

@elhewaty , I once worked on similar things in SimplifyCFG and JumpThreading pass (#67275). A possible solution is to relax the non-empty phis constraint at

if (BB->phis().empty() || Succ->phis().empty())

, and then handle the empty basicblock without phi correctly.

@elhewaty
Copy link
Member

@XChy why can't the branch in BB then folded directly to return instead of else, as else only goes to return, So shouldn't every jump to else goes to return?

@XChy
Copy link
Member Author

XChy commented Jan 29, 2024

@elhewaty , the phi in return constraints it. When coming from then, the phi is false. But when coming from else, the phi is true. If we thread then directly, we need to remove then from phi and the semantics changes.
Jumping from entry demands modification on the phi, as the motivating example shows.

@XChy
Copy link
Member Author

XChy commented Feb 28, 2024

This missed optimization opportunity also exists in CPython/_ctypes/_ctypes.c.

@XChy XChy self-assigned this Mar 22, 2024
XChy added a commit that referenced this issue Apr 16, 2024
)

Fixes #76609
This patch does:
- relax the phis constraint in `CanRedirectPredsOfEmptyBBToSucc`
- guarantee the BB has multiple different predecessors to redirect, so
that we can handle the case without phis in BB. Without this change and
phi constraint, we may redirect the CommonPred.

The motivation is consistent with JumpThreading. We always want the
branch to jump more direct to the destination, without passing the
middle block. In this way, we can expose more other optimization
opportunities.

An obivous example proposed by @dtcxzyw is like:
```llvm
define i32 @test(...) {
entry:
   br i1 %c, label %do.end, label %if.then

if.then:                                          ; preds = %entry
   %call2 = call i32 @dummy()
   %tobool3.not = icmp eq i32 %call2, 0
   br i1 %tobool3.not, label %do.end, label %return

do.end:                                           ; preds = %entry, %if.then
   br label %return

return:                                           ; preds = %if.then, %do.end
   %retval.0 = phi i32 [ 0, %do.end ], [ %call2, %if.then ]
   ret i32 %retval.0
}
```
`entry` can directly jump to return, without passing `do.end`, and then
the if-else pattern can be simplified further:
```llvm
define i32 @test(...) {
entry:
   br i1 %c, label %return, label %if.then

if.then:                                          ; preds = %entry
   %call2 = call i32 @dummy()
   br label %return

return:                                           ; preds = %if.then
   %retval.0 = phi i32 [ 0, %entry ], [ %call2, %if.then ]
   ret i32 %retval.0
}
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants