Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: Do greedy 4-opt for backward jumps in 3-opt layout #110277

Merged
merged 3 commits into from
Dec 3, 2024

Conversation

amanasifkhalid
Copy link
Member

@amanasifkhalid amanasifkhalid commented Nov 29, 2024

Part of #107749. Follow-up to #103450. Greedy 3-opt (i.e. an implementation that requires each move to be profitable on its own) is not well-suited for discovering profitable moves for backward jumps, as such movement requires an unrelated move to first place the source block lexically behind the destination block. Thus, the 3-opt implementation added in #103450 incorporates a 4-opt move for backward jumps, where we partition 1) before the destination block, 2) before the source block, and 3) directly after the source block. This 4-opt implementation can be expanded to search for the best cut point between the destination and source blocks to maximize its efficacy. Since we can compute the distance between the blocks, we can skip this linear search for large distances if it proves to be too expensive.

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Nov 29, 2024
Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

@amanasifkhalid
Copy link
Member Author

cc @dotnet/jit-contrib, @AndyAyersMS PTAL. Diffs show this has more PerfScore improvements than regressions across platforms, for what it's worth. TP regressions seem to be inflated by some outlier in realworld, though I'm having a tough time narrowing it down without a working pin binary locally...

@AndyAyersMS
Copy link
Member

I have a machine that can still run pin. If you want, I can try and find the problematic case or cases.

@amanasifkhalid
Copy link
Member Author

I have a machine that can still run pin. If you want, I can try and find the problematic case or cases.

Thank you for the offer! I might be able to hunt it down manually -- I'll let you know how that goes.

@amanasifkhalid
Copy link
Member Author

Also, the code looks uglier, but pre-computing part of the partition cost improved TP across the board by quite a bit.

@amanasifkhalid
Copy link
Member Author

Thank you for the offer! I might be able to hunt it down manually -- I'll let you know how that goes.

The pathological case is the same as the one in #109521: We have a method with over a thousand basic blocks, and some blocks that are interesting to 3/4-opt have hundreds of predecessors. To compute the costs of potential cut points, we have to iterate up to every single predecessor edge into each block to the right of a cut point; with the previous implementation of this PR, this meant iterating up to 780 predecessor edges, dozens of times. With the new logic for precomputing some parts of the cost, the TP cost for this particular method drops from over 160% to 3.9%, hence why the TP diffs look far less dramatic overall.

Copy link
Member

@AndyAyersMS AndyAyersMS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Large pred lists are a frequent source of trouble. Glad you were able to track down the problematic case.

@amanasifkhalid
Copy link
Member Author

/ba-g Build analysis blocked by #110173

@amanasifkhalid amanasifkhalid merged commit 6d12a30 into dotnet:main Dec 3, 2024
102 of 108 checks passed
@amanasifkhalid amanasifkhalid deleted the greedy-4-opt branch December 3, 2024 21:25
eduardo-vp pushed a commit to eduardo-vp/runtime that referenced this pull request Dec 5, 2024
Part of dotnet#107749. Follow-up to dotnet#103450. Greedy 3-opt (i.e. an implementation that requires each move to be profitable on its own) is not well-suited for discovering profitable moves for backward jumps, as such movement requires an unrelated move to first place the source block lexically behind the destination block. Thus, the 3-opt implementation added in dotnet#103450 incorporates a 4-opt move for backward jumps, where we partition 1) before the destination block, 2) before the source block, and 3) directly after the source block. This 4-opt implementation can be expanded to search for the best cut point between the destination and source blocks to maximize its efficacy.
mikelle-rogers pushed a commit to mikelle-rogers/runtime that referenced this pull request Dec 10, 2024
Part of dotnet#107749. Follow-up to dotnet#103450. Greedy 3-opt (i.e. an implementation that requires each move to be profitable on its own) is not well-suited for discovering profitable moves for backward jumps, as such movement requires an unrelated move to first place the source block lexically behind the destination block. Thus, the 3-opt implementation added in dotnet#103450 incorporates a 4-opt move for backward jumps, where we partition 1) before the destination block, 2) before the source block, and 3) directly after the source block. This 4-opt implementation can be expanded to search for the best cut point between the destination and source blocks to maximize its efficacy.
amanasifkhalid added a commit that referenced this pull request Dec 24, 2024
Follow-up to #110277. Fixes #110756. Don't consider 4-opt cut points that would move the entry block of a try/handler region below other blocks in the region. Previously, either future moves would put the entry block back at the top of the region, or we would get unlucky in the rare case and hit asserts.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants