-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT: Do greedy 4-opt for backward jumps in 3-opt layout #110277
Conversation
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
cc @dotnet/jit-contrib, @AndyAyersMS PTAL. Diffs show this has more PerfScore improvements than regressions across platforms, for what it's worth. TP regressions seem to be inflated by some outlier in |
I have a machine that can still run pin. If you want, I can try and find the problematic case or cases. |
Thank you for the offer! I might be able to hunt it down manually -- I'll let you know how that goes. |
Also, the code looks uglier, but pre-computing part of the partition cost improved TP across the board by quite a bit. |
The pathological case is the same as the one in #109521: We have a method with over a thousand basic blocks, and some blocks that are interesting to 3/4-opt have hundreds of predecessors. To compute the costs of potential cut points, we have to iterate up to every single predecessor edge into each block to the right of a cut point; with the previous implementation of this PR, this meant iterating up to 780 predecessor edges, dozens of times. With the new logic for precomputing some parts of the cost, the TP cost for this particular method drops from over 160% to 3.9%, hence why the TP diffs look far less dramatic overall. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Large pred lists are a frequent source of trouble. Glad you were able to track down the problematic case.
/ba-g Build analysis blocked by #110173 |
Part of dotnet#107749. Follow-up to dotnet#103450. Greedy 3-opt (i.e. an implementation that requires each move to be profitable on its own) is not well-suited for discovering profitable moves for backward jumps, as such movement requires an unrelated move to first place the source block lexically behind the destination block. Thus, the 3-opt implementation added in dotnet#103450 incorporates a 4-opt move for backward jumps, where we partition 1) before the destination block, 2) before the source block, and 3) directly after the source block. This 4-opt implementation can be expanded to search for the best cut point between the destination and source blocks to maximize its efficacy.
Part of dotnet#107749. Follow-up to dotnet#103450. Greedy 3-opt (i.e. an implementation that requires each move to be profitable on its own) is not well-suited for discovering profitable moves for backward jumps, as such movement requires an unrelated move to first place the source block lexically behind the destination block. Thus, the 3-opt implementation added in dotnet#103450 incorporates a 4-opt move for backward jumps, where we partition 1) before the destination block, 2) before the source block, and 3) directly after the source block. This 4-opt implementation can be expanded to search for the best cut point between the destination and source blocks to maximize its efficacy.
Follow-up to #110277. Fixes #110756. Don't consider 4-opt cut points that would move the entry block of a try/handler region below other blocks in the region. Previously, either future moves would put the entry block back at the top of the region, or we would get unlucky in the rare case and hit asserts.
Part of #107749. Follow-up to #103450. Greedy 3-opt (i.e. an implementation that requires each move to be profitable on its own) is not well-suited for discovering profitable moves for backward jumps, as such movement requires an unrelated move to first place the source block lexically behind the destination block. Thus, the 3-opt implementation added in #103450 incorporates a 4-opt move for backward jumps, where we partition 1) before the destination block, 2) before the source block, and 3) directly after the source block. This 4-opt implementation can be expanded to search for the best cut point between the destination and source blocks to maximize its efficacy. Since we can compute the distance between the blocks, we can skip this linear search for large distances if it proves to be too expensive.