Skip to content

Commit

Permalink
[Grammar] Update 10-3 Replace branches with predication.md
Browse files Browse the repository at this point in the history
  • Loading branch information
dendibakh authored Sep 23, 2024
1 parent 3abe0f8 commit f4e7304
Showing 1 changed file with 2 additions and 2 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ if (cond) { /* frequently mispredicted */ => int y = computeY();
foo(a);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
For the code on the right, the compiler can replace the branch that comes from the ternary operator, and generate a `CMOV` x86 instruction instead. A `CMOVcc` instruction checks the state of one or more of the status flags in the `EFLAGS` register (`CF, OF, PF, SF` and `ZF`) and performs a move operation if the flags are in a specified state or condition. A similar transformation can be done for floating-point numbers with `FCMOVcc,VMAXSS/VMINSS` instructions. In the ARM ISA, there is `CSEL` (conditional selection) instruction, but also `CSINC` (select and increment), `CSNEG` (select and negate), and a few other conditional instructions.
For the code on the right, the compiler can replace the branch that comes from the ternary operator, and generate a `CMOV` x86 instruction instead. A `CMOVcc` instruction checks the state of one or more of the status flags in the `EFLAGS` register (`CF, OF, PF, SF` and `ZF`) and performs a move operation if the flags are in a specified state or condition. A similar transformation can be done for floating-point numbers with `FCMOVcc, VMAXSS/VMINSS` instructions. In the ARM ISA, there is `CSEL` (conditional selection) instruction, but also `CSINC` (select and increment), `CSNEG` (select and negate), and a few other conditional instructions.
Listing: Replacing Branches with Selection - x86 assembly code.
Expand All @@ -33,7 +33,7 @@ Listing: Replacing Branches with Selection - x86 assembly code.
[@lst:ReplaceBranchesWithSelectionAsm] shows assembly listings for the original and the branchless version. In contrast with the original version, the branchless version doesn't have jump instructions. However, the branchless version calculates both `x` and `y` independently, and then selects one of the values and discards the other. While this transformation eliminates the penalty of a branch misprediction, it is doing more work than the original code.
We already know that the branch in the original version on the left is hard to predict. This is what motivates us to try a branchless version in the first place. In this example, the performance gain of this change depends on the characteristics of `computeX` and `computeY` functions. If the functions are small[^1] and the compiler can inline them, then selection might bring noticeable performance benefits. If the functions are big[^2], it might be cheaper to take the cost of a branch mispredict than to execute both `computeX` and `computeY` functions. Ultimately, performance measurements always decide which version is better.
We already know that the branch in the original version on the left is hard to predict. This is what motivates us to try a branchless version in the first place. In this example, the performance gain of this change depends on the characteristics of the `computeX` and `computeY` functions. If the functions are small[^1] and the compiler can inline them, then selection might bring noticeable performance benefits. If the functions are big[^2], it might be cheaper to take the cost of a branch mispredict than to execute both `computeX` and `computeY` functions. Ultimately, performance measurements always decide which version is better.
Take a look at [@lst:ReplaceBranchesWithSelectionAsm] one more time. On the left, a processor can predict, for example, that the `je 400514` branch will be taken, speculatively call `computeY`, and start running code from the function `foo`. Remember, branch prediction usually happens many cycles before we know the actual outcome of the branch. By the time we start resolving the branch, we could be already halfway through the `foo` function, despite it is still speculative. If we are correct, we've saved a lot of cycles. If we are wrong, we have to take the penalty and start over from the correct path. In the latter case, we don't gain anything from the fact that we have already completed a portion of `foo`, it all must be thrown away. If the mispredictions occur too often, the recovering penalty outweighs the gains from speculative execution.
Expand Down

0 comments on commit f4e7304

Please sign in to comment.