-
Notifications
You must be signed in to change notification settings - Fork 12k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[X86] cmov conversion hurts binary search performance #39374
Comments
For reference, the relevant line in the original Rust code is: https://github.com/rust-lang/rust/blob/059e6a6f57f4e80d527a3cd8a8afe7f51f01af8e/src/libcore/slice/mod.rs#L1423 The interesting part of the IR:
And assembly with cmov (what we want):
And with branch (what we get):
|
I remember this problem from a while ago, and I wonder if the !unpredictable metadata attachment could solve this problem: I see a few instances of it in the test suite which gives me hope:
|
Just gave !unpredictable a try, and unfortunately it didn't help. This does sound like something that the pass should take into account -- not sure if this information is still available at that point though. |
Looks like a similar optimization in CodeGenPrepare can be disabled via !unpredictable: https://github.com/llvm-mirror/llvm/blob/f2511abd84e3aa730ed68ac30881b0895d03b6d4/lib/CodeGen/CodeGenPrepare.cpp#L5756 But this metadata is not preserved in SelectionDAG. |
Yes, 'unpredictable' was created for situations like this, but I never looked at getting metadata to survive to a machine IR pass. Taming the heuristic cost calc might be the easier fix. Which CPU(s) are showing the 2-3x slowdown? It's possible that a particular sched model and/or generic defaults should be adjusted to avoid this problem. |
@spatel: I'm seeing 2.8x slowdown on i5-4690 CPU (Haswell), with both default and native target CPU. On the Rust bug report two other people report similar numbers, though I don't know which CPUs those are. I don't think that changing scheduling models will help for this particular case (where the transform should be pretty much universally disadvantageous, as the branch is unpredictable). |
Yes, I agree - and apparently so did the author of this pass. :) |
So that test case has a structure like %Left.sink = select i1 %tobool, %struct.Node** %Left, %struct.Node** %Right which I think is guarded by https://github.com/llvm-mirror/llvm/blob/0bbe50f2fb47207ac8093bb700edcb75616b7dd5/lib/Target/X86/X86CmovConversion.cpp#L536 in the cmov pass. Notably, and as the comment there indicates, this is not actually a binary search, but rather a binary tree search, and the code uses "load depending on select" to detect that case. |
This case seems pretty tricky to me, because on the surface, the pattern is exactly one where cmov conversion should be profitable (both select values are already computed, we're only waiting on the condition) -- under the assumption that the branch is predictable. The cmov conversion cost modeling currently only models whether the conversion is profitable under the assumption of a reasonably well-predicted branch, but (apart from the tree-search case mentioned above) doesn't make an effort to determine whether or not the branch is expected to be predictable. And I'm not sure how it could really do that. My best idea would be that branch predictability correlates with access locality, and that branches depending on random-access loads are more likely to be unpredictable, while branches depending on loads from a linear induction variable are likely to be predictable. But that's something that doesn't sound easy to check at the point where cmov conversion operates. |
Looks like this issue has previously been reported at #36492 . |
This commit added a magic-number (why 12.5%?) restriction: ...so there's still a possibility that we could adjust the code to avoid this case. If not (or even if we do that), we should figure out how to run the plumbing to get 'unpredictable' down to this layer and have this pass use that info. |
For reference, the computed values in the cost modelling (for generic target) are: Diff[0]: 1 |
*** Bug llvm/llvm-bugzilla-archive#40646 has been marked as a duplicate of this bug. *** |
I was wondering if there has been any progress on this. One case very similar to binary search where this hits us is binary heap primitives. See for example this code: https://godbolt.org/z/h6hFxg The condition highlighted is unpredictable, and the cmov conversion has a significant negative impact. We have a workaround that uses inline asm to force a cmov, but supporting that is quite annoying (the real implementation is a template and we have an extension point to define per-comparator workarounds). It also seems that branch probabilities from FDO data are not taken into account by the optimization. As in the godbolt link, disabling the optimization globally restores the cmov, but that may have other unwanted side effects. Is there any way to disable cmov conversion on a per-function basis? Ideally, we'd use the "unpredictable" annotation, but I'm not sure whether it's possible to express that in Clang (is there a corresponding builtin?) and it also looks like the optimization is still not honoring that either. |
Yes, there is a clang builtin: if (__builtin_unpredictable(x > 0)) { https://clang.llvm.org/docs/LanguageExtensions.html#builtin-functions But no, it does not make a difference on your code example because that information is lost before the problematic pass. AFAIK, there have been no improvements on this since this bug was filed. |
We did make the cost modelling a bit more conservative to avoid some additional regressions in LLVM 10. But nothing that would affect the binary search related cases. We probably should just make this fully conservative and assume worst-case probabilities for everything, until such a time as accurate profiling information is available this deep in the backend. |
mentioned in issue llvm/llvm-bugzilla-archive#40646 |
mentioned in issue llvm/llvm-bugzilla-archive#44539 |
2 similar comments
mentioned in issue llvm/llvm-bugzilla-archive#44539 |
mentioned in issue llvm/llvm-bugzilla-archive#44539 |
After https://reviews.llvm.org/D118118 this flag / metadata is propagated to the X86 backend and with builtin __builtin_unpredictable or attached unpredictable metadata on selects you can force LLVM to codegen CMovs. |
It's nice to have the opt-out, but we should probably still make this optimize well by default. We now have the SelectOptimize pass that is based on branch weights and sensible cost modeling. We should probably schedule it for X86 and delete the cmov pass. |
Right, optimizing this correctly by default would ve great but this can be very trickly as history of gcc/clang shows SelectOptimize may catch this but may regress other cases. Not sure if it works without PGO profile atleast as good as CMov pass. |
Extended Description
Originally reported at: rust-lang/rust#53823
The cmov conversion pass converts a cmov in Rust's binary search implementation into a branch. As binary search branches are badly predicted, this significantly hurts performance. For small data sets, which are not dominated by cache performance, the slowdown is around 2-3x.
https://gist.github.com/nikic/13e907d7097f74a92d082fcc61bc212c has the LLVM IR for the function, as well as generated assembly with cmov conversion enabled (default) and disabled (-x86-cmov-converter=false).
It would be great if the cmov conversion heuristics could be adjusted in some way to avoid converting this case.
The text was updated successfully, but these errors were encountered: