-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
don't short-circuit chained comparisons #16088
Comments
I wasn't even aware that these short-circuit, but I see that it is documented in the manual. (I guess it comes from Python?) Somehow I doubt that many programs rely on this, but I don't know how to go about doing a survey. |
The manual says "However, the order of evaluations in a chained comparison is undefined". This makes the short-circuit behavior useless anyway, so you might as well get rid of it. Any code that relies on the short-circuit behavior is probably broken anyway, thanks to the undefined order. |
I'm not sure about the efficiency trade-off. Assuming short-circuit (lazy) semantics, several cases come to mind for
For the last case, there is also a flow-analysis loss for lazy evaluation: with evaluation of Maybe we could add a switch to control lazy versus eager evaluation of chained comparisons and see what the impact is on current benchmarks of interest? |
With respect to "order of evaluations in a chained comparison is undefined", I'd be in favor nailing down the order, regardless of whether we decide on eager or lazy evaluation, because having the order differ in different implementations can cause a lot of pain. The optimization advantages of undefined order, useful on ye olde PDP 11 seem to have evaporated with modern compilers. Even the C++ committee is seriously considering nailing down the order. |
@ArchRobison: if LLVM was better at doing this, I would agree with your analysis, but it seems to either never do this or be very bad at it. Perhaps that's effectively agreement since LLVM could be improved to make this argument go through. Similarly for the second subcase. I wholeheartedly agree about making the order of evaluation defined. |
I would think that a lot of chained comparisons in practice are safe to evaluate unnecessarily, e.g. In the case where one of the operands is very expensive to evaluate unnecessarily, one can always write |
I just did a quick grep through an archive of all registered Julia packages for chained Of course, I could have missed some; this is hard to grep for, and there were a lot of false positives to sift through. But I think I at least got a representative sampling. |
Thanks for looking. Array lookups can be surprisingly expensive, depending on whether they hit L1 cache or not, and branches can be surprisingly cheap, depending on whether the branch predictor bets right. (Modern CPUs have become a casino.) I tried the
For 64-bit x86, clang -O3 and icc -O3 leave the first as lazy and the second as eager. gcc 5.2 changed both to the lazy form, suggesting that the gcc developers bet that lazy is faster than eager on 64-bit x86. Story on other processors may differ. I think timing benchmarks of interest is the data we need to make a decision based on performance considerations, and performance seems to one of the motivations for this issue. |
I looked at this back when I was introducing the array indexing infrastructure. At the time, it was surprisingly cheaper to have the short-circuiting semantics (#10525 (comment)) — those branches are extremely predictable. In this specific case, it'd be interesting to try |
A quick benchmark: function foo(a)
s = 0
for i in 2:length(a)
s += a[i-1] < i < a[i]
end
end
function bar(a)
s = 0
for i in 2:length(a)
s += (a[i-1] < i) & (i < a[i])
end
end
a = rand(Int, 10^4)
@elapsed(foo(a)) / @elapsed(bar(a)) gives a 5x slowdown from short-circuiting. This seems like a pretty huge penalty, especially considering that the comparisons per se are only part of the loop body. Surprisingly, if I use |
Nice example. One improvement: make |
The slowdown is surely a branch miss-prediction issue, since using
|
A thought that passed my mind is that the quick benchmark has the chain that is not used to direct control flow. It might be useful to understand how often that happens compared to the control-flow case. I personally love the chained form for assertions, but in that scenario the branches are highly predictable unless I'm having a bad day. |
Triage: resolved that this doesn't matter much and changing it now would be pointless churn. There are corner cases where the current behavior could be more efficient. In cases where it's less efficient, future optimization work can address that gap, whereas laziness is semantically significant and therefore cannot be avoided. |
It's harder to lower, more surprising and usually less efficient. We should just lower
a < b < c
to(a < b) & (b < c)
.The text was updated successfully, but these errors were encountered: