Improve optimization of harmless call cycles #38231

Keno · 2020-10-29T21:48:50Z

When inference detects a call cycle, one of two things could happen:

It determines that in order for inference to converge it needs
to limit the signatures of some methods to something more general, or
The cycle is determined to be harmless at the inference level (because
though there is a cycle in the CFG there is no dependency cycle of
type information).

In the first case, we simply disable optimizations, which is sensible,
because we're likely to have to recompute some information anyway,
when we actually get there dynamically.

In the second case however, we do do optimizations, but it's a bit
of an unusual case. In particular, inference usually delivers
methods to inlining in postorder (meaning callees get optimized
before their callers) such that a caller can always inline a callee.

However, if there is a cycle, there is of course no unique postorder
of functions, since by definition we're looking a locally strongly
connected component. In this case, we would just essentially pick
an arbitrary order (and moreover the order may differ depending on
the point at which we enter the cycle and subsequently cached,
leading to potential performance instabilities, depending on function order).
However, the arbitrary order is quite possibly
suboptimal. For example in #36414, we have a cycle
randn -> _randn -> randn_unlikely -> randn. In this cycle the author
of this code expected both randn and _randn to inline and
annotated the functions as such. However, in 1.5+ the order we happed
to pick would have inlined randn_unlikely into _randn (had it not
been marked noinline), with a hard call break between randn and
_randn, which is problematic from a performance standpoint.

This PR aims to address this by introducing a heuristic: If some
functions in a cycle are marked as @noinline, we want to make
sure to optimize these last (since they won't ever be inlined anyway).
To make this guarantee, while also restoring postorder if this happens
to break the cycle, we perform a DFS traversal rooted at any @noinline
functions and then optimize the functions in the cycle in DFS-postorder.
Of course still may still not be a true postorder in the inlining
graph (if the @noinline functions don't break the cycle), but
even in that case, it should be no worse than the default order.

Fixes #36414
Closes #37234

When inference detects a call cycle, one of two things could happen: 1. It determines that in order for inference to converge it needs to limit the signatures of some methods to something more general, or 2. The cycle is determined to be harmless at the inference level (because though there is a cycle in the CFG there is no dependency cycle of type information). In the first case, we simply disable optimizations, which is sensible, because we're likely to have to recompute some information anyway, when we actually get there dynamically. In the second case however, we do do optimizations, but it's a bit of an unusual case. In particular, inference usually delivers methods to inlining in postorder (meaning callees get optimized before their callers) such that a caller can always inline a callee. However, if there is a cycle, there is of course no unique postorder of functions, since by definition we're looking a locally strongly connected component. In this case, we would just essentially pick an arbitrary order (and moreover, depending on the point at which we enter the cycle and subsequently cached, leading to potential performance instabilities, depending on function order). However, the arbitrary order is quite possibly suboptimal. For example in #36414, we have a cycle randn -> _randn -> randn_unlikely -> rand. In this cycle the author of this code expected both `randn` and `_randn` to inline and annotated the functions as such. However, in 1.5+ the order we happed to pick would have inlined randn_unlikely into _randn (had it not been marked noinline), with a hard call break between randn and _randn, whch is problematic from a performance standpoint. This PR aims to address this by introducing a heuristic: If some functions in a cycle are marked as `@noinline`, we want to make sure to infer these last (since they won't ever be inlined anyway). To ensure this happens, while restoring postorder if this happens to break the cycle, we perform a DFS traversal rooted at any `@noinline` functions and then optimize the functions in the cycle in DFS-postorder. Of course still may still not be a true postorder in the inlining graph (if the `@noinline` functions don't break the cycle), but even in that case, it should be no worse than the default order. Fixes #36414 Closes #37234

vtjnash · 2020-10-30T15:34:16Z

This seems somewhat complicated, and seems to be trying to work around the design intent of this code, without actually updating it. This function was intended to produce stable results by batching the <inference>, <optimization>, and <publishing> steps (thereby preserving them as unordered). While not critical (since code_typed will always opt to lie to you about this anyways), it seemed preferable for a predictable run time. This was known to be suboptimal (particular in the case of rand, which I've come across many times, since it has poor inference performance due to this code structure), but we didn't have any DFS traversal algorithms at the time either to pick a different strategy.

Keno · 2020-10-30T15:38:19Z

I think your comment got garbled. I don't really know what you're saying.

vtjnash · 2020-10-30T15:47:12Z

The structure of the _typeinf function is supposed to be preventing this inlining optimization for functions within a cycle. I think we should alter the structure of that function directly, rather than trying to work around it everywhere else.

Keno · 2020-10-30T15:50:21Z

I don't really know what you're saying. This is entirely within the _typeinf function and preserves exactly the batching that your updated comment requests.

vtjnash · 2020-10-30T15:58:44Z

Yeah, I don't want it to preserve batching there. If there's a better order, it should define and use it.

Keno · 2020-10-30T16:02:11Z

Alright, then I'm just confused at this point. Could you elaborate what you're proposing as an alternative?

Keno · 2020-11-05T05:14:39Z

@vtjnash and I discussed this on a call and the primary objection is to the new cycle caching scheme this introduced. The reason for that was that we currently batch inference frames for optimizations such that we can intersect their world ages. However, @vtjnash argued that such a batching was unnecessary and didn't make much sense. I'll be submitting a separate pull request to remove the batching, which will make the caching scheme in this PR unnecessary.

Keno · 2020-11-07T05:25:24Z

I've tried that, and not batching the world-age computation seems to cause about a 10-15% inference slowdown, presumably because it leads to unnecessary re-inference of some cycles. Given that, should we just go with this instead (which doesn't have the slowdown)?

StefanKarpinski · 2020-12-01T05:03:40Z

Bump: @vtjnash?

Keno requested review from vtjnash and JeffBezanson October 29, 2020 21:48

vtjnash mentioned this pull request Dec 10, 2020

inference: stop re-converging worlds after optimization #38820

Merged

rfourquet mentioned this pull request Jan 19, 2021

Rebase: Performance regression of scalar randn() between Julia 1.4 and 1.5 #39319

Merged

vtjnash mentioned this pull request Mar 5, 2021

Reorganize code to keep code as IRCode slightly longer #39885

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve optimization of harmless call cycles #38231

Improve optimization of harmless call cycles #38231

Keno commented Oct 29, 2020 •

edited

Loading

vtjnash commented Oct 30, 2020 •

edited

Loading

Keno commented Oct 30, 2020

vtjnash commented Oct 30, 2020

Keno commented Oct 30, 2020

vtjnash commented Oct 30, 2020

Keno commented Oct 30, 2020

Keno commented Nov 5, 2020

Keno commented Nov 7, 2020

StefanKarpinski commented Dec 1, 2020

Improve optimization of harmless call cycles #38231

Are you sure you want to change the base?

Improve optimization of harmless call cycles #38231

Conversation

Keno commented Oct 29, 2020 • edited Loading

vtjnash commented Oct 30, 2020 • edited Loading

Keno commented Oct 30, 2020

vtjnash commented Oct 30, 2020

Keno commented Oct 30, 2020

vtjnash commented Oct 30, 2020

Keno commented Oct 30, 2020

Keno commented Nov 5, 2020

Keno commented Nov 7, 2020

StefanKarpinski commented Dec 1, 2020

Keno commented Oct 29, 2020 •

edited

Loading

vtjnash commented Oct 30, 2020 •

edited

Loading