Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

change how MIR inlining handles cycles #43542

Closed
nikomatsakis opened this issue Jul 29, 2017 · 11 comments
Closed

change how MIR inlining handles cycles #43542

nikomatsakis opened this issue Jul 29, 2017 · 11 comments
Labels
A-incr-comp Area: Incremental compilation A-MIR Area: Mid-level IR (MIR) - https://blog.rust-lang.org/2016/04/19/MIR.html A-mir-opt Area: MIR optimizations A-mir-opt-inlining Area: MIR inlining T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@nikomatsakis
Copy link
Contributor

When doing the "query-ification" of MIR, we handled MIR inlining by doing a "try-get", meaning that we attempt to inline all callees but ignore if a cycle arises. This is OK but leads to nondeterministic results where the order of queries in a SCC can affect what gets optimized into what. On #42633, I wrote:

MIR inlining uses try_get to handle cycles in the call graph. This was a clever idea but I think ultimately not the right way -- it makes things nondeterministic. Imagine that you have foo() which calls bar() which calls foo(). If we start by asking for the optimized MIR for foo, it can inline bar, but bar will not inline foo. However, if we start by asking for the optimized MIR for bar, you get the opposite result. I think you should get the same result whichever optimized MIR you ask for first.

The workaround that I think we should be doing is to have a separate query (let's call it MIR-call-graph or something) and then having the inlining passes all request this call-graph. The call-graph would just walk the MIR for everything. This isn't great for responsiveness, but this call-graph would only be used when we are doing inlining and MIR optimization, so that's probably not an IDE use case, nor is it a prime incremental use-case. (You can make incremental do better by projecting out the SCC for particular def-ids, and then red-green can observe that they have not changed.)

We could probably do better here but I think for now this would be fine, and it's not clear that we need to do better ever.

And @michaelwoerister later wrote:

As a side-note regarding inlining: With #43183, the trans-collector builds up a fairly accurate, global "call graph" between translation items. This could be useful for inlining too. It is post-monomorphization though, which is good for the quality of information it can provide, but which also means that it is a bit late in the pipeline.

This issue is intending to track the proposed refactoring of the MIR inlining pass to not use try-get.

@nikomatsakis nikomatsakis added A-incr-comp Area: Incremental compilation T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Jul 29, 2017
@nikomatsakis
Copy link
Contributor Author

cc @matthewhammer, who may take this on

@eddyb
Copy link
Member

eddyb commented Jul 29, 2017

What I came up with previously was not caching the result of queries which observed a cycle via try_get, although I'm not sure how many recomputations that would result in or if it's correct.

@Mark-Simulacrum Mark-Simulacrum added the C-tracking-issue Category: A tracking issue for an RFC or an unstable feature. label Aug 3, 2017
@matthewhammer
Copy link

matthewhammer commented Aug 25, 2017

Above, @nikomatsakis outlines two problems (the way that I view it now, at least)

  1. Inlining decisions are determined by incidental query-order details
  2. Inlining decisions are, under the hood, using the query framework's cycle detection mechanism

I've talked to @nikomatsakis a couple of times about this since he created this issue. In my view, the first problem and the second problem can be decoupled, where we solve the first problem without necessarily eschewing the query framework's cycle detection mechanism. (Either way would probably work, I think, and I don't have strong feelings about the choice).

In particular, I recently proposed a solution to the first problem that would require knowing some SCC information, and would use that to determine inline decisions at callsites, in place of (solely) using the cycle detection mechanism.

Restating the problem

I'd rephrase the first problem above as follows:

The current inlining logic is "asymmetric", in that as it makes inlining decisions, it favors one root for each SCC, rather than favoring each SCC root equally.

My proposal for inlining assumes that we want to specialize each SCC of "inline" nodes for each root of this SCC. This may or may not be what we actually want, since it will lead to lots of code specialization (maybe good, maybe not).

Tiny example

To illustrate the idea, suppose we have an SCC with two inline functions, f and g, which mutually call each other, are each roots of the SCC. Specifically, they are SCC roots in the sense that f and g are each called by non-inline functions that are outside of the SCC (f and g do not call these inter-SCC functions, directly or transitively).

The current approach has the issue that either f or g will be processed first, making a single choice of "root" for the SCC, which will not reflect the fact that f and g are both roots of the SCC. One may call this "non-deterministic" (as @nikomatsakis does above); I think term makes sense.

I'd also offer the term "asymmetric", in the sense that every root of the SCC should get its own "view" of the SCC, rather than choosing a single root (as determined by the larger compilation context, with its various queries and their dependencies).

Specialize MIR for each SCC root

My proposal addresses this asymmetry issue by assuming that we want to specialize each SCC of "inline" nodes for each root of this SCC.

In our running example, we'd generate two versions of the single SCC involving f and g, where the versions differ in that one usesf as the root, and the other uses g as the root. In the version where f is the root, the calls from g to f would not be inlined, but the calls from f to g would be inlined. In the version where g is the root, the situation is reversed.

And in general, the per-SCC-root approach is as follows: For each SCC, for each root, we generate one inlined MIR; so during inlining any MIR, have a root function that gives the call edges of the SCC a "direction" (e.g., via a topological sort from this root), allowing us to distinguish the “backward” call edges which are not inlined from the (inlined) “forward” call edges.

In general, for an SCC of n nodes marked as inline, with m roots, we'd generate nm nodes of output from the inlining these nodes.

Include SCC root among query key

My current thinking is that the inlining approach above (the "per-SCC-root approach") can be implemented in two different ways. In both ways, we key the queries for inlined output by at least two DefIDs:

  • the “source” MIR being transformed (e.g., containing callsites to be inlined) and
  • the “root” MIR that is marked as “inline”, if any (could be None).

That is to say, we compute inlined output in reference to a "root" inline function of the inlined code. This additional root parameter addresses the asymmetry issue, as explained above in the example. In sum, for each root we get a different view of the SCC.

Given this definition for the (rooted) queries, I see two implementation strategies:

  1. Use the SCC graph to determine the SCCs, their roots, and how to inline each call. No need to use cycle detection any longer to make decisions; all decisions stem from the SCC graph.
  2. Do not use or compute an SCC graph; The additional "root" DefID of each query addresses the asymmetry issue, so it is okay to use cycle detection (not an SCC graph) to avoid inlining cyclic calls.

@matthewhammer
Copy link

My main questions for @nikomatsakis and others:

  • Does this per-SCC-root approach make sense?
  • If so, which implementation approach makes the most sense in the near term?

@matthewhammer
Copy link

matthewhammer commented Aug 25, 2017

Example from above, as a figure:

image

@hanna-kruppe
Copy link
Contributor

This is an interesting strategy. It certainly avoids the non-determinism that is the main problem. I don't know if it's a good optimization strategy, though. If inlining potentially duplicates the functions in an SCC once per "root", then that should probably be reflected in the cost metric for inlining. However, I don't see how this can be done in the "go depth-first, use cycle recovery, but include the root in the query key" implementation (since the SCC is never explicitly computed).

Furthermore, even if the SCC is available, it seems somewhat tricky to predict what inlining decisions other, separate runs of the inlining pass (starting from a different root) will perform. (This information would presumably be needed for the cost metric.) Of course, each query could run the same analyses that those other invocations would already perform, but that would duplicate a ton of work. Perhaps this is just my lack of imagination speaking, but I fear that going through SCC multiple times, from different starting points, will just make it harder to make good inlining decisions.

(Besides, if the SCC is available, there's no particular need to innovate on this. There are other strategies for dealing with SCCs that are more battle-tested.)

bors added a commit that referenced this issue Aug 26, 2017
rustc: Start moving toward "try_get is a bug" for incremental

This PR is an effort to burn down some of the work items on #42633. The basic change here was to leave the `try_get` function exposed but have it return a `DiagnosticBuilder` instead of a `CycleError`. This means that it should be a compiler bug to *not* handle the error as dropping a diagnostic should result in a complier panic.

After that change it was then necessary to update the compiler's callsites of `try_get` to handle the error coming out. These were handled as:

* The `sized_constraint` and `needs_drop_raw` checks take the diagnostic and defer it as a compiler bug. This was a new piece of functionality added to the error handling infrastructure, and the idea is that for both these checks a "real" compiler error should be emitted elsewhere, so it's only a bug if we don't actually emit the complier error elsewhere.
* MIR inlining was updated to just ignore the diagnostic. This is being tracked by #43542 which sounded like it either already had some work underway or was planning to change regardless.
* The final case, `item_path`, is still sort of up for debate. At the time of this writing this PR simply removes the invocations of `try_get` there, assuming that the query will always succeed. This turns out to be true for the test suite anyway! It sounds like, though, that this logic was intended to assist in "weird" situations like `RUST_LOG` where debug implementations can trigger at any time. This PR would therefore, however, break those implementations.

I'm unfortunately sort of out of ideas on how to handle `item_path`, but other thoughts would be welcome!

Closes #42633
@nikomatsakis
Copy link
Contributor Author

@rkruppe

However, I don't see how this can be done in the "go depth-first, use cycle recovery, but include the root in the query key" implementation (since the SCC is never explicitly computed).

The whole goal is to move away from that strategy -- it is nondeterministic, and cycle recovery is problematic anyhow (we would like the invariant that "cycle detected => compile fails eventually"). My current plan is to do the following:

  • compute the SCC as a query, starting from the "pre-inlining" MIR (this will wind up forcing the MIR for every function in the crate, unfortunately, but I don't see much of an alternative -- we could possibly have queries for computing the "SCC as seen from X", but it'd be complicated to share results then);
  • when doing inlining, we check that the callee is not part of our SCC, and inline them only in that case (this gives a tree structure).

I think that @matthewhammer's scheme fits into this by adding, essentially, another layering of inlining that happens afterwards. That is, what I described above yields a result that is effectively the sort of "bottom-up" inlining that we are trying to do now. If MIR0 is the pre-inlining representation (from which the SCC is computed), then we have now built MIR1. We can then imagine inlining within an SCC -- that would be inlining MIR1 into MIR1, to yield MIR2.

Furthermore, even if the SCC is available, it seems somewhat tricky to predict what inlining decisions other, separate runs of the inlining pass (starting from a different root) will perform.

This, however, may well be true!

@hanna-kruppe
Copy link
Contributor

hanna-kruppe commented Sep 1, 2017

@nikomatsakis re: cycle recovery, I was referring to this part (one of two possible implementation strategies) of @matthewhammer's post:

Do not use or compute an SCC graph; The additional "root" DefID of each query addresses the asymmetry issue, so it is okay to use cycle detection (not an SCC graph) to avoid inlining cyclic calls.

This would be deterministic, but would still use cycle recovery (in a bening way). There might be other reasons to remove cycle recovery entirely, then this implementation strategy is undesirable anyway. I was just saying, even if this implementation strategy remains viable, it may not give very good results.

when doing inlining, we check that the callee is not part of our SCC, and inline them only in that case (this gives a tree structure).

Aside to be clear: it is absolutely possible (and profitable) to inline within an SCC. LLVM does this sometimes. It may not be as easy (or at all possible) to achieve with the "query optimized MIR for other functions" model, and it may not be necessary or very useful for MIR optimizations, but in theory it's possible.

@jonas-schievink jonas-schievink added the A-MIR Area: Mid-level IR (MIR) - https://blog.rust-lang.org/2016/04/19/MIR.html label Aug 19, 2019
@jonas-schievink jonas-schievink added the A-mir-opt Area: MIR optimizations label Mar 29, 2020
@wesleywiser wesleywiser added the A-mir-opt-inlining Area: MIR inlining label Apr 22, 2022
@pnkfelix
Copy link
Member

Note that PR #68828 resolved the issue of inlining cycles in a more "pessimistic" way, as noted to me by @wesleywiser

@pnkfelix
Copy link
Member

I do not think this is a tracking issue. It does track a potential work item, but I'm not convinced its broken down into subparts that warrant the C-tracking-issue label.

@rustbot label: -C-tracking-issue

@rustbot rustbot removed the C-tracking-issue Category: A tracking issue for an RFC or an unstable feature. label Apr 22, 2022
@pnkfelix
Copy link
Member

After discussing with @wesleywiser , we're not sure there's value in leaving this open.

If someone sees this and realizes its not done and wants to pick it up and do it, great. But its mostly just noise in our issue repository the way things stand now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-incr-comp Area: Incremental compilation A-MIR Area: Mid-level IR (MIR) - https://blog.rust-lang.org/2016/04/19/MIR.html A-mir-opt Area: MIR optimizations A-mir-opt-inlining Area: MIR inlining T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

9 participants