Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

incr.comp.: Improve caching efficiency by handling spans in a more robust way #47389

Open
michaelwoerister opened this issue Jan 12, 2018 · 23 comments
Labels
A-incr-comp Area: Incremental compilation C-enhancement Category: An issue proposing an enhancement or a PR with one. C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@michaelwoerister
Copy link
Member

The Problem

Source location information (which, for simplicity, I'll just call "spans" from here on out) are the bane of incremental compilation's existence. Why is that? Unlike most other kinds of frequent changes done to source code, changing spans has (seemingly) non-local effects.

As an example, let's first consider a "regular" change of a program, like turning a + in an expression into a -. This change means that the function containing the expression has to be re-analyzed and the object file it was instantiated in has to be re-compiled. So far, so expected. Now, in contrast, consider a change that affects the span of a function, like adding a line with a comment to it. At first glance, it looks like we haven't really changed anything -- we just added a comment after all -- but that's not true. Spans are part of the HIR, the MIR, ScopeTrees, and (via debuginfo and panic messages) even LLVM IR and object files. So, adding a comment to a function will legitimately cause the function and its containing object file to be re-compiled. That's a bit unexpected and sad, but how is it "non-local"?

Remember that we added a new line with a comment to a function, thus changing the span of the function. What I didn't explicitly mention was that by adding this line, we shifted down everything following that line in the same source file, thus changing not only one function but potentially dozens of functions and type definitions. That's what I described as "non-local" effects (or rather "seemingly" non-local because shifting everything by a line is a legitimate, real change to everything that has been shifted, it's just easy to overlook).

"That's horrific!", you say, "We have to do something about it!" Well, I'm glad you think so too.

What can we do?

As stated above, the changes and the invalidation throughout the incr.comp. cache that they cause are legitimate. They are not false positives, since changing the source location of a function really changes (for example) the MIR of that function. So we cannot just be smarter about tracking spans or ignore them altogether. However, what we can do is refactoring the representation of HIR, MIR, etc, so that they don't actually contain spans anymore. The span information has to be somewhere, and we still have to be able to map various things in HIR, MIR, etc to spans, but spans can be split out into separate tables. As a consequence, HIR, MIR, and ScopeTrees will be represented in a way that is impervious to changes that don't affect their structure.

One way to achieve this (and the only way I know) is to introduce the concept of "abstract spans". An abstract span does not directly contain a source location, but it identifies a source location uniquely. For example, if we store all spans in a side table then the abstract span would be the key to this table. For this to bring any improvement over the current situation, an abstract span must be stable across changes that don't affect the structure of the thing containing it. E.g. shifting down a function by a line can change the thing the abstract span points to, but the value of the abstract span itself must not change. (This is simple to achieve by using a scheme that is similar to what we already do for HirId. Implementing it without increasing memory requirements is harder).

Implementation Strategies

There are a few prerequisites for the implementation:

  • Span information must be tracked by a different DepNode than Hir, HirBody, OptimizedMir, etc, which implies that it must not be directly accessible from any of the data covered by these DepNodes.
  • Span information must still be tracked, but in contrast to the current situation, we only want to depend on it when the information is actually used.
  • Abstract spans must be stable.

These goals can be achieved by:

  • Splitting out span information during HIR lowering and storing it separately. I imagine having one table per HirId::owner that then corresponds to one DepNode, making spans be tracked at the same granularity as HIR items.
  • Replacing Span fields with abstract spans in HIR, MIR, etc. This will mean quite a bit of refactoring everywhere these spans are used (as opposed to just being copied around)

Alternatively, this could also be achieved by:

  • Making the CodeMap inaccessible from queries and generating a map from Span value to DepNode during HIR lowering, thus effectively making the existing Span type abstract.
  • Providing a query that allows to decode a Span to its contents.
  • Making sure that none of the error reporting APIs take Spans directly.

I lean a bit towards alternative (1) but it's hard to gauge which one will lead to cleaner, more robust code in the end. Solution (1) would have a risk of false positives (too much invalidation), while solution (2) has the risk of false negatives (changes not detected) because existing APIs present tracking holes. Not detecting changes seems like the worse problem.

Regardless of the implementation, we will have to store additional tables in crate metadata that allow mapping from abstract spans to regular spans for upstream crates.

Abstract Span Representation

Ideally, an abstract span would not take up more space than one u32, which is how much space a Span takes up. One way to achieve this would be by making abstract spans be struct SpanId(ast::NodeId). Mapping from SpanId to Span would then involve mapping from NodeId to HirId, taking the HirId::owner to identify the correct side table and DepNode, and then the HirId::local_id as key into the side table. However, this only works for the current crate. In order for this to work across crates, we would either have to make SpanId also contain a CrateNum (thus doubling its size to 8 bytes), or implement a NodeId remapping scheme, similar to what we do for imported FileMaps and formerly already had for AST "inlining". With the latter in place we might be able to remove the HirId from some of the HIR structs again, which would help amortize its implementation effort.

NodeId-based abstract spans have the restriction of only being able to represent things that have a NodeId. However, that should be easily solved by assigning NodeIds to things that at the moment have a Span but no NodeId.

NodeId-based abstract spans have the advantage that HIR structs would not have to store a separate span field. The SpanId could be generated from the already available NodeId.

Abstract spans could be implemented completely separately from NodeId and HirId but there's probably little advantage to doing so while quite a bit of new infrastructure would have to be put into place.

Guarding against Regressions

After putting so much effort into using abstract spans, we'll want to avoid that vanilla Span values make their way into query results again. Luckily this should be easily achievable by adding an assertion to the HashStable implementation for Span that makes sure we don't encounter unexpected invocations.

Abstract Spans Level 2

The first goal would be to use abstract spans in everything up until and including MIR. An even more ambitious goal would be to also use abstract spans in cached LLVM IR and/or object files. That might allow us to skip re-optimizing code and just patch up source locations (if it's really just spans that have changed -- detecting that is another challenge).

Call for feedback

Since this will probably result in quite a few changes, I'd like to get some feedback before jumping into an implementation. Here are some guiding questions:

  • Did I explain the problem properly?
  • Do you know an alternative to span abstraction for solving the problem?
  • Which of the two implementation approaches would you choose?
  • Is there a better way of implementing abstract spans?

Any kind of feedback is welcome!

cc @rust-lang/compiler

@michaelwoerister michaelwoerister added the A-incr-comp Area: Incremental compilation label Jan 12, 2018
@estebank
Copy link
Contributor

Would it make sense to introduce the concept of scoped spans? Given

fn foo() {
    let y = "";
}
fn main() {
    let x = 4;
}

x would have a span that belongs within the main span, storing only offsets from the parent. This way, changes outside of a scope, or moving scopes around would require the inner spans to be modified. Moving foo after main would only require recalculating the spans for the methods, everything inside of them remains the same, similarly to how NodeId/HirIds work now.

@XAMPPRocky XAMPPRocky added C-enhancement Category: An issue proposing an enhancement or a PR with one. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Apr 23, 2018
@eddyb
Copy link
Member

eddyb commented Jul 13, 2018

@estebank IMO the first step would be to make spans file-local (including treating macro expansions as "files"), which would already get rid of a lot of the span-related churn.

For intra-file deltas, I'd rather adjust all the spans when decoding them (taking into account whether some things were accessing the line/column information from spans), but this scheme only works really good if you do get a delta from e.g. the IDE that is using the compiler.

@michaelwoerister
Copy link
Member Author

Note though that file-local spans are not enough to make a difference here. In fact, we are already hashing spans as (filename, line, col), so as far as incr. comp. is concerned they already are file-local. @estebank's version, on the other hand, would solve the problem.

@eddyb
Copy link
Member

eddyb commented Jul 13, 2018

@michaelwoerister I have my own agenda here, which is to drive incremental recompilation from incremental reparsing (with a GLL parser, hopefully, not a hand-written one), instead of HIR.
So there would be no "parent item" to scope things to, just an affected region (the delta), and everything outside of that can stay "unchanged", aside from debuginfo.

@michaelwoerister
Copy link
Member Author

I don't quite see how that would help here. Debuginfo (and the various datastructures that thread span info to it) is what this issue is about.

@eddyb
Copy link
Member

eddyb commented Jul 13, 2018

@michaelwoerister I might be getting confused. Is the proposal to bypass most of the transformations that would depend on the spans in question, by specifically keeping the Span representation the same across them, and only redo the e.g. codegen that actually requests the information pertaining to spans?
If so, why not add a query for getting filename/line/column (i.e. "source") information from spans?
When not using that query directly, Spans don't matter, and we can do whatever translation we need to, the same way we'd translate the DefId for a DefId-relative span.
This doesn't work as well with hashing, which would prefer a "persistent collection" scheme where we "allocate" new ranges instead of shifting the existing ones, but it seems doable that we could hash spans which are outside the known input delta, the same way they were before the delta.

EDIT: if we can provide enough operations to not require observing lo and hi values, we should probably go for the "persistent" route, where we have some stitched together ranges and some way to map them to the current input.

@michaelwoerister
Copy link
Member Author

Is the proposal to bypass most of the transformations that would depend on the spans in question, by specifically keeping the Span representation the same across them, and only redo the e.g. codegen that actually requests the information pertaining to spans?

Yes.

If so, why not add a query for getting filename/line/column (i.e. "source") information from spans?

This is basically what the approach about proposes, only that the above renames Span to SpanId to make it clearer that it's opaque and can only act as a lookup key. The caveat is that we would really have to make the span information inaccessible except for via the query.

@eddyb
Copy link
Member

eddyb commented Jul 13, 2018

Okay I found the approach I prefer in the original post:

Alternatively, this could also be achieved by:

  • Making the CodeMap inaccessible from queries and generating a map from Span value to DepNode during HIR lowering, thus effectively making the existing Span type abstract.
  • Providing a query that allows to decode a Span to its contents.
  • Making sure that none of the error reporting APIs take Spans directly.

However, I think we can do this with the current Span, without a DepNode hacking around it, just hiding the ability to interact with Spans behind some object (CodeMap?) that from some point in the compilation is owned by the query engine.

@michaelwoerister
Copy link
Member Author

However, I think we can do this with the current Span, without a DepNode hacking around it

How would you detect changed spans with a dependency edge to somewhere?

@eddyb
Copy link
Member

eddyb commented Jul 13, 2018

@michaelwoerister I'd expect all of the observable Span properties to be behind queries, but I may be missing some low-level detail of how red-green query incremental integration works.

@michaelwoerister
Copy link
Member Author

The span information is part of the input to the compilation process, so we need to always hash it (in order to detect changes) and then the query unpacking the span must add an edge to the DepNode representing that input. Otherwise, if you check if you need to re-execute the unpack_span query, you wouldn't know how to find out.

@eddyb
Copy link
Member

eddyb commented Jul 13, 2018

Don't we have queries that only take Ty as input, how do those work?

@michaelwoerister
Copy link
Member Author

What do you mean exactly?

@eddyb
Copy link
Member

eddyb commented Jul 13, 2018

Nevermind, I think I see, DepNodes are an internal representation of the fact that the query extracts "hidden information" from the ambient context, as opposed to being fully determinstic on the input query key. I guess we could get better at this over time, but currently it's how it works.

michaelwoerister added a commit to michaelwoerister/rust that referenced this issue Nov 27, 2018
bors added a commit that referenced this issue Nov 27, 2018
[do not merge] Benchmark ignoring span hashes during incr. comp. for getting a lower bound for #47389

I figured that ignoring changes to span values might give a useful lower bound on compile times with the optimizations described in #47389 applied. Keep in mind that any improvements shown with this PR could only be achieved if we could update object files in place, which would be the "stretch goal" for #47389.
michaelwoerister added a commit to michaelwoerister/rust that referenced this issue Nov 28, 2018
@michaelwoerister
Copy link
Member Author

In #56287 I did a preliminary benchmark that shows what kind of improvements can be expected at best from the optimization/refactoring described here:
https://perf.rust-lang.org/compare.html?start=b68fc18c45350e1cdcd83cecf0f12e294e55af56&end=54d813038fd30724af29a1c438cda175a67ee4dc

How to read the results: Since the benchmark was done by completely ignoring spans during incr. comp. hashing, the numbers show the best case scenario (and then some). This best case scenario could only be achieved by the Abstract Spans Level 2 optimization described above, i.e. finding a way to patch LLVM IR and machine code in place. Which is probably hard to do in a robust and efficient way.

The patched incremental cases of the *-check benchmarks, however, should give an indication of what implementing only the first part of the optimization could achieve (since these benchmarks don't do any of the LLVM passes). Here we see rather big improvements in some cases:

style-servo          -34.1%
tokio-webpush-simple -24.7%
clap-rs              -24.4%
regex                -17.7%
sentry-cli           -12.3%

As always with incr. comp., improvements depend entirely on the actual changeset, so take with a grain of salt. But, in my opinion, the numbers still show that there's quite a bit of potential in an optimization like this.

Also note: The baseline incremental benchmarks should be completely ignored. They just show the improvement of not hashing spans. Yet spans would still need to be hashed. The work would just move from one query to another.

@johnbcoughlin
Copy link

This is a great, well-written summary issue. Is this still considered the right path forward? I'm interested in taking a crack at it.

@michaelwoerister
Copy link
Member Author

Thanks for your interest, @johnbcoughlin. I think there is consensus that absolute source location information should be moved out of as many internal data structures as possible. But it's not clear how to exactly go about it. I know @eddyb has ideas and @matklad already goes about the problem in a smarter way in rust-analyzer. I personally don't have the bandwidth to drive the needed design effort at the moment, unfortunately.

@est31
Copy link
Member

est31 commented Apr 27, 2020

@michaelwoerister could you explain this smarter way that @matklad is going? How does rust-analyzer approach it?

@rpjohnst
Copy link
Contributor

@est31 rust-analyzer uses the same approach as C#'s Roslyn compiler- syntax tree nodes do not store positions (absolute or otherwise), but only their size. Positions are only reconstructed on-demand as part of walking the tree, by accumulating the sizes of previous nodes. This makes the actual data stored in the nodes position-invariant.

@est31
Copy link
Member

est31 commented Apr 27, 2020

Oh wow that's an incredibly smart solution. Thanks!

@est31
Copy link
Member

est31 commented May 1, 2020

Hmmm now I'm not so sure any more. Because this only works for spans embedded into the AST and while you are running a visitor on that AST that keeps record on the current position whether there is an error or not (you can't start keeping the record when you know you want yield the error, you need it from the very beginning, as you can't re-iterate over the AST without a way to reidentify the span). It also doesn't work when you store stuff in tables, e.g. you want to print "Defined here" things because those spans are embedded in tables instead of the AST and you get the spans by lookups instead of AST walks. Rust's elaborate errors contain a lot of those "third party" spans.

Maybe some ID system could help, where each span contains an unique id that AST walking code can recognize when encountering, but it can't be purely content based (say hash of content) because otherwise duplicating the same code say in different functions of the same file would yield the same spans. This happens more often than you think, after all how often do you have the same identifier occur again and again?

@cjgillot
Copy link
Contributor

I tried implementing both approaches in two draft PRs.

#72015 attempts to move the Span information out of the HIR tree, and access it by HirId through a dedicated query.
Since there are so many spans, everywhere, this implies
a modification of the entire code base, just to access spans differently,
with very few benefits until the very last Span is removed from the HIR.
The presence of spans inside Idents is an additional drag:
some HIR nodes have two spans (one for the node and one the Ident).
As a consequence, I doubt this is worth putting more work to it.

#84373 attempts to make Spans local to the enclosing HIR item.
This is done by marking each Span with its parent's LocalDefId.
At hashing/encoding/decoding time, marked spans are encoded relative to their parent's span's beginning.
A translation layer is added between the so-marked Span and the source code information.
This translation wraps the SourceMap and, for each manipulation of a span,
asserts the dependency to the parent's actual absolute span
using a new source_span(LocalDefId) query.
This solution is indeed more brittle, since span information is used in a large
variety of ways by error reporting code, and enumerating all of them is not obvious.

@Mark-Simulacrum
Copy link
Member

@cjgillot Given #84373 (and its stabilization in #115656) is there more that we expect to do here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-incr-comp Area: Incremental compilation C-enhancement Category: An issue proposing an enhancement or a PR with one. C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

10 participants