Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor lint handling to be more "eager" #42511

Closed
nikomatsakis opened this issue Jun 7, 2017 · 14 comments · Fixed by #43522
Closed

refactor lint handling to be more "eager" #42511

nikomatsakis opened this issue Jun 7, 2017 · 14 comments · Fixed by #43522
Labels
A-incr-comp Area: Incremental compilation A-lint Area: Lints (warnings about flaws in source code) such as unused_mut. C-cleanup Category: PRs that clean code up or issues documenting cleanup. E-hard Call for participation: Hard difficulty. Experience needed to fix: A lot. E-mentor Call for participation: This issue has a mentor. Use #t-compiler/help on Zulip for discussion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@nikomatsakis
Copy link
Contributor

Lint handling at present works like this:

  • During most of compilation, when we see something we might want to lint about, we insert this potential lint into a mutable table in the session. This comes along with a NodeId that identifies the AST/HIR node that the lint is attached to, along with the lint name, message, etc.
  • We then have a dedicated pass that walks over the HIR. At each point, it tracks which lints are "allowed", "denied", etc. At the same time, it extracts entries from this table and reports them. It also invokes "lint visitors" that may generate additional lints.
    • In particular, some lints are just a custom pass, but others are generated as a side-effect of other things (like type-checking). It just depends on when the information that is needed is most readily available.

This is not a very good setup. Among other things:

  • It requires us to generate a big table of "pending" lints, even in the case where a lint is #[allow]. Wasteful.
  • It means that lints all get displayed fairly late in compilation -- this limits their value, since often a lint is detected early which might be useful in solving a compilation error, but we can't report the lint.
  • It's not very incremental friendly because of the Big Mutable Table.

@eddyb proposed an alternative architecture that I think is better.

  • In HIR lowering, we would compute which lints are in scope at each point. We would store this information in the HIR in some way (some thoughts on this below).
  • When we detect (in some phase) a lintable scenario, we would check immediately whether the lint is "allow", "warn", etc, and -- if needed -- issue the appropriate diagnostic right then.
  • We can still run the custom lint code, it would simply query the HIR settings in the same way.

The best way to store this lint table is not 100% clear. Something sparse seems obviously like a good idea, since most nodes in the HIR do not have lint-related attributes on them (and hence just inherit the settings from their parent). I think maybe having each item store a link to the "current" settings for each lint makes sense; this can be easily shared between items. Then, within an item, if we need to compute the settings for some particular node, we would walk up the parents in the HIR map, looking for lint-related attributes and -- when we reach the item -- just using the settings stored on the item itself.

Alternatively, we could just cache nothing and compute everything this way, at least to start. If more caching is desired, maybe have passes keep a local cache that would then be populated within the things within a particular item.

I'm available to mentor this. It is a bit of a "design needed" sort of job, so I'm not sure what the immediate instructions would be. The first step is obviously familiarizing yourself with the current setup. After that, I would probably try to make a commit that adds a method to the HIR map that will compute the innermost applicable lint setting for a particular node (just by walking up the parents). We could then use this and remove all the existing lint code for tracking. That should work, but be slow (O(n^2) lookups). Presumably we can then adopt some sort of caching strategy.

@eddyb also mentioned that for bonus points, we should probably "decode" attributes during HIR lowering more generally, so that we are not walking the general attribute data structures, but rather something specific to the stuff the compiler cares about.

@nikomatsakis nikomatsakis added A-incr-comp Area: Incremental compilation E-mentor Call for participation: This issue has a mentor. Use #t-compiler/help on Zulip for discussion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Jun 7, 2017
@nikomatsakis nikomatsakis added E-hard Call for participation: Hard difficulty. Experience needed to fix: A lot. E-medium Call for participation: Medium difficulty. Experience needed to fix: Intermediate. and removed E-medium Call for participation: Medium difficulty. Experience needed to fix: Intermediate. labels Jun 7, 2017
@eddyb
Copy link
Member

eddyb commented Jun 7, 2017

Another problem with the existing deferred lints is I don't think there is a way to make the messages rich (complex diagnostics) while emitting the diagnostics on the spot would make that trivial.

@nikomatsakis
Copy link
Contributor Author

@eddyb I actually added support for rich lints, but yes, it was a pain, and it shouldn't be.

@oli-obk
Copy link
Contributor

oli-obk commented Jun 8, 2017

Cc @Manishearth , @llogiq , @mcarton

We have a few lints which use a custom visitor because they need additional info. With these lints we've had problems with the attributes not working on the actual item which is linted. The way I understood this new system, such a check could become way simpler.

@mjkillough
Copy link
Contributor

@nikomatsakis - I'd like to have a go at this if you're still willing to mentor it. I've started familiarising myself with the existing code and I am preparing to have a go at the "step one" you suggest.

@mjkillough
Copy link
Contributor

I've added a function to the HIR map that naively calculates a LintLevels at a particular node by walking up its parents. I'm now trying to figure out how best to re-write some of the lints to use it.

If I understand the code correctly, there are four phases where we try to lint about things:

  1. Calls to Session::add_lint() before HIR lowering and before lint::context::EarlyContext runs. These get added to the table of pending lints and are emitted by EarlyContext as it walks the AST. ([1] [2] [3] [4])
  2. The EarlyContext pass which walks the AST just before HIR lowering. It emits any pending lints from the table and also runs these checks and immediately emits any appropriate lints: UnusedParens, UnusedImportBraces, AnonymousParameters, IllegalFloatLiteralPattern, DeprecatedAttr.
  • (The AST is then lowered to HIR).
  1. Calls to Session::add_lint() after HIR lowering. (I think there are ~32 such calls). These get added to the table of pending lints and are emitted by lint::context::LateContext as it walks the HIR.
  2. The LateContext pass which walks the HIR. It emits any pending lints from the table and runs the rest of the lint passes, immediately emitting any appropriate lints.
  • (The doc-string of lint::context::raw_emit_lint() says that it is made public so that lints may be emitted during trans, after LateContext has already run. I can't see anywhere that's actually done though).

Which would mean the plan in the issue description expands to:

  • Leave (1) and (2) alone. Continue to build-up a small table of pending lints with calls to Session::add_lint(). These continue to get emitted by lint::context::EarlyContext. Continue to determine the level of each lint by parsing the attrs as we walk the AST in EarlyContext. (As both of these happen during librustc_driver::driver::phase_2_configure_and_expand, will it affect incremental compilation?)
  • During HIR lowering, calculate the level of each lint at each node.
  • For (3), calls to Session::add_lint() after HIR lowering use the calculated lint levels to emit lints immediately.
  • For (4), LateContext stops trying to keep track of the lint levels as it walks the HIR and just uses the information on the HIR nodes that we calculated during lowering.

Is that still what you had in mind @nikomatsakis?

@eddyb
Copy link
Member

eddyb commented Jun 25, 2017

@mjkillough That seems fine, expect that walking up the HIR every time can be expensive - or did you mean that HIR lowering records the information for some nodes (items?) and the HIR maps walks up for anything in between that and the node it was queried about?

@mjkillough
Copy link
Contributor

@eddyb - Yeah, once we calculate the levels at each node during HIR lowering, we'd just walk up from the current node to whichever node had the calculated lint levels. As I haven't touched the HIR lowering part yet, I'm currently doing the easier thing and walking all the way up to the crate each time.

I don't think it'll just be items that we store the calculated lint levels on. Based on what the visitors in lint::context do, I think we'd need to store it on:Item, ForeignItem, TraitItem, ImplItem, StructField, Variant, Local and Expr. (We'd therefore need to walk up from whichever node we want to lint about to the enclosing one of these, which we can do quite easily).

I also haven't put much thought into how we'd store the calculated lint levels on the HIR nodes yet - I'd planned to do that once I had something working.

@eddyb
Copy link
Member

eddyb commented Jun 25, 2017

@mjkillough I'd do only Item for now and have a hybrid solution for retrieving lint nodes for a specific node - mostly because lint annotations are somewhat rare.

@mjkillough
Copy link
Contributor

@eddyb - OK, that makes sense. I'll have a go at doing that. Thanks!

@Mark-Simulacrum
Copy link
Member

Related to this, I believe: #14407 and #14408.

@nikomatsakis
Copy link
Contributor Author

@mjkillough any progress on this issue? I'm excited to see you were poking at it!

@mjkillough
Copy link
Contributor

@nikomatsakis - Ah, sorry for going quiet on this! I did start making progress, but ran out of free time.

I managed to get a protoype working, doing roughly what I described in my previous comment. It wasn't doing any smart caching - just walking up the HIR for every node to calculate lint levels. (It was tricky to see whether my changes broke anything, as they caused around 100 failures in the compile-fail tests because a bunch of lints are now more eager).

My next steps were going to be to rejig some of the structures in lint::context, so it was easier to cache something on the HIR Item nodes. I didn't get very far with that before I ran out of time.

I should be able to start looking at this again from the middle of next week. If someone needs to start making progress with this in the meantime, just let me know and I can push what I have.

@mjkillough
Copy link
Contributor

@alexcrichton is keen to pick this up, so I've pushed what I had here and wrote my planned next-steps in the commit message: master...mjkillough:eager_linting_prototype

It may not make sense to base changes off that branch, as it is quite rough around the edges. It'll at least give you an indication of the direction I was heading in. :)

@Mark-Simulacrum Mark-Simulacrum added the C-cleanup Category: PRs that clean code up or issues documenting cleanup. label Jul 27, 2017
@alexcrichton
Copy link
Member

I've opened a PR for this at #43522

bors added a commit that referenced this issue Aug 10, 2017
rustc: Rearchitect lints to be emitted more eagerly

In preparation for incremental compilation this commit refactors the lint
handling infrastructure in the compiler to be more "eager" and overall more
incremental-friendly. Many passes of the compiler can emit lints at various
points but before this commit all lints were buffered in a table to be emitted
at the very end of compilation. This commit changes these lints to be emitted
immediately during compilation using pre-calculated lint level-related data
structures.

Linting today is split into two phases, one set of "early" lints run on the
`syntax::ast` and a "late" set of lints run on the HIR. This commit moves the
"early" lints to running as late as possible in compilation, just before HIR
lowering. This notably means that we're catching resolve-related lints just
before HIR lowering. The early linting remains a pass very similar to how it was
before, maintaining context of the current lint level as it walks the tree.

Post-HIR, however, linting is structured as a method on the `TyCtxt` which
transitively executes a query to calculate lint levels. Each request to lint on
a `TyCtxt` will query the entire crate's 'lint level data structure' and then go
from there about whether the lint should be emitted or not.

The query depends on the entire HIR crate but should be very quick to calculate
(just a quick walk of the HIR) and the red-green system should notice that the
lint level data structure rarely changes, and should hopefully preserve
incrementality.

Overall this resulted in a pretty big change to the test suite now that lints
are emitted much earlier in compilation (on-demand vs only at the end). This in
turn necessitated the addition of many `#![allow(warnings)]` directives
throughout the compile-fail test suite and a number of updates to the UI test
suite.

Closes #42511
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-incr-comp Area: Incremental compilation A-lint Area: Lints (warnings about flaws in source code) such as unused_mut. C-cleanup Category: PRs that clean code up or issues documenting cleanup. E-hard Call for participation: Hard difficulty. Experience needed to fix: A lot. E-mentor Call for participation: This issue has a mentor. Use #t-compiler/help on Zulip for discussion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants