refactor lint handling to be more "eager" #42511

nikomatsakis · 2017-06-07T19:32:27Z

Lint handling at present works like this:

During most of compilation, when we see something we might want to lint about, we insert this potential lint into a mutable table in the session. This comes along with a NodeId that identifies the AST/HIR node that the lint is attached to, along with the lint name, message, etc.
We then have a dedicated pass that walks over the HIR. At each point, it tracks which lints are "allowed", "denied", etc. At the same time, it extracts entries from this table and reports them. It also invokes "lint visitors" that may generate additional lints.
- In particular, some lints are just a custom pass, but others are generated as a side-effect of other things (like type-checking). It just depends on when the information that is needed is most readily available.

This is not a very good setup. Among other things:

It requires us to generate a big table of "pending" lints, even in the case where a lint is #[allow]. Wasteful.
It means that lints all get displayed fairly late in compilation -- this limits their value, since often a lint is detected early which might be useful in solving a compilation error, but we can't report the lint.
It's not very incremental friendly because of the Big Mutable Table.

@eddyb proposed an alternative architecture that I think is better.

In HIR lowering, we would compute which lints are in scope at each point. We would store this information in the HIR in some way (some thoughts on this below).
When we detect (in some phase) a lintable scenario, we would check immediately whether the lint is "allow", "warn", etc, and -- if needed -- issue the appropriate diagnostic right then.
We can still run the custom lint code, it would simply query the HIR settings in the same way.

The best way to store this lint table is not 100% clear. Something sparse seems obviously like a good idea, since most nodes in the HIR do not have lint-related attributes on them (and hence just inherit the settings from their parent). I think maybe having each item store a link to the "current" settings for each lint makes sense; this can be easily shared between items. Then, within an item, if we need to compute the settings for some particular node, we would walk up the parents in the HIR map, looking for lint-related attributes and -- when we reach the item -- just using the settings stored on the item itself.

Alternatively, we could just cache nothing and compute everything this way, at least to start. If more caching is desired, maybe have passes keep a local cache that would then be populated within the things within a particular item.

I'm available to mentor this. It is a bit of a "design needed" sort of job, so I'm not sure what the immediate instructions would be. The first step is obviously familiarizing yourself with the current setup. After that, I would probably try to make a commit that adds a method to the HIR map that will compute the innermost applicable lint setting for a particular node (just by walking up the parents). We could then use this and remove all the existing lint code for tracking. That should work, but be slow (O(n^2) lookups). Presumably we can then adopt some sort of caching strategy.

@eddyb also mentioned that for bonus points, we should probably "decode" attributes during HIR lowering more generally, so that we are not walking the general attribute data structures, but rather something specific to the stuff the compiler cares about.

The text was updated successfully, but these errors were encountered:

eddyb · 2017-06-07T19:40:47Z

Another problem with the existing deferred lints is I don't think there is a way to make the messages rich (complex diagnostics) while emitting the diagnostics on the spot would make that trivial.

nikomatsakis · 2017-06-07T19:50:14Z

@eddyb I actually added support for rich lints, but yes, it was a pain, and it shouldn't be.

oli-obk · 2017-06-08T10:35:09Z

Cc @Manishearth , @llogiq , @mcarton

We have a few lints which use a custom visitor because they need additional info. With these lints we've had problems with the attributes not working on the actual item which is linted. The way I understood this new system, such a check could become way simpler.

mjkillough · 2017-06-22T09:22:02Z

@nikomatsakis - I'd like to have a go at this if you're still willing to mentor it. I've started familiarising myself with the existing code and I am preparing to have a go at the "step one" you suggest.

mjkillough · 2017-06-25T10:33:12Z

I've added a function to the HIR map that naively calculates a LintLevels at a particular node by walking up its parents. I'm now trying to figure out how best to re-write some of the lints to use it.

If I understand the code correctly, there are four phases where we try to lint about things:

Calls to Session::add_lint() before HIR lowering and before lint::context::EarlyContext runs. These get added to the table of pending lints and are emitted by EarlyContext as it walks the AST. ([1] [2] [3] [4])
The EarlyContext pass which walks the AST just before HIR lowering. It emits any pending lints from the table and also runs these checks and immediately emits any appropriate lints: UnusedParens, UnusedImportBraces, AnonymousParameters, IllegalFloatLiteralPattern, DeprecatedAttr.

(The AST is then lowered to HIR).

Calls to Session::add_lint() after HIR lowering. (I think there are ~32 such calls). These get added to the table of pending lints and are emitted by lint::context::LateContext as it walks the HIR.
The LateContext pass which walks the HIR. It emits any pending lints from the table and runs the rest of the lint passes, immediately emitting any appropriate lints.

(The doc-string of lint::context::raw_emit_lint() says that it is made public so that lints may be emitted during trans, after LateContext has already run. I can't see anywhere that's actually done though).

Which would mean the plan in the issue description expands to:

Leave (1) and (2) alone. Continue to build-up a small table of pending lints with calls to Session::add_lint(). These continue to get emitted by lint::context::EarlyContext. Continue to determine the level of each lint by parsing the attrs as we walk the AST in EarlyContext. (As both of these happen during librustc_driver::driver::phase_2_configure_and_expand, will it affect incremental compilation?)
During HIR lowering, calculate the level of each lint at each node.
For (3), calls to Session::add_lint() after HIR lowering use the calculated lint levels to emit lints immediately.
For (4), LateContext stops trying to keep track of the lint levels as it walks the HIR and just uses the information on the HIR nodes that we calculated during lowering.

Is that still what you had in mind @nikomatsakis?

eddyb · 2017-06-25T10:41:13Z

@mjkillough That seems fine, expect that walking up the HIR every time can be expensive - or did you mean that HIR lowering records the information for some nodes (items?) and the HIR maps walks up for anything in between that and the node it was queried about?

mjkillough · 2017-06-25T11:29:12Z

@eddyb - Yeah, once we calculate the levels at each node during HIR lowering, we'd just walk up from the current node to whichever node had the calculated lint levels. As I haven't touched the HIR lowering part yet, I'm currently doing the easier thing and walking all the way up to the crate each time.

I don't think it'll just be items that we store the calculated lint levels on. Based on what the visitors in lint::context do, I think we'd need to store it on:Item, ForeignItem, TraitItem, ImplItem, StructField, Variant, Local and Expr. (We'd therefore need to walk up from whichever node we want to lint about to the enclosing one of these, which we can do quite easily).

I also haven't put much thought into how we'd store the calculated lint levels on the HIR nodes yet - I'd planned to do that once I had something working.

eddyb · 2017-06-25T11:42:37Z

@mjkillough I'd do only Item for now and have a hybrid solution for retrieving lint nodes for a specific node - mostly because lint annotations are somewhat rare.

mjkillough · 2017-06-25T11:55:16Z

@eddyb - OK, that makes sense. I'll have a go at doing that. Thanks!

Mark-Simulacrum · 2017-06-25T18:15:51Z

Related to this, I believe: #14407 and #14408.

nikomatsakis · 2017-07-23T18:59:05Z

@mjkillough any progress on this issue? I'm excited to see you were poking at it!

mjkillough · 2017-07-26T17:41:22Z

@nikomatsakis - Ah, sorry for going quiet on this! I did start making progress, but ran out of free time.

I managed to get a protoype working, doing roughly what I described in my previous comment. It wasn't doing any smart caching - just walking up the HIR for every node to calculate lint levels. (It was tricky to see whether my changes broke anything, as they caused around 100 failures in the compile-fail tests because a bunch of lints are now more eager).

My next steps were going to be to rejig some of the structures in lint::context, so it was easier to cache something on the HIR Item nodes. I didn't get very far with that before I ran out of time.

I should be able to start looking at this again from the middle of next week. If someone needs to start making progress with this in the meantime, just let me know and I can push what I have.

mjkillough · 2017-07-26T18:15:49Z

@alexcrichton is keen to pick this up, so I've pushed what I had here and wrote my planned next-steps in the commit message: master...mjkillough:eager_linting_prototype

It may not make sense to base changes off that branch, as it is quite rough around the edges. It'll at least give you an indication of the direction I was heading in. :)

alexcrichton · 2017-07-28T14:16:48Z

I've opened a PR for this at #43522

rustc: Rearchitect lints to be emitted more eagerly In preparation for incremental compilation this commit refactors the lint handling infrastructure in the compiler to be more "eager" and overall more incremental-friendly. Many passes of the compiler can emit lints at various points but before this commit all lints were buffered in a table to be emitted at the very end of compilation. This commit changes these lints to be emitted immediately during compilation using pre-calculated lint level-related data structures. Linting today is split into two phases, one set of "early" lints run on the `syntax::ast` and a "late" set of lints run on the HIR. This commit moves the "early" lints to running as late as possible in compilation, just before HIR lowering. This notably means that we're catching resolve-related lints just before HIR lowering. The early linting remains a pass very similar to how it was before, maintaining context of the current lint level as it walks the tree. Post-HIR, however, linting is structured as a method on the `TyCtxt` which transitively executes a query to calculate lint levels. Each request to lint on a `TyCtxt` will query the entire crate's 'lint level data structure' and then go from there about whether the lint should be emitted or not. The query depends on the entire HIR crate but should be very quick to calculate (just a quick walk of the HIR) and the red-green system should notice that the lint level data structure rarely changes, and should hopefully preserve incrementality. Overall this resulted in a pretty big change to the test suite now that lints are emitted much earlier in compilation (on-demand vs only at the end). This in turn necessitated the addition of many `#![allow(warnings)]` directives throughout the compile-fail test suite and a number of updates to the UI test suite. Closes #42511

nikomatsakis added A-incr-comp Area: Incremental compilation E-mentor Call for participation: This issue has a mentor. Use #t-compiler/help on Zulip for discussion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Jun 7, 2017

aturon mentioned this issue Jun 7, 2017

Incremental compilation rust-lang/rust-roadmap-2017#4

Open

14 tasks

This was referenced Jun 8, 2017

migrate all "incremental tasks" to "on-demand queries" #40746

Closed

incr.comp.: Lints and other error messages are not included in the ICH #41184

Closed

sanxiyn added the A-lints Area: Lints (warnings about flaws in source code) such as unused_mut. label Jun 19, 2017

sanxiyn mentioned this issue Jun 21, 2017

Skip LintPasses whose lints are all Allow #15161

Closed

Mark-Simulacrum added the C-cleanup Category: PRs that clean code up or issues documenting cleanup. label Jul 27, 2017

alexcrichton mentioned this issue Jul 28, 2017

rustc: Rearchitect lints to be emitted more eagerly #43522

Merged

bors closed this as completed in #43522 Aug 10, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor lint handling to be more "eager" #42511

refactor lint handling to be more "eager" #42511

nikomatsakis commented Jun 7, 2017

eddyb commented Jun 7, 2017

nikomatsakis commented Jun 7, 2017

oli-obk commented Jun 8, 2017

mjkillough commented Jun 22, 2017

mjkillough commented Jun 25, 2017

eddyb commented Jun 25, 2017

mjkillough commented Jun 25, 2017

eddyb commented Jun 25, 2017

mjkillough commented Jun 25, 2017

Mark-Simulacrum commented Jun 25, 2017

nikomatsakis commented Jul 23, 2017

mjkillough commented Jul 26, 2017

mjkillough commented Jul 26, 2017

alexcrichton commented Jul 28, 2017

refactor lint handling to be more "eager" #42511

refactor lint handling to be more "eager" #42511

Comments

nikomatsakis commented Jun 7, 2017

eddyb commented Jun 7, 2017

nikomatsakis commented Jun 7, 2017

oli-obk commented Jun 8, 2017

mjkillough commented Jun 22, 2017

mjkillough commented Jun 25, 2017

eddyb commented Jun 25, 2017

mjkillough commented Jun 25, 2017

eddyb commented Jun 25, 2017

mjkillough commented Jun 25, 2017

Mark-Simulacrum commented Jun 25, 2017

nikomatsakis commented Jul 23, 2017

mjkillough commented Jul 26, 2017

mjkillough commented Jul 26, 2017

alexcrichton commented Jul 28, 2017