-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use HasSpan
trait more extensively for more ergonomic API
#278
Conversation
0f8276b
to
95d660b
Compare
Warn, | ||
} | ||
|
||
pub(crate) fn check_item<'ast>(cx: &'ast MarkerContext<'ast>, item: ItemKind<'ast>) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm struggling with code organization here. Since there is a single lint pass for the entire crate, then if I want to define unrelated lints in separate modules I can't write their code inside of a trait impl. Writing code inside of a trait impl is cool because I can leverage rust-analyzer to paste the stub trait impl and let me fill it in. With the free function I had to copy the signature from the trait's declaration manually. I wish we could improve this situation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'm currently experimenting a bit in https://github.com/rust-marker/marker-example-lints. Maybe having something like the inventory crate, that you talked about previously could be a good idea 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I already forgot about inventory
. I'm thinking what the right API should be though, it's hard to to come up with something both succinct but flexible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that clippy
uses a single lint pass per lint.
For example MultiAssignments
is defined in a separate file with its own lint pass (code).
And then that lint pass is registered manually (code).
That second part of manual registration can be done by inventory
though.
But let's not forget that lints should be configurable, like this one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clippy usually has one lint pass per lint crate, but we also combine them sometimes. This can become messy (See clippy_lints::methods
as an extreme example), but allows lints to share common logic.
We could do something similar, where we have a lint pass trait implemented on several types. The main LintPass
of the crate would hold Vec<Box<&dyn LintPass>>
, to distribute the control flow inside the crate. The dyn LintPass
wouldn't have access to the 'ast
lifetime, but I think we want to discourage users from storing any 'ast
objects in the structs anyways
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aren't clippy lints all in one crate? I've shown the example where they register one lint pass per lint module.
That extreme example is really extreme. It's hard to tell which logic they share in common, and wether it's even worth it. I suppose it's an attempt to save some performance on looking up method expressions in code. I.e. the difference between:
if let Expr::Method(method) = expr {
method_lint_1(method);
method_lint_2(method);
}
if let Expr::Method(method) = expr {
method_lint_1(method);
}
if let Expr::Method(method) = expr {
method_lint_2(method);
}
and the first one wins because it doesn't have a repeated check?
That makes sense, but I'm not sure such optimization is worth it, given the messy code organization, but I suppose we shouldn't have an API that disallows doing such context sharing.
Maybe there could be several variations of declare_lint!
macro. One for the simple case where the lint with the lint pass implementation is directly registered in inventory
, and the second one more advanced that registers only the lint in the inventory, but the implementation of the lint pass is expected to be used manually inside of other lint pass implementation and thus not registered.
fn strip_references(ty: ast::TyKind<'_>) -> ast::TyKind<'_> { | ||
match ty { | ||
ast::TyKind::Ref(ref_ty) => strip_references(ref_ty.inner_ty()), | ||
_ => ty, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this function could be moved to the public API of TyKind
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it would definitely be a good function to have. peel_refs from rustc might be a good example as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool 🤙, I added the same peel_refs
method to ast::TyKind
and sem::TyKind
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've noticed that there is no information about the parentheses around the type in TyKind
s. For example both u32
, (u32)
, and ((u32))
are represented the same in the AST. This is in contrast to syn
, where it has a Paren
type kind that wraps the inner type in parens explicitly.
I added two test cases to print_ty.rs
to show that there is no info about that. Is this by design? I don't think this is critical, and in fact the Paren
type kind in syn
is often annoying and I have to write a similar strip_ty_group
function that removes the parentheses and an invisible Group
. It's really hard to understand what Group
is, but I suppose that's related to invisible groupping that is added by the compiler when a macro is invoked in type position. For example ¯o_call!()
is implicitly grouped when expanded like &(expanded_ty_here)
and the parentheses I used here aren't visible.
So I'm not really sure marker needs to expose any parens info at all. Stripping them automatically simplifies life, but maybe there could be lints that want that info? I'm not sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this by design?
Yes, such information is usually not represented in ASTs, as they are semantically the same. You'll see the same thing for binary operators. (1 + 2) * 3
the brackets are not included in the final AST. The tree structure preserves the execution order.
This is part of what the abstract from "Abstract Syntax Tree" is about. It tries to unify these syntactical differences into one uniform view, at the loss of information. If a user wants, they can still use the span, to see if there are parentheses, but that's not necessarily recommended.
Clippy has a few lints, which check for the syntax. But they're always a bit weird, as we use Clippy as a formatter with semantic information. clippy::semicolon_if_nothing_returned is an example, where we need to check the written syntax as well for some heuristics, if the semicolon should be suggested.
So I'm not really sure marker needs to expose any parens info at all. Stripping them automatically simplifies life, but maybe there could be lints that want that info? I'm not sure.
In Clippy we have lived happy without this information, and hacked around with spans, where needed. If we wanted to provide a way to completely reconstruct the original source code, we should include parenthesis. Rustfmt and syn, for instance, require this information. For Marker's case, I don't believe that this should be one of our goals. It would be nice to have a function to print the AST as it would be written (Similar how rustc has this for hir, with the "Show HIR" option on the playground). But adding parenthesis nodes, would add a lot of edge cases to linting, with very little benefit :)
if ident != "Span" { | ||
continue; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't find a way to resolve the semantic information of an ast::TyKind
. That's why I had to use a heuristic that compares the last segment of the path with Span
which is good enough, but definitely not ideal in cases when there are multiple crates with multiple types named Span
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Jep, you're right. There should be an option to check this using resolve_ty_ids
, but there seems to be no way of getting the TyDefId
, woops 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem with resolve_ty_ids
is that it takes a stringified path as a parameter. It also returns a list of matching types.
In this case we should have a method that takes an ast::TyKind
as a parameter and returns just a single sem::TyKind
. I suppose it shouldn't be ambiguous, and only a single type can be resolved given the context of the crate and module where the ast::TyKind
resides it should always resolve to a single type, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rustc has a util to convert AST types to semantic types, it can crash under some circumstances, which is why I'm cautious with it. I think the better option would be to have a resolve_item_ids
equivalent. The AST path of the ast::TyKind
should already provide the ItemId
it refers to. Right now, there is just no way to check if that's the one from our Span
struct
ast::TyKind
resides it should always resolve to a single type, right?
That's correct 👍 Everything in the AST can only have a single type, but a string path can have multiple IDs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, a few nits and comments, but that's it. It's nice to see another lint implementation. It's a good way to find issues and to prioritize them 👍
When adding a new custom lint to marker_lints I also moved it under a module not_using_has_span_trait and named the lint variable just LINT. I did the same with the existing lint for uppercase letter in diagnostic message, and found out that it generated a name conflict. The error I saw due to this came from the internal panic in rustc_lint or some other internal rusrc crate. It was so verbose, it included a backtrace, and looked so scary that I improved the error handling due the lint crates initialization.
This is something which I've been thinking about. Ideally, I would like to have the lint crate in front, instead of marker
. allow(marker_lints::XYZ)
makes more sense, and prevents collisions by default. But the chances of this happening, with the previous state of the discussion about tool name spaces is basically zero. We could add something after the marker
prefix, marker::<lint_crate>::<lint_name>
or so. Here, I fear that this is too verbose. 🤔 I'm also not sure if rustc would accept this correctly, but if this is something we want, we could probably convince the compiler team, to accept these changes :)
Edit: It seems like rustc accepts lint path with multiple segments, so that could be an option (Playground) 👍 This doesn't solve the issue, if multiple lints have the same name in one lint crate, but at least prevents collisions between different ones
For now, I like your solution a lot :D
marker_rustc_driver/src/main.rs
Outdated
feature(register_tool), | ||
register_tool(marker), | ||
// This lint applies only to public `marker_api` | ||
allow(marker::not_using_has_span_trait) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be better, to have the lint allow-by-default and then add the warn(marker::not_using_has_span_trait)
to marker_api
instead. I also thought about adding a lint crate for internal marker lints. 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea 💡, I forgot about that
fn strip_references(ty: ast::TyKind<'_>) -> ast::TyKind<'_> { | ||
match ty { | ||
ast::TyKind::Ref(ref_ty) => strip_references(ref_ty.inner_ty()), | ||
_ => ty, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it would definitely be a good function to have. peel_refs from rustc might be a good example as well
if ident != "Span" { | ||
continue; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Jep, you're right. There should be an option to check this using resolve_ty_ids
, but there seems to be no way of getting the TyDefId
, woops 😅
Warn, | ||
} | ||
|
||
pub(crate) fn check_item<'ast>(cx: &'ast MarkerContext<'ast>, item: ItemKind<'ast>) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'm currently experimenting a bit in https://github.com/rust-marker/marker-example-lints. Maybe having something like the inventory crate, that you talked about previously could be a good idea 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, thank you for the nice update and new lint ❤️
4f5b787
to
87fbdf5
Compare
I added `HasSpan` trait to all types that had a custom `span()` method not included in that trait. I also replace `&Span` with `impl HasSpan` in all public methods in `marker_api` to allow passing nodes to them directly. I even added a new lint to `marker_lints` that should catch the cases when `impl HasSpan` isn't used in a public method, but only after I implemented it I noticed that item visibility information isn't availble in `marker_api` yet. That's a bummer, because this custom lint doesn't make sense for non-pub methods and emits a lot of false positivies due to this fact. Anyway, since rust-marker#26 has to be eventually implemented I left the new custom lint in code, and added ignores at crate-level for it.
87fbdf5
to
e47d5b8
Compare
This PR contains several breaking changes documented in the changelog.
I added
HasSpan
trait to all types that had a customspan()
method not included in that trait. I also replace&Span
withimpl HasSpan
in all public methods inmarker_api
to allow passing nodes to them directly.I even added a new lint to
marker_lints
that should catch the cases whenimpl HasSpan
isn't used in a public method, but only after I implemented it I noticed that item visibility information isn't availble inmarker_api
yet. That's a bummer, because this custom lint doesn't make sense for non-pub methods and emits a lot of false positivies due to this fact. Anyway, since #26 has to be eventually implemented I left the new custom lint in code, and added ignores at crate-level for it.When adding a new custom lint to
marker_lints
I also moved it under a modulenot_using_has_span_trait
and named the lint variable justLINT
. I did the same with the existing lint for uppercase letter in diagnostic message, and found out that it generated a name conflict. The error I saw due to this came from the internal panic inrustc_lint
or some other internalrusrc
crate. It was so verbose, it included a backtrace, and looked so scary that I improved the error handling due the lint crates initialization.Now if there are name conflicts in lint names the error will look like this:
I had to use
process::exit
to implement this because it's not possible to return a result from the context where the lints are registered. I saw a comment that mentions a caveat that that code runs on a different thread, and thus we have no other choice, and I didn't try to fight with that. Yeah, it is a bit ugly, but it works for now.You can also see that the error is repeated multiple times. It's because
cargo
called the rustic driver multiple times in parallel and all of these calls returned the same error. Unfortunately cargo didn't duplicate the error messages. Not sure how cargo does this, bit may be researched later.