Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use HasSpan trait more extensively for more ergonomic API #278

Merged
merged 1 commit into from
Oct 8, 2023

Conversation

Veetaha
Copy link
Contributor

@Veetaha Veetaha commented Oct 7, 2023

This PR contains several breaking changes documented in the changelog.


I added HasSpan trait to all types that had a custom span() method not included in that trait. I also replace &Span with impl HasSpan in all public methods in marker_api to allow passing nodes to them directly.

I even added a new lint to marker_lints that should catch the cases when impl HasSpan isn't used in a public method, but only after I implemented it I noticed that item visibility information isn't availble in marker_api yet. That's a bummer, because this custom lint doesn't make sense for non-pub methods and emits a lot of false positivies due to this fact. Anyway, since #26 has to be eventually implemented I left the new custom lint in code, and added ignores at crate-level for it.


When adding a new custom lint to marker_lints I also moved it under a module not_using_has_span_trait and named the lint variable just LINT. I did the same with the existing lint for uppercase letter in diagnostic message, and found out that it generated a name conflict. The error I saw due to this came from the internal panic in rustc_lint or some other internal rusrc crate. It was so verbose, it included a backtrace, and looked so scary that I improved the error handling due the lint crates initialization.

Now if there are name conflicts in lint names the error will look like this:

      Marker linting
    Checking marker_api v0.4.0-dev (/home/veetaha/dev/marker/marker_api)
   Compiling marker_rustc_driver v0.4.0-dev (/home/veetaha/dev/marker/marker_rustc_driver)
    Checking marker_error v0.4.0-dev (/home/veetaha/dev/marker/marker_error)
    Checking marker_uitest v0.4.0-dev (/home/veetaha/dev/marker/marker_uitest)
  × The lint `marker::lint` is defined multiple times:
  × The lint `marker::lint` is defined multiple times:
  │ - marker_lints::diag_msg_uppercase_start::LINT
  │ - marker_lints::not_using_has_span_trait::LINT
  │ - marker_lints::diag_msg_uppercase_start::LINT
  │ - marker_lints::not_using_has_span_trait::LINT
  × The lint `marker::lint` is defined multiple times:
  │ - marker_lints::diag_msg_uppercase_start::LINT
  │ - marker_lints::not_using_has_span_trait::LINT
  × The lint `marker::lint` is defined multiple times:
  │ - marker_lints::diag_msg_uppercase_start::LINT
  │ - marker_lints::not_using_has_span_trait::LINT
error: could not compile `marker_api` (lib)
warning: build failed, waiting for other jobs to finish...
error: could not compile `marker_rustc_driver` (build script)
error: could not compile `marker_error` (lib)
error: could not compile `marker_uitest` (lib)
  × linting finished with an error

I had to use process::exit to implement this because it's not possible to return a result from the context where the lints are registered. I saw a comment that mentions a caveat that that code runs on a different thread, and thus we have no other choice, and I didn't try to fight with that. Yeah, it is a bit ugly, but it works for now.

You can also see that the error is repeated multiple times. It's because cargo called the rustic driver multiple times in parallel and all of these calls returned the same error. Unfortunately cargo didn't duplicate the error messages. Not sure how cargo does this, bit may be researched later.

@Veetaha Veetaha force-pushed the feat/more-has-span branch 2 times, most recently from 0f8276b to 95d660b Compare October 7, 2023 15:16
Warn,
}

pub(crate) fn check_item<'ast>(cx: &'ast MarkerContext<'ast>, item: ItemKind<'ast>) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm struggling with code organization here. Since there is a single lint pass for the entire crate, then if I want to define unrelated lints in separate modules I can't write their code inside of a trait impl. Writing code inside of a trait impl is cool because I can leverage rust-analyzer to paste the stub trait impl and let me fill it in. With the free function I had to copy the signature from the trait's declaration manually. I wish we could improve this situation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'm currently experimenting a bit in https://github.com/rust-marker/marker-example-lints. Maybe having something like the inventory crate, that you talked about previously could be a good idea 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I already forgot about inventory. I'm thinking what the right API should be though, it's hard to to come up with something both succinct but flexible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that clippy uses a single lint pass per lint.
For example MultiAssignments is defined in a separate file with its own lint pass (code).

And then that lint pass is registered manually (code).

That second part of manual registration can be done by inventory though.

But let's not forget that lints should be configurable, like this one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clippy usually has one lint pass per lint crate, but we also combine them sometimes. This can become messy (See clippy_lints::methods as an extreme example), but allows lints to share common logic.

We could do something similar, where we have a lint pass trait implemented on several types. The main LintPass of the crate would hold Vec<Box<&dyn LintPass>>, to distribute the control flow inside the crate. The dyn LintPass wouldn't have access to the 'ast lifetime, but I think we want to discourage users from storing any 'ast objects in the structs anyways

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't clippy lints all in one crate? I've shown the example where they register one lint pass per lint module.

That extreme example is really extreme. It's hard to tell which logic they share in common, and wether it's even worth it. I suppose it's an attempt to save some performance on looking up method expressions in code. I.e. the difference between:

if let Expr::Method(method) = expr {
	method_lint_1(method);
	method_lint_2(method);
}
if let Expr::Method(method) = expr {
	method_lint_1(method);
}

if let Expr::Method(method) = expr {
	method_lint_2(method);
}

and the first one wins because it doesn't have a repeated check?
That makes sense, but I'm not sure such optimization is worth it, given the messy code organization, but I suppose we shouldn't have an API that disallows doing such context sharing.

Maybe there could be several variations of declare_lint! macro. One for the simple case where the lint with the lint pass implementation is directly registered in inventory, and the second one more advanced that registers only the lint in the inventory, but the implementation of the lint pass is expected to be used manually inside of other lint pass implementation and thus not registered.

Comment on lines 39 to 43
fn strip_references(ty: ast::TyKind<'_>) -> ast::TyKind<'_> {
match ty {
ast::TyKind::Ref(ref_ty) => strip_references(ref_ty.inner_ty()),
_ => ty,
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this function could be moved to the public API of TyKind.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would definitely be a good function to have. peel_refs from rustc might be a good example as well

Copy link
Contributor Author

@Veetaha Veetaha Oct 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool 🤙, I added the same peel_refs method to ast::TyKind and sem::TyKind.

Copy link
Contributor Author

@Veetaha Veetaha Oct 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've noticed that there is no information about the parentheses around the type in TyKinds. For example both u32, (u32), and ((u32)) are represented the same in the AST. This is in contrast to syn, where it has a Paren type kind that wraps the inner type in parens explicitly.

I added two test cases to print_ty.rs to show that there is no info about that. Is this by design? I don't think this is critical, and in fact the Paren type kind in syn is often annoying and I have to write a similar strip_ty_group function that removes the parentheses and an invisible Group. It's really hard to understand what Group is, but I suppose that's related to invisible groupping that is added by the compiler when a macro is invoked in type position. For example &macro_call!() is implicitly grouped when expanded like &(expanded_ty_here) and the parentheses I used here aren't visible.

So I'm not really sure marker needs to expose any parens info at all. Stripping them automatically simplifies life, but maybe there could be lints that want that info? I'm not sure.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this by design?

Yes, such information is usually not represented in ASTs, as they are semantically the same. You'll see the same thing for binary operators. (1 + 2) * 3 the brackets are not included in the final AST. The tree structure preserves the execution order.

This is part of what the abstract from "Abstract Syntax Tree" is about. It tries to unify these syntactical differences into one uniform view, at the loss of information. If a user wants, they can still use the span, to see if there are parentheses, but that's not necessarily recommended.

Clippy has a few lints, which check for the syntax. But they're always a bit weird, as we use Clippy as a formatter with semantic information. clippy::semicolon_if_nothing_returned is an example, where we need to check the written syntax as well for some heuristics, if the semicolon should be suggested.

So I'm not really sure marker needs to expose any parens info at all. Stripping them automatically simplifies life, but maybe there could be lints that want that info? I'm not sure.

In Clippy we have lived happy without this information, and hacked around with spans, where needed. If we wanted to provide a way to completely reconstruct the original source code, we should include parenthesis. Rustfmt and syn, for instance, require this information. For Marker's case, I don't believe that this should be one of our goals. It would be nice to have a function to print the AST as it would be written (Similar how rustc has this for hir, with the "Show HIR" option on the playground). But adding parenthesis nodes, would add a lot of edge cases to linting, with very little benefit :)

Comment on lines +26 to +31
if ident != "Span" {
continue;
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't find a way to resolve the semantic information of an ast::TyKind. That's why I had to use a heuristic that compares the last segment of the path with Span which is good enough, but definitely not ideal in cases when there are multiple crates with multiple types named Span.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Jep, you're right. There should be an option to check this using resolve_ty_ids, but there seems to be no way of getting the TyDefId, woops 😅

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with resolve_ty_ids is that it takes a stringified path as a parameter. It also returns a list of matching types.
In this case we should have a method that takes an ast::TyKind as a parameter and returns just a single sem::TyKind. I suppose it shouldn't be ambiguous, and only a single type can be resolved given the context of the crate and module where the ast::TyKind resides it should always resolve to a single type, right?

Copy link
Member

@xFrednet xFrednet Oct 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rustc has a util to convert AST types to semantic types, it can crash under some circumstances, which is why I'm cautious with it. I think the better option would be to have a resolve_item_ids equivalent. The AST path of the ast::TyKind should already provide the ItemId it refers to. Right now, there is just no way to check if that's the one from our Span struct

ast::TyKind resides it should always resolve to a single type, right?

That's correct 👍 Everything in the AST can only have a single type, but a string path can have multiple IDs.

@Veetaha Veetaha requested a review from xFrednet October 7, 2023 15:31
Copy link
Member

@xFrednet xFrednet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, a few nits and comments, but that's it. It's nice to see another lint implementation. It's a good way to find issues and to prioritize them 👍

When adding a new custom lint to marker_lints I also moved it under a module not_using_has_span_trait and named the lint variable just LINT. I did the same with the existing lint for uppercase letter in diagnostic message, and found out that it generated a name conflict. The error I saw due to this came from the internal panic in rustc_lint or some other internal rusrc crate. It was so verbose, it included a backtrace, and looked so scary that I improved the error handling due the lint crates initialization.

This is something which I've been thinking about. Ideally, I would like to have the lint crate in front, instead of marker. allow(marker_lints::XYZ) makes more sense, and prevents collisions by default. But the chances of this happening, with the previous state of the discussion about tool name spaces is basically zero. We could add something after the marker prefix, marker::<lint_crate>::<lint_name> or so. Here, I fear that this is too verbose. 🤔 I'm also not sure if rustc would accept this correctly, but if this is something we want, we could probably convince the compiler team, to accept these changes :)

Edit: It seems like rustc accepts lint path with multiple segments, so that could be an option (Playground) 👍 This doesn't solve the issue, if multiple lints have the same name in one lint crate, but at least prevents collisions between different ones


For now, I like your solution a lot :D

feature(register_tool),
register_tool(marker),
// This lint applies only to public `marker_api`
allow(marker::not_using_has_span_trait)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be better, to have the lint allow-by-default and then add the warn(marker::not_using_has_span_trait) to marker_api instead. I also thought about adding a lint crate for internal marker lints. 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea 💡, I forgot about that

marker_lints/src/not_using_has_span_trait.rs Show resolved Hide resolved
Comment on lines 39 to 43
fn strip_references(ty: ast::TyKind<'_>) -> ast::TyKind<'_> {
match ty {
ast::TyKind::Ref(ref_ty) => strip_references(ref_ty.inner_ty()),
_ => ty,
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would definitely be a good function to have. peel_refs from rustc might be a good example as well

Comment on lines +26 to +31
if ident != "Span" {
continue;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Jep, you're right. There should be an option to check this using resolve_ty_ids, but there seems to be no way of getting the TyDefId, woops 😅

Warn,
}

pub(crate) fn check_item<'ast>(cx: &'ast MarkerContext<'ast>, item: ItemKind<'ast>) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'm currently experimenting a bit in https://github.com/rust-marker/marker-example-lints. Maybe having something like the inventory crate, that you talked about previously could be a good idea 🤔

marker_api/src/lib.rs Outdated Show resolved Hide resolved
marker_api/src/ast/ty.rs Outdated Show resolved Hide resolved
@Veetaha Veetaha requested a review from xFrednet October 8, 2023 13:54
Copy link
Member

@xFrednet xFrednet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, thank you for the nice update and new lint ❤️

@Veetaha Veetaha force-pushed the feat/more-has-span branch from 4f5b787 to 87fbdf5 Compare October 8, 2023 16:01
@Veetaha Veetaha enabled auto-merge October 8, 2023 16:01
I added `HasSpan` trait to all types that had a custom `span()` method not included in that trait. I also replace `&Span` with `impl HasSpan` in all public methods in `marker_api` to allow passing nodes to them directly.

I even added a new lint to `marker_lints` that should catch the cases when `impl HasSpan` isn't used in a public method, but only after I implemented it I noticed that item visibility information isn't availble in `marker_api` yet. That's a bummer, because this custom lint doesn't make sense for non-pub methods and emits a lot of false positivies due to this fact. Anyway, since rust-marker#26 has to be eventually implemented I left the new custom lint in code, and added ignores at crate-level for it.
@Veetaha Veetaha force-pushed the feat/more-has-span branch from 87fbdf5 to e47d5b8 Compare October 8, 2023 16:28
@Veetaha Veetaha added this pull request to the merge queue Oct 8, 2023
Merged via the queue into rust-marker:master with commit 36ecc5b Oct 8, 2023
@Veetaha Veetaha deleted the feat/more-has-span branch October 8, 2023 16:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants