Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement uninit_vec lint #7682

Merged
merged 11 commits into from
Oct 12, 2021
Merged

Implement uninit_vec lint #7682

merged 11 commits into from
Oct 12, 2021

Conversation

Qwaz
Copy link
Contributor

@Qwaz Qwaz commented Sep 17, 2021

changelog: add the new lint [uninit_vec]

Fix #7681

@rust-highfive
Copy link

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @xFrednet (or someone else) soon.

Please see the contribution instructions for more information.

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties label Sep 17, 2021
Copy link
Member

@xFrednet xFrednet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Welcome to the project! I've done a first review of the changes. The general logic is already checking for the correct conditions. 👍 My comments are mostly related to improvements of this logic and refactorings to catch more cases 🙃

Let me know if anything is unclear, or you need help with anything. 🙃


Note: You can also assign yourself to the linked issue by commenting @rustbot claim on it. That's just to make sure that no one else also starts to work on the issue. :)

clippy_lints/src/uninit_vec.rs Show resolved Hide resolved

impl<'tcx> LateLintPass<'tcx> for UninitVec {
fn check_block(&mut self, cx: &LateContext<'tcx>, block: &'tcx Block<'_>) {
for w in block.stmts.windows(2) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This only catches cases where Vec::with_capacity(..); is directly followed by vec.set_len(..);. This will miss cases where the user has some additional statements in between. In Clippy we try to handle such lints by first finding the value initialization (in this case let vec = Vec::with_capacity(..);) and then inspecting how the value is used in the rest of the block.

The initialization would be found by checking each local with check_local, similar to how you're current doing it. We then have two options to check for the usage of the value:

  1. A Visitor can be used to check each expression inside a given block. Here you could get the containing block of the value declaration and walk each expression. You could then lint, if the first usage of the value is actually a call to vec.set_len(..);
  2. The second option is to store the ID of the value in Self and then check for the usage in this lint pass, which is essentially also just a visitor.
    A great and hopefully not too confusing example is the vec_init_then_push lint. You can find the code in clippy_lints/src/vec_init_then_push.rs

Option 2 would be a bit better performance wise, but can be a bit more complex. You can also try to implement this using a Visitor. Let me know if you want to also have some example code for this 🙃

Copy link
Contributor Author

@Qwaz Qwaz Sep 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This only catches cases where Vec::with_capacity(..); is directly followed by vec.set_len(..);

This was an intentional decision to reduce false positives. The statements between them might be a legit initialization:

self.vec.reserve(1000);
self.init_vec(); // correctly initializes `self.vec`, e.g., by using `mem::write()`
self.vec.set_len(1000); // this call is now correct

For statistics, checking directly adjacent statements are enough to detect most of incorrect usage of set_len(). We have a list of 30 soundness bugs that fall into this category, and 28 of them can be caught with the current implementation. The other two patterns that are not detected are:

let mut a = Vec::with_capacity(len);
let mut b = Vec::with_capacity(len);
unsafe {
    a.set_len(len);
    b.set_len(len);
}

and

let mut vec = Vec::with_capacity(len);
unsafe {
    vec.set_len(len);
    another_unsafe_fn(&mut vec);
}

I think I can slightly modify the block handling logic to detect the second case. Then, it would detect 29 out of 30 cases. Would that be good enough, or would you still prefer me to try detecting the first case?

My concern is the implementation complexity. (1) The block handling logic would become complicated because with_capacity()/reserve() and set_len() might not be in the same block due to unsafe blocks. (2) The visitor needs to detect whether the target Vec is saved to a local variable or to the field of a struct. Only the first case would be safe to detect with the visitor due to the false positives mentioned above.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, from the example, I see what you mean. Cases where the Vec is stored inside a struct or tuple would not be caught, that's correct. Most lints related to sequential usage only work for locals, as far as I know. Do you know how many of the mentioned cases, used the vector as a part of a struct or tuple?

(1) The block handling logic would become complicated because with_capacity()/reserve() and set_len() might not be in the same block due to unsafe blocks.

The Visitor or LintPass both also visit nested blocks. They would therefore also work with unsafe blocks or more complicated logic.

(2) The visitor needs to detect whether the target Vec is saved to a local variable or to the field of a struct. Only the first case would be safe to detect with the visitor due to the false positives mentioned above.

Hmm, I think the best way to handle this would be to track the local that the vector is stored in and check for the first usage. This would mean the vector itself if it's a local value or the struct that the vector is stored in. This shouldn't add too much additional complexity (I believe...).

However, I'm not totally sure on this. @flip1995 could you maybe take a look at this? 🙃

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know how many of the mentioned cases, used the vector as a part of a struct or tuple?

They were roughly 15% of the bugs.

The Visitor or LintPass both also visit nested blocks. They would therefore also work with unsafe blocks or more complicated logic.

Hmm, I think the best way to handle this would be to track the local that the vector is stored in and check for the first usage.

Oh, now I see what you were suggesting by "The second option is to store the ID of the value in Self". I was initially confused by vec_init_then_push example because it handles neither nested blocks nor in-between statements. Is it correct that you were providing it as an example of how to use a lint pass as a visitor while maintaining the state in self, not necessarily as an example for implementing the visitor logic itself?

I understand what I need to do in that case. However, I think I might still prefer the current version over the version that checks in-between statements due to the simplicity and possible false positives. Would you mind if I open a separate PR for the visitor version that checks in-between statements? Then maybe we can compare the two versions and merge the one that looks better. I'll start working on it after addressing the comments about the lint messages below.

In addition, could you recommend a lint example that handles in-between statements? I was thinking of using HashMap that maps the HirId of Local to its tracked status, and I was wondering if there is something that I can reuse from the existing lints. Thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know how many of the mentioned cases, used the vector as a part of a struct or tuple?

They were roughly 15% of the bugs.

In that case, we should definitely take care to also cover those cases. We can later use our lintcheck tool to run Clippy on them to make sure that they are all caught by this lint 🙃

Oh, now I see what you were suggesting by "The second option is to store the ID of the value in Self". I was initially confused by vec_init_then_push example because it handles neither nested blocks nor in-between statements. Is it correct that you were providing it as an example of how to use a lint pass as a visitor while maintaining the state in self, not necessarily as an example for implementing the visitor logic itself?

Correct, sorry if the wording was a bit confusing. Clippy uses the internal Visitor API from rustc. I found two examples that both check for the usage of a value (example 1, example 2). Note that both use path_to_local_id as they are only tracking local values. You'll need to make sure that the tracked value is set to the base where the vector is stored. This will most likely require some additional logic to handle Field expressions.

Would you mind if I open a separate PR for the visitor version that checks in-between statements?

I wouldn't mind at all, that would be a nice way to compare the two versions. If you do, you can request me as a reviewer by including r? @xFrednet in your PR description 🙃

In addition, could you recommend a lint example that handles in-between statements? I was thinking of using HashMap that maps the HirId of Local to its tracked status, and I was wondering if there is something that I can reuse from the existing lints. Thanks!

If we handle in-between statements, we do so in the described fashion with either a Visitor in example 1 and example 2 or a bit more advances by using the LintPass directly, as can be seen in the vec_init_then_push example. In case of Visitors, we usually use one per local value that could potentially trigger the lint.

I'm not totally sure where you want to use a HashMap. Could you maybe describe what you were thinking of? 🙃

Copy link
Contributor Author

@Qwaz Qwaz Sep 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we use a separate visitor struct, we can create a visitor for each local variable (e.g., the use of id field in needless_collect lint) so HashMap wouldn't be needed in this case. However, it would have a quadratic performance in terms of # locals/block size. This seems fine since a visitor will be only created for each with_capacity()/reserve(). I would probably implement another PR in this way.

If we use LintPass itself as a visitor and handle all variables in a single loop, it would need to manage a state for each variable and might need a data structure like HashMap.

Copy link
Member

@xFrednet xFrednet Sep 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense. It would be interesting to do some profiling to compare a Visitor implementation to a LintPass implementation. This lint can choose either way, due to the fact that the actual logic for each expression is relatively simple. 🙃

I would recommend that you choose the method that seems more readable to you 🙃

clippy_lints/src/uninit_vec.rs Outdated Show resolved Hide resolved
clippy_lints/src/uninit_vec.rs Outdated Show resolved Hide resolved
clippy_utils/src/lib.rs Outdated Show resolved Hide resolved
clippy_utils/src/lib.rs Outdated Show resolved Hide resolved
clippy_lints/src/uninit_vec.rs Outdated Show resolved Hide resolved
tests/ui/uninit_vec.rs Outdated Show resolved Hide resolved
@Qwaz
Copy link
Contributor Author

Qwaz commented Sep 20, 2021

Hmm, the CI failure seems like a false positive in the internal lint. I can't collapse span_lint_and_then() to span_lint_and_help() in this case because span_lint_and_then() takes S: Into<MultiSpan> while span_lint_and_help() only takes Span...

@xFrednet
Copy link
Member

xFrednet commented Sep 21, 2021

True, that seems to be a false positive, you can simply allow the internal lint, with an #[allow] attribute. That's the first time I've heard of this FP 😅

@Qwaz
Copy link
Contributor Author

Qwaz commented Sep 21, 2021

I handled the PR comments, and this one ready to be reviewed again. I'll create another PR for an alternative implementation of this lint that checks in-between statements as discussed in the comment.

@xFrednet
Copy link
Member

Thank you for the update, I'll review it over the week. It's also awesome that you've already created an issue for the FP of our internal lint 👍

Small note regarding your last commit message: we usually try to avoid linking issues in commit messages, as that'll add a new reference when you would need to rebase your branch. And it'll also reference a Rust issue once we sync commits between this repo and the compiler repo. You can leave it in, that's just a note for future commits 🙃

clippy_lints/src/uninit_vec.rs Outdated Show resolved Hide resolved
tests/ui/uninit_vec.rs Show resolved Hide resolved
clippy_lints/src/uninit_vec.rs Outdated Show resolved Hide resolved
clippy_lints/src/uninit_vec.rs Outdated Show resolved Hide resolved
@bors
Copy link
Contributor

bors commented Sep 30, 2021

☔ The latest upstream changes (presumably #7673) made this pull request unmergeable. Please resolve the merge conflicts.

@xFrednet xFrednet added S-waiting-on-author Status: This is awaiting some action from the author. (Use `@rustbot ready` to update this status) and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties labels Oct 1, 2021
@flip1995
Copy link
Member

flip1995 commented Oct 1, 2021

I thought a bit about the Visitor/LintPass vs window discussion. I like the Visitor version more. The example for the FP given doesn't seem very realistic to me: why would you want to define the capacity and set the length outside of the initialization function? This just seems like a really niche corner case (we can test this hypothesis with out lintcheck tool).

You can prevent FP by checking if the assignee is used in between the reserve and set_len. For Locals you can just do this by LocalId. I don't know how best to do this for fields, or even more complicated for methods returning &mut Vec<_>.

So even though I like the Visitor version more, I'd be fine with the window version to get merged for now with a FIXME in the code and documenting the limitations in the Known Problems section.

@Qwaz
Copy link
Contributor Author

Qwaz commented Oct 3, 2021

The example for the FP given doesn't seem very realistic to me: why would you want to define the capacity and set the length outside of the initialization function?

There are some use cases like recursive parsing, where the outer function allocates a buffer for the inner function and the inner function initializes it. I think preventing such FP is necessary when we take the visitor-based implementation (especially because the suggested category is "correctness").

I don't know how best to do this for fields, or even more complicated for methods returning &mut Vec<_>.

I was thinking about using the originating place of projections and being conservative. For instance, self (which has LocalId) would be originating place for self.vec, and any access to self between self.vec.reserve() and self.vec.set_len() would suppress the lint.

I'd be fine with the window version to get merged for now with a FIXME in the code and documenting the limitations in the Known Problems section.

Merging this PR first and working on the visitor version based on it might be better than having two separate PRs for me as well. How about that @xFrednet?

@xFrednet
Copy link
Member

xFrednet commented Oct 3, 2021

I don't know how best to do this for fields, or even more complicated for methods returning &mut Vec<_>.

I was thinking about using the originating place of projections and being conservative. For instance, self (which has LocalId) would be originating place for self.vec, and any access to self between self.vec.reserve() and self.vec.set_len() would suppress the lint.

This conservative approach sounds good to me and was also the one I was thinking of 👍

Merging this PR first and working on the visitor version based on it might be better than having two separate PRs for me as well. How about that @xFrednet?

Sure, that sounds good to me 👍. I want to take a final look before merging it. I should be able to review it on Monday 🙃

So even though I like the Visitor version more, I'd be fine with the window version to get merged for now with a FIXME in the code and documenting the limitations in the Known Problems section.

Could you add this documentation to the code as a preparation? 🙃


@rustbot label -S-waiting-on-author +S-waiting-on-review

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties and removed S-waiting-on-author Status: This is awaiting some action from the author. (Use `@rustbot ready` to update this status) labels Oct 3, 2021
@flip1995
Copy link
Member

flip1995 commented Oct 4, 2021

SGTM 👍

Copy link
Member

@xFrednet xFrednet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some smaller NITs that should be easy to fix. Please also add a note to the docs that the lint currently only supports statements that are next to each other. Then it's ready to be merged IMO. Thank you for the contributions 🙃

clippy_lints/src/uninit_vec.rs Outdated Show resolved Hide resolved
clippy_lints/src/uninit_vec.rs Outdated Show resolved Hide resolved
clippy_utils/src/paths.rs Outdated Show resolved Hide resolved
clippy_utils/src/lib.rs Outdated Show resolved Hide resolved
tests/ui/uninit_vec.rs Show resolved Hide resolved
@xFrednet xFrednet added S-waiting-on-author Status: This is awaiting some action from the author. (Use `@rustbot ready` to update this status) and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties labels Oct 4, 2021
@Qwaz
Copy link
Contributor Author

Qwaz commented Oct 5, 2021

Addressed PR comments, need feedback on #7682 (comment)

@bors
Copy link
Contributor

bors commented Oct 6, 2021

☔ The latest upstream changes (presumably #7773) made this pull request unmergeable. Please resolve the merge conflicts.

@Qwaz
Copy link
Contributor Author

Qwaz commented Oct 7, 2021

Added testcases for new() and default()

@Qwaz
Copy link
Contributor Author

Qwaz commented Oct 9, 2021

Forgot to address the merge conflict 😅

Copy link
Member

@xFrednet xFrednet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, there is one last NIT, and then I think it's ready to be merged! 👍

clippy_lints/src/uninit_vec.rs Show resolved Hide resolved
@Qwaz
Copy link
Contributor Author

Qwaz commented Oct 12, 2021

Description updated!

Copy link
Member

@xFrednet xFrednet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything looks good to me, thank you very much for all the work you put into this lint! I really appreciate it 🙃

@xFrednet
Copy link
Member

@bors r+

🎉

@bors
Copy link
Contributor

bors commented Oct 12, 2021

📌 Commit 4ed3a4f has been approved by xFrednet

@bors
Copy link
Contributor

bors commented Oct 12, 2021

⌛ Testing commit 4ed3a4f with merge 3d9c4a6...

@bors
Copy link
Contributor

bors commented Oct 12, 2021

☀️ Test successful - checks-action_dev_test, checks-action_remark_test, checks-action_test
Approved by: xFrednet
Pushing 3d9c4a6 to master...

@bors bors merged commit 3d9c4a6 into rust-lang:master Oct 12, 2021
@Qwaz
Copy link
Contributor Author

Qwaz commented Oct 12, 2021

Thank you as well for the helpful and detailed reviews :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-author Status: This is awaiting some action from the author. (Use `@rustbot ready` to update this status)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

New Lint: Detect Uninitialized Vec
6 participants