-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tabs_in_doc_comments: Fix ICE due to char indexing #7039
Conversation
This is a quick-fix for an ICE in `tabs_in_doc_comments`. The problem was that we we're indexing into possibly multi-byte characters, such as '位'. More specifically `get_chunks_of_tabs` was returning indices into multi-byte characters. Those were passed on to a `Span` creation that then caused the ICE. This fix makes sure that we don't return indices that point inside a multi-byte character. *However*, we are still iterating over unicode codepoints, not grapheme clusters. So a seemingly single character like y̆ , which actually consists of two codepoints, will probably still cause incorrect spans in the output.
@@ -137,7 +136,7 @@ fn get_chunks_of_tabs(the_str: &str) -> Vec<(u32, u32)> { | |||
if is_active { | |||
spans.push(( | |||
current_start, | |||
u32::try_from(the_str.chars().count()).expect(line_length_way_to_long), | |||
u32::try_from(char_indices.last().unwrap().0 + 1).expect(line_length_way_to_long), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this just
u32::try_from(char_indices.last().unwrap().0 + 1).expect(line_length_way_to_long), | |
u32::try_from(char_indices.len()).expect(line_length_way_to_long), |
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so. char_indicies is an iterator over tuples with (byte position, char)
and using the length of the iterator would produce the same result as the previous code. The suggestion would only work if all characters are 1-byte characters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah can you add a comment to that line than, that the first part is the byte position and not the character index? Otherwise r=me
r? @flip1995 Since highfive didn't assign anyone. |
I don't think so, no. But doesn't this lint only produce spans for whitespaces? There shouldn't be many problems with wrong spans, I think. |
Let me know if you want a rebase! (my GPG setup is currently broken and I may just stop using it altogether, fyi) |
Since comments on discussions often get lost ^ #7039 (comment) |
@bors r=flip1995 |
📌 Commit cbdebd9 has been approved by |
☀️ Test successful - checks-action_dev_test, checks-action_remark_test, checks-action_test |
This is a quick-fix for an ICE in
tabs_in_doc_comments
. The problemwas that we we're indexing into possibly multi-byte characters, such as '位'.
More specifically
get_chunks_of_tabs
was returning indices intomulti-byte characters. Those were passed on to a
Span
creation thatthen caused the ICE.
This fix makes sure that we don't return indices that point inside a
multi-byte character. However, we are still iterating over unicode
codepoints, not grapheme clusters. So a seemingly single character like y̆ ,
which actually consists of two codepoints, will probably still cause
incorrect spans in the output. But I don't think we handle those cases
anywhere in Clippy currently?
Fixes #5835
changelog: Fix ICE in
tabs_in_doc_comments