tabs_in_doc_comments: Fix ICE due to char indexing #7039

phansch · 2021-04-06T05:16:26Z

This is a quick-fix for an ICE in tabs_in_doc_comments. The problem
was that we we're indexing into possibly multi-byte characters, such as '位'.

More specifically get_chunks_of_tabs was returning indices into
multi-byte characters. Those were passed on to a Span creation that
then caused the ICE.

This fix makes sure that we don't return indices that point inside a
multi-byte character. However, we are still iterating over unicode
codepoints, not grapheme clusters. So a seemingly single character like y̆ ,
which actually consists of two codepoints, will probably still cause
incorrect spans in the output. But I don't think we handle those cases
anywhere in Clippy currently?

Fixes #5835

changelog: Fix ICE in tabs_in_doc_comments

This is a quick-fix for an ICE in `tabs_in_doc_comments`. The problem was that we we're indexing into possibly multi-byte characters, such as '位'. More specifically `get_chunks_of_tabs` was returning indices into multi-byte characters. Those were passed on to a `Span` creation that then caused the ICE. This fix makes sure that we don't return indices that point inside a multi-byte character. *However*, we are still iterating over unicode codepoints, not grapheme clusters. So a seemingly single character like y̆ , which actually consists of two codepoints, will probably still cause incorrect spans in the output.

clippy_lints/src/tabs_in_doc_comments.rs

flip1995 · 2021-04-06T15:40:10Z

clippy_lints/src/tabs_in_doc_comments.rs

@@ -137,7 +136,7 @@ fn get_chunks_of_tabs(the_str: &str) -> Vec<(u32, u32)> {
    if is_active {
        spans.push((
            current_start,
-            u32::try_from(the_str.chars().count()).expect(line_length_way_to_long),
+            u32::try_from(char_indices.last().unwrap().0 + 1).expect(line_length_way_to_long),


Isn't this just

Suggested change

u32::try_from(char_indices.last().unwrap().0 + 1).expect(line_length_way_to_long),

u32::try_from(char_indices.len()).expect(line_length_way_to_long),

?

I don't think so. char_indicies is an iterator over tuples with (byte position, char) and using the length of the iterator would produce the same result as the previous code. The suggestion would only work if all characters are 1-byte characters.

Ah can you add a comment to that line than, that the first part is the byte position and not the character index? Otherwise r=me

tests/ui/crashes/ice-5835.rs

flip1995 · 2021-04-06T15:43:42Z

r? @flip1995

Since highfive didn't assign anyone.

flip1995 · 2021-04-06T15:45:37Z

But I don't think we handle those cases anywhere in Clippy currently?

I don't think so, no. But doesn't this lint only produce spans for whitespaces? There shouldn't be many problems with wrong spans, I think.

phansch · 2021-04-10T13:33:01Z

Let me know if you want a rebase!

(my GPG setup is currently broken and I may just stop using it altogether, fyi)

flip1995 · 2021-04-12T09:43:57Z

Ah can you add a comment to that line than, that the first part is the byte position and not the character index? Otherwise r=me

Since comments on discussions often get lost ^ #7039 (comment)

phansch · 2021-04-14T04:44:56Z

Since comments on discussions often get lost
Yep, totally missed that 😅

@bors r=flip1995

bors · 2021-04-14T04:44:57Z

📌 Commit cbdebd9 has been approved by flip1995

bors · 2021-04-14T04:53:28Z

⌛ Testing commit cbdebd9 with merge 24921df...

bors · 2021-04-14T05:06:39Z

☀️ Test successful - checks-action_dev_test, checks-action_remark_test, checks-action_test
Approved by: flip1995
Pushing 24921df to master...

phansch force-pushed the melt-ice branch from 136073e to 1573d10 Compare April 6, 2021 05:21

flip1995 reviewed Apr 6, 2021

View reviewed changes

rust-highfive assigned flip1995 Apr 6, 2021

phansch added 3 commits April 10, 2021 14:42

Replace complex conditional with pattern matching

dde46c9

Fix rustfmt error / Add comment for tab character

8b9331b

Fix dogfood

47a4865

flip1995 added the S-waiting-on-author Status: This is awaiting some action from the author. (Use `@rustbot ready` to update this status) label Apr 12, 2021

Explain why we use char_indices() instead of chars()

cbdebd9

bors merged commit 24921df into rust-lang:master Apr 14, 2021

phansch deleted the melt-ice branch April 14, 2021 08:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tabs_in_doc_comments: Fix ICE due to char indexing #7039

tabs_in_doc_comments: Fix ICE due to char indexing #7039

phansch commented Apr 6, 2021 •

edited

Loading

flip1995 Apr 6, 2021

phansch Apr 10, 2021

flip1995 Apr 10, 2021

flip1995 commented Apr 6, 2021

flip1995 commented Apr 6, 2021

phansch commented Apr 10, 2021 •

edited

Loading

flip1995 commented Apr 12, 2021

phansch commented Apr 14, 2021

bors commented Apr 14, 2021

bors commented Apr 14, 2021

bors commented Apr 14, 2021

	u32::try_from(char_indices.last().unwrap().0 + 1).expect(line_length_way_to_long),
	u32::try_from(char_indices.len()).expect(line_length_way_to_long),

tabs_in_doc_comments: Fix ICE due to char indexing #7039

tabs_in_doc_comments: Fix ICE due to char indexing #7039

Conversation

phansch commented Apr 6, 2021 • edited Loading

flip1995 Apr 6, 2021

Choose a reason for hiding this comment

phansch Apr 10, 2021

Choose a reason for hiding this comment

flip1995 Apr 10, 2021

Choose a reason for hiding this comment

flip1995 commented Apr 6, 2021

flip1995 commented Apr 6, 2021

phansch commented Apr 10, 2021 • edited Loading

flip1995 commented Apr 12, 2021

phansch commented Apr 14, 2021

bors commented Apr 14, 2021

bors commented Apr 14, 2021

bors commented Apr 14, 2021

phansch commented Apr 6, 2021 •

edited

Loading

phansch commented Apr 10, 2021 •

edited

Loading