Apply NFKC normalization to unicode identifiers when storing bindings in the semantic model #10381

AlexWaygood · 2024-03-13T12:20:15Z

Summary

Python applies NFKC normalization to identifiers that use unicode characters. That means that F821 should not be emitted if ruff encounters the following snippet (but on main, it is), as from Python's perspective, these are all the same identifier:

𝒞 = 500
print(𝒞)
print(C + 𝒞)  # ruff says `C` isn't defined
print(C / 𝒞)
print(C == 𝑪 == 𝒞 == 𝓒 == 𝕮)  # ruff says `C`, `𝑪`, ... isn't defined

I fixed this false positive by changing the bindings field in ruff_python_semantic/scope.rs so that identifiers are always unicode-normalized according to NFKC before being stored in the hashmap. An alternative approach I played around with was to unicode-normalize identifiers in the AST itself. However, this would have had undesirable consequences: the formatter would have started eagerly normalizing unicode characters in identifiers when it reformatted a Python file.

Test Plan

cargo test

codspeed-hq · 2024-03-13T12:26:46Z

CodSpeed Performance Report

Merging #10381 will degrade performances by 5.17%

_{Comparing AlexWaygood:unicode-normalize-identifiers (0385a62) with main (e944c16)}

Summary

❌ 1 regressions
✅ 29 untouched benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

	Benchmark	`main`	`AlexWaygood:unicode-normalize-identifiers`	Change
❌	`linter/default-rules[numpy/globals.py]`	635 µs	669.6 µs	-5.17%

AlexWaygood · 2024-03-13T12:27:44Z

Ouch. I guess I need to start running benchmarks before filing PRs :)

github-actions · 2024-03-13T13:03:41Z

`ruff-ecosystem` results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

Formatter (stable)

✅ ecosystem check detected no format changes.

Formatter (preview)

✅ ecosystem check detected no format changes.

… in the semantic model

This reverts commit d79042e.

AlexWaygood · 2024-03-14T17:42:38Z

Closing in favour of #10412

AlexWaygood marked this pull request as draft March 13, 2024 12:27

AlexWaygood force-pushed the unicode-normalize-identifiers branch from 1915fd2 to 6749d86 Compare March 13, 2024 12:45

AlexWaygood force-pushed the unicode-normalize-identifiers branch from ddc1d3a to d79042e Compare March 14, 2024 12:36

AlexWaygood added 6 commits March 14, 2024 12:53

Apply NFKC normalization to unicode identifiers when storing bindings…

334c024

… in the semantic model

Only apply unicode normalization when the string isn't ASCII

f500b62

.

eb9559c

.

2ed9638

see if this has any impact

e456050

Revert "see if this has any impact" (it made things worse)

0385a62

This reverts commit d79042e.

AlexWaygood force-pushed the unicode-normalize-identifiers branch from 7e55231 to 0385a62 Compare March 14, 2024 12:53

AlexWaygood closed this Mar 14, 2024

AlexWaygood deleted the unicode-normalize-identifiers branch March 14, 2024 17:42

AlexWaygood mentioned this pull request Mar 14, 2024

Apply NFKC normalization to unicode identifiers in the lexer #10412

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Apply NFKC normalization to unicode identifiers when storing bindings in the semantic model #10381

Apply NFKC normalization to unicode identifiers when storing bindings in the semantic model #10381

AlexWaygood commented Mar 13, 2024

codspeed-hq bot commented Mar 13, 2024 •

edited

Loading

AlexWaygood commented Mar 13, 2024

github-actions bot commented Mar 13, 2024 •

edited

Loading

AlexWaygood commented Mar 14, 2024

Apply NFKC normalization to unicode identifiers when storing bindings in the semantic model #10381

Apply NFKC normalization to unicode identifiers when storing bindings in the semantic model #10381

Conversation

AlexWaygood commented Mar 13, 2024

Summary

Test Plan

codspeed-hq bot commented Mar 13, 2024 • edited Loading

CodSpeed Performance Report

Merging #10381 will degrade performances by 5.17%

Summary

Benchmarks breakdown

AlexWaygood commented Mar 13, 2024

github-actions bot commented Mar 13, 2024 • edited Loading

ruff-ecosystem results

Linter (stable)

Linter (preview)

Formatter (stable)

Formatter (preview)

AlexWaygood commented Mar 14, 2024

codspeed-hq bot commented Mar 13, 2024 •

edited

Loading

github-actions bot commented Mar 13, 2024 •

edited

Loading

`ruff-ecosystem` results