Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Apply NFKC normalization to unicode identifiers when storing bindings in the semantic model #10381

Closed

Conversation

AlexWaygood
Copy link
Member

Summary

Fixes #5003.

Python applies NFKC normalization to identifiers that use unicode characters. That means that F821 should not be emitted if ruff encounters the following snippet (but on main, it is), as from Python's perspective, these are all the same identifier:

𝒞 = 500
print(𝒞)
print(C + 𝒞)  # ruff says `C` isn't defined
print(C / 𝒞)
print(C == 𝑪 == 𝒞 == 𝓒 == 𝕮)  # ruff says `C`, `𝑪`, ... isn't defined

I fixed this false positive by changing the bindings field in ruff_python_semantic/scope.rs so that identifiers are always unicode-normalized according to NFKC before being stored in the hashmap. An alternative approach I played around with was to unicode-normalize identifiers in the AST itself. However, this would have had undesirable consequences: the formatter would have started eagerly normalizing unicode characters in identifiers when it reformatted a Python file.

Test Plan

cargo test

Copy link

codspeed-hq bot commented Mar 13, 2024

CodSpeed Performance Report

Merging #10381 will degrade performances by 5.17%

Comparing AlexWaygood:unicode-normalize-identifiers (0385a62) with main (e944c16)

Summary

❌ 1 regressions
✅ 29 untouched benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

Benchmark main AlexWaygood:unicode-normalize-identifiers Change
linter/default-rules[numpy/globals.py] 635 µs 669.6 µs -5.17%

@AlexWaygood
Copy link
Member Author

Ouch. I guess I need to start running benchmarks before filing PRs :)

@AlexWaygood AlexWaygood marked this pull request as draft March 13, 2024 12:27
@AlexWaygood AlexWaygood force-pushed the unicode-normalize-identifiers branch from 1915fd2 to 6749d86 Compare March 13, 2024 12:45
Copy link
Contributor

github-actions bot commented Mar 13, 2024

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

Formatter (stable)

✅ ecosystem check detected no format changes.

Formatter (preview)

✅ ecosystem check detected no format changes.

@AlexWaygood AlexWaygood force-pushed the unicode-normalize-identifiers branch from ddc1d3a to d79042e Compare March 14, 2024 12:36
@AlexWaygood AlexWaygood force-pushed the unicode-normalize-identifiers branch from 7e55231 to 0385a62 Compare March 14, 2024 12:53
@AlexWaygood
Copy link
Member Author

Closing in favour of #10412

@AlexWaygood AlexWaygood deleted the unicode-normalize-identifiers branch March 14, 2024 17:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

F821 False Positive when using equivalent script characters
1 participant