Always use identifier ranges to store bindings (#5110)

## Summary At present, when we store a binding, we include a `TextRange` alongside it. The `TextRange` _sometimes_ matches the exact range of the identifier to which the `Binding` is linked, but... not always. For example, given: ```python x = 1 ``` The binding we create _will_ use the range of `x`, because the left-hand side is an `Expr::Name`, which has a valid range on it. However, given: ```python try: pass except ValueError as e: pass ``` When we create a binding for `e`, we don't have a `TextRange`... The AST doesn't give us one. So we end up extracting it via lexing. This PR extends that pattern to the rest of the binding kinds, to ensure that whenever we create a binding, we always use the range of the bound name. This leads to better diagnostics in cases like pattern matching, whereby the diagnostic for "unused variable `x`" here used to include `*x`, instead of just `x`: ```python def f(provided: int) -> int: match provided: case [_, *x]: pass ``` This is _also_ required for symbol renames, since we track writes as bindings -- so we need to know the ranges of the bound symbols. By storing these bindings precisely, we can also remove the `binding.trimmed_range` abstraction -- since bindings already use the "trimmed range". To implement this behavior, I took some of our existing utilities (like the code we had for `except ValueError as e` above), migrated them from a full lexer to a zero-allocation lexer that _only_ identifies "identifiers", and moved the behavior into a trait, so we can now do `stmt.identifier(locator)` to get the range for the identifier. Honestly, we might end up discarding much of this if we decide to put ranges on all identifiers (astral-sh/RustPython-Parser#8). But even if we do, this will _still_ be a good change, because the lexer introduced here is useful beyond names (e.g., we use it find the `except` keyword in an exception handler, to find the `else` after a `for` loop, and so on). So, I'm fine committing this even if we end up changing our minds about the right approach. Closes #5090. ## Benchmarks No significant change, with one statistically significant improvement (-2.1654% on `linter/all-rules/large/dataset.py`): ``` linter/default-rules/numpy/globals.py time: [73.922 µs 73.955 µs 73.986 µs] thrpt: [39.882 MiB/s 39.898 MiB/s 39.916 MiB/s] change: time: [-0.5579% -0.4732% -0.3980%] (p = 0.00 < 0.05) thrpt: [+0.3996% +0.4755% +0.5611%] Change within noise threshold. Found 6 outliers among 100 measurements (6.00%) 4 (4.00%) low severe 1 (1.00%) low mild 1 (1.00%) high mild linter/default-rules/pydantic/types.py time: [1.4909 ms 1.4917 ms 1.4926 ms] thrpt: [17.087 MiB/s 17.096 MiB/s 17.106 MiB/s] change: time: [+0.2140% +0.2741% +0.3392%] (p = 0.00 < 0.05) thrpt: [-0.3380% -0.2734% -0.2136%] Change within noise threshold. Found 4 outliers among 100 measurements (4.00%) 3 (3.00%) high mild 1 (1.00%) high severe linter/default-rules/numpy/ctypeslib.py time: [688.97 µs 691.34 µs 694.15 µs] thrpt: [23.988 MiB/s 24.085 MiB/s 24.168 MiB/s] change: time: [-1.3282% -0.7298% -0.1466%] (p = 0.02 < 0.05) thrpt: [+0.1468% +0.7351% +1.3461%] Change within noise threshold. Found 15 outliers among 100 measurements (15.00%) 1 (1.00%) low mild 2 (2.00%) high mild 12 (12.00%) high severe linter/default-rules/large/dataset.py time: [3.3872 ms 3.4032 ms 3.4191 ms] thrpt: [11.899 MiB/s 11.954 MiB/s 12.011 MiB/s] change: time: [-0.6427% -0.2635% +0.0906%] (p = 0.17 > 0.05) thrpt: [-0.0905% +0.2642% +0.6469%] No change in performance detected. Found 20 outliers among 100 measurements (20.00%) 1 (1.00%) low severe 2 (2.00%) low mild 4 (4.00%) high mild 13 (13.00%) high severe linter/all-rules/numpy/globals.py time: [148.99 µs 149.21 µs 149.42 µs] thrpt: [19.748 MiB/s 19.776 MiB/s 19.805 MiB/s] change: time: [-0.7340% -0.5068% -0.2778%] (p = 0.00 < 0.05) thrpt: [+0.2785% +0.5094% +0.7395%] Change within noise threshold. Found 2 outliers among 100 measurements (2.00%) 1 (1.00%) low mild 1 (1.00%) high severe linter/all-rules/pydantic/types.py time: [3.0362 ms 3.0396 ms 3.0441 ms] thrpt: [8.3779 MiB/s 8.3903 MiB/s 8.3997 MiB/s] change: time: [-0.0957% +0.0618% +0.2125%] (p = 0.45 > 0.05) thrpt: [-0.2121% -0.0618% +0.0958%] No change in performance detected. Found 11 outliers among 100 measurements (11.00%) 1 (1.00%) low severe 3 (3.00%) low mild 5 (5.00%) high mild 2 (2.00%) high severe linter/all-rules/numpy/ctypeslib.py time: [1.6879 ms 1.6894 ms 1.6909 ms] thrpt: [9.8478 MiB/s 9.8562 MiB/s 9.8652 MiB/s] change: time: [-0.2279% -0.0888% +0.0436%] (p = 0.18 > 0.05) thrpt: [-0.0435% +0.0889% +0.2284%] No change in performance detected. Found 5 outliers among 100 measurements (5.00%) 4 (4.00%) low mild 1 (1.00%) high severe linter/all-rules/large/dataset.py time: [7.1520 ms 7.1586 ms 7.1654 ms] thrpt: [5.6777 MiB/s 5.6831 MiB/s 5.6883 MiB/s] change: time: [-2.5626% -2.1654% -1.7780%] (p = 0.00 < 0.05) thrpt: [+1.8102% +2.2133% +2.6300%] Performance has improved. Found 2 outliers among 100 measurements (2.00%) 1 (1.00%) low mild 1 (1.00%) high mild ```
astral-sh · Jun 15, 2023 · 5ea3e42 · 5ea3e42
1 parent 66089e1
commit 5ea3e42
Show file tree

Hide file tree

Showing 58 changed files with 1,001 additions and 576 deletions.
diff --git a/crates/ruff/resources/test/fixtures/pyflakes/F841_3.py b/crates/ruff/resources/test/fixtures/pyflakes/F841_3.py
@@ -108,3 +108,33 @@ def f():
 
 def f():
     toplevel = tt = 1
+
+
+def f(provided: int) -> int:
+    match provided:
+        case [_, *x]:
+            pass
+
+
+def f(provided: int) -> int:
+    match provided:
+        case x:
+            pass
+
+
+def f(provided: int) -> int:
+    match provided:
+        case Foo(bar) as x:
+            pass
+
+
+def f(provided: int) -> int:
+    match provided:
+        case {"foo": 0, **x}:
+            pass
+
+
+def f(provided: int) -> int:
+    match provided:
+        case {**x}:
+            pass
diff --git a/crates/ruff/src/checkers/ast/mod.rs b/crates/ruff/src/checkers/ast/mod.rs
@@ -12,12 +12,13 @@ use rustpython_parser::ast::{
 use ruff_diagnostics::{Diagnostic, Fix, IsolationLevel};
 use ruff_python_ast::all::{extract_all_names, AllNamesFlags};
 use ruff_python_ast::helpers::{extract_handled_exceptions, to_module_path};
+use ruff_python_ast::identifier::{Identifier, TryIdentifier};
 use ruff_python_ast::source_code::{Generator, Indexer, Locator, Quote, Stylist};
 use ruff_python_ast::str::trailing_quote;
 use ruff_python_ast::types::Node;
 use ruff_python_ast::typing::{parse_type_annotation, AnnotationKind};
 use ruff_python_ast::visitor::{walk_excepthandler, walk_pattern, Visitor};
-use ruff_python_ast::{cast, helpers, str, visitor};
+use ruff_python_ast::{cast, helpers, identifier, str, visitor};
 use ruff_python_semantic::analyze::{branch_detection, typing, visibility};
 use ruff_python_semantic::{
     Binding, BindingFlags, BindingId, BindingKind, ContextualizedDefinition, Exceptions,
@@ -250,7 +251,7 @@ where
         // Pre-visit.
         match stmt {
             Stmt::Global(ast::StmtGlobal { names, range: _ }) => {
-                let ranges: Vec<TextRange> = helpers::find_names(stmt, self.locator).collect();
+                let ranges: Vec<TextRange> = identifier::names(stmt, self.locator).collect();
                 if !self.semantic.scope_id.is_global() {
                     for (name, range) in names.iter().zip(ranges.iter()) {
                         // Add a binding to the current scope.
@@ -271,7 +272,7 @@ where
                 }
             }
             Stmt::Nonlocal(ast::StmtNonlocal { names, range: _ }) => {
-                let ranges: Vec<TextRange> = helpers::find_names(stmt, self.locator).collect();
+                let ranges: Vec<TextRange> = identifier::names(stmt, self.locator).collect();
                 if !self.semantic.scope_id.is_global() {
                     for (name, range) in names.iter().zip(ranges.iter()) {
                         // Add a binding to the current scope.
@@ -367,7 +368,7 @@ where
                 if self.enabled(Rule::AmbiguousFunctionName) {
                     if let Some(diagnostic) =
                         pycodestyle::rules::ambiguous_function_name(name, || {
-                            helpers::identifier_range(stmt, self.locator)
+                            stmt.identifier(self.locator)
                         })
                     {
                         self.diagnostics.push(diagnostic);
@@ -692,7 +693,7 @@ where
                 }
                 if self.enabled(Rule::AmbiguousClassName) {
                     if let Some(diagnostic) = pycodestyle::rules::ambiguous_class_name(name, || {
-                        helpers::identifier_range(stmt, self.locator)
+                        stmt.identifier(self.locator)
                     }) {
                         self.diagnostics.push(diagnostic);
                     }
@@ -808,7 +809,7 @@ where
                         let name = alias.asname.as_ref().unwrap_or(&alias.name);
                         self.add_binding(
                             name,
-                            alias.range(),
+                            alias.identifier(self.locator),
                             BindingKind::FutureImportation,
                             BindingFlags::empty(),
                         );
@@ -828,7 +829,7 @@ where
                         let qualified_name = &alias.name;
                         self.add_binding(
                             name,
-                            alias.range(),
+                            alias.identifier(self.locator),
                             BindingKind::SubmoduleImportation(SubmoduleImportation {
                                 qualified_name,
                             }),
@@ -839,7 +840,7 @@ where
                         let qualified_name = &alias.name;
                         self.add_binding(
                             name,
-                            alias.range(),
+                            alias.identifier(self.locator),
                             BindingKind::Importation(Importation { qualified_name }),
                             if alias
                                 .asname
@@ -1084,7 +1085,7 @@ where
 
                         self.add_binding(
                             name,
-                            alias.range(),
+                            alias.identifier(self.locator),
                             BindingKind::FutureImportation,
                             BindingFlags::empty(),
                         );
@@ -1145,7 +1146,7 @@ where
                             helpers::format_import_from_member(level, module, &alias.name);
                         self.add_binding(
                             name,
-                            alias.range(),
+                            alias.identifier(self.locator),
                             BindingKind::FromImportation(FromImportation { qualified_name }),
                             if alias
                                 .asname
@@ -1871,7 +1872,7 @@ where
 
                 self.add_binding(
                     name,
-                    stmt.range(),
+                    stmt.identifier(self.locator),
                     BindingKind::FunctionDefinition,
                     BindingFlags::empty(),
                 );
@@ -2094,7 +2095,7 @@ where
                 self.semantic.pop_definition();
                 self.add_binding(
                     name,
-                    stmt.range(),
+                    stmt.identifier(self.locator),
                     BindingKind::ClassDefinition,
                     BindingFlags::empty(),
                 );
@@ -3902,8 +3903,7 @@ where
                 }
                 match name {
                     Some(name) => {
-                        let range = helpers::excepthandler_name_range(excepthandler, self.locator)
-                            .expect("Failed to find `name` range");
+                        let range = excepthandler.try_identifier(self.locator).unwrap();
 
                         if self.enabled(Rule::AmbiguousVariableName) {
                             if let Some(diagnostic) =
@@ -4019,7 +4019,7 @@ where
         // upstream.
         self.add_binding(
             &arg.arg,
-            arg.range(),
+            arg.identifier(self.locator),
             BindingKind::Argument,
             BindingFlags::empty(),
         );
@@ -4059,7 +4059,7 @@ where
         {
             self.add_binding(
                 name,
-                pattern.range(),
+                pattern.try_identifier(self.locator).unwrap(),
                 BindingKind::Assignment,
                 BindingFlags::empty(),
             );
@@ -4268,16 +4268,14 @@ impl<'a> Checker<'a> {
                     {
                         if self.enabled(Rule::RedefinedWhileUnused) {
                             #[allow(deprecated)]
-                            let line = self.locator.compute_line_index(
-                                shadowed.trimmed_range(&self.semantic, self.locator).start(),
-                            );
+                            let line = self.locator.compute_line_index(shadowed.range.start());
 
                             let mut diagnostic = Diagnostic::new(
                                 pyflakes::rules::RedefinedWhileUnused {
                                     name: name.to_string(),
                                     line,
                                 },
-                                binding.trimmed_range(&self.semantic, self.locator),
+                                binding.range,
                             );
                             if let Some(range) = binding.parent_range(&self.semantic) {
                                 diagnostic.set_parent(range.start());
@@ -4890,17 +4888,15 @@ impl<'a> Checker<'a> {
                         }
 
                         #[allow(deprecated)]
-                        let line = self.locator.compute_line_index(
-                            shadowed.trimmed_range(&self.semantic, self.locator).start(),
-                        );
+                        let line = self.locator.compute_line_index(shadowed.range.start());
 
                         let binding = self.semantic.binding(binding_id);
                         let mut diagnostic = Diagnostic::new(
                             pyflakes::rules::RedefinedWhileUnused {
                                 name: (*name).to_string(),
                                 line,
                             },
-                            binding.trimmed_range(&self.semantic, self.locator),
+                            binding.range,
                         );
                         if let Some(range) = binding.parent_range(&self.semantic) {
                             diagnostic.set_parent(range.start());

diff --git a/crates/ruff/src/rules/flake8_annotations/rules/definition.rs b/crates/ruff/src/rules/flake8_annotations/rules/definition.rs
@@ -2,9 +2,10 @@ use rustpython_parser::ast::{Expr, Ranged, Stmt};
 
 use ruff_diagnostics::{AlwaysAutofixableViolation, Diagnostic, Violation};
 use ruff_macros::{derive_message_formats, violation};
+use ruff_python_ast::cast;
 use ruff_python_ast::helpers::ReturnStatementVisitor;
+use ruff_python_ast::identifier::Identifier;
 use ruff_python_ast::statement_visitor::StatementVisitor;
-use ruff_python_ast::{cast, helpers};
 use ruff_python_semantic::analyze::visibility;
 use ruff_python_semantic::{Definition, Member, MemberKind, SemanticModel};
 use ruff_python_stdlib::typing::SIMPLE_MAGIC_RETURN_TYPES;
@@ -640,7 +641,7 @@ pub(crate) fn definition(
                     MissingReturnTypeClassMethod {
                         name: name.to_string(),
                     },
-                    helpers::identifier_range(stmt, checker.locator),
+                    stmt.identifier(checker.locator),
                 ));
             }
         } else if is_method
@@ -651,7 +652,7 @@ pub(crate) fn definition(
                     MissingReturnTypeStaticMethod {
                         name: name.to_string(),
                     },
-                    helpers::identifier_range(stmt, checker.locator),
+                    stmt.identifier(checker.locator),
                 ));
             }
         } else if is_method && visibility::is_init(name) {
@@ -663,7 +664,7 @@ pub(crate) fn definition(
                         MissingReturnTypeSpecialMethod {
                             name: name.to_string(),
                         },
-                        helpers::identifier_range(stmt, checker.locator),
+                        stmt.identifier(checker.locator),
                     );
                     if checker.patch(diagnostic.kind.rule()) {
                         #[allow(deprecated)]
@@ -680,7 +681,7 @@ pub(crate) fn definition(
                     MissingReturnTypeSpecialMethod {
                         name: name.to_string(),
                     },
-                    helpers::identifier_range(stmt, checker.locator),
+                    stmt.identifier(checker.locator),
                 );
                 let return_type = SIMPLE_MAGIC_RETURN_TYPES.get(name);
                 if let Some(return_type) = return_type {
@@ -701,7 +702,7 @@ pub(crate) fn definition(
                             MissingReturnTypeUndocumentedPublicFunction {
                                 name: name.to_string(),
                             },
-                            helpers::identifier_range(stmt, checker.locator),
+                            stmt.identifier(checker.locator),
                         ));
                     }
                 }
@@ -711,7 +712,7 @@ pub(crate) fn definition(
                             MissingReturnTypePrivateFunction {
                                 name: name.to_string(),
                             },
-                            helpers::identifier_range(stmt, checker.locator),
+                            stmt.identifier(checker.locator),
                         ));
                     }
                 }

diff --git a/crates/ruff/src/rules/flake8_bugbear/rules/abstract_base_class.rs b/crates/ruff/src/rules/flake8_bugbear/rules/abstract_base_class.rs
@@ -2,7 +2,7 @@ use rustpython_parser::ast::{self, Constant, Expr, Keyword, Ranged, Stmt};
 
 use ruff_diagnostics::{Diagnostic, Violation};
 use ruff_macros::{derive_message_formats, violation};
-use ruff_python_ast::helpers::identifier_range;
+use ruff_python_ast::identifier::Identifier;
 use ruff_python_semantic::analyze::visibility::{is_abstract, is_overload};
 use ruff_python_semantic::SemanticModel;
 
@@ -134,7 +134,7 @@ pub(crate) fn abstract_base_class(
                 AbstractBaseClassWithoutAbstractMethod {
                     name: name.to_string(),
                 },
-                identifier_range(stmt, checker.locator),
+                stmt.identifier(checker.locator),
             ));
         }
     }

diff --git a/crates/ruff/src/rules/flake8_bugbear/rules/f_string_docstring.rs b/crates/ruff/src/rules/flake8_bugbear/rules/f_string_docstring.rs
@@ -2,7 +2,7 @@ use rustpython_parser::ast::{self, Expr, Stmt};
 
 use ruff_diagnostics::{Diagnostic, Violation};
 use ruff_macros::{derive_message_formats, violation};
-use ruff_python_ast::helpers;
+use ruff_python_ast::identifier::Identifier;
 
 use crate::checkers::ast::Checker;
 
@@ -31,6 +31,6 @@ pub(crate) fn f_string_docstring(checker: &mut Checker, body: &[Stmt]) {
     };
     checker.diagnostics.push(Diagnostic::new(
         FStringDocstring,
-        helpers::identifier_range(stmt, checker.locator),
+        stmt.identifier(checker.locator),
     ));
 }
diff --git a/crates/ruff/src/rules/flake8_builtins/helpers.rs b/crates/ruff/src/rules/flake8_builtins/helpers.rs
@@ -1,7 +1,7 @@
 use ruff_text_size::TextRange;
 use rustpython_parser::ast::{Excepthandler, Expr, Ranged, Stmt};
 
-use ruff_python_ast::helpers::identifier_range;
+use ruff_python_ast::identifier::Identifier;
 use ruff_python_ast::source_code::Locator;
 use ruff_python_stdlib::builtins::BUILTINS;
 
@@ -20,7 +20,7 @@ impl AnyShadowing<'_> {
     pub(crate) fn range(self, locator: &Locator) -> TextRange {
         match self {
             AnyShadowing::Expression(expr) => expr.range(),
-            AnyShadowing::Statement(stmt) => identifier_range(stmt, locator),
+            AnyShadowing::Statement(stmt) => stmt.identifier(locator),
             AnyShadowing::ExceptHandler(handler) => handler.range(),
         }
     }

diff --git a/crates/ruff/src/rules/flake8_pyi/rules/non_self_return_type.rs b/crates/ruff/src/rules/flake8_pyi/rules/non_self_return_type.rs
@@ -2,7 +2,8 @@ use rustpython_parser::ast::{self, Arguments, Decorator, Expr, Stmt};
 
 use ruff_diagnostics::{Diagnostic, Violation};
 use ruff_macros::{derive_message_formats, violation};
-use ruff_python_ast::helpers::{identifier_range, map_subscript};
+use ruff_python_ast::helpers::map_subscript;
+use ruff_python_ast::identifier::Identifier;
 use ruff_python_semantic::analyze::visibility::{is_abstract, is_final, is_overload};
 use ruff_python_semantic::{ScopeKind, SemanticModel};
 
@@ -147,7 +148,7 @@ pub(crate) fn non_self_return_type(
                     class_name: class_def.name.to_string(),
                     method_name: name.to_string(),
                 },
-                identifier_range(stmt, checker.locator),
+                stmt.identifier(checker.locator),
             ));
         }
         return;
@@ -161,7 +162,7 @@ pub(crate) fn non_self_return_type(
                     class_name: class_def.name.to_string(),
                     method_name: name.to_string(),
                 },
-                identifier_range(stmt, checker.locator),
+                stmt.identifier(checker.locator),
             ));
         }
         return;
@@ -176,7 +177,7 @@ pub(crate) fn non_self_return_type(
                     class_name: class_def.name.to_string(),
                     method_name: name.to_string(),
                 },
-                identifier_range(stmt, checker.locator),
+                stmt.identifier(checker.locator),
             ));
         }
         return;
@@ -192,7 +193,7 @@ pub(crate) fn non_self_return_type(
                         class_name: class_def.name.to_string(),
                         method_name: name.to_string(),
                     },
-                    identifier_range(stmt, checker.locator),
+                    stmt.identifier(checker.locator),
                 ));
             }
         }
@@ -205,7 +206,7 @@ pub(crate) fn non_self_return_type(
                         class_name: class_def.name.to_string(),
                         method_name: name.to_string(),
                     },
-                    identifier_range(stmt, checker.locator),
+                    stmt.identifier(checker.locator),
                 ));
             }
         }