Skip to content

Commit

Permalink
Always use identifier ranges to store bindings (#5110)
Browse files Browse the repository at this point in the history
## Summary

At present, when we store a binding, we include a `TextRange` alongside
it. The `TextRange` _sometimes_ matches the exact range of the
identifier to which the `Binding` is linked, but... not always.

For example, given:

```python
x = 1
```

The binding we create _will_ use the range of `x`, because the left-hand
side is an `Expr::Name`, which has a valid range on it.

However, given:

```python
try:
  pass
except ValueError as e:
  pass
```

When we create a binding for `e`, we don't have a `TextRange`... The AST
doesn't give us one. So we end up extracting it via lexing.

This PR extends that pattern to the rest of the binding kinds, to ensure
that whenever we create a binding, we always use the range of the bound
name. This leads to better diagnostics in cases like pattern matching,
whereby the diagnostic for "unused variable `x`" here used to include
`*x`, instead of just `x`:

```python
def f(provided: int) -> int:
    match provided:
        case [_, *x]:
            pass
```

This is _also_ required for symbol renames, since we track writes as
bindings -- so we need to know the ranges of the bound symbols.

By storing these bindings precisely, we can also remove the
`binding.trimmed_range` abstraction -- since bindings already use the
"trimmed range".

To implement this behavior, I took some of our existing utilities (like
the code we had for `except ValueError as e` above), migrated them from
a full lexer to a zero-allocation lexer that _only_ identifies
"identifiers", and moved the behavior into a trait, so we can now do
`stmt.identifier(locator)` to get the range for the identifier.

Honestly, we might end up discarding much of this if we decide to put
ranges on all identifiers
(astral-sh/RustPython-Parser#8). But even if we
do, this will _still_ be a good change, because the lexer introduced
here is useful beyond names (e.g., we use it find the `except` keyword
in an exception handler, to find the `else` after a `for` loop, and so
on). So, I'm fine committing this even if we end up changing our minds
about the right approach.

Closes #5090.

## Benchmarks

No significant change, with one statistically significant improvement
(-2.1654% on `linter/all-rules/large/dataset.py`):

```
linter/default-rules/numpy/globals.py
                        time:   [73.922 µs 73.955 µs 73.986 µs]
                        thrpt:  [39.882 MiB/s 39.898 MiB/s 39.916 MiB/s]
                 change:
                        time:   [-0.5579% -0.4732% -0.3980%] (p = 0.00 < 0.05)
                        thrpt:  [+0.3996% +0.4755% +0.5611%]
                        Change within noise threshold.
Found 6 outliers among 100 measurements (6.00%)
  4 (4.00%) low severe
  1 (1.00%) low mild
  1 (1.00%) high mild
linter/default-rules/pydantic/types.py
                        time:   [1.4909 ms 1.4917 ms 1.4926 ms]
                        thrpt:  [17.087 MiB/s 17.096 MiB/s 17.106 MiB/s]
                 change:
                        time:   [+0.2140% +0.2741% +0.3392%] (p = 0.00 < 0.05)
                        thrpt:  [-0.3380% -0.2734% -0.2136%]
                        Change within noise threshold.
Found 4 outliers among 100 measurements (4.00%)
  3 (3.00%) high mild
  1 (1.00%) high severe
linter/default-rules/numpy/ctypeslib.py
                        time:   [688.97 µs 691.34 µs 694.15 µs]
                        thrpt:  [23.988 MiB/s 24.085 MiB/s 24.168 MiB/s]
                 change:
                        time:   [-1.3282% -0.7298% -0.1466%] (p = 0.02 < 0.05)
                        thrpt:  [+0.1468% +0.7351% +1.3461%]
                        Change within noise threshold.
Found 15 outliers among 100 measurements (15.00%)
  1 (1.00%) low mild
  2 (2.00%) high mild
  12 (12.00%) high severe
linter/default-rules/large/dataset.py
                        time:   [3.3872 ms 3.4032 ms 3.4191 ms]
                        thrpt:  [11.899 MiB/s 11.954 MiB/s 12.011 MiB/s]
                 change:
                        time:   [-0.6427% -0.2635% +0.0906%] (p = 0.17 > 0.05)
                        thrpt:  [-0.0905% +0.2642% +0.6469%]
                        No change in performance detected.
Found 20 outliers among 100 measurements (20.00%)
  1 (1.00%) low severe
  2 (2.00%) low mild
  4 (4.00%) high mild
  13 (13.00%) high severe

linter/all-rules/numpy/globals.py
                        time:   [148.99 µs 149.21 µs 149.42 µs]
                        thrpt:  [19.748 MiB/s 19.776 MiB/s 19.805 MiB/s]
                 change:
                        time:   [-0.7340% -0.5068% -0.2778%] (p = 0.00 < 0.05)
                        thrpt:  [+0.2785% +0.5094% +0.7395%]
                        Change within noise threshold.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) low mild
  1 (1.00%) high severe
linter/all-rules/pydantic/types.py
                        time:   [3.0362 ms 3.0396 ms 3.0441 ms]
                        thrpt:  [8.3779 MiB/s 8.3903 MiB/s 8.3997 MiB/s]
                 change:
                        time:   [-0.0957% +0.0618% +0.2125%] (p = 0.45 > 0.05)
                        thrpt:  [-0.2121% -0.0618% +0.0958%]
                        No change in performance detected.
Found 11 outliers among 100 measurements (11.00%)
  1 (1.00%) low severe
  3 (3.00%) low mild
  5 (5.00%) high mild
  2 (2.00%) high severe
linter/all-rules/numpy/ctypeslib.py
                        time:   [1.6879 ms 1.6894 ms 1.6909 ms]
                        thrpt:  [9.8478 MiB/s 9.8562 MiB/s 9.8652 MiB/s]
                 change:
                        time:   [-0.2279% -0.0888% +0.0436%] (p = 0.18 > 0.05)
                        thrpt:  [-0.0435% +0.0889% +0.2284%]
                        No change in performance detected.
Found 5 outliers among 100 measurements (5.00%)
  4 (4.00%) low mild
  1 (1.00%) high severe
linter/all-rules/large/dataset.py
                        time:   [7.1520 ms 7.1586 ms 7.1654 ms]
                        thrpt:  [5.6777 MiB/s 5.6831 MiB/s 5.6883 MiB/s]
                 change:
                        time:   [-2.5626% -2.1654% -1.7780%] (p = 0.00 < 0.05)
                        thrpt:  [+1.8102% +2.2133% +2.6300%]
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) low mild
  1 (1.00%) high mild
```
  • Loading branch information
charliermarsh authored Jun 15, 2023
1 parent 66089e1 commit 5ea3e42
Show file tree
Hide file tree
Showing 58 changed files with 1,001 additions and 576 deletions.
30 changes: 30 additions & 0 deletions crates/ruff/resources/test/fixtures/pyflakes/F841_3.py
Original file line number Diff line number Diff line change
Expand Up @@ -108,3 +108,33 @@ def f():

def f():
toplevel = tt = 1


def f(provided: int) -> int:
match provided:
case [_, *x]:
pass


def f(provided: int) -> int:
match provided:
case x:
pass


def f(provided: int) -> int:
match provided:
case Foo(bar) as x:
pass


def f(provided: int) -> int:
match provided:
case {"foo": 0, **x}:
pass


def f(provided: int) -> int:
match provided:
case {**x}:
pass
44 changes: 20 additions & 24 deletions crates/ruff/src/checkers/ast/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -12,12 +12,13 @@ use rustpython_parser::ast::{
use ruff_diagnostics::{Diagnostic, Fix, IsolationLevel};
use ruff_python_ast::all::{extract_all_names, AllNamesFlags};
use ruff_python_ast::helpers::{extract_handled_exceptions, to_module_path};
use ruff_python_ast::identifier::{Identifier, TryIdentifier};
use ruff_python_ast::source_code::{Generator, Indexer, Locator, Quote, Stylist};
use ruff_python_ast::str::trailing_quote;
use ruff_python_ast::types::Node;
use ruff_python_ast::typing::{parse_type_annotation, AnnotationKind};
use ruff_python_ast::visitor::{walk_excepthandler, walk_pattern, Visitor};
use ruff_python_ast::{cast, helpers, str, visitor};
use ruff_python_ast::{cast, helpers, identifier, str, visitor};
use ruff_python_semantic::analyze::{branch_detection, typing, visibility};
use ruff_python_semantic::{
Binding, BindingFlags, BindingId, BindingKind, ContextualizedDefinition, Exceptions,
Expand Down Expand Up @@ -250,7 +251,7 @@ where
// Pre-visit.
match stmt {
Stmt::Global(ast::StmtGlobal { names, range: _ }) => {
let ranges: Vec<TextRange> = helpers::find_names(stmt, self.locator).collect();
let ranges: Vec<TextRange> = identifier::names(stmt, self.locator).collect();
if !self.semantic.scope_id.is_global() {
for (name, range) in names.iter().zip(ranges.iter()) {
// Add a binding to the current scope.
Expand All @@ -271,7 +272,7 @@ where
}
}
Stmt::Nonlocal(ast::StmtNonlocal { names, range: _ }) => {
let ranges: Vec<TextRange> = helpers::find_names(stmt, self.locator).collect();
let ranges: Vec<TextRange> = identifier::names(stmt, self.locator).collect();
if !self.semantic.scope_id.is_global() {
for (name, range) in names.iter().zip(ranges.iter()) {
// Add a binding to the current scope.
Expand Down Expand Up @@ -367,7 +368,7 @@ where
if self.enabled(Rule::AmbiguousFunctionName) {
if let Some(diagnostic) =
pycodestyle::rules::ambiguous_function_name(name, || {
helpers::identifier_range(stmt, self.locator)
stmt.identifier(self.locator)
})
{
self.diagnostics.push(diagnostic);
Expand Down Expand Up @@ -692,7 +693,7 @@ where
}
if self.enabled(Rule::AmbiguousClassName) {
if let Some(diagnostic) = pycodestyle::rules::ambiguous_class_name(name, || {
helpers::identifier_range(stmt, self.locator)
stmt.identifier(self.locator)
}) {
self.diagnostics.push(diagnostic);
}
Expand Down Expand Up @@ -808,7 +809,7 @@ where
let name = alias.asname.as_ref().unwrap_or(&alias.name);
self.add_binding(
name,
alias.range(),
alias.identifier(self.locator),
BindingKind::FutureImportation,
BindingFlags::empty(),
);
Expand All @@ -828,7 +829,7 @@ where
let qualified_name = &alias.name;
self.add_binding(
name,
alias.range(),
alias.identifier(self.locator),
BindingKind::SubmoduleImportation(SubmoduleImportation {
qualified_name,
}),
Expand All @@ -839,7 +840,7 @@ where
let qualified_name = &alias.name;
self.add_binding(
name,
alias.range(),
alias.identifier(self.locator),
BindingKind::Importation(Importation { qualified_name }),
if alias
.asname
Expand Down Expand Up @@ -1084,7 +1085,7 @@ where

self.add_binding(
name,
alias.range(),
alias.identifier(self.locator),
BindingKind::FutureImportation,
BindingFlags::empty(),
);
Expand Down Expand Up @@ -1145,7 +1146,7 @@ where
helpers::format_import_from_member(level, module, &alias.name);
self.add_binding(
name,
alias.range(),
alias.identifier(self.locator),
BindingKind::FromImportation(FromImportation { qualified_name }),
if alias
.asname
Expand Down Expand Up @@ -1871,7 +1872,7 @@ where

self.add_binding(
name,
stmt.range(),
stmt.identifier(self.locator),
BindingKind::FunctionDefinition,
BindingFlags::empty(),
);
Expand Down Expand Up @@ -2094,7 +2095,7 @@ where
self.semantic.pop_definition();
self.add_binding(
name,
stmt.range(),
stmt.identifier(self.locator),
BindingKind::ClassDefinition,
BindingFlags::empty(),
);
Expand Down Expand Up @@ -3902,8 +3903,7 @@ where
}
match name {
Some(name) => {
let range = helpers::excepthandler_name_range(excepthandler, self.locator)
.expect("Failed to find `name` range");
let range = excepthandler.try_identifier(self.locator).unwrap();

if self.enabled(Rule::AmbiguousVariableName) {
if let Some(diagnostic) =
Expand Down Expand Up @@ -4019,7 +4019,7 @@ where
// upstream.
self.add_binding(
&arg.arg,
arg.range(),
arg.identifier(self.locator),
BindingKind::Argument,
BindingFlags::empty(),
);
Expand Down Expand Up @@ -4059,7 +4059,7 @@ where
{
self.add_binding(
name,
pattern.range(),
pattern.try_identifier(self.locator).unwrap(),
BindingKind::Assignment,
BindingFlags::empty(),
);
Expand Down Expand Up @@ -4268,16 +4268,14 @@ impl<'a> Checker<'a> {
{
if self.enabled(Rule::RedefinedWhileUnused) {
#[allow(deprecated)]
let line = self.locator.compute_line_index(
shadowed.trimmed_range(&self.semantic, self.locator).start(),
);
let line = self.locator.compute_line_index(shadowed.range.start());

let mut diagnostic = Diagnostic::new(
pyflakes::rules::RedefinedWhileUnused {
name: name.to_string(),
line,
},
binding.trimmed_range(&self.semantic, self.locator),
binding.range,
);
if let Some(range) = binding.parent_range(&self.semantic) {
diagnostic.set_parent(range.start());
Expand Down Expand Up @@ -4890,17 +4888,15 @@ impl<'a> Checker<'a> {
}

#[allow(deprecated)]
let line = self.locator.compute_line_index(
shadowed.trimmed_range(&self.semantic, self.locator).start(),
);
let line = self.locator.compute_line_index(shadowed.range.start());

let binding = self.semantic.binding(binding_id);
let mut diagnostic = Diagnostic::new(
pyflakes::rules::RedefinedWhileUnused {
name: (*name).to_string(),
line,
},
binding.trimmed_range(&self.semantic, self.locator),
binding.range,
);
if let Some(range) = binding.parent_range(&self.semantic) {
diagnostic.set_parent(range.start());
Expand Down
15 changes: 8 additions & 7 deletions crates/ruff/src/rules/flake8_annotations/rules/definition.rs
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,10 @@ use rustpython_parser::ast::{Expr, Ranged, Stmt};

use ruff_diagnostics::{AlwaysAutofixableViolation, Diagnostic, Violation};
use ruff_macros::{derive_message_formats, violation};
use ruff_python_ast::cast;
use ruff_python_ast::helpers::ReturnStatementVisitor;
use ruff_python_ast::identifier::Identifier;
use ruff_python_ast::statement_visitor::StatementVisitor;
use ruff_python_ast::{cast, helpers};
use ruff_python_semantic::analyze::visibility;
use ruff_python_semantic::{Definition, Member, MemberKind, SemanticModel};
use ruff_python_stdlib::typing::SIMPLE_MAGIC_RETURN_TYPES;
Expand Down Expand Up @@ -640,7 +641,7 @@ pub(crate) fn definition(
MissingReturnTypeClassMethod {
name: name.to_string(),
},
helpers::identifier_range(stmt, checker.locator),
stmt.identifier(checker.locator),
));
}
} else if is_method
Expand All @@ -651,7 +652,7 @@ pub(crate) fn definition(
MissingReturnTypeStaticMethod {
name: name.to_string(),
},
helpers::identifier_range(stmt, checker.locator),
stmt.identifier(checker.locator),
));
}
} else if is_method && visibility::is_init(name) {
Expand All @@ -663,7 +664,7 @@ pub(crate) fn definition(
MissingReturnTypeSpecialMethod {
name: name.to_string(),
},
helpers::identifier_range(stmt, checker.locator),
stmt.identifier(checker.locator),
);
if checker.patch(diagnostic.kind.rule()) {
#[allow(deprecated)]
Expand All @@ -680,7 +681,7 @@ pub(crate) fn definition(
MissingReturnTypeSpecialMethod {
name: name.to_string(),
},
helpers::identifier_range(stmt, checker.locator),
stmt.identifier(checker.locator),
);
let return_type = SIMPLE_MAGIC_RETURN_TYPES.get(name);
if let Some(return_type) = return_type {
Expand All @@ -701,7 +702,7 @@ pub(crate) fn definition(
MissingReturnTypeUndocumentedPublicFunction {
name: name.to_string(),
},
helpers::identifier_range(stmt, checker.locator),
stmt.identifier(checker.locator),
));
}
}
Expand All @@ -711,7 +712,7 @@ pub(crate) fn definition(
MissingReturnTypePrivateFunction {
name: name.to_string(),
},
helpers::identifier_range(stmt, checker.locator),
stmt.identifier(checker.locator),
));
}
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ use rustpython_parser::ast::{self, Constant, Expr, Keyword, Ranged, Stmt};

use ruff_diagnostics::{Diagnostic, Violation};
use ruff_macros::{derive_message_formats, violation};
use ruff_python_ast::helpers::identifier_range;
use ruff_python_ast::identifier::Identifier;
use ruff_python_semantic::analyze::visibility::{is_abstract, is_overload};
use ruff_python_semantic::SemanticModel;

Expand Down Expand Up @@ -134,7 +134,7 @@ pub(crate) fn abstract_base_class(
AbstractBaseClassWithoutAbstractMethod {
name: name.to_string(),
},
identifier_range(stmt, checker.locator),
stmt.identifier(checker.locator),
));
}
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ use rustpython_parser::ast::{self, Expr, Stmt};

use ruff_diagnostics::{Diagnostic, Violation};
use ruff_macros::{derive_message_formats, violation};
use ruff_python_ast::helpers;
use ruff_python_ast::identifier::Identifier;

use crate::checkers::ast::Checker;

Expand Down Expand Up @@ -31,6 +31,6 @@ pub(crate) fn f_string_docstring(checker: &mut Checker, body: &[Stmt]) {
};
checker.diagnostics.push(Diagnostic::new(
FStringDocstring,
helpers::identifier_range(stmt, checker.locator),
stmt.identifier(checker.locator),
));
}
4 changes: 2 additions & 2 deletions crates/ruff/src/rules/flake8_builtins/helpers.rs
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
use ruff_text_size::TextRange;
use rustpython_parser::ast::{Excepthandler, Expr, Ranged, Stmt};

use ruff_python_ast::helpers::identifier_range;
use ruff_python_ast::identifier::Identifier;
use ruff_python_ast::source_code::Locator;
use ruff_python_stdlib::builtins::BUILTINS;

Expand All @@ -20,7 +20,7 @@ impl AnyShadowing<'_> {
pub(crate) fn range(self, locator: &Locator) -> TextRange {
match self {
AnyShadowing::Expression(expr) => expr.range(),
AnyShadowing::Statement(stmt) => identifier_range(stmt, locator),
AnyShadowing::Statement(stmt) => stmt.identifier(locator),
AnyShadowing::ExceptHandler(handler) => handler.range(),
}
}
Expand Down
13 changes: 7 additions & 6 deletions crates/ruff/src/rules/flake8_pyi/rules/non_self_return_type.rs
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,8 @@ use rustpython_parser::ast::{self, Arguments, Decorator, Expr, Stmt};

use ruff_diagnostics::{Diagnostic, Violation};
use ruff_macros::{derive_message_formats, violation};
use ruff_python_ast::helpers::{identifier_range, map_subscript};
use ruff_python_ast::helpers::map_subscript;
use ruff_python_ast::identifier::Identifier;
use ruff_python_semantic::analyze::visibility::{is_abstract, is_final, is_overload};
use ruff_python_semantic::{ScopeKind, SemanticModel};

Expand Down Expand Up @@ -147,7 +148,7 @@ pub(crate) fn non_self_return_type(
class_name: class_def.name.to_string(),
method_name: name.to_string(),
},
identifier_range(stmt, checker.locator),
stmt.identifier(checker.locator),
));
}
return;
Expand All @@ -161,7 +162,7 @@ pub(crate) fn non_self_return_type(
class_name: class_def.name.to_string(),
method_name: name.to_string(),
},
identifier_range(stmt, checker.locator),
stmt.identifier(checker.locator),
));
}
return;
Expand All @@ -176,7 +177,7 @@ pub(crate) fn non_self_return_type(
class_name: class_def.name.to_string(),
method_name: name.to_string(),
},
identifier_range(stmt, checker.locator),
stmt.identifier(checker.locator),
));
}
return;
Expand All @@ -192,7 +193,7 @@ pub(crate) fn non_self_return_type(
class_name: class_def.name.to_string(),
method_name: name.to_string(),
},
identifier_range(stmt, checker.locator),
stmt.identifier(checker.locator),
));
}
}
Expand All @@ -205,7 +206,7 @@ pub(crate) fn non_self_return_type(
class_name: class_def.name.to_string(),
method_name: name.to_string(),
},
identifier_range(stmt, checker.locator),
stmt.identifier(checker.locator),
));
}
}
Expand Down
Loading

0 comments on commit 5ea3e42

Please sign in to comment.