-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Separate Q003
to accomodate f-string context
#7588
Conversation
bf47707
to
c9cf545
Compare
0f998b7
to
32f1f48
Compare
I hope this needs a rebase xD 16k lines sounds painful to review |
Oh wait, what happened here lol. Let me rebase. Edit: forgot to push the rebased branch. |
587aa71
to
2d9069a
Compare
CodSpeed Performance ReportMerging #7588 will degrade performances by 2.33%Comparing Summary
Benchmarks breakdown
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM although the code is a bit difficult to understand. I would love to get @charliermarsh's opinion on too because I'm not very familiar with the rule
if settings.rules.enabled(Rule::AvoidableEscapedQuote) && settings.flake8_quotes.avoid_escape { | ||
flake8_quotes::rules::avoidable_escaped_quote(&mut diagnostics, tokens, locator, settings); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the reason for moving the rule out of from_tokens
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea was that the escaped quote check needs to be done in the content value only i.e., for f-string it's only the FStringMiddle
tokens.
The f-string context which needs to be tracked for this rule is different than the one for the quote rules where the main difference is how implicitly concatenated strings are handled. For this rule, implicit string concatenation doesn't matter because we only care about the content while for the quote rules it matters.
For quote rules, each sequence needs to be independently built. For nested f-strings, the nested sequence will be different than the outer one and so on. This is the main reason for the separation. It becomes easier to reason about.
fn do_not_check_for_escaped_quote(&mut self) { | ||
self.check_for_escaped_quote = false; | ||
self.fstring_middle_ranges.clear(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would have expected a enter_
...exit_...
function pair How does it start checking for quotes again?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't start again. For example,
f"\"foo\" {'nested'}"
# ^^(1)^^^ ^^^(2)^^
Here, (1) will be checked first and the range will be added to fstring_middle_ranges
. Then we encountered a nested string (2) but our target version doesn't support PEP 701, so we'll stop checking for the current f-string by inverting the flag and removing any previously diagnosed ranges.
/// The range of the f-string start token. | ||
fstring_start_range: TextRange, | ||
/// The ranges of the f-string middle tokens containing escaped quotes. | ||
fstring_middle_ranges: Vec<TextRange>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I understand this correctly, we allocate a fstring_middle_ranges
for each f-string. Can we flatten this datastructure so that it uses a single vector instead of N nested vectors?
From what I understand, it may be sufficient to track the fstring middle ranges for the outermost fstring only (just push the ranges of the inner fstring middles into the same vec)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to associate the edits for the f-string it belongs to. The context helps in preserving that information. So, once we're at the end of a f-string (could be nested) we'll pop the current f-string context which will give us the range of FStringStart
token and all the FStringMiddle
tokens which contains escaped quotes.
It could also be that the nested f-strings need not be checked but the outer one should be. So, once we're out of the nested f-string which will pop the context of that f-string, we need to reset the "check for escaped quote" field as per the context of the now current f-string (the outer f-string).
// ```python | ||
// f'"foo" {'nested'}" | ||
// ``` | ||
if matches!(tok, Tok::String { .. } | Tok::FStringStart) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case this is a nested string or f-string: We disable the tracking for the entire outer fstring. Isn't this too late in case we already flagged an escape early in the fstring?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's why the do_not_check_for_escaped_quote
method exists to stop checking any further and remove any existing ranges. Note that we don't extend the diagnostics until we reach the end of the f-string.
if string_contents.contains(quotes_settings.inline_quotes.as_char()) | ||
&& !string_contents.contains(quotes_settings.inline_quotes.opposite().as_char()) | ||
{ | ||
context.push_fstring_middle_range(tok_range); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: The name of the method is misleading (or the field name is). It doesn't track all fstring middle parts. It tracks the fstring middle parts that contain the preferred quotes only.
crates/ruff_linter/src/rules/flake8_quotes/rules/avoidable_escaped_quote.rs
Outdated
Show resolved
Hide resolved
crates/ruff_linter/src/rules/flake8_quotes/rules/avoidable_escaped_quote.rs
Outdated
Show resolved
Hide resolved
let mut fstrings: Vec<FStringContext> = Vec::new(); | ||
let mut state_machine = StateMachine::default(); | ||
|
||
for &(ref tok, tok_range) in lxr.iter().flatten() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic is fairly involved. I wonder if it would be easier to understand if we split the logic in two parts:
- Hot loop for when not inside of an f-string
- Loop that process an fstring (or skips it if pep701 isn't supported)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm looking into this and I think this might simplify other logic as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is taking some time so I'm skipping this for now. I'll probably take this up later as a refactor.
f5cd461
to
917cace
Compare
2d9069a
to
7753ce8
Compare
crates/ruff_linter/src/rules/flake8_quotes/rules/avoidable_escaped_quote.rs
Outdated
Show resolved
Hide resolved
95c5bd3
to
acd4ee0
Compare
7753ce8
to
51f7ef1
Compare
## Summary This PR updates the `Q003` rule to accommodate the new f-string context. The logic here takes into consideration the nested f-strings and the configured target version. The rule checks for escaped quotes within a string and determines if they are avoidable or not. It is avoidable if: 1. Outer quote matches the user preferred quote 2. Not a raw string 3. Not a triple-quoted string 4. String content contains the same quote as the outer one 5. String content _doesn't_ contain the opposite quote For f-string, the way it works is by using a context stack to keep track of certain things but mainly the text range (`FStringMiddle`) where the escapes exists. It contains the following: 1. Do we want to check for escaped quotes in the current f-string? This is required to: * Preserve the context for `FStringMiddle` tokens where we need to check for escaped quotes. But, the answer to whether we need to check or not lies with the `FStringStart` token which contains the quotes. So, when the context starts, we'll store this information. * Disallow nesting for pre 3.12 target versions 2. Store the `FStringStart` token range. This is required to create the edit to replace the quote if this f-string contains escaped quote(s). 3. All the `FStringMiddle` ranges where there are escaped quote(s). ## Test Plan * Add new test cases for nested f-strings. * Write new tests for old Python versions as existing ones test it on the latest version by default which is 3.12 as of this writing. * Verify the snapshots
This PR updates the `Q003` rule to accommodate the new f-string context. The logic here takes into consideration the nested f-strings and the configured target version. The rule checks for escaped quotes within a string and determines if they are avoidable or not. It is avoidable if: 1. Outer quote matches the user preferred quote 2. Not a raw string 3. Not a triple-quoted string 4. String content contains the same quote as the outer one 5. String content _doesn't_ contain the opposite quote For f-string, the way it works is by using a context stack to keep track of certain things but mainly the text range (`FStringMiddle`) where the escapes exists. It contains the following: 1. Do we want to check for escaped quotes in the current f-string? This is required to: * Preserve the context for `FStringMiddle` tokens where we need to check for escaped quotes. But, the answer to whether we need to check or not lies with the `FStringStart` token which contains the quotes. So, when the context starts, we'll store this information. * Disallow nesting for pre 3.12 target versions 2. Store the `FStringStart` token range. This is required to create the edit to replace the quote if this f-string contains escaped quote(s). 3. All the `FStringMiddle` ranges where there are escaped quote(s). * Add new test cases for nested f-strings. * Write new tests for old Python versions as existing ones test it on the latest version by default which is 3.12 as of this writing. * Verify the snapshots
This PR updates the `Q003` rule to accommodate the new f-string context. The logic here takes into consideration the nested f-strings and the configured target version. The rule checks for escaped quotes within a string and determines if they are avoidable or not. It is avoidable if: 1. Outer quote matches the user preferred quote 2. Not a raw string 3. Not a triple-quoted string 4. String content contains the same quote as the outer one 5. String content _doesn't_ contain the opposite quote For f-string, the way it works is by using a context stack to keep track of certain things but mainly the text range (`FStringMiddle`) where the escapes exists. It contains the following: 1. Do we want to check for escaped quotes in the current f-string? This is required to: * Preserve the context for `FStringMiddle` tokens where we need to check for escaped quotes. But, the answer to whether we need to check or not lies with the `FStringStart` token which contains the quotes. So, when the context starts, we'll store this information. * Disallow nesting for pre 3.12 target versions 2. Store the `FStringStart` token range. This is required to create the edit to replace the quote if this f-string contains escaped quote(s). 3. All the `FStringMiddle` ranges where there are escaped quote(s). * Add new test cases for nested f-strings. * Write new tests for old Python versions as existing ones test it on the latest version by default which is 3.12 as of this writing. * Verify the snapshots
Summary
This PR updates the
Q003
rule to accommodate the new f-string context. The logic here takes into consideration the nested f-strings and the configured target version.The rule checks for escaped quotes within a string and determines if they are avoidable or not. It is avoidable if:
For f-string, the way it works is by using a context stack to keep track of certain things but mainly the text range (
FStringMiddle
) where the escapes exists. It contains the following:FStringMiddle
tokens where we need to check for escaped quotes. But, the answer to whether we need to check or not lies with theFStringStart
token which contains the quotes. So, when the context starts, we'll store this information.FStringStart
token range. This is required to create the edit to replace the quote if this f-string contains escaped quote(s).FStringMiddle
ranges where there are escaped quote(s).Test Plan