Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Separate Q003 to accomodate f-string context #7588

Merged
merged 2 commits into from
Sep 27, 2023

Conversation

dhruvmanila
Copy link
Member

@dhruvmanila dhruvmanila commented Sep 22, 2023

Summary

This PR updates the Q003 rule to accommodate the new f-string context. The logic here takes into consideration the nested f-strings and the configured target version.

The rule checks for escaped quotes within a string and determines if they are avoidable or not. It is avoidable if:

  1. Outer quote matches the user preferred quote
  2. Not a raw string
  3. Not a triple-quoted string
  4. String content contains the same quote as the outer one
  5. String content doesn't contain the opposite quote

For f-string, the way it works is by using a context stack to keep track of certain things but mainly the text range (FStringMiddle) where the escapes exists. It contains the following:

  1. Do we want to check for escaped quotes in the current f-string? This is required to:
    • Preserve the context for FStringMiddle tokens where we need to check for escaped quotes. But, the answer to whether we need to check or not lies with the FStringStart token which contains the quotes. So, when the context starts, we'll store this information.
    • Disallow nesting for pre 3.12 target versions
  2. Store the FStringStart token range. This is required to create the edit to replace the quote if this f-string contains escaped quote(s).
  3. All the FStringMiddle ranges where there are escaped quote(s).

Test Plan

  • Add new test cases for nested f-strings.
  • Write new tests for old Python versions as existing ones test it on the latest version by default which is 3.12 as of this writing.
  • Verify the snapshots

@dhruvmanila dhruvmanila linked an issue Sep 22, 2023 that may be closed by this pull request
@dhruvmanila dhruvmanila added rule Implementing or modifying a lint rule python312 Related to Python 3.12 labels Sep 22, 2023
@dhruvmanila dhruvmanila marked this pull request as ready for review September 22, 2023 05:54
@dhruvmanila dhruvmanila force-pushed the dhruv/formatter-fstring branch from bf47707 to c9cf545 Compare September 22, 2023 06:29
Base automatically changed from dhruv/formatter-fstring to dhruv/pep-701 September 22, 2023 09:25
@MichaReiser
Copy link
Member

I hope this needs a rebase xD 16k lines sounds painful to review

@dhruvmanila
Copy link
Member Author

dhruvmanila commented Sep 22, 2023

I hope this needs a rebase xD 16k lines sounds painful to review

Oh wait, what happened here lol. Let me rebase.

Edit: forgot to push the rebased branch.

@codspeed-hq
Copy link

codspeed-hq bot commented Sep 22, 2023

CodSpeed Performance Report

Merging #7588 will degrade performances by 2.33%

Comparing dhruv/issue-7297 (7753ce8) with dhruv/pep-701 (917cace)

Summary

❌ 2 regressions
✅ 23 untouched benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

Benchmark dhruv/pep-701 dhruv/issue-7297 Change
linter/all-rules[pydantic/types.py] 72.3 ms 74 ms -2.33%
linter/all-rules[large/dataset.py] 162.7 ms 166.6 ms -2.31%

@dhruvmanila dhruvmanila requested review from MichaReiser and removed request for MichaReiser September 25, 2023 09:35
Copy link
Member

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM although the code is a bit difficult to understand. I would love to get @charliermarsh's opinion on too because I'm not very familiar with the rule

Comment on lines +121 to +123
if settings.rules.enabled(Rule::AvoidableEscapedQuote) && settings.flake8_quotes.avoid_escape {
flake8_quotes::rules::avoidable_escaped_quote(&mut diagnostics, tokens, locator, settings);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason for moving the rule out of from_tokens?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea was that the escaped quote check needs to be done in the content value only i.e., for f-string it's only the FStringMiddle tokens.

The f-string context which needs to be tracked for this rule is different than the one for the quote rules where the main difference is how implicitly concatenated strings are handled. For this rule, implicit string concatenation doesn't matter because we only care about the content while for the quote rules it matters.

For quote rules, each sequence needs to be independently built. For nested f-strings, the nested sequence will be different than the outer one and so on. This is the main reason for the separation. It becomes easier to reason about.

Comment on lines 64 to 67
fn do_not_check_for_escaped_quote(&mut self) {
self.check_for_escaped_quote = false;
self.fstring_middle_ranges.clear();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have expected a enter_...exit_... function pair How does it start checking for quotes again?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't start again. For example,

f"\"foo\" {'nested'}"
# ^^(1)^^^ ^^^(2)^^

Here, (1) will be checked first and the range will be added to fstring_middle_ranges. Then we encountered a nested string (2) but our target version doesn't support PEP 701, so we'll stop checking for the current f-string by inverting the flag and removing any previously diagnosed ranges.

/// The range of the f-string start token.
fstring_start_range: TextRange,
/// The ranges of the f-string middle tokens containing escaped quotes.
fstring_middle_ranges: Vec<TextRange>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I understand this correctly, we allocate a fstring_middle_ranges for each f-string. Can we flatten this datastructure so that it uses a single vector instead of N nested vectors?

From what I understand, it may be sufficient to track the fstring middle ranges for the outermost fstring only (just push the ranges of the inner fstring middles into the same vec)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to associate the edits for the f-string it belongs to. The context helps in preserving that information. So, once we're at the end of a f-string (could be nested) we'll pop the current f-string context which will give us the range of FStringStart token and all the FStringMiddle tokens which contains escaped quotes.

It could also be that the nested f-strings need not be checked but the outer one should be. So, once we're out of the nested f-string which will pop the context of that f-string, we need to reset the "check for escaped quote" field as per the context of the now current f-string (the outer f-string).

// ```python
// f'"foo" {'nested'}"
// ```
if matches!(tok, Tok::String { .. } | Tok::FStringStart) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case this is a nested string or f-string: We disable the tracking for the entire outer fstring. Isn't this too late in case we already flagged an escape early in the fstring?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's why the do_not_check_for_escaped_quote method exists to stop checking any further and remove any existing ranges. Note that we don't extend the diagnostics until we reach the end of the f-string.

if string_contents.contains(quotes_settings.inline_quotes.as_char())
&& !string_contents.contains(quotes_settings.inline_quotes.opposite().as_char())
{
context.push_fstring_middle_range(tok_range);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: The name of the method is misleading (or the field name is). It doesn't track all fstring middle parts. It tracks the fstring middle parts that contain the preferred quotes only.

let mut fstrings: Vec<FStringContext> = Vec::new();
let mut state_machine = StateMachine::default();

for &(ref tok, tok_range) in lxr.iter().flatten() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic is fairly involved. I wonder if it would be easier to understand if we split the logic in two parts:

  1. Hot loop for when not inside of an f-string
  2. Loop that process an fstring (or skips it if pep701 isn't supported)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm looking into this and I think this might simplify other logic as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is taking some time so I'm skipping this for now. I'll probably take this up later as a refactor.

@dhruvmanila dhruvmanila merged commit ac80be1 into dhruv/pep-701 Sep 27, 2023
2 of 15 checks passed
@dhruvmanila dhruvmanila deleted the dhruv/issue-7297 branch September 27, 2023 07:57
dhruvmanila added a commit that referenced this pull request Sep 28, 2023
## Summary

This PR updates the `Q003` rule to accommodate the new f-string context.
The logic here takes into consideration the nested f-strings and the
configured target version.

The rule checks for escaped quotes within a string and determines if
they are avoidable or not. It is avoidable if:
1. Outer quote matches the user preferred quote
2. Not a raw string
3. Not a triple-quoted string
4. String content contains the same quote as the outer one
5. String content _doesn't_ contain the opposite quote

For f-string, the way it works is by using a context stack to keep track
of certain things but mainly the text range (`FStringMiddle`) where the
escapes exists. It contains the following:

1. Do we want to check for escaped quotes in the current f-string? This
is required to:
* Preserve the context for `FStringMiddle` tokens where we need to check
for escaped quotes. But, the answer to whether we need to check or not
lies with the `FStringStart` token which contains the quotes. So, when
the context starts, we'll store this information.
  	* Disallow nesting for pre 3.12 target versions
2. Store the `FStringStart` token range. This is required to create the
edit to replace the quote if this f-string contains escaped quote(s).
3. All the `FStringMiddle` ranges where there are escaped quote(s).

## Test Plan

* Add new test cases for nested f-strings.
* Write new tests for old Python versions as existing ones test it on
the latest version by default which is 3.12 as of this writing.
* Verify the snapshots
dhruvmanila added a commit that referenced this pull request Sep 29, 2023
This PR updates the `Q003` rule to accommodate the new f-string context.
The logic here takes into consideration the nested f-strings and the
configured target version.

The rule checks for escaped quotes within a string and determines if
they are avoidable or not. It is avoidable if:
1. Outer quote matches the user preferred quote
2. Not a raw string
3. Not a triple-quoted string
4. String content contains the same quote as the outer one
5. String content _doesn't_ contain the opposite quote

For f-string, the way it works is by using a context stack to keep track
of certain things but mainly the text range (`FStringMiddle`) where the
escapes exists. It contains the following:

1. Do we want to check for escaped quotes in the current f-string? This
is required to:
* Preserve the context for `FStringMiddle` tokens where we need to check
for escaped quotes. But, the answer to whether we need to check or not
lies with the `FStringStart` token which contains the quotes. So, when
the context starts, we'll store this information.
  	* Disallow nesting for pre 3.12 target versions
2. Store the `FStringStart` token range. This is required to create the
edit to replace the quote if this f-string contains escaped quote(s).
3. All the `FStringMiddle` ranges where there are escaped quote(s).

* Add new test cases for nested f-strings.
* Write new tests for old Python versions as existing ones test it on
the latest version by default which is 3.12 as of this writing.
* Verify the snapshots
dhruvmanila added a commit that referenced this pull request Sep 29, 2023
This PR updates the `Q003` rule to accommodate the new f-string context.
The logic here takes into consideration the nested f-strings and the
configured target version.

The rule checks for escaped quotes within a string and determines if
they are avoidable or not. It is avoidable if:
1. Outer quote matches the user preferred quote
2. Not a raw string
3. Not a triple-quoted string
4. String content contains the same quote as the outer one
5. String content _doesn't_ contain the opposite quote

For f-string, the way it works is by using a context stack to keep track
of certain things but mainly the text range (`FStringMiddle`) where the
escapes exists. It contains the following:

1. Do we want to check for escaped quotes in the current f-string? This
is required to:
* Preserve the context for `FStringMiddle` tokens where we need to check
for escaped quotes. But, the answer to whether we need to check or not
lies with the `FStringStart` token which contains the quotes. So, when
the context starts, we'll store this information.
  	* Disallow nesting for pre 3.12 target versions
2. Store the `FStringStart` token range. This is required to create the
edit to replace the quote if this f-string contains escaped quote(s).
3. All the `FStringMiddle` ranges where there are escaped quote(s).

* Add new test cases for nested f-strings.
* Write new tests for old Python versions as existing ones test it on
the latest version by default which is 3.12 as of this writing.
* Verify the snapshots
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
python312 Related to Python 3.12 rule Implementing or modifying a lint rule
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update flake8_quotes to account for the new f-string tokens
3 participants