Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add DebugText for self-documenting f-strings #6167

Merged
merged 5 commits into from
Aug 1, 2023

Conversation

davidszotten
Copy link
Contributor

instead of modelling self-documenting f-strings (f"{ foo= }") as a (simplified)
Constant("foo=") followed by a FormattedValue(Expr("x")), instead model this case with a DebugText(leading, trailing) attribute on the FormattedValue so that we don't have to synthesize nodes (which results in siblings with overlapping ranges). We need to be able to preserve the whitespace for self-documenting f-strings, as well as reproduce the source (eg unparse, format).

Fixes #5970

Test Plan

Existing snapshots, a few more tests. More needed?

@github-actions
Copy link
Contributor

github-actions bot commented Jul 29, 2023

PR Check Results

Ecosystem

✅ ecosystem check detected no changes.

Benchmark

Linux

group                                      main                                   pr
-----                                      ----                                   --
formatter/large/dataset.py                 1.00     10.9±0.29ms     3.7 MB/sec    1.12     12.3±0.44ms     3.3 MB/sec
formatter/numpy/ctypeslib.py               1.00      2.1±0.06ms     7.9 MB/sec    1.08      2.3±0.09ms     7.3 MB/sec
formatter/numpy/globals.py                 1.00    232.3±7.64µs    12.7 MB/sec    1.05   244.1±15.98µs    12.1 MB/sec
formatter/pydantic/types.py                1.00      4.6±0.16ms     5.5 MB/sec    1.09      5.0±0.16ms     5.1 MB/sec
linter/all-rules/large/dataset.py          1.05     15.0±0.49ms     2.7 MB/sec    1.00     14.4±1.06ms     2.8 MB/sec
linter/all-rules/numpy/ctypeslib.py        1.03      3.7±0.11ms     4.4 MB/sec    1.00      3.6±0.08ms     4.6 MB/sec
linter/all-rules/numpy/globals.py          1.00   497.5±21.62µs     5.9 MB/sec    1.02   506.9±12.18µs     5.8 MB/sec
linter/all-rules/pydantic/types.py         1.01      6.6±0.28ms     3.9 MB/sec    1.00      6.6±0.27ms     3.9 MB/sec
linter/default-rules/large/dataset.py      1.17      8.8±0.22ms     4.6 MB/sec    1.00      7.5±0.21ms     5.4 MB/sec
linter/default-rules/numpy/ctypeslib.py    1.17  1797.2±47.89µs     9.3 MB/sec    1.00  1538.0±44.55µs    10.8 MB/sec
linter/default-rules/numpy/globals.py      1.05    198.3±4.90µs    14.9 MB/sec    1.00    188.8±6.29µs    15.6 MB/sec
linter/default-rules/pydantic/types.py     1.14      3.9±0.07ms     6.6 MB/sec    1.00      3.4±0.08ms     7.5 MB/sec

Windows

group                                      main                                   pr
-----                                      ----                                   --
formatter/large/dataset.py                 1.07     13.5±0.58ms     3.0 MB/sec    1.00     12.6±0.72ms     3.2 MB/sec
formatter/numpy/ctypeslib.py               1.10      2.6±0.11ms     6.3 MB/sec    1.00      2.4±0.11ms     7.0 MB/sec
formatter/numpy/globals.py                 1.10   303.7±35.09µs     9.7 MB/sec    1.00   276.5±29.94µs    10.7 MB/sec
formatter/pydantic/types.py                1.16      6.1±0.39ms     4.2 MB/sec    1.00      5.2±0.22ms     4.9 MB/sec
linter/all-rules/large/dataset.py          1.14     20.5±1.08ms  2028.8 KB/sec    1.00     18.0±0.83ms     2.3 MB/sec
linter/all-rules/numpy/ctypeslib.py        1.13      5.3±0.31ms     3.2 MB/sec    1.00      4.7±0.23ms     3.6 MB/sec
linter/all-rules/numpy/globals.py          1.07   621.3±29.97µs     4.7 MB/sec    1.00   578.5±30.42µs     5.1 MB/sec
linter/all-rules/pydantic/types.py         1.13      9.0±0.39ms     2.8 MB/sec    1.00      8.0±0.38ms     3.2 MB/sec
linter/default-rules/large/dataset.py      1.13     10.9±0.67ms     3.7 MB/sec    1.00      9.7±0.49ms     4.2 MB/sec
linter/default-rules/numpy/ctypeslib.py    1.07      2.1±0.10ms     7.8 MB/sec    1.00  1986.2±168.05µs     8.4 MB/sec
linter/default-rules/numpy/globals.py      1.07   248.3±13.84µs    11.9 MB/sec    1.00   232.9±17.70µs    12.7 MB/sec
linter/default-rules/pydantic/types.py     1.11      4.8±0.38ms     5.3 MB/sec    1.00      4.3±0.27ms     5.9 MB/sec

@MichaReiser MichaReiser added the parser Related to the parser label Jul 29, 2023
fn self_documenting_f_string() {
assert_round_trip!(r#"f"{ chr(65) = }""#);
assert_round_trip!(r#"f"{ chr(65) = !s}""#);
assert_round_trip!(r#"f"{ chr(65) = !r}""#);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind adding a test case with a format specification?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If possible, an additional test case to mix both conversion and format specification would be nice as well: f"{a=!r:0.05f}"

start_location,
)
})?;
let leading =
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this feels a little hacky but was my best idea. suggestions welcome

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which part do you find hacky?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pulling slices out of expression

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine. I ultimately want to re-write this entire file and use Cursor and use string slices from the source text directly everywhere. But @dhruvmanila may make this whole code redundant before when implementing the 3.12 F-string changes.

Copy link
Member

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great!

I would like to hear your opinion on keeping Conversion::Repr for debug strings to ease downstream use and we may be able to speed this up a bit (the whole module requires a rework but we can start somewhere).

@@ -592,10 +592,26 @@ pub struct ExprCall<'a> {
#[derive(Debug, PartialEq, Eq, Hash)]
pub struct ExprFormattedValue<'a> {
value: Box<ComparableExpr<'a>>,
debug_text: Option<DebugText<'a>>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to store a reference to a debug text instead of defining an extra DebugText node?

Suggested change
debug_text: Option<DebugText<'a>>,
debug_text: Option<&'a DebugText>,

Comment on lines 600 to 614
#[derive(Debug, PartialEq, Eq, Hash)]
pub struct DebugText<'a> {
leading: &'a str,
trailing: &'a str,
}

impl<'a> From<&'a ast::DebugText> for DebugText<'a> {
fn from(debug_text: &'a ast::DebugText) -> Self {
Self {
leading: debug_text.leading.as_str(),
trailing: debug_text.trailing.as_str(),
}
}
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment above

Suggested change
#[derive(Debug, PartialEq, Eq, Hash)]
pub struct DebugText<'a> {
leading: &'a str,
trailing: &'a str,
}
impl<'a> From<&'a ast::DebugText> for DebugText<'a> {
fn from(debug_text: &'a ast::DebugText) -> Self {
Self {
leading: debug_text.leading.as_str(),
trailing: debug_text.trailing.as_str(),
}
}
}

@@ -925,6 +926,14 @@ impl ConversionFlag {
}
}

#[derive(Clone, Debug, PartialEq)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#[derive(Clone, Debug, PartialEq)]
#[derive(Clone, Debug, PartialEq, Eq, Hash)]

fn unparse_formatted(
&mut self,
val: &Expr,
debug_text: &Option<DebugText>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I find Option<&ref> more natural than passing a reference to an option (except if the option is mutable).

Suggested change
debug_text: &Option<DebugText>,
debug_text: Option<&DebugText>,

start_location,
)
})?;
let leading =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which part do you find hacky?

@@ -230,6 +230,7 @@ impl<'a> StringParser<'a> {
// match a python 3.8 self documenting expression
// format '{' PYTHON_EXPRESSION '=' FORMAT_SPECIFIER? '}'
'=' if self.peek() != Some('=') && delimiters.is_empty() => {
trailing_seq.push('=');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it that we now use the text ranges to get leading and before_eq. Would it be possible also to use text ranges to extract trailing? It would reduce 2 string allocations for each self-documenting string (one for trailing_seq, and one for formatting trailing in debug text.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think so. we'd need to store the = and any trailing spaces inside expression which is a bit odd but doable if we just track where the actual expression ends. will push something with what i mean (and maybe you have ideas for how to improve that)

Copy link
Member

@dhruvmanila dhruvmanila left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good! Thanks for working on this :)

fn self_documenting_f_string() {
assert_round_trip!(r#"f"{ chr(65) = }""#);
assert_round_trip!(r#"f"{ chr(65) = !s}""#);
assert_round_trip!(r#"f"{ chr(65) = !r}""#);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If possible, an additional test case to mix both conversion and format specification would be nice as well: f"{a=!r:0.05f}"

instead of modelling self-documenting f-strings (`f"{ foo= }"`) as a
(simplified)
`Constant("foo=")` followed by a `FormattedValue(Expr("x"))`, instead
model this case with a `DebugText(leading, trailing)` attribute on the
`FormattedValue` so that we don't have to synthesize nodes (which
results in siblings with overlapping ranges). We need to be able to
preserve the whitespace for self-documenting f-strings, as well as
reproduce the source (eg unparse, format).
vec![Expr::from(ast::ExprFormattedValue {
value: Box::new(value),
debug_text: Some(ast::DebugText {
leading: leading.to_string(),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we store references into the source here rather than allocating new strings? i guess many more things would need to change

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I played around with an ast that uses Cow<'source, str> for all string fields. I spent about 2h fixing lifetime issues and run out of patience. The problem is that every node must be parametrized by the source lifetime.

We may still want to do this because using a bump allocator for the AST nodes would also require a new lifetime. Something to explore in the future.

@MichaReiser MichaReiser merged commit ba990b6 into astral-sh:main Aug 1, 2023
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
parser Related to the parser
Projects
None yet
Development

Successfully merging this pull request may close these issues.

parser needs special handling of self-documenting f-strings
3 participants