add `DebugText` for self-documenting f-strings #6167

davidszotten · 2023-07-29T08:08:36Z

instead of modelling self-documenting f-strings (f"{ foo= }") as a (simplified)
Constant("foo=") followed by a FormattedValue(Expr("x")), instead model this case with a DebugText(leading, trailing) attribute on the FormattedValue so that we don't have to synthesize nodes (which results in siblings with overlapping ranges). We need to be able to preserve the whitespace for self-documenting f-strings, as well as reproduce the source (eg unparse, format).

Fixes #5970

Test Plan

Existing snapshots, a few more tests. More needed?

github-actions · 2023-07-29T08:20:54Z

PR Check Results

Ecosystem

✅ ecosystem check detected no changes.

Benchmark

Linux

group                                      main                                   pr
-----                                      ----                                   --
formatter/large/dataset.py                 1.00     10.9±0.29ms     3.7 MB/sec    1.12     12.3±0.44ms     3.3 MB/sec
formatter/numpy/ctypeslib.py               1.00      2.1±0.06ms     7.9 MB/sec    1.08      2.3±0.09ms     7.3 MB/sec
formatter/numpy/globals.py                 1.00    232.3±7.64µs    12.7 MB/sec    1.05   244.1±15.98µs    12.1 MB/sec
formatter/pydantic/types.py                1.00      4.6±0.16ms     5.5 MB/sec    1.09      5.0±0.16ms     5.1 MB/sec
linter/all-rules/large/dataset.py          1.05     15.0±0.49ms     2.7 MB/sec    1.00     14.4±1.06ms     2.8 MB/sec
linter/all-rules/numpy/ctypeslib.py        1.03      3.7±0.11ms     4.4 MB/sec    1.00      3.6±0.08ms     4.6 MB/sec
linter/all-rules/numpy/globals.py          1.00   497.5±21.62µs     5.9 MB/sec    1.02   506.9±12.18µs     5.8 MB/sec
linter/all-rules/pydantic/types.py         1.01      6.6±0.28ms     3.9 MB/sec    1.00      6.6±0.27ms     3.9 MB/sec
linter/default-rules/large/dataset.py      1.17      8.8±0.22ms     4.6 MB/sec    1.00      7.5±0.21ms     5.4 MB/sec
linter/default-rules/numpy/ctypeslib.py    1.17  1797.2±47.89µs     9.3 MB/sec    1.00  1538.0±44.55µs    10.8 MB/sec
linter/default-rules/numpy/globals.py      1.05    198.3±4.90µs    14.9 MB/sec    1.00    188.8±6.29µs    15.6 MB/sec
linter/default-rules/pydantic/types.py     1.14      3.9±0.07ms     6.6 MB/sec    1.00      3.4±0.08ms     7.5 MB/sec

Windows

group                                      main                                   pr
-----                                      ----                                   --
formatter/large/dataset.py                 1.07     13.5±0.58ms     3.0 MB/sec    1.00     12.6±0.72ms     3.2 MB/sec
formatter/numpy/ctypeslib.py               1.10      2.6±0.11ms     6.3 MB/sec    1.00      2.4±0.11ms     7.0 MB/sec
formatter/numpy/globals.py                 1.10   303.7±35.09µs     9.7 MB/sec    1.00   276.5±29.94µs    10.7 MB/sec
formatter/pydantic/types.py                1.16      6.1±0.39ms     4.2 MB/sec    1.00      5.2±0.22ms     4.9 MB/sec
linter/all-rules/large/dataset.py          1.14     20.5±1.08ms  2028.8 KB/sec    1.00     18.0±0.83ms     2.3 MB/sec
linter/all-rules/numpy/ctypeslib.py        1.13      5.3±0.31ms     3.2 MB/sec    1.00      4.7±0.23ms     3.6 MB/sec
linter/all-rules/numpy/globals.py          1.07   621.3±29.97µs     4.7 MB/sec    1.00   578.5±30.42µs     5.1 MB/sec
linter/all-rules/pydantic/types.py         1.13      9.0±0.39ms     2.8 MB/sec    1.00      8.0±0.38ms     3.2 MB/sec
linter/default-rules/large/dataset.py      1.13     10.9±0.67ms     3.7 MB/sec    1.00      9.7±0.49ms     4.2 MB/sec
linter/default-rules/numpy/ctypeslib.py    1.07      2.1±0.10ms     7.8 MB/sec    1.00  1986.2±168.05µs     8.4 MB/sec
linter/default-rules/numpy/globals.py      1.07   248.3±13.84µs    11.9 MB/sec    1.00   232.9±17.70µs    12.7 MB/sec
linter/default-rules/pydantic/types.py     1.11      4.8±0.38ms     5.3 MB/sec    1.00      4.3±0.27ms     5.9 MB/sec

MichaReiser · 2023-07-29T08:54:33Z

crates/ruff_python_codegen/src/generator.rs

+    fn self_documenting_f_string() {
+        assert_round_trip!(r#"f"{ chr(65)  =   }""#);
+        assert_round_trip!(r#"f"{ chr(65)  =   !s}""#);
+        assert_round_trip!(r#"f"{ chr(65)  =   !r}""#);


Would you mind adding a test case with a format specification?

If possible, an additional test case to mix both conversion and format specification would be nice as well: f"{a=!r:0.05f}"

.../src/snapshots/ruff_python_parser__string__tests__parse_fstring_self_doc_trailing_space.snap

davidszotten · 2023-07-29T11:22:39Z

crates/ruff_python_parser/src/string.rs

+                                    start_location,
+                                )
+                            })?;
+                        let leading =


this feels a little hacky but was my best idea. suggestions welcome

Which part do you find hacky?

pulling slices out of expression

I think it's fine. I ultimately want to re-write this entire file and use Cursor and use string slices from the source text directly everywhere. But @dhruvmanila may make this whole code redundant before when implementing the 3.12 F-string changes.

MichaReiser

This is great!

I would like to hear your opinion on keeping Conversion::Repr for debug strings to ease downstream use and we may be able to speed this up a bit (the whole module requires a rework but we can start somewhere).

MichaReiser · 2023-07-31T10:22:38Z

crates/ruff_python_ast/src/comparable.rs

@@ -592,10 +592,26 @@ pub struct ExprCall<'a> {
 #[derive(Debug, PartialEq, Eq, Hash)]
 pub struct ExprFormattedValue<'a> {
    value: Box<ComparableExpr<'a>>,
+    debug_text: Option<DebugText<'a>>,


Would it be possible to store a reference to a debug text instead of defining an extra DebugText node?

Suggested change

debug_text: Option<DebugText<'a>>,

debug_text: Option<&'a DebugText>,

MichaReiser · 2023-07-31T10:22:55Z

crates/ruff_python_ast/src/comparable.rs

+#[derive(Debug, PartialEq, Eq, Hash)]
+pub struct DebugText<'a> {
+    leading: &'a str,
+    trailing: &'a str,
+}
+
+impl<'a> From<&'a ast::DebugText> for DebugText<'a> {
+    fn from(debug_text: &'a ast::DebugText) -> Self {
+        Self {
+            leading: debug_text.leading.as_str(),
+            trailing: debug_text.trailing.as_str(),
+        }
+    }
+}
+


See comment above

Suggested change

#[derive(Debug, PartialEq, Eq, Hash)]

pub struct DebugText<'a> {

leading: &'a str,

trailing: &'a str,

}

impl<'a> From<&'a ast::DebugText> for DebugText<'a> {

fn from(debug_text: &'a ast::DebugText) -> Self {

Self {

leading: debug_text.leading.as_str(),

trailing: debug_text.trailing.as_str(),

}

}

}

MichaReiser · 2023-07-31T10:23:17Z

crates/ruff_python_ast/src/nodes.rs

@@ -925,6 +926,14 @@ impl ConversionFlag {
    }
 }

+#[derive(Clone, Debug, PartialEq)]


Suggested change

#[derive(Clone, Debug, PartialEq)]

#[derive(Clone, Debug, PartialEq, Eq, Hash)]

MichaReiser · 2023-07-31T10:24:00Z

crates/ruff_python_codegen/src/generator.rs

+    fn unparse_formatted(
+        &mut self,
+        val: &Expr,
+        debug_text: &Option<DebugText>,


Nit: I find Option<&ref> more natural than passing a reference to an option (except if the option is mutable).

Suggested change

debug_text: &Option<DebugText>,

debug_text: Option<&DebugText>,

MichaReiser · 2023-07-31T10:31:35Z

crates/ruff_python_parser/src/string.rs

+                                    start_location,
+                                )
+                            })?;
+                        let leading =


Which part do you find hacky?

MichaReiser · 2023-07-31T10:32:49Z

crates/ruff_python_parser/src/string.rs

@@ -230,6 +230,7 @@ impl<'a> StringParser<'a> {
                // match a python 3.8 self documenting expression
                // format '{' PYTHON_EXPRESSION '=' FORMAT_SPECIFIER? '}'
                '=' if self.peek() != Some('=') && delimiters.is_empty() => {
+                    trailing_seq.push('=');


I like it that we now use the text ranges to get leading and before_eq. Would it be possible also to use text ranges to extract trailing? It would reduce 2 string allocations for each self-documenting string (one for trailing_seq, and one for formatting trailing in debug text.

i think so. we'd need to store the = and any trailing spaces inside expression which is a bit odd but doable if we just track where the actual expression ends. will push something with what i mean (and maybe you have ideas for how to improve that)

dhruvmanila

This is good! Thanks for working on this :)

dhruvmanila · 2023-07-31T15:51:13Z

crates/ruff_python_codegen/src/generator.rs

+    fn self_documenting_f_string() {
+        assert_round_trip!(r#"f"{ chr(65)  =   }""#);
+        assert_round_trip!(r#"f"{ chr(65)  =   !s}""#);
+        assert_round_trip!(r#"f"{ chr(65)  =   !r}""#);


If possible, an additional test case to mix both conversion and format specification would be nice as well: f"{a=!r:0.05f}"

instead of modelling self-documenting f-strings (`f"{ foo= }"`) as a (simplified) `Constant("foo=")` followed by a `FormattedValue(Expr("x"))`, instead model this case with a `DebugText(leading, trailing)` attribute on the `FormattedValue` so that we don't have to synthesize nodes (which results in siblings with overlapping ranges). We need to be able to preserve the whitespace for self-documenting f-strings, as well as reproduce the source (eg unparse, format).

davidszotten · 2023-07-31T20:48:49Z

crates/ruff_python_parser/src/string.rs

+                        vec![Expr::from(ast::ExprFormattedValue {
+                            value: Box::new(value),
+                            debug_text: Some(ast::DebugText {
+                                leading: leading.to_string(),


Could we store references into the source here rather than allocating new strings? i guess many more things would need to change

I played around with an ast that uses Cow<'source, str> for all string fields. I spent about 2h fixing lifetime issues and run out of patience. The problem is that every node must be parametrized by the source lifetime.

We may still want to do this because using a bump allocator for the AST nodes would also require a new lifetime. Something to explore in the future.

davidszotten mentioned this pull request Jul 29, 2023

parser needs special handling of self-documenting f-strings #5970

Closed

MichaReiser requested a review from dhruvmanila July 29, 2023 08:53

MichaReiser added the parser Related to the parser label Jul 29, 2023

MichaReiser requested a review from charliermarsh July 29, 2023 08:55

MichaReiser reviewed Jul 29, 2023

View reviewed changes

davidszotten commented Jul 29, 2023

View reviewed changes

.../src/snapshots/ruff_python_parser__string__tests__parse_fstring_self_doc_trailing_space.snap Show resolved Hide resolved

davidszotten commented Jul 29, 2023

View reviewed changes

davidszotten force-pushed the self-documenting-f-strings branch from 135c98b to 757dc4f Compare July 29, 2023 11:23

MichaReiser approved these changes Jul 31, 2023

View reviewed changes

dhruvmanila approved these changes Jul 31, 2023

View reviewed changes

davidszotten added 4 commits July 31, 2023 19:41

store a ref to ast::DebugText in compare

68659fe

one more test

61a895d

&option<_> -> option<&_>

f7ead78

davidszotten commented Jul 31, 2023

View reviewed changes

avoid some allocations

ff200c6

davidszotten force-pushed the self-documenting-f-strings branch from 757dc4f to ff200c6 Compare July 31, 2023 20:51

MichaReiser approved these changes Aug 1, 2023

View reviewed changes

MichaReiser merged commit ba990b6 into astral-sh:main Aug 1, 2023
17 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add `DebugText` for self-documenting f-strings #6167

add `DebugText` for self-documenting f-strings #6167

davidszotten commented Jul 29, 2023

github-actions bot commented Jul 29, 2023 •

edited

Loading

MichaReiser Jul 29, 2023

dhruvmanila Jul 31, 2023

davidszotten Jul 29, 2023

MichaReiser Jul 31, 2023

davidszotten Jul 31, 2023

MichaReiser Aug 1, 2023

MichaReiser left a comment

MichaReiser Jul 31, 2023

MichaReiser Jul 31, 2023

MichaReiser Jul 31, 2023

MichaReiser Jul 31, 2023

MichaReiser Jul 31, 2023

MichaReiser Jul 31, 2023

davidszotten Jul 31, 2023

dhruvmanila left a comment

dhruvmanila Jul 31, 2023

davidszotten Jul 31, 2023

MichaReiser Aug 1, 2023

	debug_text: Option<DebugText<'a>>,
	debug_text: Option<&'a DebugText>,

	#[derive(Clone, Debug, PartialEq)]
	#[derive(Clone, Debug, PartialEq, Eq, Hash)]

	debug_text: &Option<DebugText>,
	debug_text: Option<&DebugText>,

add DebugText for self-documenting f-strings #6167

add DebugText for self-documenting f-strings #6167

Conversation

davidszotten commented Jul 29, 2023

Test Plan

github-actions bot commented Jul 29, 2023 • edited Loading

PR Check Results

Ecosystem

Benchmark

Linux

Windows

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MichaReiser left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dhruvmanila left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

add `DebugText` for self-documenting f-strings #6167

add `DebugText` for self-documenting f-strings #6167

github-actions bot commented Jul 29, 2023 •

edited

Loading