Basic string formatting #5297

MichaReiser · 2023-06-22T12:20:31Z

Summary

This PR implements formatting for non-f-string Strings that do not use implicit concatenation.

Docstring formatting is out of the scope of this PR.

Test Plan

I added a few tests for simple string literals.

Performance

Ouch. This is hitting performance somewhat hard. This is probably because we now iterate each string a couple of times:

To detect if it is an implicit string continuation
To detect if the string contains any new lines
To detect the preferred quote
To normalize the string

Edit: I integrated the detection of newlines into the preferred quote detection so that we only iterate the string three time.
We can probably do better by merging the implicit string continuation with the quote detection and new line detection by iterating till the end of the string part and returning the offset. We then use our simple tokenizer to skip over any comments or whitespace until we find the first non trivia token. From there we keep continue doing this in a loop until we reach the end o the string. I'll leave this improvement for later.

MichaReiser · 2023-06-22T12:20:45Z

Current dependencies on/for this PR:

main
- PR Format class definitions #5289
  - PR Format Compare Op #5292
    - PR Basic string formatting #5297 👈

This comment was auto-generated by Graphite.

github-actions · 2023-06-22T12:49:52Z

PR Check Results

Ecosystem

✅ ecosystem check detected no changes.

Benchmark

Linux

group                                      main                                   pr
-----                                      ----                                   --
formatter/large/dataset.py                 1.01      7.4±0.35ms     5.5 MB/sec    1.00      7.3±0.32ms     5.6 MB/sec
formatter/numpy/ctypeslib.py               1.00  1723.0±93.08µs     9.7 MB/sec    1.01  1741.2±93.18µs     9.6 MB/sec
formatter/numpy/globals.py                 1.00   207.5±13.20µs    14.2 MB/sec    1.02   210.7±13.52µs    14.0 MB/sec
formatter/pydantic/types.py                1.04      3.9±0.40ms     6.5 MB/sec    1.00      3.8±0.21ms     6.7 MB/sec
linter/all-rules/large/dataset.py          1.07     15.7±0.92ms     2.6 MB/sec    1.00     14.7±0.46ms     2.8 MB/sec
linter/all-rules/numpy/ctypeslib.py        1.00      3.7±0.15ms     4.5 MB/sec    1.01      3.7±0.18ms     4.5 MB/sec
linter/all-rules/numpy/globals.py          1.00   472.5±24.15µs     6.2 MB/sec    1.04   491.9±28.96µs     6.0 MB/sec
linter/all-rules/pydantic/types.py         1.01      6.6±0.31ms     3.8 MB/sec    1.00      6.6±0.33ms     3.9 MB/sec
linter/default-rules/large/dataset.py      1.00      7.2±0.25ms     5.7 MB/sec    1.07      7.7±0.45ms     5.3 MB/sec
linter/default-rules/numpy/ctypeslib.py    1.00  1580.9±77.03µs    10.5 MB/sec    1.04  1650.6±122.86µs    10.1 MB/sec
linter/default-rules/numpy/globals.py      1.00   189.4±10.17µs    15.6 MB/sec    1.01   191.5±12.14µs    15.4 MB/sec
linter/default-rules/pydantic/types.py     1.01      3.4±0.18ms     7.5 MB/sec    1.00      3.4±0.14ms     7.5 MB/sec

Windows

group                                      main                                   pr
-----                                      ----                                   --
formatter/large/dataset.py                 1.00      8.0±0.09ms     5.1 MB/sec    1.00      8.0±0.07ms     5.1 MB/sec
formatter/numpy/ctypeslib.py               1.00  1773.0±15.83µs     9.4 MB/sec    1.00  1777.6±17.46µs     9.4 MB/sec
formatter/numpy/globals.py                 1.00    203.9±5.46µs    14.5 MB/sec    1.00    203.9±4.40µs    14.5 MB/sec
formatter/pydantic/types.py                1.00      4.1±0.04ms     6.3 MB/sec    1.01      4.1±0.06ms     6.2 MB/sec
linter/all-rules/large/dataset.py          1.00     15.2±0.12ms     2.7 MB/sec    1.00     15.3±0.08ms     2.7 MB/sec
linter/all-rules/numpy/ctypeslib.py        1.00      4.1±0.10ms     4.1 MB/sec    1.00      4.1±0.02ms     4.1 MB/sec
linter/all-rules/numpy/globals.py          1.00    427.5±6.56µs     6.9 MB/sec    1.00    428.0±8.00µs     6.9 MB/sec
linter/all-rules/pydantic/types.py         1.00      6.9±0.04ms     3.7 MB/sec    1.01      6.9±0.06ms     3.7 MB/sec
linter/default-rules/large/dataset.py      1.00      8.0±0.09ms     5.1 MB/sec    1.00      8.1±0.05ms     5.0 MB/sec
linter/default-rules/numpy/ctypeslib.py    1.00  1671.0±15.33µs    10.0 MB/sec    1.00  1677.3±22.38µs     9.9 MB/sec
linter/default-rules/numpy/globals.py      1.00    181.6±3.37µs    16.2 MB/sec    1.00    181.5±1.88µs    16.3 MB/sec
linter/default-rules/pydantic/types.py     1.00      3.6±0.02ms     7.0 MB/sec    1.01      3.7±0.02ms     7.0 MB/sec

crates/ruff_python_formatter/src/expression/string.rs

konstin · 2023-06-22T15:30:31Z

crates/ruff_python_formatter/src/lib.rs

@@ -342,29 +377,8 @@ if True:
        let printed = format_module(&content)?;
        let formatted_code = printed.as_code();

-        let reformatted =


why is that gone?

My understanding is that ensure_stability_when_formatting_twice now already asserts that reformatting the file twice yields the same result. Thanks to your refactor :)

MichaReiser · 2023-06-23T07:35:46Z

Graphite rebased this pull request after merging its parent, because this pull request is set to merge when ready.

MichaReiser · 2023-06-23T07:46:08Z

@MichaReiser merged this pull request with Graphite.

This was referenced Jun 22, 2023

Format class definitions #5289

Merged

Format Compare Op #5292

Merged

MichaReiser force-pushed the format-string branch from afad5cc to 0a3ed21 Compare June 22, 2023 14:48

MichaReiser marked this pull request as ready for review June 22, 2023 14:50

MichaReiser force-pushed the format-string branch 2 times, most recently from 0bea2d4 to a78246b Compare June 22, 2023 15:16

konstin approved these changes Jun 22, 2023

View reviewed changes

MichaReiser force-pushed the format-string branch from a78246b to ae9cadf Compare June 22, 2023 16:36

MichaReiser added internal An internal refactor or improvement formatter Related to the formatter labels Jun 22, 2023

MichaReiser force-pushed the 06-21-Format_Compare_Op branch from 895d416 to 54b2b11 Compare June 23, 2023 07:18

MichaReiser force-pushed the format-string branch from ae9cadf to d2df571 Compare June 23, 2023 07:18

Base automatically changed from 06-21-Format_Compare_Op to main June 23, 2023 07:35

MichaReiser added 3 commits June 23, 2023 07:35

Basic string formatting

e1cad54

Format single string part

936e745

Correctly handle triple quoted strings

20158e7

MichaReiser force-pushed the format-string branch from d2df571 to 20158e7 Compare June 23, 2023 07:35

MichaReiser merged commit c52aa8f into main Jun 23, 2023

MichaReiser deleted the format-string branch June 23, 2023 07:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Basic string formatting #5297

Basic string formatting #5297

MichaReiser commented Jun 22, 2023 •

edited

Loading

MichaReiser commented Jun 22, 2023

github-actions bot commented Jun 22, 2023 •

edited

Loading

konstin Jun 22, 2023

MichaReiser Jun 22, 2023 •

edited

Loading

MichaReiser commented Jun 23, 2023

MichaReiser commented Jun 23, 2023

Basic string formatting #5297

Basic string formatting #5297

Conversation

MichaReiser commented Jun 22, 2023 • edited Loading

Summary

Test Plan

Performance

MichaReiser commented Jun 22, 2023

github-actions bot commented Jun 22, 2023 • edited Loading

PR Check Results

Ecosystem

Benchmark

Linux

Windows

konstin Jun 22, 2023

Choose a reason for hiding this comment

MichaReiser Jun 22, 2023 • edited Loading

Choose a reason for hiding this comment

MichaReiser commented Jun 23, 2023

MichaReiser commented Jun 23, 2023

MichaReiser commented Jun 22, 2023 •

edited

Loading

github-actions bot commented Jun 22, 2023 •

edited

Loading

MichaReiser Jun 22, 2023 •

edited

Loading