docstring code formatter: remove "invalid Python" check #8857

BurntSushi · 2023-11-27T16:12:10Z

#8811 added a docstring code snippet formatter. As part of the initial implementation, it is actually possible for the reformatter to transform valid Python to invalid Python, usually as a result of corner cases related to triple quoting. Since these are odd cases, for expediency, the initial implementation checks if the reformatted code is valid. If it isn't, then it bails out of reformatting and skips the code snippets entirely.

Ideally, we would be able to have more confidence in our code snippet reformatter to the point that we could remove this check for invalid Python code. Doing this will likely require some refactoring for how nested triple quotes are handled.

Here's a good example from the tests that doesn't work today:

ruff/crates/ruff_python_formatter/resources/test/fixtures/ruff/docstring_code_examples.py

Lines 307 to 316 in d6148b6

    
           # Test that a doctest containing a triple quoted string at least 
        
           # does not result in invalid Python code. Ideally this would format 
        
           # correctly, but at time of writing it does not. 
        
           def doctest_invalid_skipped_with_triple_double_in_single_quote_string(): 
        
               """ 
        
               Do cool stuff. 
        
               >>> x        =      '\"\"\"' 
        
               """ 
        
               pass

Namely, the code snippet there ought to be formatted, but today, it is skipped because the reformatting currently generates invalid Python code.

MichaReiser · 2023-11-27T23:25:02Z

Do we have a benchmark that tests the performance of docstring formatting? Adding one could help to understand the performance characteristics better and validate any improvements to it.

Do we always perform the re-parse today or do we use a heuristic when the re-parse is necessary (only when the source text contains triple quoted strings?). A text search should be multiple times faster than a full re-parse.

BurntSushi · 2023-11-28T14:58:51Z

There aren't any benchmarks for code snippet formatting yet. Using a heuristic to avoid a re-parse seems plausible, but only if we buy that the cases where invalid Python can be produced are limited to triple quoting. (I think I buy that, otherwise I don't see how it could break out of the enclosing docstring.)

For an ad hoc benchmark, if I run the formatter with and without docstring-code enabled, then runtime is about the same. I ran it a few times and couldn't notice a difference. (Not that this is a good substitute for a real benchmark, but perhaps suggestive that docstring code formatting doesn't have a huge impact on perf.)

MichaReiser · 2023-11-28T23:51:53Z

For an ad hoc benchmark, if I run the formatter with and without docstring-code enabled, then runtime is about the same. I ran it a few times and couldn't notice a difference. (Not that this is a good substitute for a real benchmark, but perhaps suggestive that docstring code formatting doesn't have a huge impact on perf.)

I assume you did that for a project that has docstrings?

If you happen to have a test file with code snippets, then you could add it to this benchmark:

ruff/crates/ruff_benchmark/benches/formatter.rs

Lines 28 to 42 in e62e245

    
           fn create_test_cases() -> Result<Vec<TestCase>, TestFileDownloadError> { 
        
               Ok(vec![ 
        
                   TestCase::fast(TestFile::try_download("numpy/globals.py", "https://raw.githubusercontent.com/numpy/numpy/89d64415e349ca75a25250f22b874aa16e5c0973/numpy/_globals.py")?), 
        
                   TestCase::fast(TestFile::try_download("unicode/pypinyin.py", "https://raw.githubusercontent.com/mozillazg/python-pinyin/9521e47d96e3583a5477f5e43a2e82d513f27a3f/pypinyin/standard.py")?), 
        
                   TestCase::normal(TestFile::try_download( 
        
                       "pydantic/types.py", 
        
                       "https://raw.githubusercontent.com/pydantic/pydantic/83b3c49e99ceb4599d9286a3d793cea44ac36d4b/pydantic/types.py", 
        
                   )?), 
        
                   TestCase::normal(TestFile::try_download("numpy/ctypeslib.py", "https://raw.githubusercontent.com/numpy/numpy/e42c9503a14d66adfd41356ef5640c6975c45218/numpy/ctypeslib.py")?), 
        
                   TestCase::slow(TestFile::try_download( 
        
                       "large/dataset.py", 
        
                       "https://raw.githubusercontent.com/DHI/mikeio/b7d26418f4db2909b0aa965253dbe83194d7bb5b/tests/test_dataset.py", 
        
                   )?), 
        
               ]) 
        
           }

And change the options to always enable the docstring code option here

ruff/crates/ruff_benchmark/benches/formatter.rs

Line 72 in e62e245

let options = PyFormatOptions::from_extension(Path::new(case.name()));

The benchmark will then run automatically as part of codspeed.

BurntSushi · 2023-11-29T13:31:42Z

I assume you did that for a project that has docstrings?

Yeah. I ran it on CPython and polars.

I added a new issue tracking adding benchmarks specifically targeted to code snippet reformatting: #8909

BurntSushi added performance Potential performance improvement formatter Related to the formatter wish Not on the current roadmap; maybe in the future labels Nov 27, 2023

This was referenced Nov 27, 2023

format doctests in docstrings #8811

Merged

Using ruff to format code examples in docstrings #7146

Closed

MichaReiser added the docstring Related to docstring linting or formatting label Nov 27, 2023

BurntSushi mentioned this issue Nov 29, 2023

docstring code formatter: add benchmarks #8909

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docstring code formatter: remove "invalid Python" check #8857

docstring code formatter: remove "invalid Python" check #8857

BurntSushi commented Nov 27, 2023

MichaReiser commented Nov 27, 2023

BurntSushi commented Nov 28, 2023

MichaReiser commented Nov 28, 2023 •

edited

Loading

BurntSushi commented Nov 29, 2023

docstring code formatter: remove "invalid Python" check #8857

docstring code formatter: remove "invalid Python" check #8857

Comments

BurntSushi commented Nov 27, 2023

MichaReiser commented Nov 27, 2023

BurntSushi commented Nov 28, 2023

MichaReiser commented Nov 28, 2023 • edited Loading

BurntSushi commented Nov 29, 2023

MichaReiser commented Nov 28, 2023 •

edited

Loading