Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

☂️ Formatter: Support formatting of embedded code #8237

Open
MichaReiser opened this issue Oct 26, 2023 · 13 comments
Open

☂️ Formatter: Support formatting of embedded code #8237

MichaReiser opened this issue Oct 26, 2023 · 13 comments
Labels
formatter Related to the formatter wish Not on the current roadmap; maybe in the future

Comments

@MichaReiser
Copy link
Member

MichaReiser commented Oct 26, 2023

Support formatting Python code embedded in other languages like:

  • Markdown
  • reStructuredText
  • HTML
  • ...

The goal of this issue is not that we implement support for all these languages but to build up the infrastructure to run ruff (at least the formatter) on files that contain embedded python code and format it. Ideally, the infrastructure would, in the future, allow us to support arbitrary nesting:

  • Format SQL in Python
  • Format Markdown in Python
  • ...

Prettier and JetBrains code formatter do an excellent job at this.

Related:

@MichaReiser MichaReiser added formatter Related to the formatter wish Not on the current roadmap; maybe in the future labels Oct 26, 2023
@MichaReiser MichaReiser changed the title ☂️ Format Python code embedded in other languages ☂️ Formatter: Support formatting of embedded code Oct 26, 2023
@dhruvmanila
Copy link
Member

The goal of this issue is not that we implement support for all these languages but to build up the infrastructure to run ruff (at least the formatter) on files that contain embedded python code and format it.

I think we should keep the linter in mind while designing the infrastructure as, and I'm not 100% sure but it's mainly my intuition from working on the Notebook support, it's more than likely than we get the formatter support for free. Free in the sense that it'll require considerably less effort on the formatter side once the infrastructure is in place.

@MichaReiser
Copy link
Member Author

I agree, but I wanted to keep this issue scoped. The linter and formatter likely have similar requirements and need similar infrastructure, but with slight nuances.

@ddelange
Copy link

potential duplicate of #3792

@henryiii
Copy link
Contributor

https://github.com/adamchainz/blacken-docs does this with black and markdown, ReST, and LaTeX. You can see the block types supported there. (python, pycon, etc). Even better, maybe the block types could be configurable, say setting md-blocks = ["python", "ipython", "{code-cell} ipython3"] would allow you to run on python, ipython, and executable Python (see https://jupyterbook.org/en/stable/file-types/myst-notebooks.html).

@KelSolaar
Copy link

Hello,

I would be keen to see doctests formatting, e.g.:

def least_square_mapping_MoorePenrose(
    y: ArrayLike, x: ArrayLike
) -> NDArrayFloat:
    """
    Compute the *least-squares* mapping from dependent variable :math:`y` to
    independent variable :math:`x` using *Moore-Penrose* inverse.

    Parameters
    ----------
    y
        Dependent and already known :math:`y` variable.
    x
        Independent :math:`x` variable(s) values corresponding with :math:`y`
        variable.

    Returns
    -------
    :class:`numpy.ndarray`
        *Least-squares* mapping.

    References
    ----------
    :cite:`Finlayson2015`

    Examples
    --------
    >>> prng = np.random.RandomState(2)
    >>> y = prng.random_sample((24, 3))
    >>> x = y + (    prng.random_sample(    (24, 3)) - 0.5) * 0.5
    >>> least_square_mapping_MoorePenrose(y, x)  # doctest: +ELLIPSIS
    array([[ 1.0526376...,  0.1378078..., -0.2276339...],
           [ 0.0739584...,  1.0293994..., -0.1060115...],
           [ 0.0572550..., -0.2052633...,  1.1015194...]])
    """

    y = np.atleast_2d(y)
    x = np.atleast_2d(x)

    return np.dot(np.transpose(x), np.linalg.pinv(np.transpose(y)))

Ruff currently does not format anything inside >>> and ... in docstrings. I managed to drop Black and replace it with Ruff but I still depend on adamchainz/blacken-docs.

Cheers,

Thomas

@MichaReiser
Copy link
Member Author

MichaReiser commented Jan 29, 2024

Hey @KelSolaar

Docstring code block formatting is supported but off by default. You can enable it in your settings using format.docstring-code-format = true

@KelSolaar
Copy link

Hi @MichaReiser,

I have it enabled but it does not seem to format the above.

Cheers,

Thomas

@MichaReiser
Copy link
Member Author

@KelSolaar

The issue is that

array([[ 1.0526376...,  0.1378078..., -0.2276339...],
           [ 0.0739584...,  1.0293994..., -0.1060115...],
           [ 0.0572550..., -0.2052633...,  1.1015194...]])

is not valid python syntax (because of the ...). The formatter can only format examples that are valid python.

@KelSolaar
Copy link

KelSolaar commented Jan 29, 2024

Right I see! So this particular code output is used as a doctest where any whitespace counts so it should NOT be formatted, ever. Ruff formatter should ignore those outputs fully which is what adamchainz/blacken-docs does. Hope it does make sense!

@KelSolaar
Copy link

For that specific case, Ruff should only consider docstring lines starting with either >>> or .... There might be some subtleties and I would check the doctests parser to confirm but it is the main idea.

@henryiii
Copy link
Contributor

henryiii commented Jan 29, 2024

That example should use the pycon lexer (similar to how "console" is used for console input with $/#). Anything in the python language should be valid Python and formatted, so I think Ruff is doing the right thing trying to format it (and failing). Supporting pycon and formatting the "python" parts would be nice, though! (Note that pycon is supported elsewhere, including in markdown like here on GitHub)

If no language is given (such code with a 4-space indent), I think it should be treated as whatever the default is (which IIRC might be Python).

@ddelange
Copy link

corresponding google-style doctest:

def least_square_mapping_MoorePenrose(
    y: ArrayLike, x: ArrayLike
) -> NDArrayFloat:
    """
    Compute the *least-squares* mapping from dependent variable :math:`y` to
    independent variable :math:`x` using *Moore-Penrose* inverse.

    Examples:
        >>> prng = np.random.RandomState(2)
        >>> y = prng.random_sample((24, 3))
        >>> x = y + (    prng.random_sample(    (24, 3)) - 0.5) * 0.5
        >>> least_square_mapping_MoorePenrose(y, x)  # doctest: +ELLIPSIS
        array([[ 1.0526376...,  0.1378078..., -0.2276339...],
               [ 0.0739584...,  1.0293994..., -0.1060115...],
               [ 0.0572550..., -0.2052633...,  1.1015194...]])
    """

    y = np.atleast_2d(y)
    x = np.atleast_2d(x)

    return np.dot(np.transpose(x), np.linalg.pinv(np.transpose(y)))

@aneeshusa
Copy link

+1 to this - beyond blacken-docs, shed which I'm currently switching from has this feature too!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
formatter Related to the formatter wish Not on the current roadmap; maybe in the future
Projects
None yet
Development

No branches or pull requests

6 participants