Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FStringPart for f-string formatting #6365

Closed
wants to merge 17 commits into from

Conversation

davidszotten
Copy link
Contributor

@davidszotten davidszotten commented Aug 5, 2023

Step one of "proper f-string formatting"

Implements JoinedStringPart suggested in #5970

The previous implementation uses Constant::Str to model the constant parts of formatted strings (including eg format_specs) but the formatter expects Constant::Str to include the surrounding quotes which these string parts don't have

I think i've got far enough into my f-string formatting poc to say this is good

@@ -927,6 +927,12 @@ impl From<ExprJoinedStr> for Expr {
}
}

#[derive(Clone, Debug, PartialEq)]
pub enum JoinedStringPart {
String(StringTodoName),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe String(StringTodoName) -> Constant(StringPart) or Constant(ConstantStringPart)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My preference (in the long term) is to rename Constant (it's a constant at runtime but it is literal in the source code) to Literal. How about FStringLiteralElement and FStringExpressionElement (or FormattedValueElement).

@github-actions
Copy link
Contributor

github-actions bot commented Aug 6, 2023

PR Check Results

Ecosystem

✅ ecosystem check detected no changes.

@davidszotten
Copy link
Contributor Author

@MichaReiser are we still interested in something like this?

@MichaReiser
Copy link
Member

@MichaReiser are we still interested in something like this?

Sorry. I have yet to look at it since it is a draft pull request. Generally, yes, but I'm a bit underwater right now. It's almost a full-day job to keep up with @charliermarsh's PRs ;P

Is there specific feedback you're looking for?

@davidszotten
Copy link
Contributor Author

Is there specific feedback you're looking for?

#6183 mentions larger structural changes around handling of implicit concatenation (and how it would be nice to unify this with JoinedStringPart (i guess i need to rename to FStringPart now).

You've previously (sorry can't find reference now) mentioned that someone might be making larger changes in order to support 3.12 style formatted-values (though that might be orthogonal until we drop support for 3.11 in 2029 or so)

i suppose for a quick review the interesting changes are in
https://github.com/astral-sh/ruff/pull/6365/files#diff-3af3e83bb70765ce947636ab537ecd13676c5dc1929edf6e2a1e3d1d46191191

ast/nodes.rs

perhaps also a glance at the snapshots

considering changing format_spec from Option<Box<Expr>> to Vec<JoinedStringPart>
(same reason as above, these "snippets" don't have quotes so can't really stand alone as expressions. but does it need to be Option<Vec<_>> to avoid memory overhead? afict an empty vec takes space for metadata but doesn't allocate a buffer for elements so maybe ok?

https://github.com/astral-sh/ruff/pull/6365/files#diff-3af3e83bb70765ce947636ab537ecd13676c5dc1929edf6e2a1e3d1d46191191R871

and open q about naming
https://github.com/astral-sh/ruff/pull/6365/files#r1285057553

lastly i hadn't quite appreciated how much changing the ast node leaks into rules. i think i managed most but will require thorough review, and i'll need help with https://github.com/astral-sh/ruff/pull/6365/files#diff-c2c8422959405016e1fb54ca23edee9716bd06ad64e610ceea9612a8291122fcR1292 and probably https://github.com/astral-sh/ruff/pull/6365/files#diff-b6e87960d75a3034eafafa5ba307dbce96935905deada7841a56f80e7927acbbR63

@davidszotten davidszotten force-pushed the joined-string-part branch 5 times, most recently from 98480e0 to 9651311 Compare August 10, 2023 08:36
Copy link
Member

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the late reply and thanks for working on this refactor.

What's your overall sense for this change? Does it simplify the AST handling or does it add unnecessary complexity?

@dhruvmanila is working on the python 312 f-string parsing. My understanding is that this is mainly limited to how we lex f-strings, and shouldn't affect the AST representation.

#6183 mentions larger structural changes around handling of implicit concatenation (and how it would be nice to unify this with JoinedStringPart (i guess i need to rename to FStringPart now).

Yeah, I'm not yet sure how implicit concatenation should work. Especially because you can mix regular strings an FString (what a mess...). I'm thinking about removing FStringExpr and instead have a single StringLiteral that may consist of FString and regular string elements. I would ignore this for now but I would appreciate any suggestions that you have.

crates/ruff_python_ast/src/nodes.rs Outdated Show resolved Hide resolved
crates/ruff_python_ast/src/nodes.rs Outdated Show resolved Hide resolved
crates/ruff_python_ast/src/node.rs Show resolved Hide resolved
crates/ruff/src/rules/flynt/helpers.rs Outdated Show resolved Hide resolved
@@ -927,6 +927,12 @@ impl From<ExprJoinedStr> for Expr {
}
}

#[derive(Clone, Debug, PartialEq)]
pub enum JoinedStringPart {
String(StringTodoName),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My preference (in the long term) is to rename Constant (it's a constant at runtime but it is literal in the source code) to Literal. How about FStringLiteralElement and FStringExpressionElement (or FormattedValueElement).

@davidszotten
Copy link
Contributor Author

What's your overall sense for this change?

as i mention in the summary it feels more correct to use a different representation for string "snippets" than Constant since they don't include the quotes (and often aren't the entire string)

it's also nice to remove a few unreachables

in general i'd say slightly better

@davidszotten davidszotten force-pushed the joined-string-part branch 4 times, most recently from 4d7e21e to 0f3ac34 Compare August 20, 2023 08:03
@davidszotten davidszotten changed the title JoinedStringPart for f-string formatting FStringPart for f-string formatting Aug 20, 2023
@davidszotten davidszotten marked this pull request as ready for review August 20, 2023 15:24
Copy link
Member

@dhruvmanila dhruvmanila left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great work, thanks for doing this! I have mainly looked at the AST/parser related changes and briefly the linter related changes but if you want I can give a closer look at other changes as well.

If I understand correctly, the introduction of Literal is (in the future) to move away from Constant as suggested by Micha. I'm a bit unsure of the difference between PartialString and Constant::Str. Is PartialString will only be used for f-string?

crates/ruff_python_ast/src/comparable.rs Outdated Show resolved Hide resolved
crates/ruff_python_ast/src/node.rs Outdated Show resolved Hide resolved
@@ -1288,17 +1288,6 @@ where
self.semantic.flags = flags_snapshot;
}

fn visit_format_spec(&mut self, format_spec: &'b Expr) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming this visit is done elsewhere and that's why it's removed here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, this is now handled as part of walking f-strings (which walks the parts, some of which are formatted-values which has the format spec

see e.g. https://github.com/davidszotten/ruff/blob/0eafbe4afee2f43cfc1af3b6ccdec8479133d38b/crates/ruff_python_ast/src/visitor.rs#L705

crates/ruff_python_parser/src/string.rs Show resolved Hide resolved
@konstin konstin added the formatter Related to the formatter label Aug 21, 2023
@davidszotten
Copy link
Contributor Author

I'm a bit unsure of the difference between PartialString and Constant::Str

PartialString doesn't include prefixes and quotes, and sometimes is only part of a string, e.g.

f"before{formatted}after"
  ^^^^^^

@MichaReiser MichaReiser added parser Related to the parser and removed formatter Related to the formatter labels Aug 22, 2023
@MichaReiser
Copy link
Member

I'm still not feeling a 100%, and this change requires more brainpower than I have right now. I'll get back to you when I feel better.

@davidszotten
Copy link
Contributor Author

I'm still not feeling a 100%, and this change requires more brainpower than I have right now. I'll get back to you when I feel better.

sorry to hear, hope you feel better soon! no rush on this

@davidszotten
Copy link
Contributor Author

might need to rethink the naming (and possibly more). i thought i had my poc for f-strings mostly working but i had forgotten about implicit concatenation which complicates things. i don't full understand how that works atm but need to figure out if/how it can be adapted for f-strings. one thing i immediately noticed is that we already use the term "string part" there to mean "one of multiple implicitly concatenated strings" so we might want to use a different name than FStringPart here to avoid confusion

@davidszotten davidszotten force-pushed the joined-string-part branch 2 times, most recently from ab9b96c to 606e185 Compare August 26, 2023 13:33
@davidszotten
Copy link
Contributor Author

davidszotten commented Sep 24, 2023

good catch.

hardcoded_bind_all_interfaces and hardcoded_tmp_directory are straightforward to add. unicode_kind_prefix doesn't apply but string_or_bytes_too_long is a bit tricker since (unlike the first 2 which only need the range) it actually unpacks the expr. let me know what you think of the approach (cast to astnodes)

i guess the main issue is remembering to add new checks in both places. could maybe factor out a helper function

Comment on lines 15 to 17
/// Convert a string to a constant string expression.
pub(super) fn to_constant_string(s: &str) -> Expr {
let node = ast::ExprConstant {
value: s.to_owned().into(),
pub(super) fn to_constant_string(s: &str) -> ast::FStringElement {
ast::FStringElement::Literal(ast::FStringLiteralElement {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous implementation would create a constant string node but it's changed to creating a f-string literal element. Is that an expected behavior? If so, could you change the function name and documentation to reflect that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes it's expected, this is (and was) only used to build an f-string. makes sense to update the name

@charliermarsh
Copy link
Member

(I will re-review this tonight; if not, then tomorrow, sorry!)

@dhruvmanila
Copy link
Member

(I'll take ownership of rebasing this PR as #7376 has been merged on main)

@dhruvmanila
Copy link
Member

dhruvmanila commented Oct 3, 2023

If I understand this correctly, then we're replacing the Constant::Str with a dedicated FStringElement::Literal, right? I think we'll need to preserve the information across both nodes which includes the unicode and implicit_concatenated fields otherwise for something like u"foo" f"{bar}" we won't be able to detect whether it's a unicode string or not.

There are some changes involved in the parser definition as well mainly with the fact that the format_spec isn't a node anymore but rather a vector of f-string elements. The range of format spec was being used to extract the debug text.

@davidszotten
Copy link
Contributor Author

If I understand this correctly, then we're replacing the Constant::Str with a dedicated FStringElement::Literal, right? I think we'll need to preserve the information across both nodes which includes the unicode and implicit_concatenated fields otherwise for something like u"foo" f"{bar}" we won't be able to detect whether it's a unicode string or not.

The original idea was focused on cases like f"foo{bar}" where the foo part was modelled the same as a single string "foo" even though those are different things (eg the latter includes quotes and can have a prefix)

i don't think i'd fully considered the case in your example but it does make me wonder if we should consider modelling implicit concatenation differently too

There are some changes involved in the parser definition as well mainly with the fact that the format_spec isn't a node anymore but rather a vector of f-string elements. The range of format spec was being used to extract the debug text.

the f-string elements have ranges so could we find the range using the start of the first and the end of the last?

@dhruvmanila
Copy link
Member

This is replaced by #8835 with correct author attribution. Thanks @davidszotten for taking this up! Happy to take any additional feedback in the new PR.

dhruvmanila added a commit that referenced this pull request Dec 7, 2023
Rebase of #6365 authored by @davidszotten.

## Summary

This PR updates the AST structure for an f-string elements.

The main **motivation** behind this change is to have a dedicated node
for the string part of an f-string. Previously, the existing
`ExprStringLiteral` node was used for this purpose which isn't exactly
correct. The `ExprStringLiteral` node should include the quotes as well
in the range but the f-string literal element doesn't include the quote
as it's a specific part within an f-string. For example,

```python
f"foo {x}"
# ^^^^
# This is the literal part of an f-string
```

The introduction of `FStringElement` enum is helpful which represent
either the literal part or the expression part of an f-string.

### Rule Updates

This means that there'll be two nodes representing a string depending on
the context. One for a normal string literal while the other is a string
literal within an f-string. The AST checker is updated to accommodate
this change. The rules which work on string literal are updated to check
on the literal part of f-string as well.

#### Notes

1. The `Expr::is_literal_expr` method would check for
`ExprStringLiteral` and return true if so. But now that we don't
represent the literal part of an f-string using that node, this improves
the method's behavior and confines to the actual expression. We do have
the `FStringElement::is_literal` method.
2. We avoid checking if we're in a f-string context before adding to
`string_type_definitions` because the f-string literal is now a
dedicated node and not part of `Expr`.
3. Annotations cannot use f-string so we avoid changing any rules which
work on annotation and checks for `ExprStringLiteral`.

## Test Plan

- All references of `Expr::StringLiteral` were checked to see if any of
the rules require updating to account for the f-string literal element
node.
- New test cases are added for rules which check against the literal
part of an f-string.
- Check the ecosystem results and ensure it remains unchanged.

## Performance

There's a performance penalty in the parser. The reason for this remains
unknown as it seems that the generated assembly code is now different
for the `__reduce154` function. The reduce function body is just popping
the `ParenthesizedExpr` on top of the stack and pushing it with the new
location.

- The size of `FStringElement` enum is the same as `Expr` which is what
it replaces in `FString::format_spec`
- The size of `FStringExpressionElement` is the same as
`ExprFormattedValue` which is what it replaces

I tried reducing the `Expr` enum from 80 bytes to 72 bytes but it hardly
resulted in any performance gain. The difference can be seen here:
- Original profile: https://share.firefox.dev/3Taa7ES
- Profile after boxing some node fields:
https://share.firefox.dev/3GsNXpD

### Backtracking

I tried backtracking the changes to see if any of the isolated change
produced this regression. The problem here is that the overall change is
so small that there's only a single checkpoint where I can backtrack and
that checkpoint results in the same regression. This checkpoint is to
revert using `Expr` to the `FString::format_spec` field. After this
point, the change would revert back to the original implementation.

## Review process

The review process is similar to #7927. The first set of commits update
the node structure, parser, and related AST files. Then, further commits
update the linter and formatter part to account for the AST change.

---------

Co-authored-by: David Szotten <davidszotten@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
parser Related to the parser
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants