-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FStringPart for f-string formatting #6365
Conversation
crates/ruff_python_ast/src/nodes.rs
Outdated
@@ -927,6 +927,12 @@ impl From<ExprJoinedStr> for Expr { | |||
} | |||
} | |||
|
|||
#[derive(Clone, Debug, PartialEq)] | |||
pub enum JoinedStringPart { | |||
String(StringTodoName), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe String(StringTodoName)
-> Constant(StringPart)
or Constant(ConstantStringPart)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My preference (in the long term) is to rename Constant
(it's a constant at runtime but it is literal in the source code) to Literal
. How about FStringLiteralElement
and FStringExpressionElement
(or FormattedValueElement
).
5a23d6c
to
d2f1aae
Compare
PR Check ResultsEcosystem✅ ecosystem check detected no changes. |
@MichaReiser are we still interested in something like this? |
Sorry. I have yet to look at it since it is a draft pull request. Generally, yes, but I'm a bit underwater right now. It's almost a full-day job to keep up with @charliermarsh's PRs ;P Is there specific feedback you're looking for? |
#6183 mentions larger structural changes around handling of implicit concatenation (and how it would be nice to unify this with You've previously (sorry can't find reference now) mentioned that someone might be making larger changes in order to support 3.12 style formatted-values (though that might be orthogonal until we drop support for 3.11 in 2029 or so) i suppose for a quick review the interesting changes are in
perhaps also a glance at the snapshots considering changing and open q about naming lastly i hadn't quite appreciated how much changing the ast node leaks into rules. i think i managed most but will require thorough review, and i'll need help with https://github.com/astral-sh/ruff/pull/6365/files#diff-c2c8422959405016e1fb54ca23edee9716bd06ad64e610ceea9612a8291122fcR1292 and probably https://github.com/astral-sh/ruff/pull/6365/files#diff-b6e87960d75a3034eafafa5ba307dbce96935905deada7841a56f80e7927acbbR63 |
98480e0
to
9651311
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the late reply and thanks for working on this refactor.
What's your overall sense for this change? Does it simplify the AST handling or does it add unnecessary complexity?
@dhruvmanila is working on the python 312 f-string parsing. My understanding is that this is mainly limited to how we lex f-strings, and shouldn't affect the AST representation.
#6183 mentions larger structural changes around handling of implicit concatenation (and how it would be nice to unify this with JoinedStringPart (i guess i need to rename to FStringPart now).
Yeah, I'm not yet sure how implicit concatenation should work. Especially because you can mix regular strings an FString (what a mess...). I'm thinking about removing FStringExpr
and instead have a single StringLiteral
that may consist of FString and regular string elements. I would ignore this for now but I would appreciate any suggestions that you have.
crates/ruff_python_ast/src/nodes.rs
Outdated
@@ -927,6 +927,12 @@ impl From<ExprJoinedStr> for Expr { | |||
} | |||
} | |||
|
|||
#[derive(Clone, Debug, PartialEq)] | |||
pub enum JoinedStringPart { | |||
String(StringTodoName), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My preference (in the long term) is to rename Constant
(it's a constant at runtime but it is literal in the source code) to Literal
. How about FStringLiteralElement
and FStringExpressionElement
(or FormattedValueElement
).
as i mention in the summary it feels more correct to use a different representation for string "snippets" than it's also nice to remove a few in general i'd say slightly better |
4d7e21e
to
0f3ac34
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great work, thanks for doing this! I have mainly looked at the AST/parser related changes and briefly the linter related changes but if you want I can give a closer look at other changes as well.
If I understand correctly, the introduction of Literal
is (in the future) to move away from Constant
as suggested by Micha. I'm a bit unsure of the difference between PartialString
and Constant::Str
. Is PartialString
will only be used for f-string?
crates/ruff/src/checkers/ast/mod.rs
Outdated
@@ -1288,17 +1288,6 @@ where | |||
self.semantic.flags = flags_snapshot; | |||
} | |||
|
|||
fn visit_format_spec(&mut self, format_spec: &'b Expr) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm assuming this visit is done elsewhere and that's why it's removed here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, this is now handled as part of walking f-strings (which walks the parts, some of which are formatted-values which has the format spec
|
I'm still not feeling a 100%, and this change requires more brainpower than I have right now. I'll get back to you when I feel better. |
sorry to hear, hope you feel better soon! no rush on this |
might need to rethink the naming (and possibly more). i thought i had my poc for f-strings mostly working but i had forgotten about implicit concatenation which complicates things. i don't full understand how that works atm but need to figure out if/how it can be adapted for f-strings. one thing i immediately noticed is that we already use the term "string part" there to mean "one of multiple implicitly concatenated strings" so we might want to use a different name than |
ab9b96c
to
606e185
Compare
3de115b
to
71ba8fe
Compare
good catch.
i guess the main issue is remembering to add new checks in both places. could maybe factor out a helper function |
71ba8fe
to
5907914
Compare
5907914
to
d9fdd3d
Compare
...rc/rules/flake8_bandit/snapshots/ruff_linter__rules__flake8_bandit__tests__S104_S104.py.snap
Show resolved
Hide resolved
/// Convert a string to a constant string expression. | ||
pub(super) fn to_constant_string(s: &str) -> Expr { | ||
let node = ast::ExprConstant { | ||
value: s.to_owned().into(), | ||
pub(super) fn to_constant_string(s: &str) -> ast::FStringElement { | ||
ast::FStringElement::Literal(ast::FStringLiteralElement { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The previous implementation would create a constant string node but it's changed to creating a f-string literal element. Is that an expected behavior? If so, could you change the function name and documentation to reflect that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes it's expected, this is (and was) only used to build an f-string. makes sense to update the name
(I will re-review this tonight; if not, then tomorrow, sorry!) |
(I'll take ownership of rebasing this PR as #7376 has been merged on |
If I understand this correctly, then we're replacing the There are some changes involved in the parser definition as well mainly with the fact that the |
The original idea was focused on cases like i don't think i'd fully considered the case in your example but it does make me wonder if we should consider modelling implicit concatenation differently too
the f-string elements have ranges so could we find the range using the start of the first and the end of the last? |
This is replaced by #8835 with correct author attribution. Thanks @davidszotten for taking this up! Happy to take any additional feedback in the new PR. |
Rebase of #6365 authored by @davidszotten. ## Summary This PR updates the AST structure for an f-string elements. The main **motivation** behind this change is to have a dedicated node for the string part of an f-string. Previously, the existing `ExprStringLiteral` node was used for this purpose which isn't exactly correct. The `ExprStringLiteral` node should include the quotes as well in the range but the f-string literal element doesn't include the quote as it's a specific part within an f-string. For example, ```python f"foo {x}" # ^^^^ # This is the literal part of an f-string ``` The introduction of `FStringElement` enum is helpful which represent either the literal part or the expression part of an f-string. ### Rule Updates This means that there'll be two nodes representing a string depending on the context. One for a normal string literal while the other is a string literal within an f-string. The AST checker is updated to accommodate this change. The rules which work on string literal are updated to check on the literal part of f-string as well. #### Notes 1. The `Expr::is_literal_expr` method would check for `ExprStringLiteral` and return true if so. But now that we don't represent the literal part of an f-string using that node, this improves the method's behavior and confines to the actual expression. We do have the `FStringElement::is_literal` method. 2. We avoid checking if we're in a f-string context before adding to `string_type_definitions` because the f-string literal is now a dedicated node and not part of `Expr`. 3. Annotations cannot use f-string so we avoid changing any rules which work on annotation and checks for `ExprStringLiteral`. ## Test Plan - All references of `Expr::StringLiteral` were checked to see if any of the rules require updating to account for the f-string literal element node. - New test cases are added for rules which check against the literal part of an f-string. - Check the ecosystem results and ensure it remains unchanged. ## Performance There's a performance penalty in the parser. The reason for this remains unknown as it seems that the generated assembly code is now different for the `__reduce154` function. The reduce function body is just popping the `ParenthesizedExpr` on top of the stack and pushing it with the new location. - The size of `FStringElement` enum is the same as `Expr` which is what it replaces in `FString::format_spec` - The size of `FStringExpressionElement` is the same as `ExprFormattedValue` which is what it replaces I tried reducing the `Expr` enum from 80 bytes to 72 bytes but it hardly resulted in any performance gain. The difference can be seen here: - Original profile: https://share.firefox.dev/3Taa7ES - Profile after boxing some node fields: https://share.firefox.dev/3GsNXpD ### Backtracking I tried backtracking the changes to see if any of the isolated change produced this regression. The problem here is that the overall change is so small that there's only a single checkpoint where I can backtrack and that checkpoint results in the same regression. This checkpoint is to revert using `Expr` to the `FString::format_spec` field. After this point, the change would revert back to the original implementation. ## Review process The review process is similar to #7927. The first set of commits update the node structure, parser, and related AST files. Then, further commits update the linter and formatter part to account for the AST change. --------- Co-authored-by: David Szotten <davidszotten@gmail.com>
Step one of "proper f-string formatting"
Implements
JoinedStringPart
suggested in #5970The previous implementation uses
Constant::Str
to model the constant parts of formatted strings (including egformat_spec
s) but the formatter expectsConstant::Str
to include the surrounding quotes which these string parts don't haveI think i've got far enough into my f-string formatting poc to say this is good