Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix f-string formatting in assignment statement #14454

Merged
merged 10 commits into from
Nov 26, 2024

Conversation

dhruvmanila
Copy link
Member

@dhruvmanila dhruvmanila commented Nov 19, 2024

Summary

fixes: #13813

This PR fixes a bug in the formatting assignment statement when the value is an f-string.

This is resolved by using custom best fit layouts if the f-string is (a) not already a flat f-string (thus, cannot be multiline) and (b) is not a multiline string (thus, cannot be flattened). So, it is used in cases like the following:

aaaaaaaaaaaaaaaaaa = f"testeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee{
    expression}moreeeeeeeeeeeeeeeee"

Which is (a) FStringLayout::Multiline and (b) not a multiline.

There are various other examples in the PR diff along with additional explanation and context as code comments.

Test Plan

Add multiple test cases for various scenarios.

@dhruvmanila dhruvmanila added the formatter Related to the formatter label Nov 19, 2024
@dhruvmanila dhruvmanila marked this pull request as draft November 19, 2024 13:40
@dhruvmanila dhruvmanila removed the request for review from MichaReiser November 19, 2024 13:40
@dhruvmanila dhruvmanila force-pushed the dhruv/f-string-assignment-2 branch from 4f24b97 to 9f968a5 Compare November 20, 2024 11:24
Copy link
Contributor

github-actions bot commented Nov 20, 2024

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

Formatter (stable)

ℹ️ ecosystem check detected format changes. (+4 -2 lines in 1 file in 1 projects; 54 projects unchanged)

milvus-io/pymilvus (+4 -2 lines across 1 file)

examples/concurrency/multithreading_hello_milvus.py~L71

         print(
             f"Inserted {len(self.batchs)} batches of {num_entities} entities in {duration} seconds"
         )
-        print(f"Expected num_entities: {len(self.batchs)*num_entities}. \
-                Acutal num_entites: {self.get_thread_local_collection().num_entities}")
+        print(
+            f"Expected num_entities: {len(self.batchs)*num_entities}. \
+                Acutal num_entites: {self.get_thread_local_collection().num_entities}"
+        )
 
 
 if __name__ == "__main__":

Formatter (preview)

ℹ️ ecosystem check detected format changes. (+4 -2 lines in 1 file in 1 projects; 54 projects unchanged)

milvus-io/pymilvus (+4 -2 lines across 1 file)

ruff format --preview

examples/concurrency/multithreading_hello_milvus.py~L71

         print(
             f"Inserted {len(self.batchs)} batches of {num_entities} entities in {duration} seconds"
         )
-        print(f"Expected num_entities: {len(self.batchs) * num_entities}. \
-                Acutal num_entites: {self.get_thread_local_collection().num_entities}")
+        print(
+            f"Expected num_entities: {len(self.batchs) * num_entities}. \
+                Acutal num_entites: {self.get_thread_local_collection().num_entities}"
+        )
 
 
 if __name__ == "__main__":

@MichaReiser
Copy link
Member

This is looking good!

@dhruvmanila

This comment was marked as resolved.

@dhruvmanila dhruvmanila force-pushed the dhruv/f-string-assignment-2 branch from b7aa489 to c2d8b38 Compare November 21, 2024 10:44
@dhruvmanila dhruvmanila changed the title WIP: Fix f-string formatting in assignment statement Fix f-string formatting in assignment statement Nov 21, 2024
@dhruvmanila dhruvmanila force-pushed the dhruv/f-string-assignment-1 branch from 19e6a99 to 4c71b5d Compare November 21, 2024 11:16
@dhruvmanila dhruvmanila force-pushed the dhruv/f-string-assignment-2 branch from c2d8b38 to 55574a9 Compare November 21, 2024 11:16
@dhruvmanila dhruvmanila marked this pull request as ready for review November 21, 2024 11:31
@MichaReiser MichaReiser added the preview Related to preview mode features label Nov 21, 2024
Copy link
Member

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This overall looks good, but I think we're now using the BestFit layout in too many places, which leads to bad formatting in clause headers (see my inline comment).

I think we should only use the BestFit layout if the FString uses the flat layout. In all other cases, we should use the Multiline layout.

We should add some more tests that cover BestFit layout usages in non assignment positions to verify that we got the needs_parentheses changes correct.

See

##############################################################################
# Regressions
##############################################################################
LEEEEEEEEEEEEEEEEEEEEEEFT = RRRRRRRRIIIIIIIIIIIIGGGGGHHHT | {
"entityNameeeeeeeeeeeeeeeeee", # comment must be long enough to
"some long implicit concatenated string" "that should join"
}
# Ensure that flipping between Multiline and BestFit layout results in stable formatting
# when using IfBreaksParenthesized layout.
assert False, "Implicit concatenated string" "uses {} layout on {} format".format(
"Multiline", "first"
)
assert False, await "Implicit concatenated string" "uses {} layout on {} format".format(
"Multiline", "first"
)
assert False, "Implicit concatenated stringuses {} layout on {} format"[
aaaaaaaaa, bbbbbb
]
assert False, +"Implicit concatenated string" "uses {} layout on {} format".format(
"Multiline", "first"
)

and

# Fits
with "aa" "bbb" "cccccccccccccccccccccccccccccccccccccccccccccc":
pass
# Parenthesize single-line
with "aa" "bbb" "ccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc":
pass
# Multiline
with "aa" "bbb" "cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc":
pass
with f"aaaaaaa{expression}bbbb" f"ccc {20999}" "more":
pass
##############################################################################
# For loops
##############################################################################
# Flat
for a in "aaaaaaaaa" "bbbbbbbbb" "ccccccccc" "dddddddddd":
pass
# Parenthesize single-line
for a in "aaaaaaaaa" "bbbbbbbbb" "ccccccccc" "dddddddddd" "eeeeeeeeeeeeeee" "fffffffffffff" "ggggggggggggggg" "hh":
pass
# Multiline
for a in "aaaaaaaaa" "bbbbbbbbb" "ccccccccc" "dddddddddd" "eeeeeeeeeeeeeee" "fffffffffffff" "ggggggggggggggg" "hhhh":
pass
##############################################################################
# Assert statement
##############################################################################
# Fits
assert "aaaaaaaaa" "bbbbbbbbbbbb", "cccccccccccccccc" "dddddddddddddddd"
# Wrap right
assert "aaaaaaaaa" "bbbbbbbbbbbb", "cccccccccccccccc" "dddddddddddddddd" "eeeeeeeeeeeee" "fffffffffff"
# Right multiline
assert "aaaaaaaaa" "bbbbbbbbbbbb", "cccccccccccccccc" "dddddddddddddddd" "eeeeeeeeeeeee" "fffffffffffffff" "ggggggggggggg" "hhhhhhhhhhh"
# Wrap left
assert "aaaaaaaaa" "bbbbbbbbbbbb" "cccccccccccccccc" "dddddddddddddddd" "eeeeeeeeeeeee" "fffffffffffffff", "ggggggggggggg" "hhhhhhhhhhh"
# Left multiline
assert "aaaaaaaaa" "bbbbbbbbbbbb" "cccccccccccccccc" "dddddddddddddddd" "eeeeeeeeeeeee" "fffffffffffffff" "ggggggggggggg", "hhhhhhhhhhh"
# wrap both
assert "aaaaaaaaa" "bbbbbbbbbbbb" "cccccccccccccccc" "dddddddddddddddd" "eeeeeeeeeeeee" "fffffffffffffff", "ggggggggggggg" "hhhhhhhhhhh" "iiiiiiiiiiiiiiiiii" "jjjjjjjjjjjjj" "kkkkkkkkkkkkkkkkk" "llllllllllll"
# both multiline
assert "aaaaaaaaa" "bbbbbbbbbbbb" "cccccccccccccccc" "dddddddddddddddd" "eeeeeeeeeeeee" "fffffffffffffff" "ggggggggggggg", "hhhhhhhhhhh" "iiiiiiiiiiiiiiiiii" "jjjjjjjjjjjjj" "kkkkkkkkkkkkkkkkk" "llllllllllll" "mmmmmmmmmmmmmm"
##############################################################################
# In clause headers (can_omit_optional_parentheses)
##############################################################################
# Use can_omit_optional_parentheses layout to avoid an instability where the formatter
# picks the can_omit_optional_parentheses layout when the strings are joined.
if (
f"implicit"
"concatenated"
"string" + f"implicit"
"concaddddddddddded"
"ring"
* len([aaaaaa, bbbbbbbbbbbbbbbb, cccccccccccccccccc, ddddddddddddddddddddddddddd])
):
pass
# Keep parenthesizing multiline - implicit concatenated strings
if (
f"implicit"
"""concatenate
d"""
"string" + f"implicit"
"concaddddddddddded"
"ring"
* len([aaaaaa, bbbbbbbbbbbbbbbb, cccccccccccccccccc, ddddddddddddddddddddddddddd])
):
pass
if (
[
aaaaaa,
bbbbbbbbbbbbbbbb,
cccccccccccccccccc,
ddddddddddddddddddddddddddd,
]
+ "implicitconcat"
"enatedstriiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiing"
):
pass

crates/ruff_python_formatter/src/string/mod.rs Outdated Show resolved Hide resolved
// This isn't decided yet, refer to the relevant discussion:
// https://github.com/astral-sh/ruff/discussions/9785
else if StringLike::FString(self).is_multiline(context.source()) {
} else if StringLike::FString(self).is_multiline(context) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think using BestFit for all f-strings that contain multiline expressions isn't correct. For example. It results in

if f"aaaaaaaaaaa { ttttteeeeeeeeest} more {
    aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
}": pass

being formatted as

if f"aaaaaaaaaaa {ttttteeeeeeeeest} more {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}":
    pass

Which seems worse than before.

I think we should keep using Multiline if the f-string has any multiline expression to avoid collapsing them.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the reason it was formatted like that previously is because the needs_parentheses would return OptionalParentheses::Never. If we would return Multiline then I think parentheses will be added making it:

if (
    f"aaaaaaaaaaa {ttttteeeeeeeeest} more {
        aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
    }"
):
    pass

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can also decide to return Never. It's not entirely clear to me what the better and more consistent layout is. I would have to play with a few layouts but I don't think it should be what it is now.

Using Never has the advantage that it avoids unnecessary parentheses and is closer to what we had today (and no one complained?). Adding parentheses is similar to having "aaaaa" + tttttt + "more(aaaaaaaaaaaaaaaaaaaaaaaaaa). It might be good to play with the formatting in other positions where we use BestFit to decide what the best layout is

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's a good idea, I can do that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add a test for this

Comment on lines 226 to 228
if string.is_implicit_concatenated() || !string.is_multiline(context) {
return false;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your change is an improvement to what we had before.

For example, this was correctly formatted code before

call(f"{
    testeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee
}")

But I don't think it's correct to use the "hug" layout if an inner expression is multiline

call(f"{
    aaaaaa
    + '''test
    more'''
}")

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I follow here. Are you saying that both the code snippet should result in the formatting where the f-string is on the same line as the call expression (call(f"{)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not directly related to your PR, but I noticed it when reviewing it because you changed is_multiline. We have to make a decision on the idiomatic formatting for "hugging multiline strings" for f-strings.

I'm leaning towards that we should only "hug" if the f-string itself contains any multiline string literal, but not if any inner-expression is multiline. Similar to prettier:

call(
  `${
    aaaaaaaaaaaaaaaaaaaaaaaaaaaa +
    `test more
    aaa` +
    morrrrrrrrrrrrrrrrrrrrrr
  }`,
);

So what I'm saying is that we should probably not hug for the both cases above.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest that we add an explicit is_triple_quoted check here. I don't think we ever want this to apply to any non-triple quoted strings. And let's add some test cases that cover f-strings too (cases where we don't want the layout to apply as well as cases where it should apply)

crates/ruff_python_formatter/src/statement/stmt_assign.rs Outdated Show resolved Hide resolved
crates/ruff_python_formatter/src/statement/stmt_assign.rs Outdated Show resolved Hide resolved
crates/ruff_python_formatter/src/string/implicit.rs Outdated Show resolved Hide resolved
crates/ruff_python_formatter/src/statement/stmt_assign.rs Outdated Show resolved Hide resolved
crates/ruff_python_formatter/src/string/mod.rs Outdated Show resolved Hide resolved
Base automatically changed from dhruv/f-string-assignment-1 to main November 25, 2024 04:59
@dhruvmanila dhruvmanila force-pushed the dhruv/f-string-assignment-2 branch from 03896f0 to 4ce0c42 Compare November 25, 2024 05:03
Copy link
Member

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good. I still think we should add more tests for f-strings in clause headers and assert statements. For example:

if (f"teaaaaaaaaaaaaaaaaaaaaaaaa{
    expressioneeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee # comment
    }aast".contains('a') 
):
    pass

We now remove parentheses from:

aaaaaaaaaaaaaaaaaa = (
    f"testeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee{[a, b,
    # comment
    ]}moee" # comment
) 

but keep them for

aaaaaaaaaaaaaaaaaa = (
    f"testeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee{[a, b,
    ]}moee"
    # comment
) 

The second case is consistent with how we handle parentheses for all other non-fstring assignments.

But I think what you have now is consistent with how e.g. lists are formatted where the parentheses are removed as well

aaaaaaaaaaaaaaaaaa = (
    [testeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee,
        # comment
    ]
) 

crates/ruff_python_ast/src/nodes.rs Outdated Show resolved Hide resolved
Comment on lines +121 to +135
if is_f_string_formatting_enabled(context) {
// Expressions containing comments can't be joined.
//
// Format specifiers needs to be checked as well. For example, the
// following should be considered multiline because the literal
// part of the format specifier contains a newline at the end
// (`.3f\n`):
//
// ```py
// x = f"hello {a + b + c + d:.3f
// } world"
// ```
context.comments().contains_comments(expression.into())
|| expression.format_spec.as_deref().is_some_and(|spec| {
contains_line_break_or_comments(&spec.elements, context)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to consider debug expressions too?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so. If debug expressions are present, then there are no breakpoints in that f-string.

Copy link
Member

@MichaReiser MichaReiser Nov 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But they could be multiline, no? The logic here isn't specific to the assignment formatting but is generally used to determine if a string is known to be multiline and it's unclear to me how a newline in a f-string literal is different from a newline in the debug expression

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, so something like the following which cannot be flattened:

aaaaaaaa = f"aaaaaaaaaa {
        aaaaaaa + bbbbbbb + cccccccc = } dddddddddd"

So, I think we should check if there's a debug expression and if so then check for line breaks in the expression itself.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but I think we also have to test in the debug expression because the above has no line break in the expression, only in the debug expression but the whole f-string should be considered multiline (whether it can be flattened is irrelevant here because this is a general purpose method to determine if a string contains a hard line break)

Comment on lines 226 to 228
if string.is_implicit_concatenated() || !string.is_multiline(context) {
return false;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest that we add an explicit is_triple_quoted check here. I don't think we ever want this to apply to any non-triple quoted strings. And let's add some test cases that cover f-strings too (cases where we don't want the layout to apply as well as cases where it should apply)

@dhruvmanila dhruvmanila force-pushed the dhruv/f-string-assignment-2 branch from 1e63fdc to 0587004 Compare November 26, 2024 07:15
@dhruvmanila
Copy link
Member Author

I've added a bunch of test cases for f-strings in various positions.

The formatting is different between the one in assignment vs e.g., a for statement for type of f-string that's mentioned in the linked issue. This is because the special casing is only done for f-string in the assignment value position.

So, the following:

aaaaaa = f"testeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee{
        expression}moreeeeeeeeeeeeeeeeeeeeeeeeeeeeee"

aaaaaa = (
    f"testeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee{expression}moreeeeeeeeeeeeeeeeeeeeeeeeeeeee"
)

for a in f"testeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee{
        expression}moreeeeeeeeeeeeeeeeeeeeeeeeeeeeee":
    pass

for a in (
    f"testeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee{expression}moreeeeeeeeeeeeeeeeeeeeeeeeeeeee"
):
    pass

will get formatted to:

aaaaaa = (
    f"testeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee{expression}moreeeeeeeeeeeeeeeeeeeeeeeeeeeeee"
)

aaaaaa = (
    f"testeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee{expression}moreeeeeeeeeeeeeeeeeeeeeeeeeeeee"
)

for a in f"testeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee{
    expression
}moreeeeeeeeeeeeeeeeeeeeeeeeeeeeee":
    pass

for a in (
    f"testeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee{expression}moreeeeeeeeeeeeeeeeeeeeeeeeeeeee"
):
    pass

I wonder if this would need to be done instead at the f-string expression level.

@MichaReiser
Copy link
Member

The for case seems fine to me. It's mostly consistent with the formatting in if-statements:

for a in f"testeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee{
        expression}moreeeeeeeeeeeeeeeeeeeeeeeeeeeeee":
    pass

for a in f"testeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee{expression}moreeeeeeeeeeeeeeeeeeeeeeeeeeeee":
    pass

if f"testeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee{
        expression}moreeeeeeeeeeeeeeeeeeeeeeeeeeeeee":
    pass

if (
    f"testeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee{expression}moreeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee"
):
    pass
for a in f"testeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee{
    expression
}moreeeeeeeeeeeeeeeeeeeeeeeeeeeeee":
    pass

for a in (
    f"testeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee{expression}moreeeeeeeeeeeeeeeeeeeeeeeeeeeee"
):
    pass

if f"testeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee{
    expression
}moreeeeeeeeeeeeeeeeeeeeeeeeeeeeee":
    pass

if f"testeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee{expression}moreeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee":
    pass

This seems fine to me and we can iterate on the exact formatting in future style guides based on actual user feedback

@dhruvmanila
Copy link
Member Author

The ecosystem change looks correct to me and is similar to what we do for regular strings:

--- /Users/dhruv/playground/ruff/formatter/preview.py
+++ /Users/dhruv/playground/ruff/formatter/preview.py
@@ -1,6 +1,10 @@
 if 1:
     if 1:
-        print("Expected num_entities: {len(self.batchs)*num_entities}. \
-                Acutal num_entites: {self.get_thread_local_collection().num_entities}")
-        print(f"Expected num_entities: {len(self.batchs)*num_entities}. \
-                Acutal num_entites: {self.get_thread_local_collection().num_entities}")
+        print(
+            "Expected num_entities: {len(self.batchs)*num_entities}. \
+                Acutal num_entites: {self.get_thread_local_collection().num_entities}"
+        )
+        print(
+            f"Expected num_entities: {len(self.batchs) * num_entities}. \
+                Acutal num_entites: {self.get_thread_local_collection().num_entities}"
+        )

Going to merge this now.

@dhruvmanila dhruvmanila merged commit f3dac27 into main Nov 26, 2024
20 checks passed
@dhruvmanila dhruvmanila deleted the dhruv/f-string-assignment-2 branch November 26, 2024 09:37
dhruvmanila added a commit that referenced this pull request Nov 27, 2024
## Summary

This PR fixes a bug in the f-string formatting to not consider the
escaped newlines for `is_multiline`. This is done by checking if the
f-string is triple-quoted or not similar to normal string literals.

This is not required to be gated behind preview because the logic change
for `is_multiline` was added in
#14454.

## Test Plan

Add a test case which formats differently on `main`:
https://play.ruff.rs/ea3c55c2-f0fe-474e-b6b8-e3365e0ede5e
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
formatter Related to the formatter preview Related to preview mode features
Projects
None yet
Development

Successfully merging this pull request may close these issues.

F-String formatting in assignment positions
2 participants