-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
parser needs special handling of self-documenting f-strings #5970
Comments
self-documenting f-strings (`f"{foo=}"`) are still outstanding see astral-sh/ruff#5970
There's everyday something new to learn about Python :) I used our playground to play with your example and it first seems that they're parsed exactly the same, but the Now, what's a bit awkward is the range of the |
sorry the missing conversion is my bad, i guess the expanded version should be but my point is that when we format we need to render |
ie rather than playing with the ranges of the synthesised nodes, could we not synthesise them at all? |
Ideally yes. I think the overall structure isn't ideal. From a linter perspective. How would you detect that
I believe ruff uses today to reconstruct the f-string. This is not ideal. But I'm a bit hesitant from changing the AST today without taking the RustPython 3.12 changes into account as well, and I'm only familiar with how JS represents template literal expressions. |
|
So what you're suggesting is the following Index: Clipboard.txt
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/Clipboard.txt b/../../Library/Application Support/JetBrains/CLion2023.1/scratches/scratch_1.txt
rename from Clipboard.txt
rename to ../../Library/Application Support/JetBrains/CLion2023.1/scratches/scratch_1.txt
--- a/Clipboard.txt
+++ b/../../Library/Application Support/JetBrains/CLion2023.1/scratches/scratch_1.txt (date 1681077307376)
@@ -2,15 +2,6 @@
ExprJoinedStr {
range: 0..7,
values: [
- Constant(
- ExprConstant {
- range: 0..7,
- value: Str(
- "a=",
- ),
- kind: None,
- },
- ),
FormattedValue(
ExprFormattedValue {
range: 0..7,
But only for I would like that! If that's the case. Could we fix (if not already done) the ranges of the non-synthesized nodes in your PR and we can then tackle the removal of the node as separate PR (we may need to review if some of ruff's rule depend on the presence of the node). CC: @charliermarsh |
yes, that's the suggestion (sorry if i was being unclear). in addition i expect we'll need some extra metadata (on the |
i think they were already correct when you last reviewed. probably the synthesized ones were confusing things. do you have a preference for what to do with the ranges for the synthesized ones for now? just leave (as the same range as the formattedvalue we want to keep)? |
I would leave them as is, considering that we intend to remove them entirely. The ranges could mess up the comment placement but this is not a problem in this very specific occasion because, from what I understood, comments aren't allowed in f-strings anyway. I plan to think about your proposal to add an extra flag later today or do you want to work out a more detailed proposal first? |
please go for it. could do with someone who a) has a broader knowledge of the project and b) can dedicate longer chunks of time to thinking about implications |
I spend some more time looking into this and are concluding that omitting the Let's say we have the following input f"{a=}"
f"{a=!r}" The proposed AST is:
Formatting the first expression's There's another problem where That's why I think that keeping the representation as is, at least for now, results in a consistent structure. But I think we should remove the merging of constants to make formatting easier: Remove the automatic merging of Regarding the initial ask to add a new field to I'm leaning towards not changing the AST for now and instead doing some lightweight parsing inside the formatter. It should be sufficient to test whether @davidszotten what do you think? Do you see use cases where testing for |
well the idea was that the but i'm also interested in your idea to not merge the constants, so let me explore that a bit. did you have a chance to think at all about what ranges to use on the synthesized constants? |
I see. I think it would still be problematic in the case of
That's a really good question and something I overlooked yesterday evening. The ranges are awkward, indeed. Ranges, ideally, map to the node's representation in the source code, and the ranges of siblings should never intersect. Satisfying both constraints seems impossible. So I ended up playing more with f-strings and found more interesting cases: f"{a= }"
f"{ a = }"
f"{a=}"
f"a={a}"
# Valid and preserve whitespace
f"{a = !r}"
f"{a + 1 = !r}"
f"{ a = :#x}"
# Valid, discard whitespace
f"{ a }"
# Valid with Python 3.12 (intentional?)
#f"{a=!r }" The most important findings is that whitespace is meaningful in debug expressions.
I've come up with a few possible representations that seems reasonable for an AST used for static analsis is (that isn't used in an interpreter, Python's representation makes a ton of sense for interpreters):
|
Could |
Saving the range of the debug text is possible. My main concern with this is that our current AST tries to be useful without the source text. Meaning, you shouldn't have to use the source text to retrieve any content (you only need it to retrieve anything that's not part of the AST). |
A fourth option (the third is the option @konstin proposed that uses a range) that I personally prefer over the others that I suggested is: struct FormattedValue {
expression: Expr,
debug_text: Option<DebugText>,
conversion: Conversion,
format_spec: Option<Expr>
}
struct DebugText {
/// The text between the `{` and the expression node.
leading: String,
/// The text between the expression and the conversion, the format_spec, or the `}`, depending on what's present in the source
trailing: String
} |
do i understand you correctly that (in option 4), |
Yes, that's correct. And the |
👍 and just to make sure, changing/adding some nodes is expected to require a bunch of boilerplate changes in |
Yes. It will require changes to visitors, comparable, but it may also impact |
ok, will see if i can wrap my head around all that. before i go too far, is the suggestion this? #[derive(Clone, Debug, PartialEq)]
pub struct ExprJoinedStr {
pub range: TextRange,
// TODO: rename to `parts`?
pub values: Vec<JoinedStringPart>,
}
#[derive(Clone, Debug, PartialEq)]
pub enum JoinedStringPart {
// TODO: i'm guessing not quite this? Another struct? (name?) for the string case?
String { range: TextRange, value: String },
FormattedValue(FormattedValue),
}
#[derive(Clone, Debug, PartialEq)]
pub struct FormattedValue {
pub range: TextRange,
pub expression: Box<Expr>,
pub debug_text: Option<DebugText>,
pub conversion: ConversionFlag,
pub format_spec: Option<Box<Expr>>,
} |
Yes, that's what I had in mind. An inner struct for |
👍 . any ideas for a good name?
attractive since i'm the one writing all the boilerplate, though in general i prefer encoding as much as possible in the type system (i presume using |
@davidszotten I don't know if you already started and you are probably in a better situation to assess the amount of work. I think it's worth considering doing this change in two steps:
I'm mainly worried that you run into many downstream changes and landing 1 on its own would unblock the formatter work. Edit: Are you planning on working on this? I would then assign the issue to you |
hi. i've started implementing the in the mean time, maybe you'd be able to have a look at #5932 (comment) ? you might be better placed than me to implement that (and that's also a blocker for f-strings i think |
Cool. I'll assign this issue to you then. Feel free to exclude the Regarding the comment. I'm sorry not to have replied yet. I'm a bit fin stretched at the moment and making progress here seemed more important. It also isn't the case that I have a good answer, I would need to tinker with different options too. We just did the monthly planning internally, and f-string formatting is one of our goals for the next four weeks. I'll check in with @konstin to decide who will support or implement f-string formatting. What you be interested in working on the f-string formatting after the parser changes or would you prefer if someone else works on (there's also other formatting work that we want to make progress on, e.g. match statements are an important one) |
Hi. I finally had a bit more time to work on this and thus got to the part where No worries about the other comment, I just wanted to make sure it was on your radar. As for timings and progress (f-strings in general) I know i'm moving much slower than you since it's spare-time only. That seemed ok while there were a bunch of other things also left (eg match statements) but feel free to take over if this becomes the limiting factor (but please let me know so i can stop working on it) apart from this issue, making sure quote switching doesn't introduce backslashes inside |
Oh wow, you already have a working implementation. This is awesome! I'll review it on Monday.
Not at all. Our plan is to get f-string formatting done in the next four weeks. You can say that you're interested in tackling it without any commitment and we can check in half time how it is going. That's why I want to leave this up to you. You know best how much time you can and want to invest. |
is this publicly available? (or could you share some high level details?) |
I plan to create an umbrella issue for the formatter on Monday |
python 3.8 and https://bugs.python.org/issue36817 (note no pep "was needed")
added self documenting expressions for f-strings
f"{a=}"
is now short-hand forf"a={a}"
in the parser the self-documented syntax above generates the same expression nodes as the expanded version, which (i think?) means we can't distinguish them. i suppose we need to actually preserve this info in the ast, e.g. by adding
self_documenting
(currently a free var in the parser) to the node or similarThe text was updated successfully, but these errors were encountered: