Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New AST nodes for f-string elements #8835

Merged
merged 12 commits into from
Dec 7, 2023
Merged

New AST nodes for f-string elements #8835

merged 12 commits into from
Dec 7, 2023

Conversation

dhruvmanila
Copy link
Member

@dhruvmanila dhruvmanila commented Nov 24, 2023

Rebase of #6365 authored by @davidszotten.

Summary

This PR updates the AST structure for an f-string elements.

The main motivation behind this change is to have a dedicated node for the string part of an f-string. Previously, the existing ExprStringLiteral node was used for this purpose which isn't exactly correct. The ExprStringLiteral node should include the quotes as well in the range but the f-string literal element doesn't include the quote as it's a specific part within an f-string. For example,

f"foo {x}"
# ^^^^
# This is the literal part of an f-string

The introduction of FStringElement enum is helpful which represent either the literal part or the expression part of an f-string.

Rule Updates

This means that there'll be two nodes representing a string depending on the context. One for a normal string literal while the other is a string literal within an f-string. The AST checker is updated to accommodate this change. The rules which work on string literal are updated to check on the literal part of f-string as well.

Notes

  1. The Expr::is_literal_expr method would check for ExprStringLiteral and return true if so. But now that we don't represent the literal part of an f-string using that node, this improves the method's behavior and confines to the actual expression. We do have the FStringElement::is_literal method.
  2. We avoid checking if we're in a f-string context before adding to string_type_definitions because the f-string literal is now a dedicated node and not part of Expr.
  3. Annotations cannot use f-string so we avoid changing any rules which work on annotation and checks for ExprStringLiteral.

Test Plan

  • All references of Expr::StringLiteral were checked to see if any of the rules require updating to account for the f-string literal element node.
  • New test cases are added for rules which check against the literal part of an f-string.
  • Check the ecosystem results and ensure it remains unchanged.

Performance

There's a performance penalty in the parser. The reason for this remains unknown as it seems that the generated assembly code is now different for the __reduce154 function. The reduce function body is just popping the ParenthesizedExpr on top of the stack and pushing it with the new location.

  • The size of FStringElement enum is the same as Expr which is what it replaces in FString::format_spec
  • The size of FStringExpressionElement is the same as ExprFormattedValue which is what it replaces

I tried reducing the Expr enum from 80 bytes to 72 bytes but it hardly resulted in any performance gain. The difference can be seen here:

Backtracking

I tried backtracking the changes to see if any of the isolated change produced this regression. The problem here is that the overall change is so small that there's only a single checkpoint where I can backtrack and that checkpoint results in the same regression. This checkpoint is to revert using Expr to the FString::format_spec field. After this point, the change would revert back to the original implementation.

Review process

The review process is similar to #7927. The first set of commits update the node structure, parser, and related AST files. Then, further commits update the linter and formatter part to account for the AST change.

Copy link

codspeed-hq bot commented Nov 24, 2023

CodSpeed Performance Report

Merging #8835 will degrade performances by 6.65%

Comparing dhruv/fstring-element (b6be853) with main (af88ffc)

Summary

❌ 6 regressions
✅ 24 untouched benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

Benchmark main dhruv/fstring-element Change
linter/default-rules[numpy/ctypeslib.py] 16.1 ms 17.2 ms -6.65%
parser[numpy/ctypeslib.py] 11.1 ms 11.8 ms -5.86%
parser[pydantic/types.py] 24.8 ms 26.3 ms -5.59%
parser[numpy/globals.py] 1.1 ms 1.1 ms -4.97%
parser[large/dataset.py] 64.8 ms 68.7 ms -5.7%
parser[unicode/pypinyin.py] 3.9 ms 4.1 ms -5.58%

Copy link
Contributor

github-actions bot commented Nov 24, 2023

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

Formatter (stable)

✅ ecosystem check detected no format changes.

Formatter (preview)

✅ ecosystem check detected no format changes.

Base automatically changed from dhruv/remove-allow to main November 25, 2023 00:09
@dhruvmanila

This comment was marked as resolved.

@dhruvmanila dhruvmanila added the internal An internal refactor or improvement label Nov 25, 2023
@dhruvmanila dhruvmanila force-pushed the dhruv/fstring-element branch 6 times, most recently from 8d52723 to 7cf4999 Compare November 26, 2023 17:12
@dhruvmanila dhruvmanila reopened this Nov 27, 2023
@dhruvmanila dhruvmanila force-pushed the dhruv/fstring-element branch 5 times, most recently from ebb8d44 to b9aa0da Compare December 1, 2023 22:32
@MichaReiser
Copy link
Member

MichaReiser commented Dec 4, 2023

Comparing the generated llvm instructions between main and this PR

< define internal fastcc void @_ZN18ruff_python_parser6python12__parse__Top11__reduce15417hd406a4d6cb8441daE(ptr noalias nocapture noundef align 8 dereferenceable(24) %__symbols) unnamed_addr #2 personality ptr @rust_eh_personality {
---
> define internal fastcc void @_ZN18ruff_python_parser6python12__parse__Top11__reduce15417h429ba5e9ae4663f4E(ptr noalias nocapture noundef align 8 dereferenceable(24) %__symbols) unnamed_addr #2 personality ptr @rust_eh_personality {
6,10c6,12
<   %_10.sroa.0 = alloca [192 x i8], align 8
<   tail call void @llvm.experimental.noalias.scope.decl(metadata !21616)
<   call void @llvm.lifetime.start.p0(i64 208, ptr nonnull %_2.i), !noalias !21619
<   tail call void @llvm.experimental.noalias.scope.decl(metadata !21621)
<   tail call void @llvm.experimental.noalias.scope.decl(metadata !21624)
---
>   %__nt = alloca %"ruff_python_ast::nodes::ParenthesizedExpr", align 8
>   %_11.sroa.4 = alloca [23 x i32], align 4
>   %_10.sroa.4 = alloca [23 x i32], align 4
>   tail call void @llvm.experimental.noalias.scope.decl(metadata !21750)
>   call void @llvm.lifetime.start.p0(i64 208, ptr nonnull %_2.i), !noalias !21753
>   tail call void @llvm.experimental.noalias.scope.decl(metadata !21755)
>   tail call void @llvm.experimental.noalias.scope.decl(metadata !21758)
12c14
<   %_2.i.i = load i64, ptr %0, align 8, !alias.scope !21626, !noalias !21627, !noundef !14
---
>   %_2.i.i = load i64, ptr %0, align 8, !alias.scope !21760, !noalias !21761, !noundef !14
14c16
<   br i1 %1, label %"_ZN5alloc3vec16Vec$LT$T$C$A$GT$3pop17h34c3d6d4499edab0E.exit.thread.i", label %"_ZN5alloc3vec16Vec$LT$T$C$A$GT$3pop17h34c3d6d4499edab0E.exit.i"
---
>   br i1 %1, label %"_ZN5alloc3vec16Vec$LT$T$C$A$GT$3pop17h91ca183f74ada0f2E.exit.thread.i", label %"_ZN5alloc3vec16Vec$LT$T$C$A$GT$3pop17h91ca183f74ada0f2E.exit.i"
16,18c18,19
< "_ZN5alloc3vec16Vec$LT$T$C$A$GT$3pop17h34c3d6d4499edab0E.exit.thread.i": ; preds = %start
<   %2 = getelementptr inbounds %"core::option::Option<(ruff_text_size::size::TextSize, python::__parse__Top::__Symbol, ruff_text_size::size::TextSize)>", ptr %_2.i, i64 0, i32 1
<   store i8 104, ptr %2, align 8, !alias.scope !21621, !noalias !21628
---
> "_ZN5alloc3vec16Vec$LT$T$C$A$GT$3pop17h91ca183f74ada0f2E.exit.thread.i": ; preds = %start
>   store i32 137, ptr %_2.i, align 8, !alias.scope !21755, !noalias !21762
21,26c22,27
< "_ZN5alloc3vec16Vec$LT$T$C$A$GT$3pop17h34c3d6d4499edab0E.exit.i": ; preds = %start
<   %3 = add i64 %_2.i.i, -1
<   store i64 %3, ptr %0, align 8, !alias.scope !21626, !noalias !21627
<   %4 = getelementptr inbounds { ptr, i64 }, ptr %__symbols, i64 0, i32 1
<   %5 = load i64, ptr %4, align 8, !noalias !14, !noundef !14
<   %_3.i.i = icmp ult i64 %3, %5
---
> "_ZN5alloc3vec16Vec$LT$T$C$A$GT$3pop17h91ca183f74ada0f2E.exit.i": ; preds = %start
>   %2 = add i64 %_2.i.i, -1
>   store i64 %2, ptr %0, align 8, !alias.scope !21760, !noalias !21761
>   %3 = getelementptr inbounds { ptr, i64 }, ptr %__symbols, i64 0, i32 1
>   %4 = load i64, ptr %3, align 8, !noalias !14, !noundef !14
>   %_3.i.i = icmp ult i64 %2, %4
29,34c30,34
<   %src.i.i = getelementptr inbounds { %"python::__parse__Top::__Symbol", i32, i32 }, ptr %self1.i.i, i64 %3
<   call void @llvm.memcpy.p0.p0.i64(ptr noundef nonnull align 8 dereferenceable(208) %_2.i, ptr noundef nonnull align 8 dereferenceable(208) %src.i.i, i64 208, i1 false), !noalias !21628
<   %.phi.trans.insert.i = getelementptr inbounds %"core::option::Option<(ruff_text_size::size::TextSize, python::__parse__Top::__Symbol, ruff_text_size::size::TextSize)>", ptr %_2.i, i64 0, i32 1
<   %.pre.i = load i8, ptr %.phi.trans.insert.i, align 8, !range !2745, !noalias !21619
<   %6 = icmp eq i8 %.pre.i, 17
<   br i1 %6, label %"_ZN5alloc3vec16Vec$LT$T$C$A$GT$4push17h6b55ac9fa3bd2a2aE.exit", label %bb2.i
---
>   %src.i.i = getelementptr inbounds { %"python::__parse__Top::__Symbol", i32, i32 }, ptr %self1.i.i, i64 %2
>   call void @llvm.memcpy.p0.p0.i64(ptr noundef nonnull align 8 dereferenceable(208) %_2.i, ptr noundef nonnull align 8 dereferenceable(208) %src.i.i, i64 208, i1 false), !noalias !21762
>   %.pr.i = load i32, ptr %_2.i, align 8, !noalias !21753
>   %cond.i = icmp eq i32 %.pr.i, 47
>   br i1 %cond.i, label %"_ZN5alloc3vec16Vec$LT$T$C$A$GT$4push17h6f744e2ae0097c52E.exit", label %bb2.i
36,37c36,37
< bb2.i:                                            ; preds = %"_ZN5alloc3vec16Vec$LT$T$C$A$GT$3pop17h34c3d6d4499edab0E.exit.i", %"_ZN5alloc3vec16Vec$LT$T$C$A$GT$3pop17h34c3d6d4499edab0E.exit.thread.i"
<   %7 = phi i8 [ 104, %"_ZN5alloc3vec16Vec$LT$T$C$A$GT$3pop17h34c3d6d4499edab0E.exit.thread.i" ], [ %.pre.i, %"_ZN5alloc3vec16Vec$LT$T$C$A$GT$3pop17h34c3d6d4499edab0E.exit.i" ]
---
> bb2.i:                                            ; preds = %"_ZN5alloc3vec16Vec$LT$T$C$A$GT$3pop17h91ca183f74ada0f2E.exit.i", %"_ZN5alloc3vec16Vec$LT$T$C$A$GT$3pop17h91ca183f74ada0f2E.exit.thread.i"
>   %5 = phi i32 [ 137, %"_ZN5alloc3vec16Vec$LT$T$C$A$GT$3pop17h91ca183f74ada0f2E.exit.thread.i" ], [ %.pr.i, %"_ZN5alloc3vec16Vec$LT$T$C$A$GT$3pop17h91ca183f74ada0f2E.exit.i" ]
40c40
<           to label %unreachable.i unwind label %cleanup.i, !noalias !21619
---
>           to label %unreachable.i unwind label %cleanup.i, !noalias !21753
43c43
<   %8 = landingpad { ptr, i32 }
---
>   %6 = landingpad { ptr, i32 }
45,46c45,46
<   %9 = icmp eq i8 %7, 104
<   br i1 %9, label %bb6.i, label %bb2.i3.i
---
>   %7 = icmp eq i32 %5, 137
>   br i1 %7, label %bb6.i, label %bb2.i2.i
48c48
< bb2.i3.i:                                         ; preds = %cleanup.i
---
> bb2.i2.i:                                         ; preds = %cleanup.i
50,51c50,51
<   invoke fastcc void @"_ZN4core3ptr71drop_in_place$LT$ruff_python_parser..python..__parse__Top..__Symbol$GT$17h7ca4ec50464f112bE"(ptr noalias noundef nonnull align 8 dereferenceable(200) %_2.i)
<           to label %bb6.i unwind label %terminate.i, !noalias !21619
---
>   invoke fastcc void @"_ZN4core3ptr71drop_in_place$LT$ruff_python_parser..python..__parse__Top..__Symbol$GT$17ha84e15138d5fb157E"(ptr noalias noundef nonnull align 8 dereferenceable(200) %_2.i)
>           to label %bb6.i unwind label %terminate.i, !noalias !21753
56,57c56,57
< terminate.i:                                      ; preds = %bb2.i3.i
<   %10 = landingpad { ptr, i32 }
---
> terminate.i:                                      ; preds = %bb2.i2.i
>   %8 = landingpad { ptr, i32 }
60c60
<   call void @_ZN4core9panicking16panic_in_cleanup17he7753e109d98c84aE() #34, !noalias !21619
---
>   call void @_ZN4core9panicking16panic_in_cleanup17he7753e109d98c84aE() #34, !noalias !21753
63,64c63,64
< bb6.i:                                            ; preds = %bb2.i3.i, %cleanup.i
<   resume { ptr, i32 } %8
---
> bb6.i:                                            ; preds = %bb2.i2.i, %cleanup.i
>   resume { ptr, i32 } %6
66,77c66,83
< "_ZN5alloc3vec16Vec$LT$T$C$A$GT$4push17h6b55ac9fa3bd2a2aE.exit": ; preds = %"_ZN5alloc3vec16Vec$LT$T$C$A$GT$3pop17h34c3d6d4499edab0E.exit.i"
<   %11 = getelementptr inbounds { %"python::__parse__Top::__Symbol", i32, i32 }, ptr %_2.i, i64 0, i32 1
<   %__l.i = load i32, ptr %11, align 8, !noalias !21619, !noundef !14
<   %12 = getelementptr inbounds { %"python::__parse__Top::__Symbol", i32, i32 }, ptr %_2.i, i64 0, i32 2
<   %__r.i = load i32, ptr %12, align 4, !noalias !21619, !noundef !14
<   call void @llvm.lifetime.start.p0(i64 192, ptr nonnull %_10.sroa.0)
<   call void @llvm.memcpy.p0.p0.i64(ptr noundef nonnull align 8 dereferenceable(88) %_10.sroa.0, ptr noundef nonnull align 8 dereferenceable(88) %src.i.i, i64 88, i1 false)
<   call void @llvm.lifetime.end.p0(i64 208, ptr nonnull %_2.i), !noalias !21619
<   tail call void @llvm.experimental.noalias.scope.decl(metadata !21629)
<   call void @llvm.memcpy.p0.p0.i64(ptr noundef nonnull align 8 dereferenceable(192) %src.i.i, ptr noundef nonnull align 8 dereferenceable(192) %_10.sroa.0, i64 192, i1 false), !noalias !21629
<   %_10.sroa.4.0.end.i.sroa_idx = getelementptr inbounds i8, ptr %src.i.i, i64 192
<   store i8 17, ptr %_10.sroa.4.0.end.i.sroa_idx, align 8, !noalias !21629
---
> "_ZN5alloc3vec16Vec$LT$T$C$A$GT$4push17h6f744e2ae0097c52E.exit": ; preds = %"_ZN5alloc3vec16Vec$LT$T$C$A$GT$3pop17h91ca183f74ada0f2E.exit.i"
>   %9 = getelementptr inbounds { %"python::__parse__Top::__Symbol", i32, i32 }, ptr %_2.i, i64 0, i32 1
>   %__l.i = load i32, ptr %9, align 8, !noalias !21753, !noundef !14
>   %10 = getelementptr inbounds { %"python::__parse__Top::__Symbol", i32, i32 }, ptr %_2.i, i64 0, i32 2
>   %__r.i = load i32, ptr %10, align 4, !noalias !21753, !noundef !14
>   %11 = getelementptr inbounds %"python::__parse__Top::__Symbol::Variant15", ptr %_2.i, i64 0, i32 1
>   call void @llvm.memcpy.p0.p0.i64(ptr noundef nonnull align 8 dereferenceable(88) %__nt, ptr noundef nonnull align 8 dereferenceable(88) %11, i64 88, i1 false)
>   call void @llvm.lifetime.end.p0(i64 208, ptr nonnull %_2.i), !noalias !21753
>   call void @llvm.lifetime.start.p0(i64 92, ptr nonnull %_10.sroa.4)
>   call void @llvm.lifetime.start.p0(i64 92, ptr nonnull %_11.sroa.4)
>   %_11.sroa.4.8.sroa_idx = getelementptr inbounds i8, ptr %_11.sroa.4, i64 4
>   call void @llvm.memcpy.p0.p0.i64(ptr noundef nonnull align 4 dereferenceable(88) %_11.sroa.4.8.sroa_idx, ptr noundef nonnull align 8 dereferenceable(88) %__nt, i64 88, i1 false)
>   call void @llvm.memcpy.p0.p0.i64(ptr noundef nonnull align 4 dereferenceable(92) %_10.sroa.4, ptr noundef nonnull align 4 dereferenceable(92) %_11.sroa.4, i64 92, i1 false)
>   call void @llvm.lifetime.end.p0(i64 92, ptr nonnull %_11.sroa.4)
>   tail call void @llvm.experimental.noalias.scope.decl(metadata !21763)
>   store i32 47, ptr %src.i.i, align 8, !noalias !21763
>   %_10.sroa.4.0.end.i.sroa_idx = getelementptr inbounds i8, ptr %src.i.i, i64 4
>   call void @llvm.memcpy.p0.p0.i64(ptr noundef nonnull align 4 dereferenceable(92) %_10.sroa.4.0.end.i.sroa_idx, ptr noundef nonnull align 4 dereferenceable(92) %_10.sroa.4, i64 92, i1 false), !noalias !21763
79c85
<   store i32 %__l.i, ptr %_10.sroa.6.0.end.i.sroa_idx, align 8, !noalias !21629
---
>   store i32 %__l.i, ptr %_10.sroa.6.0.end.i.sroa_idx, align 8, !noalias !21763
81,83c87,89
<   store i32 %__r.i, ptr %_10.sroa.7.0.end.i.sroa_idx, align 4, !noalias !21629
<   store i64 %_2.i.i, ptr %0, align 8, !alias.scope !21629, !noalias !21632
<   call void @llvm.lifetime.end.p0(i64 192, ptr nonnull %_10.sroa.0)
---
>   store i32 %__r.i, ptr %_10.sroa.7.0.end.i.sroa_idx, align 4, !noalias !21763
>   store i64 %_2.i.i, ptr %0, align 8, !alias.scope !21763, !noalias !21766
>   call void @llvm.lifetime.end.p0(i64 92, ptr nonnull %_10.sroa.4)
(venv) 

I did compare the generated assembly and the new assembly indeed is longer for reduce154 than it used to.

8c8
<  cbz     x8, LBB835_3
---
>  cbz     x8, LBB839_3
28,53c28,70
<  ldrb    w19, [sp, #384]
<  cmp     w19, #17
<  b.ne    LBB835_4
<  ldr     w10, [sp, #392]
<  ldr     w11, [sp, #396]
<  ldr     x12, [x9, #80]
<  str     x12, [sp, #80]
<  ldp     q0, q1, [x9, #32]
<  stp     q0, q1, [sp, #32]
<  ldr     q2, [x9, #64]
<  str     q2, [sp, #64]
<  ldp     q3, q4, [sp, #144]
<  ldp     q6, q5, [sp, #112]
<  stp     q5, q3, [x9, #128]
<  ldr     q3, [sp, #176]
<  stp     q4, q3, [x9, #160]
<  ldp     q3, q4, [sp, #80]
<  stp     q2, q3, [x9, #64]
<  ldp     q2, q3, [x9]
<  stp     q2, q3, [sp]
<  stp     q4, q6, [x9, #96]
<  stp     q2, q3, [x9]
<  stp     q0, q1, [x9, #32]
<  mov     w12, #17
<  strb    w12, [x9, #192]
<  stp     w10, w11, [x9, #200]
---
>  ldr     w19, [sp, #192]
>  cmp     w19, #47
>  b.ne    LBB839_4
>  add     x10, sp, #192
>  ldr     w11, [sp, #392]
>  ldr     w12, [sp, #396]
>  ldur    q0, [x10, #40]
>  ldur    q1, [x10, #56]
>  stp     q0, q1, [sp, #128]
>  ldur    q2, [x10, #72]
>  str     q2, [sp, #160]
>  ldr     x13, [sp, #280]
>  str     x13, [sp, #176]
>  ldur    q3, [x10, #8]
>  ldur    q4, [x10, #24]
>  stp     q3, q4, [sp, #96]
>  stur    x13, [x10, #84]
>  stur    q2, [x10, #68]
>  stur    q1, [x10, #52]
>  stur    q0, [x10, #36]
>  stur    q4, [x10, #20]
>  stur    q3, [x10, #4]
>  ldur    q0, [x10, #76]
>  stur    q0, [sp, #76]
>  ldr     q2, [sp, #256]
>  ldp     q3, q0, [sp, #224]
>  stp     q0, q2, [sp, #48]
>  ldp     q0, q1, [sp, #192]
>  str     q0, [sp]
>  stp     q1, q3, [sp, #16]
>  mov     w10, #47
>  str     w10, [x9]
>  ldp     q0, q1, [sp, #32]
>  stur    q0, [x9, #36]
>  stur    q1, [x9, #52]
>  ldr     q0, [sp, #64]
>  stur    q0, [x9, #68]
>  ldur    q0, [sp, #76]
>  str     q0, [x9, #80]
>  ldp     q0, q1, [sp]
>  stur    q0, [x9, #4]
>  stur    q1, [x9, #20]
>  stp     w11, w12, [x9, #200]
60,64c77,81
< LBB835_3:
<  mov     w8, #104
<  mov     w19, #104
<  strb    w8, [sp, #384]
< LBB835_4:
---
> LBB839_3:
>  mov     w8, #137
>  mov     w19, #137
>  str     w8, [sp, #192]
> LBB839_4:
67c84
< LBB835_6:
---
> LBB839_6:
69,70c86,87
<  cmp     w19, #104
<  b.eq    LBB835_8
---
>  cmp     w19, #137
>  b.eq    LBB839_8
72,73c89,90
<  bl      __ZN4core3ptr71drop_in_place$LT$ruff_python_parser..python..__parse__Top..__Symbol$GT$17h7ca4ec50464f112bE
< LBB835_8:
---
>  bl      __ZN4core3ptr71drop_in_place$LT$ruff_python_parser..python..__parse__Top..__Symbol$GT$17ha84e15138d5fb157E
> LBB839_8:
76c93
< LBB835_9:
---
> LBB839_9:

It seems that the new version uses ldr instead of ldrb on ARM which loads the entire 4 bytes instead of just 8bits into memory. I'm no good at reading assembly...

@MichaReiser
Copy link
Member

MichaReiser commented Dec 4, 2023

Some findings from reading through the python.rs changes:

  • There are now 104 Symbol variants compared to 101 before (one for Option<FStringFormatSpec>, one for FStringFormatSpec, and one for FStringElement)
  • The Symbol size remains unchanged at 200 bytes (My hope was that it changed)

@dhruvmanila dhruvmanila force-pushed the dhruv/fstring-element branch 2 times, most recently from f4ed639 to 765c62d Compare December 4, 2023 18:44
@dhruvmanila dhruvmanila marked this pull request as ready for review December 4, 2023 21:04
Copy link
Member

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for driving this long standing PR forward. I like what I see in the parser diffs.

The majority of comments are nits, but there's the question whather we should visit the FStringLiteralElement in visitors and if some lint rules should check if it si a single element fstring to avoid false positives.

I would prefer for @charliermarsh to also review the linter changes, considering that he has more context than I.

davidszotten and others added 12 commits December 6, 2023 11:16
This commit adds the new `FStringElement` enum which is either a literal
element or an expression element. To give context, an f-string such as
`f"foo {x}"` is made up of the literal ("foo ") and an expression (x)
elements.

The change here is to have a dedicated literal node for a string inside
an f-string. Earlier, the existing `StringLiteral` node was being used
to represent this. The problem with that is the f-string range doesn't
include the quote while any other string literal would include the
quotes in the range. This creates a problem in the formatter when the
f-string formatting is implemented.
Additional tests for rules which checks within the literal parts of
f-strings. These rules were updated in the previous commit and new test
cases are added here.
@dhruvmanila dhruvmanila force-pushed the dhruv/fstring-element branch from bd821b8 to b6be853 Compare December 6, 2023 17:17
if !self.semantic.in_f_string()
&& !self.semantic.in_typing_literal()
if !self.semantic.in_typing_literal()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The in_f_string check isn't required now that there's a dedicated node for the literal part of f-string and it isn't the same as ExprStringLiteral.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Smart.

@@ -1006,6 +1007,30 @@ pub(crate) fn expression(expr: &Expr, checker: &mut Checker) {
pyupgrade::rules::unicode_kind_prefix(checker, string_literal);
}
}
for literal in value.elements().filter_map(|element| element.as_literal()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So what happens here if we have, like func(f"/tmp" f"/bad"), and the rule is matching against "/tmp/bad"? I guess that would now be a false negative? Or how do we handle concatenations here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like we already wouldn't flag that on main, though I'm curious if it should be considered a bug. E.g., for S104, should we flag f"0.0" f".0.0"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good point and yes it will be a false negative now. It's not just f"/tmp" f"/bad" but also "/tmp" f"/bad" (string and f-string)

We can introduce a method which iterates over the concatenated literal parts of an f-string. For example,

"foo" f"bar {x} baz" "end"

The above code would return two strings: "foobar " and " bazend". Here, I don't see a way to avoid string allocation here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm moving ahead with this limitation for now but we can revisit if it proves to be a problem. My hunch is this shouldn't be but you never know 🤷‍♂️

Copy link
Member

@charliermarsh charliermarsh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great.

@dhruvmanila dhruvmanila merged commit cdac90e into main Dec 7, 2023
16 of 17 checks passed
@dhruvmanila dhruvmanila deleted the dhruv/fstring-element branch December 7, 2023 16:28
dhruvmanila added a commit that referenced this pull request Dec 7, 2023
## Summary

This PR introduces a new `StringLike` enum which is a narrow type to
indicate string-like nodes. These includes the string literals, bytes
literals, and the literal parts of f-strings.

The main motivation behind this is to avoid repetition of rule calling
in the AST checker. We add a new `analyze::string_like` function which
takes in the enum and calls all the respective rule functions which
expects atleast 2 of the variants of this enum.

I'm open to discarding this if others think it's not that useful at this
stage as currently only 3 rules require these nodes.

As suggested
[here](#8835 (comment))
and
[here](#8835 (comment)).

## Test Plan

`cargo test`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
internal An internal refactor or improvement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants