[clang-format] Fix a bug in annotating angles containing FatArrow #108671

owenca · 2024-09-14T02:37:27Z

Fixes #108536.

Fixes llvm#108536.

llvmbot · 2024-09-14T02:37:59Z

@llvm/pr-subscribers-clang-format

Author: Owen Pan (owenca)

Changes

Fixes #108536.

Full diff: https://github.com/llvm/llvm-project/pull/108671.diff

2 Files Affected:

(modified) clang/lib/Format/TokenAnnotator.cpp (+6-4)
(modified) clang/unittests/Format/TokenAnnotatorTest.cpp (+9)

diff --git a/clang/lib/Format/TokenAnnotator.cpp b/clang/lib/Format/TokenAnnotator.cpp
index dfa703aed0d34d..4313cc20dd61a3 100644
--- a/clang/lib/Format/TokenAnnotator.cpp
+++ b/clang/lib/Format/TokenAnnotator.cpp
@@ -189,7 +189,8 @@ class AnnotatingParser {
       next();
     }
 
-    for (bool SeenTernaryOperator = false; CurrentToken;) {
+    for (bool SeenTernaryOperator = false, SeenFatArrow = false;
+         CurrentToken;) {
       const bool InExpr = Contexts[Contexts.size() - 2].IsExpression;
       if (CurrentToken->is(tok::greater)) {
         const auto *Next = CurrentToken->Next;
@@ -243,7 +244,7 @@ class AnnotatingParser {
       // operator that was misinterpreted because we are parsing template
       // parameters.
       // FIXME: This is getting out of hand, write a decent parser.
-      if (InExpr && !Line.startsWith(tok::kw_template) &&
+      if (InExpr && !SeenFatArrow && !Line.startsWith(tok::kw_template) &&
           Prev.is(TT_BinaryOperator)) {
         const auto Precedence = Prev.getPrecedence();
         if (Precedence > prec::Conditional && Precedence < prec::Relational)
@@ -251,6 +252,8 @@ class AnnotatingParser {
       }
       if (Prev.isOneOf(tok::question, tok::colon) && !Style.isProto())
         SeenTernaryOperator = true;
+      else if (Prev.is(TT_FatArrow))
+        SeenFatArrow = true;
       updateParameterCount(Left, CurrentToken);
       if (Style.Language == FormatStyle::LK_Proto) {
         if (FormatToken *Previous = CurrentToken->getPreviousNonComment()) {
@@ -260,8 +263,7 @@ class AnnotatingParser {
             Previous->setType(TT_SelectorName);
           }
         }
-      }
-      if (Style.isTableGen()) {
+      } else if (Style.isTableGen()) {
         if (CurrentToken->isOneOf(tok::comma, tok::equal)) {
           // They appear as separators. Unless they are not in class definition.
           next();
diff --git a/clang/unittests/Format/TokenAnnotatorTest.cpp b/clang/unittests/Format/TokenAnnotatorTest.cpp
index 5c28e3a4ea5a1f..c3ac5df12d085c 100644
--- a/clang/unittests/Format/TokenAnnotatorTest.cpp
+++ b/clang/unittests/Format/TokenAnnotatorTest.cpp
@@ -616,6 +616,15 @@ TEST_F(TokenAnnotatorTest, UnderstandsTernaryInTemplate) {
   EXPECT_TOKEN(Tokens[8], tok::greater, TT_TemplateCloser);
 }
 
+TEST_F(TokenAnnotatorTest, FatArrowInAngleBrackets) {
+  auto Tokens = annotate("foo = new Bar<(id: int) => X | Y>();",
+                         getLLVMStyle(FormatStyle::LK_JavaScript));
+  ASSERT_EQ(Tokens.size(), 19u) << Tokens;
+  EXPECT_TOKEN(Tokens[4], tok::less, TT_TemplateOpener);
+  EXPECT_TOKEN(Tokens[10], tok::equal, TT_FatArrow);
+  EXPECT_TOKEN(Tokens[14], tok::greater, TT_TemplateCloser);
+}
+
 TEST_F(TokenAnnotatorTest, UnderstandsNonTemplateAngleBrackets) {
   auto Tokens = annotate("return a < b && c > d;");
   ASSERT_EQ(Tokens.size(), 10u) << Tokens;

kadircet · 2024-09-16T07:55:24Z

clang/lib/Format/TokenAnnotator.cpp

          Prev.is(TT_BinaryOperator)) {
        const auto Precedence = Prev.getPrecedence();
        if (Precedence > prec::Conditional && Precedence < prec::Relational)
          return false;
      }
      if (Prev.isOneOf(tok::question, tok::colon) && !Style.isProto())
        SeenTernaryOperator = true;
+      else if (Prev.is(TT_FatArrow))


i am not sure if the regression is specific to fat-arrow here (sorry for not putting much effort into the reproducer). the original change seem to changed behavior for handling of any bitwise operators. so I think something like:

auto foo = new Bar<kFoo | kBar>();

would also be formatted differently

How realistic is your C++ example? In general, I don't think we can really tell if the angle brackets are template opener/closer or comparison operators without checking more (and very specific) contexts, e.g. the new keyword before Bar and the empty parentheses after >. In absence of that, it seems more likely that the angles are comparison operators. See e.g. this test case.

i don't think new or empty parantheses are a must either, the logic around here is just checking that <> are inside an expression, hence that kind of code pattern is likely quite common as people tend to do arithmetic via metaprogramming using these patterns. e.g:

constexpr auto foo() { return FixedInt<N | M>(foo); }

this will be formatted as the following now:

constexpr auto foo() { return FixedInt < N | M > (foo); }

Without extra contexts, how can you tell whether the less/greater below are comparison operators or angle brackets?

return a < b | c > (d ^ e);

sorry I wasn't saying that FixedInt<N | M>(foo); is clearly a template argument, it's definitely ambiguous in absence of name-resolution and there isn't much clang-format can do here.

this kind of backward incompatible changes are making it harder to update clang-format to newer versions in large codebases. as clang-format right now is trading off one set of false-formatting with other. hence my understanding for the project has been to stick with existing formatting, if we're heuristically detecting some patterns. so that behavior on existing formatted-code doesn't regress. maybe that understanding has changed over time, or wasn't there at all. But this bug and requested changes in your PR was solely made with that assumption (as many others I've been filing over the past years).

hence I am not expecting a "fix" for the issue here, but I guess a narrowed down version of other patch, where it actually preserves behavior with old clang-format in uncertain cases.

It seems a general rule of thumb that the more complicated the code the harder the job clang-format seems to have.

Whilst I admire the programming prowess of such code, I generally struggle to read it no matter how its formatted. My view has often been when clang-format does fail, it might say more about the code than about clang-format. For the cases we butcher (and we will from time to time) there is still //clang-format off until we can resolve it.

that said I believe we can only walk towards clang-format doing the right thing one patch at a time, and I don't believe we should let "Perfection be the enemy of good".

I personally like to follow the "Beyonce" rule, "if you liked it you should have put a test on it". IMHO as long as our tests pass, and the test cases keep growing, and we don't allow tests to change without good reason, and all new code has tests, then I believe we are good.

Some person, with some specific failure will always happen, given how much code is likely clang-formatted daily. But honestly I don't think of it as a true regression unless the test is failing, because often regressions are caused by
us now interpreting better, annotating better, or just fixing something and often what previously worked by chance before doesn't now. That's not a real regression its just bad luck and a lack of a test to keep us honest.

While I don't really like code becoming a glorious game of whack-a-mole I think our intentions are honorable and we should make forward process, even if its one patch at a time.

When this does happen it means that previous use case wasn't covered by a test. It needs to be, and we can fix it accordingly adding the missing test which ultimately should give us more robustness around the unintended "bad-luck"

However the clang-format users community must remember, we are a core team of 4 with really @owenca doing what I consider to be the majority share, (sorry I'm so busy with my day job. I only get time for drive by reviewing, but I'm trying to continue to give moral support), as far as I am aware none of us are paid to work on clang-format, and despite all the big IDE players using clang-format integrated into their commercial offerings I see little in the way of contribution from them. In fact at last years CppCon I saw one even openly criticize clang-format with vague obtuse corner cases (A cheap shot in my view).

Its my belief that "our modus operandi" is as correct as it can be given our time and resources. We never intentionally break things, but its extremely hard to find a large enough code base that we can test against especially given that not even all of LLVM or even Clang is clang-format clean, enough to uncover issues. This is especially difficult to find code using new C++ language concepts, modules, complex template scenarios and lambdas.

Now if anyone wants to persuade all of LLVM that the entire repo should be clang-formatted top to bottom, then maybe we'd have more code examples to ensure we introduced less issues.

If you are interested in the "Cheap Shot" ... https://www.youtube.com/watch?v=NnQraMtpvws This is from a JetBrains employee which frankly I'd have expected better from given the feedback I gave them on their blog https://blog.jetbrains.com/clion/2023/12/a-clangformat-story-and-the-third-clion-nova-update/

When we update clang-format in chromium, we format a chunk of the codebase with old and new clang-format versions and look at the diff. We also ran into this regression: https://chromium-review.googlesource.com/c/chromium/src/+/6174847/1..2/base/allocator/partition_allocator/src/partition_alloc/shim/allocator_shim_default_dispatch_to_partition_alloc.cc

We reported #123144 which pointed us to here.

While it's true that a<b | c>(d) could be a < b | c > (d), using | (instead of ||) with comparisons and using parenthesized one-element expressions seems less likely than this being a template function call with explicit template arguments that are a bitwise or, right? Maybe the heuristics could be tweaked to pick the template personality of <> when that seems "more likely"?

I had the same idea but didn't get a chance to implement it.

See #124438.

[clang-format] Fix a bug in annotating angles containing FatArrow

90a2015

Fixes llvm#108536.

llvmbot added the clang-format label Sep 14, 2024

kadircet requested changes Sep 16, 2024

View reviewed changes

kadircet mentioned this pull request Sep 23, 2024

Regression in typescript generics formatting #108536

Closed

owenca requested review from mydeveloperday and HazardyKnusperkeks October 1, 2024 04:32

mydeveloperday approved these changes Oct 2, 2024

View reviewed changes

owenca mentioned this pull request Jan 17, 2025

clang-format produces unnecessary spacing around template angle brackets for class member function calls. #123144

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[clang-format] Fix a bug in annotating angles containing FatArrow #108671

[clang-format] Fix a bug in annotating angles containing FatArrow #108671

owenca commented Sep 14, 2024

llvmbot commented Sep 14, 2024

kadircet Sep 16, 2024

owenca Sep 17, 2024

kadircet Sep 17, 2024

owenca Sep 24, 2024

kadircet Sep 24, 2024

mydeveloperday Oct 2, 2024

mydeveloperday Oct 4, 2024

nico Jan 23, 2025

owenca Jan 24, 2025

owenca Jan 26, 2025

[clang-format] Fix a bug in annotating angles containing FatArrow #108671

Are you sure you want to change the base?

[clang-format] Fix a bug in annotating angles containing FatArrow #108671

Conversation

owenca commented Sep 14, 2024

llvmbot commented Sep 14, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment