-
Notifications
You must be signed in to change notification settings - Fork 12.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[clang-format] Fix a bug in annotating angles containing FatArrow #108671
base: main
Are you sure you want to change the base?
Conversation
@llvm/pr-subscribers-clang-format Author: Owen Pan (owenca) ChangesFixes #108536. Full diff: https://github.com/llvm/llvm-project/pull/108671.diff 2 Files Affected:
diff --git a/clang/lib/Format/TokenAnnotator.cpp b/clang/lib/Format/TokenAnnotator.cpp
index dfa703aed0d34d..4313cc20dd61a3 100644
--- a/clang/lib/Format/TokenAnnotator.cpp
+++ b/clang/lib/Format/TokenAnnotator.cpp
@@ -189,7 +189,8 @@ class AnnotatingParser {
next();
}
- for (bool SeenTernaryOperator = false; CurrentToken;) {
+ for (bool SeenTernaryOperator = false, SeenFatArrow = false;
+ CurrentToken;) {
const bool InExpr = Contexts[Contexts.size() - 2].IsExpression;
if (CurrentToken->is(tok::greater)) {
const auto *Next = CurrentToken->Next;
@@ -243,7 +244,7 @@ class AnnotatingParser {
// operator that was misinterpreted because we are parsing template
// parameters.
// FIXME: This is getting out of hand, write a decent parser.
- if (InExpr && !Line.startsWith(tok::kw_template) &&
+ if (InExpr && !SeenFatArrow && !Line.startsWith(tok::kw_template) &&
Prev.is(TT_BinaryOperator)) {
const auto Precedence = Prev.getPrecedence();
if (Precedence > prec::Conditional && Precedence < prec::Relational)
@@ -251,6 +252,8 @@ class AnnotatingParser {
}
if (Prev.isOneOf(tok::question, tok::colon) && !Style.isProto())
SeenTernaryOperator = true;
+ else if (Prev.is(TT_FatArrow))
+ SeenFatArrow = true;
updateParameterCount(Left, CurrentToken);
if (Style.Language == FormatStyle::LK_Proto) {
if (FormatToken *Previous = CurrentToken->getPreviousNonComment()) {
@@ -260,8 +263,7 @@ class AnnotatingParser {
Previous->setType(TT_SelectorName);
}
}
- }
- if (Style.isTableGen()) {
+ } else if (Style.isTableGen()) {
if (CurrentToken->isOneOf(tok::comma, tok::equal)) {
// They appear as separators. Unless they are not in class definition.
next();
diff --git a/clang/unittests/Format/TokenAnnotatorTest.cpp b/clang/unittests/Format/TokenAnnotatorTest.cpp
index 5c28e3a4ea5a1f..c3ac5df12d085c 100644
--- a/clang/unittests/Format/TokenAnnotatorTest.cpp
+++ b/clang/unittests/Format/TokenAnnotatorTest.cpp
@@ -616,6 +616,15 @@ TEST_F(TokenAnnotatorTest, UnderstandsTernaryInTemplate) {
EXPECT_TOKEN(Tokens[8], tok::greater, TT_TemplateCloser);
}
+TEST_F(TokenAnnotatorTest, FatArrowInAngleBrackets) {
+ auto Tokens = annotate("foo = new Bar<(id: int) => X | Y>();",
+ getLLVMStyle(FormatStyle::LK_JavaScript));
+ ASSERT_EQ(Tokens.size(), 19u) << Tokens;
+ EXPECT_TOKEN(Tokens[4], tok::less, TT_TemplateOpener);
+ EXPECT_TOKEN(Tokens[10], tok::equal, TT_FatArrow);
+ EXPECT_TOKEN(Tokens[14], tok::greater, TT_TemplateCloser);
+}
+
TEST_F(TokenAnnotatorTest, UnderstandsNonTemplateAngleBrackets) {
auto Tokens = annotate("return a < b && c > d;");
ASSERT_EQ(Tokens.size(), 10u) << Tokens;
|
Prev.is(TT_BinaryOperator)) { | ||
const auto Precedence = Prev.getPrecedence(); | ||
if (Precedence > prec::Conditional && Precedence < prec::Relational) | ||
return false; | ||
} | ||
if (Prev.isOneOf(tok::question, tok::colon) && !Style.isProto()) | ||
SeenTernaryOperator = true; | ||
else if (Prev.is(TT_FatArrow)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i am not sure if the regression is specific to fat-arrow here (sorry for not putting much effort into the reproducer). the original change seem to changed behavior for handling of any bitwise operators. so I think something like:
auto foo = new Bar<kFoo | kBar>();
would also be formatted differently
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How realistic is your C++ example? In general, I don't think we can really tell if the angle brackets are template opener/closer or comparison operators without checking more (and very specific) contexts, e.g. the new
keyword before Bar
and the empty parentheses after >
. In absence of that, it seems more likely that the angles are comparison operators. See e.g. this test case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't think new or empty parantheses are a must either, the logic around here is just checking that <>
are inside an expression, hence that kind of code pattern is likely quite common as people tend to do arithmetic via metaprogramming using these patterns. e.g:
constexpr auto foo() { return FixedInt<N | M>(foo); }
this will be formatted as the following now:
constexpr auto foo() { return FixedInt < N | M > (foo); }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without extra contexts, how can you tell whether the less/greater below are comparison operators or angle brackets?
return a < b | c > (d ^ e);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry I wasn't saying that FixedInt<N | M>(foo);
is clearly a template argument, it's definitely ambiguous in absence of name-resolution and there isn't much clang-format can do here.
this kind of backward incompatible changes are making it harder to update clang-format to newer versions in large codebases. as clang-format right now is trading off one set of false-formatting with other. hence my understanding for the project has been to stick with existing formatting, if we're heuristically detecting some patterns. so that behavior on existing formatted-code doesn't regress.
maybe that understanding has changed over time, or wasn't there at all. But this bug and requested changes in your PR was solely made with that assumption (as many others I've been filing over the past years).
hence I am not expecting a "fix" for the issue here, but I guess a narrowed down version of other patch, where it actually preserves behavior with old clang-format in uncertain cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems a general rule of thumb that the more complicated the code the harder the job clang-format seems to have.
Whilst I admire the programming prowess of such code, I generally struggle to read it no matter how its formatted. My view has often been when clang-format does fail, it might say more about the code than about clang-format. For the cases we butcher (and we will from time to time) there is still //clang-format off until we can resolve it.
that said I believe we can only walk towards clang-format doing the right thing one patch at a time, and I don't believe we should let "Perfection be the enemy of good".
I personally like to follow the "Beyonce" rule, "if you liked it you should have put a test on it". IMHO as long as our tests pass, and the test cases keep growing, and we don't allow tests to change without good reason, and all new code has tests, then I believe we are good.
Some person, with some specific failure will always happen, given how much code is likely clang-formatted daily. But honestly I don't think of it as a true regression unless the test is failing, because often regressions are caused by
us now interpreting better, annotating better, or just fixing something and often what previously worked by chance before doesn't now. That's not a real regression its just bad luck and a lack of a test to keep us honest.
While I don't really like code becoming a glorious game of whack-a-mole I think our intentions are honorable and we should make forward process, even if its one patch at a time.
When this does happen it means that previous use case wasn't covered by a test. It needs to be, and we can fix it accordingly adding the missing test which ultimately should give us more robustness around the unintended "bad-luck"
However the clang-format users community must remember, we are a core team of 4 with really @owenca doing what I consider to be the majority share, (sorry I'm so busy with my day job. I only get time for drive by reviewing, but I'm trying to continue to give moral support), as far as I am aware none of us are paid to work on clang-format, and despite all the big IDE players using clang-format integrated into their commercial offerings I see little in the way of contribution from them. In fact at last years CppCon I saw one even openly criticize clang-format with vague obtuse corner cases (A cheap shot in my view).
Its my belief that "our modus operandi" is as correct as it can be given our time and resources. We never intentionally break things, but its extremely hard to find a large enough code base that we can test against especially given that not even all of LLVM or even Clang is clang-format clean, enough to uncover issues. This is especially difficult to find code using new C++ language concepts, modules, complex template scenarios and lambdas.
Now if anyone wants to persuade all of LLVM that the entire repo should be clang-formatted top to bottom, then maybe we'd have more code examples to ensure we introduced less issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you are interested in the "Cheap Shot" ... https://www.youtube.com/watch?v=NnQraMtpvws This is from a JetBrains employee which frankly I'd have expected better from given the feedback I gave them on their blog https://blog.jetbrains.com/clion/2023/12/a-clangformat-story-and-the-third-clion-nova-update/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we update clang-format in chromium, we format a chunk of the codebase with old and new clang-format versions and look at the diff. We also ran into this regression: https://chromium-review.googlesource.com/c/chromium/src/+/6174847/1..2/base/allocator/partition_allocator/src/partition_alloc/shim/allocator_shim_default_dispatch_to_partition_alloc.cc
We reported #123144 which pointed us to here.
While it's true that a<b | c>(d)
could be a < b | c > (d)
, using |
(instead of ||
) with comparisons and using parenthesized one-element expressions seems less likely than this being a template function call with explicit template arguments that are a bitwise or, right? Maybe the heuristics could be tweaked to pick the template personality of <> when that seems "more likely"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had the same idea but didn't get a chance to implement it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #124438.
Fixes #108536.