Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[clang-format] Stop breaking unbreakable strings in JS #66168

Merged
merged 1 commit into from
Sep 13, 2023

Conversation

sstwcw
Copy link
Contributor

@sstwcw sstwcw commented Sep 13, 2023

Dictionary literal keys and strings in TypeScript type declarations can not be broken.

The problem was pointed out by @alexfh and @e-kud here:

https://reviews.llvm.org/D154093#4644512

Dictionary literal keys and strings in TypeScript type declarations can
not be broken.

The problem was pointed out by @alexfh and @e-kud here:

https://reviews.llvm.org/D154093#4644512
@sstwcw sstwcw requested a review from a team as a code owner September 13, 2023 03:43
@llvmbot llvmbot added clang Clang issues not falling into any other category clang-format labels Sep 13, 2023
@llvmbot
Copy link
Member

llvmbot commented Sep 13, 2023

@llvm/pr-subscribers-clang

@llvm/pr-subscribers-clang-format

Changes Dictionary literal keys and strings in TypeScript type declarations can not be broken.

The problem was pointed out by @alexfh and @e-kud here:

https://reviews.llvm.org/D154093#4644512

Full diff: https://github.com/llvm/llvm-project/pull/66168.diff

3 Files Affected:

  • (modified) clang/lib/Format/ContinuationIndenter.cpp (+6)
  • (modified) clang/unittests/Format/FormatTestJS.cpp (+22)
  • (modified) clang/unittests/Format/TokenAnnotatorTest.cpp (+19)
diff --git a/clang/lib/Format/ContinuationIndenter.cpp b/clang/lib/Format/ContinuationIndenter.cpp
index ac62dab1b07cdca..75ab08de42ea0e8 100644
--- a/clang/lib/Format/ContinuationIndenter.cpp
+++ b/clang/lib/Format/ContinuationIndenter.cpp
@@ -2241,6 +2241,12 @@ ContinuationIndenter::createBreakableToken(const FormatToken &Current,
     if (Style.isJson() || !Style.BreakStringLiterals || !AllowBreak)
       return nullptr;
 
+    // Strings in TypeScript types and dictionary keys can not be broken.
+    if (Style.isJavaScript() && (Current.is(TT_SelectorName) ||
+                                 State.Line->startsWith(Keywords.kw_type))) {
+      return nullptr;
+    }
+
     // Don't break string literals inside preprocessor directives (except for
     // #define directives, as their contents are stored in separate lines and
     // are not affected by this check).
diff --git a/clang/unittests/Format/FormatTestJS.cpp b/clang/unittests/Format/FormatTestJS.cpp
index bc4c7ff7ba6b30f..309326569143df6 100644
--- a/clang/unittests/Format/FormatTestJS.cpp
+++ b/clang/unittests/Format/FormatTestJS.cpp
@@ -1596,6 +1596,28 @@ TEST_F(FormatTestJS, StringLiteralConcatenation) {
                "var literal = `xxxxxx ${xxxxxxxxxxxxxxxxxxxxxx + "
                "xxxxxxxxxxxxxxxxxxxxxx} xxxxxx`;",
                getGoogleJSStyleWithColumns(14));
+
+  // Strings in a TypeScript type declaration can't be broken.
+  verifyFormat("type x =\n"
+               "    'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx';",
+               getGoogleJSStyleWithColumns(20));
+  verifyFormat("/* type */ type x =\n"
+               "    'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx';",
+               getGoogleJSStyleWithColumns(20));
+  // Dictionary keys can't be broken. Values can be broken.
+  verifyFormat("var w = {\n"
+               "  'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx':\n"
+               "      'xxxxxxxxxx' +\n"
+               "      'xxxxxxxxxx' +\n"
+               "      'xxxxxxxxxx' +\n"
+               "      'xxxxxxxxxx' +\n"
+               "      'xxx',\n"
+               "};",
+               "var w = {\n"
+               "  'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx':\n"
+               "      'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx',\n"
+               "};",
+               getGoogleJSStyleWithColumns(20));
 }
 
 TEST_F(FormatTestJS, RegexLiteralClassification) {
diff --git a/clang/unittests/Format/TokenAnnotatorTest.cpp b/clang/unittests/Format/TokenAnnotatorTest.cpp
index 434852983712940..9ab1eae85f29c4d 100644
--- a/clang/unittests/Format/TokenAnnotatorTest.cpp
+++ b/clang/unittests/Format/TokenAnnotatorTest.cpp
@@ -2040,6 +2040,25 @@ TEST_F(TokenAnnotatorTest, UnderstandDesignatedInitializers) {
   EXPECT_TOKEN(Tokens[13], tok::period, TT_DesignatedInitializerPeriod);
 }
 
+TEST_F(TokenAnnotatorTest, UnderstandsJavaScript) {
+  auto Annotate = [this](llvm::StringRef Code) {
+    return annotate(Code, getLLVMStyle(FormatStyle::LK_JavaScript));
+  };
+
+  // Dictionary.
+  auto Tokens = Annotate("var x = {'x' : 1, 'y' : 2};");
+  ASSERT_EQ(Tokens.size(), 14u) << Tokens;
+  EXPECT_TOKEN(Tokens[3], tok::l_brace, TT_DictLiteral);
+  EXPECT_TOKEN(Tokens[4], tok::string_literal, TT_SelectorName);
+  EXPECT_TOKEN(Tokens[5], tok::colon, TT_DictLiteral);
+  EXPECT_TOKEN(Tokens[8], tok::string_literal, TT_SelectorName);
+  EXPECT_TOKEN(Tokens[9], tok::colon, TT_DictLiteral);
+  // Change when we need to annotate these.
+  EXPECT_BRACE_KIND(Tokens[3], BK_Unknown);
+  EXPECT_BRACE_KIND(Tokens[11], BK_Unknown);
+  EXPECT_TOKEN(Tokens[11], tok::r_brace, TT_Unknown);
+}
+
 } // namespace
 } // namespace format
 } // namespace clang

@sstwcw
Copy link
Contributor Author

sstwcw commented Sep 13, 2023

Who should I add for JS-related stuff? @mprobst seems to have stopped working on this project. @frigus02 can you have a look at this?

@e-kud
Copy link
Contributor

e-kud commented Sep 13, 2023

The problem was pointed out by @alexfh and @e-kud here:

I think I've been wrongly tagged :) However, I have troubles to find Evgeny Eltsin's github login too...

Copy link
Contributor

@alexfh alexfh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LG. Thanks for the prompt fix!

@alexfh alexfh merged commit 5db201f into llvm:main Sep 13, 2023
@alexfh
Copy link
Contributor

alexfh commented Sep 14, 2023

Unfortunately, this is not enough. I found at least one more broken case (pun intended):

export type LooooooooooooooooooooooooooongTypeName = 'AAAAAAAAAAAA'|('BBBBBB');

gets transformed into

export type LooooooooooooooooooooooooooongTypeName = 'AAAAAAAAAAAA'|('BBBBB' +
                                                                       'B');

which is also invalid.

@sstwcw
Copy link
Contributor Author

sstwcw commented Sep 14, 2023

Unfortunately, this is not enough. I found at least one more broken case (pun intended):

export type LooooooooooooooooooooooooooongTypeName = 'AAAAAAAAAAAA'|('BBBBBB');

gets transformed into

export type LooooooooooooooooooooooooooongTypeName = 'AAAAAAAAAAAA'|('BBBBB' +
                                                                       'B');

which is also invalid.

Here is the new fix. #66321

If it is not urgent, I prefer to wait a few days for reviews and then commit the patch myself.

@alexfh
Copy link
Contributor

alexfh commented Sep 14, 2023

Here is the new fix. #66321

Thanks! I think, I found another pattern where string literal breaking is not allowed - in parameters of goog.module.get() (https://google.github.io/closure-library/api/goog.module.html). I'm not completely sure, which tool produces the error I'm seeing and how relevant this is to the external users of Closure, but we definitely want a way to prevent breaking arguments of this function (or completely) - if not by default, then at least as a customizable style option.

If it is not urgent, I prefer to wait a few days for reviews and then commit the patch myself.

I used to work on this part of clang-format and now the code a bit, thus I feel comfortable approving this sort of a change. Another thing is that we need to get this fixed really soon. Another (safer and actually preferred) alternative is to revert the change and address problematic use cases before recommitting it, but I'm also fine with forward-fixing this as long as fixes don't require major changes that have potential for other problems to appear.

sstwcw added a commit to sstwcw/llvm-project that referenced this pull request Sep 14, 2023
See the discussion
[here](llvm#66168 (comment)).

The functionality is not mature enough.
@sstwcw
Copy link
Contributor Author

sstwcw commented Sep 14, 2023

Another thing is that we need to get this fixed really soon. Another (safer and actually preferred) alternative is to revert the change and address problematic use cases before recommitting it, but I'm also fine with forward-fixing this as long as fixes don't require major changes that have potential for other problems to appear.

I see. In that case, I added #66372.

However, why do you need it fixed so soon? The latest release is
version 16. And the string breaking stuff isn't even in the version 17
branch.

@eaeltsin
Copy link
Contributor

Here are string literal context from closure - This still misses contexts when string literal is required, for example https://github.com/search?q=repo%3Agoogle%2Fclosure-compiler%20%22%20must%20be%20a%20string%20literal%22&type=code

I wonder, if splitting the literal with + is a good option at all.

@sstwcw
Copy link
Contributor Author

sstwcw commented Sep 14, 2023

I wonder, if splitting the literal with + is a good option at all.

Maybe that means we should make the BreakStringLiterals option default to off for JavaScript. But I want about 2 weeks to decide.

Can you set the option in your configuration as a workaround for now? If so, please don't approve #66372. Note that you can set a different style configuration for each language.

@alexfh
Copy link
Contributor

alexfh commented Sep 15, 2023

Another thing is that we need to get this fixed really soon. Another (safer and actually preferred) alternative is to revert the change and address problematic use cases before recommitting it, but I'm also fine with forward-fixing this as long as fixes don't require major changes that have potential for other problems to appear.

I see. In that case, I added #66372.

I think, disabling the functionality for JS/TS is the right choice until all wrinkles are ironed.

However, why do you need it fixed so soon? The latest release is version 16. And the string breaking stuff isn't even in the version 17 branch.

In general, "as a community, we strongly value having the tip of tree in a good state while allowing rapid iterative development" (as opposed to allowing to break stuff after a release cut and trying to glue the pieces together before the next release, see https://llvm.org/docs/DeveloperPolicy.html#patch-reversion-policy). Some LLVM users (toolchain teams at Google, for example) strongly rely on the ToT LLVM being close to (that is, at most a few patches away from) production quality. We are updating our toolchain ~weekly and expect all tools (including clang-format) to be free from critical defects (if you're interested, there's a talk about this: https://www.youtube.com/watch?v=zh91if43QLM). Other users (including multiple open-source projects) may take various parts of LLVM at arbitrary moments of time with similar expectations to the code quality. This particular problem happens to be critical, since it makes clang-format generate invalid code. Leaving this state for weeks after it was detected (or requiring users to modify their configuration just to work around the problem) is unacceptable, thus I think we should proceed with #66372 (or a revert of the original patch) for now.

@eaeltsin
Copy link
Contributor

I second @alexfh that we need to submit #66372 to get back to the good state. Please proceed ASAP.

eywdck2l pushed a commit that referenced this pull request Sep 15, 2023
See the discussion

[here](#66168 (comment)).

The functionality is not mature enough.
ZijunZhaoCCK pushed a commit to ZijunZhaoCCK/llvm-project that referenced this pull request Sep 19, 2023
Dictionary literal keys and strings in TypeScript type declarations can
not be broken.

The problem was pointed out by @alexfh and @e-kud here:

https://reviews.llvm.org/D154093#4644512
ZijunZhaoCCK pushed a commit to ZijunZhaoCCK/llvm-project that referenced this pull request Sep 19, 2023
See the discussion

[here](llvm#66168 (comment)).

The functionality is not mature enough.
@eaeltsin
Copy link
Contributor

Looks like we need the similar fix for verilog, see https://reviews.llvm.org/D154093#4648931

@sstwcw
Copy link
Contributor Author

sstwcw commented Nov 6, 2023

Here are string literal context from closure - This still misses contexts when string literal is required, for example https://github.com/search?q=repo%3Agoogle%2Fclosure-compiler%20%22%20must%20be%20a%20string%20literal%22&type=code

I wonder, if splitting the literal with + is a good option at all.

I was trying to stop the formatter from breaking the arguments to certain functions by marking the lines as not breakable. Then I found the commit 53c38f4 which says that lines containing goog.module should not be broken but lines containing goog.module.get should be broken. Who knows the rationale behind it? Among the functions whose arguments must be a string literal, which ones make the entire line unbreakable?

@owenca owenca removed the clang Clang issues not falling into any other category label May 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants