Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Driver] Don't default to -mrelax-all for non-RISCV -O0 #90013

Conversation

MaskRay
Copy link
Member

@MaskRay MaskRay commented Apr 25, 2024

Some assembly mnemonics may assemble to instructions of different
lengths. The longer form is to support instructions like a long branch.
On X86, -mrelax-all enables MCRelaxAll, which expands instructions to
the long form regardless of whether a short form suffices, while
-mno-relax-all only expands instructions when needed.

// x86 example
void foo(int a) {
  // -mno-relax-all or gas: short jump (2 bytes)
  // -mrelax-all: near jump (6 bytes)
  if (a) bar();
}

The -mrelax-all default for non-RISCV -O0 appears to only affect x86 and
increases code size without any compile time difference for a stage-2
x86-64 build of lld.

-mrelax-all:    file size: 60.9MiB   VM size: 52.4MiB
-mno-relax-all: file size: 58.2MiB   VM size: 49.7MiB

There is no compile time difference (other than noise) GNU assembler
doesn't expand instructions by default. Let's remove the -mrelax-all default.

Created using spr 1.3.5-bogner
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' labels Apr 25, 2024
@llvmbot
Copy link
Member

llvmbot commented Apr 25, 2024

@llvm/pr-subscribers-clang-driver

@llvm/pr-subscribers-clang

Author: Fangrui Song (MaskRay)

Changes

Some assembly mnemonics may assemble to instructions of different
lengths. The longer form is to support instructions like a long branch.
-mrelax-all enables MCRelaxAll, which expands instructions to the long
form regardless of whether a short form suffices, while -mno-relax-all
only expands instructions when needed.

// x86 example
void foo(int a) {
  // -mno-relax-all or gas: short jump (2 bytes)
  // -mrelax-all: near jump (6 bytes)
  if (a) bar();
}

The -mrelax-all default for non-RISCV -O0 increases code size with
non-perceivable compile time difference for a stage-2 x86-64 build of
lld.

-mrelax-all:    file size: 58.2MiB   VM size: 49.7MiB
-mno-relax-all: file size: 60.9MiB   VM size: 52.4MiB

There is no compile time difference (other than noise) GNU assembler
doesn't expand instructions by default. Let's remove the -mrelax-all default.


Full diff: https://github.com/llvm/llvm-project/pull/90013.diff

2 Files Affected:

  • (modified) clang/lib/Driver/ToolChains/Clang.cpp (+10-42)
  • (modified) clang/test/Driver/integrated-as.c (+1-1)
diff --git a/clang/lib/Driver/ToolChains/Clang.cpp b/clang/lib/Driver/ToolChains/Clang.cpp
index 5f5d720cf759f4..651a2b5aac368b 100644
--- a/clang/lib/Driver/ToolChains/Clang.cpp
+++ b/clang/lib/Driver/ToolChains/Clang.cpp
@@ -835,46 +835,6 @@ static void addPGOAndCoverageFlags(const ToolChain &TC, Compilation &C,
   }
 }
 
-/// Check whether the given input tree contains any compilation actions.
-static bool ContainsCompileAction(const Action *A) {
-  if (isa<CompileJobAction>(A) || isa<BackendJobAction>(A))
-    return true;
-
-  return llvm::any_of(A->inputs(), ContainsCompileAction);
-}
-
-/// Check if -relax-all should be passed to the internal assembler.
-/// This is done by default when compiling non-assembler source with -O0.
-static bool UseRelaxAll(Compilation &C, const ArgList &Args) {
-  bool RelaxDefault = true;
-
-  if (Arg *A = Args.getLastArg(options::OPT_O_Group))
-    RelaxDefault = A->getOption().matches(options::OPT_O0);
-
-  // RISC-V requires an indirect jump for offsets larger than 1MiB. This cannot
-  // be done by assembler branch relaxation as it needs a free temporary
-  // register. Because of this, branch relaxation is handled by a MachineIR
-  // pass before the assembler. Forcing assembler branch relaxation for -O0
-  // makes the MachineIR branch relaxation inaccurate and it will miss cases
-  // where an indirect branch is necessary. To avoid this issue we are
-  // sacrificing the compile time improvement of using -mrelax-all for -O0.
-  if (C.getDefaultToolChain().getTriple().isRISCV())
-    RelaxDefault = false;
-
-  if (RelaxDefault) {
-    RelaxDefault = false;
-    for (const auto &Act : C.getActions()) {
-      if (ContainsCompileAction(Act)) {
-        RelaxDefault = true;
-        break;
-      }
-    }
-  }
-
-  return Args.hasFlag(options::OPT_mrelax_all, options::OPT_mno_relax_all,
-                      RelaxDefault);
-}
-
 static void
 RenderDebugEnablingArgs(const ArgList &Args, ArgStringList &CmdArgs,
                         llvm::codegenoptions::DebugInfoKind DebugInfoKind,
@@ -2472,8 +2432,16 @@ static void CollectArgsForIntegratedAssembler(Compilation &C,
                                               const ArgList &Args,
                                               ArgStringList &CmdArgs,
                                               const Driver &D) {
-  if (UseRelaxAll(C, Args))
-    CmdArgs.push_back("-mrelax-all");
+  // Default to -mno-relax-all.
+  //
+  // Note: RISC-V requires an indirect jump for offsets larger than 1MiB. This
+  // cannot be done by assembler branch relaxation as it needs a free temporary
+  // register. Because of this, branch relaxation is handled by a MachineIR pass
+  // before the assembler. Forcing assembler branch relaxation for -O0 makes the
+  // MachineIR branch relaxation inaccurate and it will miss cases where an
+  // indirect branch is necessary.
+  Args.addOptInFlag(CmdArgs, options::OPT_mrelax_all,
+                    options::OPT_mno_relax_all);
 
   // Only default to -mincremental-linker-compatible if we think we are
   // targeting the MSVC linker.
diff --git a/clang/test/Driver/integrated-as.c b/clang/test/Driver/integrated-as.c
index e78fde873cf47f..b0a26f6011b0c7 100644
--- a/clang/test/Driver/integrated-as.c
+++ b/clang/test/Driver/integrated-as.c
@@ -3,7 +3,7 @@
 // RUN: %clang -### -c -save-temps -integrated-as --target=x86_64 %s 2>&1 | FileCheck %s
 
 // CHECK: cc1as
-// CHECK: -mrelax-all
+// CHECK-NOT: -mrelax-all
 
 // RISC-V does not enable -mrelax-all
 // RUN: %clang -### -c -save-temps -integrated-as --target=riscv64 %s 2>&1 | FileCheck %s -check-prefix=RISCV-RELAX

@MaskRay
Copy link
Member Author

MaskRay commented Apr 25, 2024

There may be an example that the current -mrelax-all default penalizes Thumb/Thumb2 code size as well, but I haven't found an example.

ARMAsmBackend::relaxInstruction

Copy link
Contributor

@KanRobert KanRobert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. My understanding is that this flag is used for reducing compile time and debug only.

@smithp35
Copy link
Collaborator

No objections from me. I would prefer there not to be a difference for assembly at different optimisation levels.

I can't find any evidence that this affects Arm (Thumb to be precise) assembly at all. For example:

        .text
        b dest    // b.n 2-byte branch
        b dest2  // b.w 4-byte branch
        nop
dest:
        nop
        .space 2048
dest2:
        nop

Regardless of what -mrelax-all is (or the llvm-mc equivalent) the first b dest is always the smaller branch.

@MaskRay
Copy link
Member Author

MaskRay commented Apr 25, 2024

.text
b dest // b.n 2-byte branch
b dest2 // b.w 4-byte branch
nop
dest:
nop
.space 2048
dest2:
nop

Thanks for checking. Confirmed no impact to ARM because llvm/lib/Target/ARM/MCTargetDesc/ARMMCTargetDesc.cpp:createELFStreamer ignores RelaxAll parameter and passes false.

@MaskRay
Copy link
Member Author

MaskRay commented Apr 25, 2024

createAMDGPUELFStreamer ignores RelaxAll, so this appears to be x86-only.

There may be some clean-up opportunities.

@MaskRay MaskRay merged commit 1837681 into main Apr 25, 2024
6 of 7 checks passed
@MaskRay MaskRay deleted the users/MaskRay/spr/driver-dont-default-to-mrelax-all-for-non-riscv-o0 branch April 25, 2024 16:27
MaskRay added a commit that referenced this pull request Apr 25, 2024
Related to clean-up opportunities discussed at #90013.
MaskRay added a commit that referenced this pull request Apr 25, 2024
Related to clean-up opportunities discussed at #90013.

After these cleanups, the `RelaxAll` parameter from
`createMCObjectStreamer` can be removed as well. As
`createMCObjectStreamer` is a more user-facing API and used by two files
in mlir/, we postpone the cleanup to the future.
@nikic
Copy link
Contributor

nikic commented Apr 26, 2024

FYI I'm seeing about 0.6% compile-time regressions for O0 test-suite builds with this change (https://llvm-compile-time-tracker.com/compare.php?from=ef2ca97f48f1aee1483f0c29de5ba52979bec454&to=18376810f359dbd39d2a0aa0ddfc0f7f50eac199&stat=instructions:u). Though there is also a 4.5% reduction in text size (https://llvm-compile-time-tracker.com/compare.php?from=ef2ca97f48f1aee1483f0c29de5ba52979bec454&to=18376810f359dbd39d2a0aa0ddfc0f7f50eac199&stat=size-text) so maybe the tradeoff is reasonable even for O0.

@MaskRay
Copy link
Member Author

MaskRay commented Apr 26, 2024

FYI I'm seeing about 0.6% compile-time regressions for O0 test-suite builds with this change (llvm-compile-time-tracker.com/compare.php?from=ef2ca97f48f1aee1483f0c29de5ba52979bec454&to=18376810f359dbd39d2a0aa0ddfc0f7f50eac199&stat=instructions:u). Though there is also a 4.5% reduction in text size (llvm-compile-time-tracker.com/compare.php?from=ef2ca97f48f1aee1483f0c29de5ba52979bec454&to=18376810f359dbd39d2a0aa0ddfc0f7f50eac199&stat=size-text) so maybe the tradeoff is reasonable even for O0.

Thanks for the numbers!

It's my understanding that the increase in .text size outweighs the assembly time benefit of -mrelax-all... (A trick to allow fewer MCFragments and remove one iteration in the branch displacement (span-dependent instruction) algorithm). In addition, gas doesn't do this trick.

sbc100 added a commit to sbc100/emscripten that referenced this pull request Apr 26, 2024
This starting failing after an llvm change:
llvm/llvm-project#90013

```
clang: error: argument unused during compilation: '-O2' [-Werror,-Wunused-command-line-argument]
```
@sbc100
Copy link
Collaborator

sbc100 commented Apr 26, 2024

This change seems to have caused a test in emscripten to starting failing with argument unused during compilation: '-O2': emscripten-core/emscripten#21841.

Is that an intended side effect of this change? The fix on our end seems reasonable but posting here just in case.

sbc100 added a commit to sbc100/emscripten that referenced this pull request Apr 26, 2024
This starting failing after an llvm change:
llvm/llvm-project#90013

```
clang: error: argument unused during compilation: '-O2' [-Werror,-Wunused-command-line-argument]
```
sbc100 added a commit to emscripten-core/emscripten that referenced this pull request Apr 26, 2024
…FC (#21841)

This starting failing after an llvm change:
llvm/llvm-project#90013

```
clang: error: argument unused during compilation: '-O2' [-Werror,-Wunused-command-line-argument]
```
@MaskRay
Copy link
Member Author

MaskRay commented Apr 26, 2024

This change seems to have caused a test in emscripten to starting failing with argument unused during compilation: '-O2': emscripten-core/emscripten#21841.

Is that an intended side effect of this change? The fix on our end seems reasonable but posting here just in case.

The side effect was not anticipated, but it is desired. It's like that we emit other unused arg warnings for -D and many -f options. Thanks for the emscripten fix!

% fclang -c a.s -DA -O2
clang: warning: argument unused during compilation: '-D A' [-Wunused-command-line-argument]
clang: warning: argument unused during compilation: '-O2' [-Wunused-command-line-argument]

MaskRay added a commit that referenced this pull request Jun 14, 2024
When both aligned bundling and RelaxAll are enabled, bundle padding is
directly written into fragments (https://reviews.llvm.org/D8072).
(The original motivation was memory usage, which has been achieved from
different angles with recent assembler improvement).

The code presents challenges with the work to replace fragment
representation (e.g. #94950 #95077). This patch removes the special
handling. RelaxAll still works but the behavior seems slightly different
as revealed by 2 changed tests. However, most `-mc-relax-all` tests are
unchanged.

RelaxAll used to be the default for clang -O0. This mode has significant
code size drawbacks and newer Clang doesn't use it (#90013).

---

flushPendingLabels: The FOffset parameter can be removed: pending labels
will be assigned to the incoming fragment at offset 0.

Pull Request: #95188
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants