Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NativeAOT] Enable -dead_strip linker optimization by default on Apple platforms #103039

Merged
merged 1 commit into from
Jun 13, 2024

Conversation

ivanpovazan
Copy link
Member

@ivanpovazan ivanpovazan commented Jun 4, 2024

Description

This PR:

  • Enables -dead_strip linker optimization by default.
  • Marks all symbols as non-deadstrippable to ensure that the sections are not split/reorganized/removed by the platform linker.

Background

When Xcode 15 got released we started hitting issues with NAOT executables as the new linker was producing incorrect unwind tables causing crashes during GC or on first stack walk.
We got this fixed by marking the ILC output and our libs with .subsections_via_symbols in:

In the meantime we started using managed object writer in .NET9 but we still mark our object Mach-O files with .subsections_via_symbols:

Regarding the -dead_strip flag we disabled it for iOS platforms as the build was crashing the old Apple linker: #88032

Current state

The .subsections_via_symbols fixes solved us some problems but unfortunately they did not seem to completely solve the compatibility with the new linker as from: #97745 we started noticing similar problems which in the end resulted with falling back to using the old linker ld_classic #98726 (backport: #97856)

@filipnavara also opened a ticket to Apple regarding the new linker issues which still have not been resolved.

Additionally, our customers started reporting startup crashes when linker optimizations switches are enabled (-dead_strip and -flto):

Dead stripping and the .subsections_via_symbols flag

Apple's linking is based on atoms - indivisible chunks of code or data which is different to traditional section-based linkers. This means that linker operates at a finer grain but it also means that it gives the linker freedom to break up the sections and reorganize their subsections to achieve better optimization. This is particularly useful for dead code stripping (ref: https://github.com/apple-oss-distributions/ld64/blob/main/doc/design/linker.html). To enable an object file to be split into atoms and qualify it for dead stripping by the linker, it needs to be marked with a .subsections_via_symbols flag (ref: https://opensource.apple.com/source/cctools/cctools-622.5.1/RelNotes/CompilerTools.html).
However, NAOT codegen/runtime has many assumptions about the layout of the generated code/data which shouldn't be altered once object file is produced. For this reason we need to mark all symbols as non-deadstrippable to avoid problems with the platform linker.

Click here to expand the case study used to asses the behaviour of dead stripping on a macOS console app

Case study: NAOT hello world console app with -dead_strip enabled on macOS

  1. Build the runtime ./build.sh clr+clr.aot+libs+packs -c Release
  2. Create a hello world template dotnet new console
  3. Adjust the project file with:
  <PropertyGroup>
    <PublishAot>true</PublishAot>
    <RestoreAdditionalProjectSources>_path_to_the_shipping_folder_</RestoreAdditionalProjectSources>
  </PropertyGroup>

  <ItemGroup>
    <FrameworkReference Update="Microsoft.NETCore.App" RuntimeFrameworkVersion="9.0.0-dev" />
    <PackageReference Include="Microsoft.DotNet.ILCompiler" Version="9.0.0-dev" />
  </ItemGroup>

  <ItemGroup>
    <LinkerArg Include="-Wl,-dead_strip" />
  </ItemGroup>
  1. Publish the app and run it
  • The first problem: The app will crash due to dehydrated/hydrated data being partially or completely stripped out
  • The second problem: If we workaround the first problem by passing -p:IlcDehydrate=false the app then crashes in: StartupCodeHelpers.InitializeModules because the module count is 0 as the whole __modules section gets stripped
  • The third problem: If we workaround the second problem by explicitly marking the __modules section with S_ATTR_NO_DEAD_STRIP the app will crash in: StartupCodeHelpers.InitializeStatics due to stripping and reordering of the GCStatic MethodTables which causes RhAllocateNewObject allocation to fail as blockAddr points to a nonexistent location.

NOTE: Kudos to @anatawa12 as the first two problems were called out in: #96743

Questions

  • What problem would we hit next?
  • Is it possible to mark every possible scenario where the codegen/runtime makes assumption about the program layout or should we just define that our object file is not compatible for splitting into subsections?

Size savings

console macOS app main (b) this PR (b) diff (%)
app size 1538384 1417792 -7,84%
symbols list syms.lst syms.lst
MAUI iOS Recipes app main (b) this PR (b) diff (%)
app size 1283488 12645649 -1,47%

The savings are coming from dead stripping other native dependencies so are not related to the size of the managed code.


Fixes: #96663

PS Manually tested that both -dead_strip and -flto are now working properly

@ivanpovazan ivanpovazan added this to the 9.0.0 milestone Jun 4, 2024
@ivanpovazan ivanpovazan self-assigned this Jun 4, 2024
@ivanpovazan
Copy link
Member Author

ivanpovazan commented Jun 4, 2024

@MichalStrehovsky @jkotas @rolfbjarne @filipnavara Could you please provide your opinion on this?

@MichalStrehovsky
Copy link
Member

If I understand .subsections_via_symbols correctly, it allows the linker to partition things between two symbols and consider them as a single continuous blob with a known length that can be removed. This doesn't sound compatible with e.g. how MethodTables are emitted or non-GC statics are emitted. In both cases, the symbol gets prefixed by data that is part of the data and cannot be separated from it - we expect it to be a prefix.

We can express this as a COMDAT section, but as far as I know, it's not expressible in .subsections_via_symbols. So I would agree we shouldn't set this flag.

@filipnavara
Copy link
Member

filipnavara commented Jun 4, 2024

I consider removing MH_SUBSECTIONS_VIA_SYMBOLS to be a nuclear option that's unlikely ever to be fully compatible with the new Xcode linker. Apple toolchains always produce object files with MH_SUBSECTIONS_VIA_SYMBOLS and the code paths in the linker basically receive nearly no testing.

That said, I am glad that you are looking into it, summarizing it, and trying to find a solution. One thing that comes to mind is, did you try to mark all the sections produced by object writer with S_ATTR_NO_DEAD_STRIP (UPD: or perhaps S_ATTR_LIVE_SUPPORT, in case of dehydrated data)? That's still pretty harsh thing to do but at least it's something that could later be refined with some granularity.

@filipnavara
Copy link
Member

filipnavara commented Jun 4, 2024

We can express this as a COMDAT section

Mach-O doesn't have the notion of COMDAT sections. It entirely depends on the subsections defined by symbols and linking based on that. The closest equivalent is the S_COALESCED flag for symbol folding.

That said, the atoms/subsections don't normally get reordered (*). Dependencies between atoms/subsections can be modeled with no-op relocations. We currently don't do this modeling, but it may be reasonable thing to do if there's more than one defined symbol with different offset here.

(*) It is not guaranteed but there are rules to it and last time I checked we were not hitting anything that would cause a re-order.

@MichalStrehovsky
Copy link
Member

I consider removing MH_SUBSECTIONS_VIA_SYMBOLS to be a nuclear option that's unlikely ever to be fully compatible with the new Xcode linker. Apple toolchains always produce object files with MH_SUBSECTIONS_VIA_SYMBOLS and the code paths in the linker basically receive nearly no testing.

At the end of the day, anything that stops linker's discretion to remove things or move things around would work. We do not want any dead stripping on the object file - the object file is not a classic object file (bunch of .c files compiled to .o in a vacuum, not knowing about other .c files that are part of the app); the .o file generated by ILC is a result of whole program analysis and we already removed everything that's dead. I would consider it a compiler bug if the linker is legitimately able to remove something. I can't think of anything emitted by the compiler that we could mark as dead strippable and get a benefit from it.

@rolfbjarne
Copy link
Member

I consider removing MH_SUBSECTIONS_VIA_SYMBOLS to be a nuclear option that's unlikely ever to be fully compatible with the new Xcode linker. Apple toolchains always produce object files with MH_SUBSECTIONS_VIA_SYMBOLS and the code paths in the linker basically receive nearly no testing.

At the end of the day, anything that stops linker's discretion to remove things or move things around would work. We do not want any dead stripping on the object file - the object file is not a classic object file (bunch of .c files compiled to .o in a vacuum, not knowing about other .c files that are part of the app); the .o file generated by ILC is a result of whole program analysis and we already removed everything that's dead. I would consider it a compiler bug if the linker is legitimately able to remove something. I can't think of anything emitted by the compiler that we could mark as dead strippable and get a benefit from it.

That's true as long as the ILC's output is the entire executable, but not if the output is a native library that can be consumed by other native libraries (or even an executable exporting symbols consumed by other native libraries).

For instance:

public class MyManagedLogic {
    [UnmanagedCallersOnly (EntryPoint = "MyManagedFunction")]
    public static int MyManagedFunction () => 42;
}

If no other native code calls MyManagedFunction, it can be removed.

@ivanpovazan
Copy link
Member Author

ivanpovazan commented Jun 5, 2024

We can express this as a COMDAT section

Mach-O doesn't have the notion of COMDAT sections. It entirely depends on the subsections defined by symbols and linking based on that. The closest equivalent is the S_COALESCED flag for symbol folding.

If I understand the problem here correctly we are looking for a way to keep the symbol and its prefix together.
As far as I could find, Mach-O does support something like this via .altentry asm directive or N_ALT_ENTRY flag which can be set on a symbol indicating that a symbol is pinned to the previous content, although I cannot tell if the new linker respects it properly. A few refs:

Q1: @filipnavara do you have experience with .altentry?
Q2: @MichalStrehovsky if this flag serves its purpose would it solve the issues you mentioned about how MethodTables are emitted or non-GC statics are emitted?

@filipnavara
Copy link
Member

@filipnavara do you have experience with .altentry?

No, but it looks intriguing. It's fairly recent addition to the Mach-O file format (~3 years ago) so presumably it's less likely to be considered a legacy feature.

@MichalStrehovsky
Copy link
Member

If no other native code calls MyManagedFunction, it can be removed.

At minimum the method is going to be referenced from stack trace metadata tables that will keep it rooted no matter what. And even if we manage to remove that method, it's transitive closure will likely still be unstrippable unless it only ever calls non-generic static methods. There's a bunch of bookkeeping for reflection, type loader, etc. that is in form of hashtables and tables that a native linker has no chance to remove and restructure.

The possible upswing is very small. The possible downswing is pretty big (damaging the MethodTable prefix will lead to crashes in GC that may or may not be close to where the problem is, similar for the statics, and I don't have an accounting of other places that may assume a linker is not going to disassemble entire sections).

@ivanpovazan
Copy link
Member Author

ivanpovazan commented Jun 6, 2024

A small update regarding:

One thing that comes to mind is, did you try to mark all the sections produced by object writer with S_ATTR_NO_DEAD_STRIP

I marked all section except bss and hydrated with the flag and for a console app this produces the same size savings + the app runs without any issues.

Expand to show the header and load commands for the Mach-o object file of a console app
obj/Release/net9.0/osx-arm64/native/n1.o:
Mach header
    magic  cputype cpusubtype  caps    filetype ncmds sizeofcmds      flags
0xfeedfacf 16777228          0  0x00           1     4       1640 0x00002000
Load command 0
    cmd LC_SEGMENT_64
cmdsize 1512
segname 
 vmaddr 0x0000000000000000
 vmsize 0x0000000000221580
fileoff 1672
filesize 2087632
maxprot 0x00000007
initprot 0x00000007
 nsects 18
  flags 0x0
Section
sectname __text
 segname __TEXT
    addr 0x0000000000000000
    size 0x0000000000001020
  offset 1672
   align 2^2 (4)
  reloff 2087632
  nreloc 446
   flags 0x90000400
reserved1 0
reserved2 0
Section
sectname __managedcode
 segname __TEXT
    addr 0x0000000000001020
    size 0x0000000000078ee0
  offset 5824
   align 2^5 (32)
  reloff 2091200
  nreloc 18129
   flags 0x90000400
reserved1 0
reserved2 0
Section
sectname .dotnet_eh_table
 segname __DATA
    addr 0x0000000000079f00
    size 0x000000000000d4dc
  offset 501152
   align 2^3 (8)
  reloff 2236232
  nreloc 294
   flags 0x12000000
reserved1 0
reserved2 0
Section
sectname __eh_frame
 segname __TEXT
    addr 0x00000000000873e0
    size 0x0000000000011760
  offset 555648
   align 2^3 (8)
  reloff 2238584
  nreloc 6816
   flags 0x7800000b
reserved1 0
reserved2 0
Section
sectname __const
 segname __TEXT
    addr 0x0000000000098b40
    size 0x000000000003f611
  offset 627168
   align 2^3 (8)
  reloff 2293112
  nreloc 11420
   flags 0x10000000
reserved1 0
reserved2 0
Section
sectname __data
 segname __DATA
    addr 0x00000000000d8158
    size 0x00000000000015a8
  offset 886776
   align 2^3 (8)
  reloff 2384472
  nreloc 905
   flags 0x10000000
reserved1 0
reserved2 0
Section
sectname hydrated
 segname __DATA
    addr 0x00000000000d9700
    size 0x0000000000022930
  offset 0
   align 2^4 (16)
  reloff 2391712
  nreloc 0
   flags 0x00000001
reserved1 0
reserved2 0
Section
sectname __unbox
 segname __TEXT
    addr 0x00000000000fc030
    size 0x00000000000003b0
  offset 892320
   align 2^2 (4)
  reloff 2391712
  nreloc 118
   flags 0x90000400
reserved1 0
reserved2 0
Section
sectname __modules
 segname __DATA
    addr 0x00000000000fc3e0
    size 0x0000000000000008
  offset 893264
   align 2^3 (8)
  reloff 2392656
  nreloc 1
   flags 0x10000000
reserved1 0
reserved2 0
Section
sectname __bss
 segname __DATA
    addr 0x00000000000fc3e8
    size 0x0000000000001820
  offset 0
   align 2^3 (8)
  reloff 2392664
  nreloc 0
   flags 0x00000001
reserved1 0
reserved2 0
Section
sectname __debug_info
 segname __DWARF
    addr 0x00000000000fdc08
    size 0x000000000004af45
  offset 893272
   align 2^1 (2)
  reloff 2392664
  nreloc 2822
   flags 0x12000000
reserved1 0
reserved2 0
Section
sectname __debug_str
 segname __DWARF
    addr 0x0000000000148b4e
    size 0x00000000000771c1
  offset 1200286
   align 2^1 (2)
  reloff 2415240
  nreloc 0
   flags 0x12000000
reserved1 0
reserved2 0
Section
sectname __debug_abbrev
 segname __DWARF
    addr 0x00000000001bfd10
    size 0x000000000000019b
  offset 1688160
   align 2^1 (2)
  reloff 2415240
  nreloc 0
   flags 0x12000000
reserved1 0
reserved2 0
Section
sectname __debug_loc
 segname __DWARF
    addr 0x00000000001bfeac
    size 0x000000000003f99b
  offset 1688572
   align 2^1 (2)
  reloff 2415240
  nreloc 21400
   flags 0x12000000
reserved1 0
reserved2 0
Section
sectname __debug_ranges
 segname __DWARF
    addr 0x00000000001ff848
    size 0x0000000000000090
  offset 1949080
   align 2^1 (2)
  reloff 2586440
  nreloc 16
   flags 0x12000000
reserved1 0
reserved2 0
Section
sectname __debug_line
 segname __DWARF
    addr 0x00000000001ff8d8
    size 0x000000000000dada
  offset 1949224
   align 2^1 (2)
  reloff 2586568
  nreloc 1
   flags 0x12000000
reserved1 0
reserved2 0
Section
sectname __debug_aranges
 segname __DWARF
    addr 0x000000000020d3c0
    size 0x00000000000000a0
  offset 2005264
   align 2^4 (16)
  reloff 2586576
  nreloc 8
   flags 0x12000000
reserved1 0
reserved2 0
Section
sectname __compact_unwind
 segname __LD
    addr 0x000000000020d460
    size 0x0000000000014120
  offset 2005424
   align 2^3 (8)
  reloff 2586640
  nreloc 3434
   flags 0x02000000
reserved1 0
reserved2 0
Load command 1
   cmd LC_SYMTAB
cmdsize 24
symoff 2614112
 nsyms 8940
stroff 2757152
strsize 538386
Load command 2
          cmd LC_DYSYMTAB
      cmdsize 80
    ilocalsym 0
    nlocalsym 10
   iextdefsym 10
   nextdefsym 8776
    iundefsym 8786
    nundefsym 154
       tocoff 0
         ntoc 0
    modtaboff 0
      nmodtab 0
 extrefsymoff 0
  nextrefsyms 0
indirectsymoff 0
nindirectsyms 0
    extreloff 0
      nextrel 0
    locreloff 0
      nlocrel 0
Load command 3
    cmd LC_BUILD_VERSION
cmdsize 24
platform 1
  minos 12.0
    sdk 16.0
 ntools 0

@ivanpovazan
Copy link
Member Author

ivanpovazan commented Jun 7, 2024

/azp run runtime-ioslike

Copy link

No pipelines are associated with this pull request.

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@ivanpovazan ivanpovazan changed the title [WIP][NativeAOT] Disable MH_SUBSECTIONS_VIA_SYMBOLS and enable -dead_strip by default on Apple platforms [NativeAOT] Enable -dead_strip linker optimization by default on Apple platforms Jun 7, 2024
@ivanpovazan ivanpovazan marked this pull request as ready for review June 7, 2024 20:59
@ivanpovazan ivanpovazan requested a review from filipnavara June 7, 2024 21:00
@ivanpovazan
Copy link
Member Author

/azp run runtime-nativeaot-outerloop

@ivanpovazan ivanpovazan requested a review from jkotas June 7, 2024 21:01
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@ivanpovazan ivanpovazan requested a review from akoeplinger June 7, 2024 21:04
MichalStrehovsky added a commit to MichalStrehovsky/runtime that referenced this pull request Jun 12, 2024
dotnet#103039 (comment) found that we're generating some orphaned `MethodTable`s. They seem to be coming from here. Wondering if anything breaks if we just delete this.
@ivanpovazan
Copy link
Member Author

@MichalStrehovsky is there any other concern regarding these changes?
I would prefer someone from NativeAOT team to approve it before merging.

Copy link
Member

@MichalStrehovsky MichalStrehovsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@akoeplinger
Copy link
Member

@dotnet-policy-service rerun

@ivanpovazan ivanpovazan merged commit d9a6607 into dotnet:main Jun 13, 2024
87 checks passed
@JCash
Copy link

JCash commented Jun 25, 2024

Q: will this also be backported to dotnet 8 ? If so, in which version could we expect this?

@github-actions github-actions bot locked and limited conversation to collaborators Jul 26, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

NativeAOT staticlib crashes with SIGSEGV inside RhpNewArray when linked with -dead_strip
6 participants