Skip to content

Commit

Permalink
[ELF] Run COMDAT elimination inside resolve_symbols.
Browse files Browse the repository at this point in the history
We've long had a hack to de-prioritize GNU_UNIQUE symbols, due to an edge case
where these symbols could get higher resolution priority yet eliminated in the
later COMDAT elimination stage. But since GNU_UNIQUE symbols were supposed to
have a resolution as strong as GLOBAL symbols, it created unsoundness in the
resolution and could have cause false duplicate symbol errors in later stages.

The root cause is that COMDAT elimination, unlike ICF and SHF_MERGE, removes
symbols from the candidate pool and can actually affect resolution result. Hence
we ought to do the elimination before the final resolution.

Doing so requires fully wiping the resolution instead of just the ones from
eliminated archive members. This has a performance hit that we need to keep an
eye on.

On another note this wiping of resolution is actually justifiable. The old
algorithm did not handle the case of an archive member overriding a DSO symbol.
So wiping is the correct way even in absence of COMDAT elimination.

One caveat is that currently the symbol resolution stage doesn't actually care
about the is_alive flag which is how COMDAT symbols are killed. There are two
other users of this flag: mergeable sections, and eh_frame. The eh_frame section
is already handled with special logic, and we can treat it as if it had no
symbols to resolve. Mergeable sections needs a bit more careful handling, but a
good way is to just leave them unmerged and update the references in the same
later pass where merge happens. This has been done in a prior commit.

Signed-off-by: Tatsuyuki Ishi <ishitatsuyuki@gmail.com>
  • Loading branch information
ishitatsuyuki committed Oct 23, 2022
1 parent 26c503a commit 730e970
Show file tree
Hide file tree
Showing 3 changed files with 38 additions and 27 deletions.
16 changes: 3 additions & 13 deletions elf/input-files.cc
Original file line number Diff line number Diff line change
Expand Up @@ -818,22 +818,12 @@ static u64 get_rank(InputFile<E> *file, const ElfSym<E> &esym, bool is_lazy) {
return (5 << 24) + file->priority;
}

// GCC creates symbols in COMDATs with STB_GNU_UNIQUE instead of
// STB_WEAK if it was configured to do so at build time or the
// -fgnu-unique flag was given. In order to to not select a
// GNU_UNIQUE symbol in a discarded COMDAT section, we treat it as
// if it were weak.
//
// It looks like STB_GNU_UNIQUE is not a popular option anymore and
// often disabled by default though.
bool is_weak = (esym.st_bind == STB_WEAK || esym.st_bind == STB_GNU_UNIQUE);

if (file->is_dso || is_lazy) {
if (is_weak)
if (esym.st_bind == STB_WEAK)
return (4 << 24) + file->priority;
return (3 << 24) + file->priority;
}
if (is_weak)
if (esym.st_bind == STB_WEAK)
return (2 << 24) + file->priority;
return (1 << 24) + file->priority;
}
Expand Down Expand Up @@ -897,7 +887,7 @@ void ObjectFile<E>::resolve_symbols(Context<E> &ctx) {
InputSection<E> *isec = nullptr;
if (!esym.is_abs() && !esym.is_common()) {
isec = get_section(esym);
if (!isec)
if (!isec || !isec->is_alive)
continue;
}

Expand Down
13 changes: 8 additions & 5 deletions elf/main.cc
Original file line number Diff line number Diff line change
Expand Up @@ -484,16 +484,19 @@ int elf_main(int argc, char **argv) {
// Create a dummy file containing linker-synthesized symbols.
create_internal_file(ctx);

// Resolve symbols and fix the set of object files that are
// included to the final output.
// resolve_symbols is 4 things in 1 phase:
// - Determine the set of object files to extract from archives.
// - Remove redundant COMDAT sections (e.g. duplicate inline functions).
// - Finally, the actual symbol resolution.
// - LTO, which requires preliminary symbol resolution before running and a follow-up re-resolution after the LTO
// objects are emitted.
//
// These passes have complex interactions, and unfortunately has to be put together in a single phase.
resolve_symbols(ctx);

// Resolve mergeable section pieces to merge them.
register_section_pieces(ctx);

// Remove redundant comdat sections (e.g. duplicate inline functions).
eliminate_comdats(ctx);

// Create .bss sections for common symbols.
convert_common_symbols(ctx);

Expand Down
36 changes: 27 additions & 9 deletions elf/passes.cc
Original file line number Diff line number Diff line change
Expand Up @@ -144,6 +144,18 @@ static void mark_live_objects(Context<E> &ctx) {
});
}

// Due to legacy reasons, archive members will only get included in the final binary if they satisfy one of the
// undefined symbols in a non-archive object file. This is called archive extraction.
// In finalize_archive_extraction, this is processed as follows:
// 1. Do preliminary symbol resolution assuming all archive members are included. This matches the undefined symbols
// with ones to be extracted from archives.
// 2. Do a mark & sweep pass to eliminate unneeded archive members.
//
// Note that the symbol resolution inside finalize_archive_extraction uses a different rule. In order to prevent
// extracting archive members that can be satisfied by either non-archive object files or DSOs, the archive members are
// given a lower priority. This is not correct for the general case, where *extracted* object files have precedence over
// DSOs and even non-archive files that are passed earlier in the command line. Hence, the symbol resolution is thrown
// away once we determine which archive members to extract, and redone later with the formal rule.
template <typename E>
void finalize_archive_extraction(Context<E> &ctx) {
auto for_each_file = [&](std::function<void(InputFile<E> *)> fn) {
Expand All @@ -158,13 +170,13 @@ void finalize_archive_extraction(Context<E> &ctx) {
// This also merges symbol visibility.
mark_live_objects(ctx);

// Remove symbols of eliminated files.
// Cleanup. The rule used for archive extraction isn't accurate for the general case of symbol extraction, so reset
// the resolution to be redone later.
for_each_file([](InputFile<E> *file) {
if (!file->is_alive)
file->clear_symbols();
file->clear_symbols();
});

// Now that the symbol references are gone, also remove the eliminated files from the file list.
// Now that the symbol references are gone, remove the eliminated files from the file list.
std::erase_if(ctx.objs, [](InputFile<E> *file) { return !file->is_alive; });
std::erase_if(ctx.dsos, [](InputFile<E> *file) { return !file->is_alive; });
}
Expand All @@ -173,6 +185,15 @@ template <typename E>
void do_resolve_symbols(Context<E> &ctx) {
finalize_archive_extraction(ctx);

// COMDAT elimination needs to happen exactly here.
//
// It needs to be after archive extraction, otherwise we might assign COMDAT leader to an archive member that is not
// supposed to be extracted.
//
// It needs to happen before symbol resolution, otherwise we could eliminate a symbol that is already resolved to
// and cause dangling references.
eliminate_comdats(ctx);

// Since we have turned on object files live bits, their symbols
// may now have higher priority than before. So run the symbol
// resolution pass again to get the final resolution result.
Expand Down Expand Up @@ -722,12 +743,9 @@ void check_duplicate_symbols(Context<E> &ctx) {
const ElfSym<E> &esym = file->elf_syms[i];
Symbol<E> &sym = *file->symbols[i];

// Skip if our symbol is undef or weak. We handle GNU-unique
// symbols as if they were weak so that this logic is consistent
// with get_rank() in input-files.cc.
// Skip if our symbol is undef or weak
if (sym.file == file || sym.file == ctx.internal_obj ||
esym.is_undef() || esym.is_common() || (esym.st_bind == STB_WEAK) ||
(esym.st_bind == STB_GNU_UNIQUE))
esym.is_undef() || esym.is_common() || (esym.st_bind == STB_WEAK))
continue;

// Skip if our symbol is in a dead section. In most cases, the
Expand Down

0 comments on commit 730e970

Please sign in to comment.