Skip to content

Commit

Permalink
Fix -z rewrite-endbr
Browse files Browse the repository at this point in the history
Previously, functions that are referred to only by section symbols were
not considered address-taken, and we always rewrote endbr64s with NOPs for
such functions. That caused a runtime fault when such function was called
indirectly.

Now, endbr64s for such symbols are retained.

I tried to build clang-19 as a release build with and without `-z
rewrite-endbr` and counted the number of endbr64 instructions in each
binary. Here is the result:

  Before: 110,615
  After:   91,799

So the feature reduces the number of gadgets by 17%. Both binaries worked
fine with Intel SDE CPU emulator with `-cet 1 -cet_raise 1`, so I think
it's finally working as expected.

I also tried to build mold itself with and without the feature. Here is
the result:

  Before: 27,430
  After:  17,725

This is a 35% reduction. I confirmed that mold built with `-z rewrite-endbr`
can self-host under Intel SDE.

The rewrite_endbr pass is extremely fast. It took only 7 milliseconds
for a ~210 MiB clang-19 binary on my Threadripper 7990X machine. We
may want to consider enabling it by default at some point.
  • Loading branch information
rui314 committed Jul 29, 2024
1 parent 3f7236d commit ed7eec5
Showing 1 changed file with 54 additions and 34 deletions.
88 changes: 54 additions & 34 deletions elf/arch-x86-64.cc
Original file line number Diff line number Diff line change
Expand Up @@ -822,44 +822,18 @@ void InputSection<E>::scan_relocations(Context<E> &ctx) {
void rewrite_endbr(Context<E> &ctx) {
Timer t(ctx, "rewrite_endbr");

auto mark = [&](Symbol<E> *sym) {
if (sym) {
std::scoped_lock lock(sym->mu);
sym->address_taken = true;
}
};

// Compute address-taken bit for each symbol
tbb::parallel_for_each(ctx.objs, [&](ObjectFile<E> *file) {
for (std::unique_ptr<InputSection<E>> &isec : file->sections)
if (isec && isec->is_alive && (isec->shdr().sh_flags & SHF_ALLOC))
for (const ElfRel<E> &rel : isec->get_rels(ctx))
if (!is_func_call_rel(rel))
if (Symbol<E> *sym = file->symbols[rel.r_sym];
sym->esym().st_type == STT_FUNC)
mark(sym);
});

// Exported symbols are conservatively assumed to be address-taken.
if (ctx.dynsym)
for (Symbol<E> *sym : ctx.dynsym->symbols)
if (sym && sym->is_exported)
mark(sym);

// Some symbols are implicitly address-taken
mark(ctx.arg.entry);
mark(ctx.arg.init);
mark(ctx.arg.fini);

constexpr u8 endbr64[] = {0xf3, 0x0f, 0x1e, 0xfa};
constexpr u8 nop[] = {0x0f, 0x1f, 0x40, 0x00};

// Rewrite endbr64 with nop
// Rewrite all endbr64 instructions referred to by function symbols with
// NOPs. We handle only global symbols because the compiler doesn't emit
// a endbr64 for a file-scoped function in the first place if it's
// address is not taken within the file.
tbb::parallel_for_each(ctx.objs, [&](ObjectFile<E> *file) {
for (Symbol<E> *sym : file->symbols) {
if (sym->file == file && sym->esym().st_type == STT_FUNC &&
!sym->address_taken) {
if (InputSection<E> *isec = sym->get_input_section()) {
for (Symbol<E> *sym : file->get_global_syms()) {
if (sym->file == file && sym->esym().st_type == STT_FUNC) {
if (InputSection<E> *isec = sym->get_input_section();
isec && (isec->shdr().sh_flags & SHF_EXECINSTR)) {
if (OutputSection<E> *osec = isec->output_section) {
u8 *buf = ctx.buf + osec->shdr.sh_offset + isec->offset + sym->value;
if (memcmp(buf, endbr64, 4) == 0)
Expand All @@ -869,6 +843,52 @@ void rewrite_endbr(Context<E> &ctx) {
}
}
});

auto write_back = [&](InputSection<E> *isec, i64 offset) {
// If isec has an endbr64 at a given offset, copy that instruction to
// the output buffer, possibly overwriting a nop written in the above
// loop.
if (isec && isec->output_section &&
(isec->shdr().sh_flags & SHF_EXECINSTR) &&
0 <= offset && offset <= isec->contents.size() - 4 &&
memcmp(isec->contents.data() + offset, endbr64, 4) == 0)
memcpy(ctx.buf + isec->output_section->shdr.sh_offset + isec->offset + offset,
endbr64, 4);
};

// Write back endbr64 instructions if they are referred to by address-taking
// relocations.
tbb::parallel_for_each(ctx.objs, [&](ObjectFile<E> *file) {
for (std::unique_ptr<InputSection<E>> &isec : file->sections) {
if (isec && isec->is_alive && (isec->shdr().sh_flags & SHF_ALLOC)) {
for (const ElfRel<E> &rel : isec->get_rels(ctx)) {
if (!is_func_call_rel(rel)) {
Symbol<E> *sym = file->symbols[rel.r_sym];
if (sym->esym().st_type == STT_SECTION)
write_back(sym->get_input_section(), rel.r_addend);
else
write_back(sym->get_input_section(), sym->value);
}
}
}
}
});

// We record addresses of some symbols in the ELF header, .dynamic or in
// .dynsym. We need to retain endbr64s for such symbols.
auto keep = [&](Symbol<E> *sym) {
if (sym)
write_back(sym->get_input_section(), sym->value);
};

keep(ctx.arg.entry);
keep(ctx.arg.init);
keep(ctx.arg.fini);

if (ctx.dynsym)
for (Symbol<E> *sym : ctx.dynsym->symbols)
if (sym && sym->is_exported)
keep(sym);
}

} // namespace mold::elf

0 comments on commit ed7eec5

Please sign in to comment.