Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Corrupts the DWARF section when relocating multiple objects into a single one #1265

Closed
Danil42Russia opened this issue May 22, 2024 · 3 comments

Comments

@Danil42Russia
Copy link

Danil42Russia commented May 22, 2024

Environment:

  • System: Debian GNU/Linux 10 (buster)
  • mold version: mold 2.31.0 (20fa8d56f5e0c47d1f4bbf7b829c12d3f43298e1; compatible with GNU ld)
  • g++ version: g++ (Debian 8.3.0-6) 8.3.0
  • lld version: Debian LLD 16.0.6 (compatible with GNU linkers)

Problem:

After we moved to mold, we found that the functions have strange names in the addr2line output. Their beginnings may
be cut off.

How we build the final binary:

  1. Codegen from PHP to c++. This turns into about 200k c++ files that lie in 100 folders
    (about 2k c++ files in one folder);
  2. All c++ are compiled into o files;
  3. From the o files from each folder, we make a "middle" o file via relocating
    (2k o files from a folder into one "average" o file. In the end, there will be 100 of them.);
  4. 100 "average" o files are collected into a final binary.

Reproducer:

Ready code, it will run the g++ compiler and the two linkers mold and lld by itself

import shlex
import shutil
import subprocess
from pathlib import Path

CPP_BLOCK_TEMPLATE = """
int uninit_{file_id}_{iter_id};
int zeroed_{file_id}_{iter_id} = 0;
int init_{file_id}_{iter_id} = {count_id};
static int static_{file_id}_{iter_id} = {count_id};
int foo_{file_id}_{iter_id}() {{ std::cout << {count_id}; return {count_id}; }}"""

CPP_BLOCK_ITERS = 2
CPP_FILES_COUNT = 10

FOLDER_REPRODUCER = Path("/tmp/problem_reproducer")
OBJ_FOLDER = FOLDER_REPRODUCER / "obj_1"

COMPILER_PATH = "/usr/bin/g++"
BASE_LINKER_PATH = "/usr/bin"


def try_command_execute(cmd: str) -> None:
    try:
        pid = subprocess.run(shlex.split(cmd))
        return_code = pid.returncode
    except Exception:
        return_code = 1  # :D

    if return_code != 0:
        print(f"There was a startup error:", shlex.quote(cmd))


def prepare_target() -> None:
    if OBJ_FOLDER.exists():
        shutil.rmtree(OBJ_FOLDER)

    OBJ_FOLDER.mkdir(parents=True)


def gen2cpp_target(file_id: int) -> None:
    cpp_file_name = f"file_{file_id}.cpp"
    print(f"STAGE 1.1: Start codegen for file '{cpp_file_name}'")

    cpp_code = "#include <iostream>"
    for iter_id in range(1, CPP_BLOCK_ITERS + 1):
        cpp_code += f"\n// BLOCK {iter_id}"
        cpp_code += CPP_BLOCK_TEMPLATE.format(file_id=file_id, iter_id=iter_id, count_id=file_id * iter_id)

    file_path = OBJ_FOLDER / cpp_file_name
    file_path.write_text(cpp_code)


def cpp2obj_target() -> None:
    compiler_flags = "-g1 -ffunction-sections -fdata-sections -c"

    for cpp_file in OBJ_FOLDER.glob("*.cpp"):
        output_obj_file = cpp_file.parent / f"{cpp_file.stem}.o"
        cmd = f"{COMPILER_PATH} {compiler_flags} -o {output_obj_file} {cpp_file}"

        print(f"STAGE 2.2: Start compiler for file '{cpp_file.name}'")
        try_command_execute(cmd)


def objs2obj_target(linker_name: str) -> None:
    base_linker_path = f"{BASE_LINKER_PATH}/{linker_name}"
    linker_flags = "-r -nostdlib"

    target_objs_list = ""
    for obj_file in OBJ_FOLDER.glob("*.o"):
        target_objs_list += f"{obj_file} "
    target_objs_list = target_objs_list.strip()

    output_object_file = FOLDER_REPRODUCER / f"{OBJ_FOLDER.name}_{linker_name}.o"
    cmd = f"{base_linker_path} {linker_flags} -o {output_object_file} {target_objs_list}"

    print(f"STAGE 3.1: Start linker '{linker_name}'")
    try_command_execute(cmd)


def main() -> None:
    print("STAGE 0: (prepare)")
    prepare_target()
    print("STAGE 1: (generate c++)")
    for file_id in range(1, CPP_FILES_COUNT + 1):
        gen2cpp_target(file_id)

    print("STAGE 2: (cpp to obj)")
    cpp2obj_target()

    print("STAGE 3: (objs to obj)")
    linkers = ["mold", "lld"]
    for linker_name in linkers:
        objs2obj_target(linker_name)


if __name__ == "__main__":
    main()

Research:

If you run the command, you will see that the beginning of some attributes has been cut off. And part of the attributes
contains information that shouldn't be there. :(

objdump --dwarf=info obj_1_mold.o | grep "\.cpp"
objdump: Warning: DW_FORM_strp offset too big: 542
objdump: Warning:     <c>   DW_AT_producer    : (indirect string, offset: 0x1a2): /tmp/problem_reproducer/obj_1/file_7.cpp
DW_FORM_strp offset too big: 4d5
    <41>   DW_AT_name        : (indirect string, offset: 0x178): m_reproducer/obj_1/file_6.cpp
objdump: Warning: DW_FORM_strp offset too big: 4fe
objdump: Warning: DW_FORM_strp offset too big: 518
objdump: Warning: DW_FORM_strp offset too big: 59f
objdump: Warning: DW_FORM_strp offset too big: 5a7
objdump: Warning: DW_FORM_strp offset too big: 597
    <134>   DW_AT_producer    : (indirect string, offset: 0x37f): .cpp
    <139>   DW_AT_name        : (indirect string, offset: 0x32c): mp/problem_reproducer/obj_1/file_5.cpp
    <19e>   DW_AT_name        : (indirect string, offset: 0x3e0): em_reproducer/obj_1/file_1.cpp
    <2f5>   DW_AT_name        : (indirect string, offset: 0x2be): /tmp/problem_reproducer/obj_1/file_9.cpp
    <2f9>   DW_AT_comp_dir    : (indirect string, offset: 0x362): oblem_reproducer/obj_1/file_3.cpp
    <325>   DW_AT_name        : (indirect string, offset: 0x379): file_3.cpp
    <41d>   DW_AT_name        : (indirect string, offset: 0x3f1): _1/file_1.cpp
    <436>   DW_AT_name        : (indirect string, offset: 0x338): eproducer/obj_1/file_5.cpp
    <44d>   DW_AT_name        : (indirect string, offset: 0x36a): producer/obj_1/file_3.cpp
    <46b>   DW_AT_linkage_name: (indirect string, offset: 0x35e): p/problem_reproducer/obj_1/file_3.cpp
    <482>   DW_AT_name        : (indirect string, offset: 0x3e9): ucer/obj_1/file_1.cpp
    <4ac>   DW_AT_producer    : (indirect string, offset: 0x178): m_reproducer/obj_1/file_6.cpp
    <540>   DW_AT_producer    : (indirect string, offset: 0x429): roducer/obj_1/file_2.cpp
    <545>   DW_AT_name        : (indirect string, offset: 0x3ac): /tmp/problem_reproducer/obj_1/file_10.cpp
    <549>   DW_AT_comp_dir    : (indirect string, offset: 0x3d6): /tmp/problem_reproducer/obj_1/file_1.cpp
    <58c>   DW_AT_name        : (indirect string, offset: 0x3f6): le_1.cpp
    <5aa>   DW_AT_name        : (indirect string, offset: 0x3ed): /obj_1/file_1.cpp
    <5b1>   DW_AT_linkage_name: (indirect string, offset: 0x47e): e_8.cpp

I checked what's going on at lld, so it's fine.

objdump --dwarf=info obj_1_lld.o | grep "\.cpp"
    <11>   DW_AT_name        : (indirect string, offset: 0x499): /tmp/problem_reproducer/obj_1/file_1.cpp
    <a5>   DW_AT_name        : (indirect string, offset: 0x0): /tmp/problem_reproducer/obj_1/file_2.cpp
    <139>   DW_AT_name        : (indirect string, offset: 0xc9): /tmp/problem_reproducer/obj_1/file_3.cpp
    <1cd>   DW_AT_name        : (indirect string, offset: 0x1d1): /tmp/problem_reproducer/obj_1/file_4.cpp
    <261>   DW_AT_name        : (indirect string, offset: 0xfe): /tmp/problem_reproducer/obj_1/file_5.cpp
    <2f5>   DW_AT_name        : (indirect string, offset: 0x127): /tmp/problem_reproducer/obj_1/file_6.cpp
    <389>   DW_AT_name        : (indirect string, offset: 0x380): /tmp/problem_reproducer/obj_1/file_7.cpp
    <41d>   DW_AT_name        : (indirect string, offset: 0x2c0): /tmp/problem_reproducer/obj_1/file_8.cpp
    <4b1>   DW_AT_name        : (indirect string, offset: 0x41b): /tmp/problem_reproducer/obj_1/file_9.cpp
    <545>   DW_AT_name        : (indirect string, offset: 0x62): /tmp/problem_reproducer/obj_1/file_10.cpp
@rui314
Copy link
Owner

rui314 commented May 23, 2024

I can reproduce the issue. Thank you for the reproducer. I'll fix the issue.

@rui314 rui314 closed this as completed in 08b0a16 May 24, 2024
@rui314
Copy link
Owner

rui314 commented May 24, 2024

The above commit should fix the issue. Please try again with git head.

@Danil42Russia
Copy link
Author

Thanks for the fix! Tested it on ours, no problems noticed so far

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants