Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read ELF build ids directly from the target process. #112

Merged
merged 6 commits into from
Apr 12, 2024

Conversation

afranchuk
Copy link
Contributor

Closes #71.

A few things to consider:

  • Since we read from the process memory, the process must be in ptrace-stop (see test_file_id). This changes when the build ids can be read. Previously they could be read without the process being stopped if the mapped files still existed (and were hopefully the same that the process was using).
  • The previous implementation made some mutations to deleted mapping names (removing the (deleted) suffix). We need to decide whether we still want/need this behavior. In the meantime I commented out a failing test assertion.

Closes rust-minidump#71.

A few things to consider:
* Since we read from the process memory, the process must be in
  ptrace-stop (see `test_file_id`). This changes when the build ids can
  be read. Previously they could be read without the process being
  stopped if the mapped files still existed (and were hopefully the same
  that the process was using).
* The previous implementation made some mutations to deleted mapping
  names (removing the ` (deleted)` suffix). We need to decide whether we
  still want/need this behavior. In the meantime I commented out a
  failing test assertion.
@afranchuk
Copy link
Contributor Author

afranchuk commented Mar 29, 2024

Maybe test_file_id fails because we can't ptrace-stop the parent process threads? When I run it locally it works. Might need to look at what the waitpid errno is.

@afranchuk afranchuk requested a review from Jake-Shadle April 1, 2024 13:46
src/linux/build_id_reader.rs Show resolved Hide resolved
src/linux/build_id_reader.rs Show resolved Hide resolved
src/linux/build_id_reader.rs Outdated Show resolved Hide resolved
src/linux/build_id_reader.rs Outdated Show resolved Hide resolved
src/linux/build_id_reader.rs Outdated Show resolved Hide resolved
src/linux/build_id_reader.rs Outdated Show resolved Hide resolved
src/linux/ptrace_dumper.rs Outdated Show resolved Hide resolved
This test needed to be disabled due to permissions issues.
Copy link
Contributor

@gabrielesvelto gabrielesvelto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left a small comment but noticed a bigger issue. On my Gentoo machine there's a lot of failures and I'm not 100% sure what's going on, however what I see consistently is the following:

  • We call PtraceDumper::elf_identifier_for_mapping() on /usr/lib/gcc/x86_64-pc-linux-gnu/13/libgcc_s.so.1 which in turns calls build_id_reader::read_build_id()
  • build_id_reader::read_build_id() tries ElfBuildIdReader::read_from_program_headers() which fails, because this file doesn't have a build ID so we can't find the note
  • build_id_reader::read_build_id() proceeds with ElfBuildIdReader::read_from_section() and here things go south. read_section_headers() calls elf::SectionHeader::parse() with an offset of 0 which causes it to return an empty array, see here
  • We attempt to execute this on an empty array and the test fails
    let strtab_section_header = &section_headers[self.header.e_shstrndx as usize];
    

One very crude way of fixing this is to add 1 byte in front of the slice containing the section headers, so that goblin really parses them instead of returning the empty vector. That being said we should still handle the case where self.header.e_shstrndx ends up being out-of-bounds, even though it might have masked this failure.

src/linux/build_id_reader.rs Outdated Show resolved Hide resolved
@afranchuk
Copy link
Contributor Author

Oh no! Looks like we could just use parse_from() once the new version of goblin is published with @lissyx's changes.

@afranchuk
Copy link
Contributor Author

It seems like some more unit tests to cover these cases are necessary, though it will be a bit contrived to e.g. create binaries with only a build id in them (to not waste space in the repo). But I'll try to add something.

@afranchuk
Copy link
Contributor Author

afranchuk commented Apr 9, 2024

I've added tests (with a small, handcrafted ELF file). I made the ELF file by making a bunch of modifications to https://github.com/tchajed/minimal-elf; I'm wondering whether I should include the code that created it in the repo?

I sadly just realized that goblin does implement Pwrite (just not for the 32/64-bit agnostic Elf wrapper which I originally was looking at), so I could/should probably rewrite the generation code using goblin. In that case keeping the size as small as possible isn't a big deal since it will be generated at runtime, but it'll likely end up the same size anyway since I don't think goblin will add anything we don't explicitly tell it to. Though they still don't provide a top-level Pwrite for the whole file, so it won't save much effort (e.g. it doesn't automatically set correct offsets/sizes/indices for you).

@gabrielesvelto
Copy link
Contributor

After a bit of wrangling with the with_deleted_binary test I figured out why it's failing on my machine, and it's not such a big deal. Locally the test program doesn't have the build ID program note and the section headers aren't being loaded into memory, or the kernel has discarded that particular area after we deleted the file. So what happens is the following:

  • read_build_id() will attempt to find the note in the program headers and fail (it's not there)
  • read_build_id() will fall back onto ElfBuildIdReader::generate_from_section() but this fails, as the section headers are not in memory
  • read_build_id() will fall back onto ElfBuildIdReader::generate_from_text(), but this needs the section header to find the text section, so this also fails

Given that I don't know a way to recover the text section from the program headers alone, and that the failure is triggered by an executable that doesn't have a build ID in the first place I think we can let it slide. We'll never encounter this in practice within Firefox, because the program headers of our various executables will always include the build ID.

Copy link
Contributor

@gabrielesvelto gabrielesvelto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is OK as it is.

@Jake-Shadle Jake-Shadle merged commit c75d2ab into rust-minidump:main Apr 12, 2024
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Read ELF build ids directly from the target process instead of mmap()ing libraries
3 participants