Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add initial MacOS implementation #21

Merged
merged 53 commits into from
May 23, 2022
Merged

Conversation

Jake-Shadle
Copy link
Collaborator

@Jake-Shadle Jake-Shadle commented Apr 20, 2022

This essentially copies the Breakpad minidump creation, with a couple of minor changes.

  1. Breakpad uses getrusage to determine the process start time, however...this was wrong. getrusage can only retrieve the process information for the current task (process) or its children, this implementation instead uses proc_pidinfo to get the process info for the actual crashed process.
  2. Breakpad/Crashpad both retrieve OS/kernel version information via an deprecated and undocumented (that's not saying much though, Apple hates documentation it seems) method, however there is a perfectly fine way to retrieve this information via sysctlbyname, kern.osproductversion has been available since 10.13.4 which is 2 versions older than the oldest supported version, and kern.osversion has been available for...I actually don't know how long, again, Apple is really terrible at documentation.

Note that this implementation only supports MacOS (not iOS), and only x86_64 and aarch64 as those are the only officially supported architectures on MacOS now.

Also note, most of my local testing has been done on an aarch64 machine, I'll add building of aarch64 from an x86_64 to CI to cover that case, but we won't be able to run actual tests (not that CI actually runs tests atm...) until Github Actions adds M1 runners, which who knows when that will happen.

Another option would be for Embark to supply a temporary M1 runner for this project similarly to how we already supply one for https://github.com/bytecodealliance/wasmtime as seen in https://github.com/EmbarkStudios/wasmtime-aarch64-apple-darwin.

@Jake-Shadle
Copy link
Collaborator Author

Jake-Shadle commented Apr 26, 2022

Converting this to a draft, I've just found out that I completely misunderstood mach_task_self which always returns the same value, I had assumed it was a unique identifier like a PID, but that's completely wrong and apparently I need to refactor the code I was using to test this change (in addition to the tests in this PR).

The value returned by task_self_trap() is not a unique identifier like a Unix process ID. In fact, its value will be the same for all tasks, even on different machines, provided the machines are running identical kernels.

@Jake-Shadle Jake-Shadle marked this pull request as ready for review April 28, 2022 19:42
@Jake-Shadle
Copy link
Collaborator Author

I've now got this working on x86_64 and aarch64, I've commented out the test that I used for confirmation since its dependency chain leads to openssl and the build was failing and I don't want to deal with it at the moment, I'll fix it properly tomorrow by opening a PR to breakpad-symbols to use rustls instead of openssl.

@Jake-Shadle Jake-Shadle mentioned this pull request Apr 28, 2022
@Gankra
Copy link
Contributor

Gankra commented Apr 28, 2022

Is this something that the vendored openssl hack fixes? https://github.com/rust-minidump/rust-minidump/blob/main/minidump-stackwalk/Cargo.toml#L27

@Jake-Shadle
Copy link
Collaborator Author

Possibly, but the dependency was on minidump-processor which doesn't have that feature. Exposing the rustls feature on reqwest or making reqwest/http symbols optional altogether would definitely work though.

@Gankra
Copy link
Contributor

Gankra commented Apr 28, 2022

yeah in theory all the crates that touch breakpad-symbols should have the hack but I just tossed on the one I needed to ship

I encountered an issue, only on x86_64, but could affect aarch64 as
well, where a stack overflow could fail to be dumped due the stack
address being borked causing the vm_read to fail with `InvalidAddress`.
This changes it so that in that case a sentinel value is used in place
of the actual stack, as there was already a graceful handling of the
case where the stack was reported to be of 0 size that was ported from
Breakpad
@@ -0,0 +1,103 @@
use crate::{
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is just a move of the DirSection and related code so that it can be shared between Linux and Mac.

Comment on lines +20 to +27
// Non-windows platforms need additional code since they are essentially
// replicating functionality we get for free on Windows
cfg_if::cfg_if! {
if #[cfg(not(target_os = "windows"))] {
pub(crate) mod mem_writer;
pub(crate) mod dir_section;
}
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like the comment says, these modules are shared code that non-windows can use to help write minidump streams.

Comment on lines +3 to +5
#[cfg(target_pointer_width = "32")]
compile_error!("Various MacOS FFI bindings assume we are on a 64-bit architechture");

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since Apple only officially supports x86_64 and aarch64 for MacOS it was easier/cleaner to just use the 64-bit structs in some of the handcoded bindings that weren't available in mach2, this just guards against someone trying to target a non-64-bit architecture and getting a nasty surprise at runtime.

Comment on lines +4 to +15
// Just exports all of the mach functions we use into a flat list
pub use mach2::{
kern_return::{kern_return_t, KERN_SUCCESS},
port::mach_port_name_t,
task::{self, task_threads},
task_info,
thread_act::thread_get_state,
traps::mach_task_self,
vm::{mach_vm_deallocate, mach_vm_read, mach_vm_region_recurse},
vm_region::vm_region_submap_info_64,
};

Copy link
Collaborator Author

@Jake-Shadle Jake-Shadle Apr 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mach2 crate has a really annoying layout that I suppose they inherited from Apple.

Comment on lines +16 to +21
/// A Mach kernel error.
///
/// See <usr/include/mach/kern_return.h>.
#[derive(thiserror::Error, Debug)]
pub enum KernelError {
#[error("specified address is not currently valid")]
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This just puts kernel errors into an enum so the debug/error display for them is not an opaque integer that ones must lookup. Also adds the actual comments in the kernel source for each one (at least the relevant parts) to at least give a user some idea what went wrong.

Comment on lines +15 to +20
let mut it = vers.split('.');

let major: u32 = it.next()?.parse().ok()?;
let minor: u32 = it.next()?.parse().ok()?;
let patch: u32 = it.next().and_then(|p| p.parse().ok()).unwrap_or_default();

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The version is usually a standard semver <major>.<minor>.<patch> but it can be just <major>.<minor>

Comment on lines +217 to +219
// This is kind of a lie as we don't actually include the full float state..?
out.context_flags = format::ContextFlagsArm64Old::CONTEXT_ARM64_OLD_FULL.bits() as u64;

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The MacOS threadstate doesn't include floating point registers on aarch64, but Breakpad was still setting MD_CONTEXT_ARM64_FULL_OLD which includes MD_CONTEXT_ARM64_FLOATING_POINT_OLD. From what I have seen this is fine and stackwalker/sentry have no issues with the thread state in such a dump.

Comment on lines +1 to +3
use crate::minidump_format::{MDLocationDescriptor, MDRVA};
use scroll::ctx::{SizeWith, TryIntoCtx};

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The contents of this file were moved to not be Linux specific.

Comment on lines 13 to 16
crash_context: crash_context::CrashContext,
/// Handle to the crashing process, which could be ourselves
crashing_process: HANDLE,
/// The pid of the crashing process.
crashing_pid: u32,
/// The `EXCEPTION_POINTERS` contained in crash context is a pointer into the
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a slight refactoring from the bump to crash-context 0.2 since the pid is now part of the crash-context itself on Windows to match linux and mac, so this allowed us to just cleanup the interface a bit since it can internally open the process in the correct way without the user needing to specify it.

tests/mac_minidump_writer.rs Outdated Show resolved Hide resolved
There are a couple of advisories that are "OK" for now (though I will
PR them to fix them), but the cargo audit GHA doesn't have a way to
ignore advisories, cargo-deny does
@Jake-Shadle
Copy link
Collaborator Author

FYI I've filed mdsteele/rust-cab#13 to get rid of the 2 security advisories introduced by this change.

gabrielesvelto pushed a commit to mozilla/dump_syms that referenced this pull request May 3, 2022
This PR makes changes so that this crate it can be used as a library (eg rust-minidump/minidump-writer#21 (comment)).

- Replaced `failure` with `anyhow`. `failure` has been [unmaintained](https://rustsec.org/advisories/RUSTSEC-2020-0036.html) for several years, `anyhow` is one of the maintained alternatives. This change is intended as temporary, as IMO the errors in this crate should use `thiserror`, or just manually defined errors, as it makes library usage easier by not using string errors for everything, but made sense for a binary application.
- Places `clap` and `simplelog` behind a `cli` feature flag, these dependencies are only used when used as a binary, crates that depend on this as a library don't need them and thus should not compile them. To preserve backwards compatibility for `cargo install` this feature is made default, so library users will need to explicitly opt-out.
- Places HTTP symbol retrieval behind the `http` feature. This removes several heavy dependencies only used for retrieving symbols from remote symbol stores, in the use case that motivated this PR linked above, this functionality is not needed and only adds compile time for no benefit. Again, this feature is enabled by default for backwards compatibility.
- Replaces `reqwest`'s use of native-tls with `rustls`, simplifying the build when `http` is enabled. I frankly despise OpenSSL and all of the aggravating problems that come with it, `rustls` is a drop-in replacement that completely avoids all of those aggravations. This change is obviously due to my personal bias, I can back it out if requested.

Resolves: #253
@Jake-Shadle Jake-Shadle merged commit df24400 into rust-minidump:main May 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants