Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for generating perf maps for simple perf profiling #6030

Merged
merged 10 commits into from
Mar 20, 2023
4 changes: 4 additions & 0 deletions crates/c-api/include/wasmtime/config.h
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,10 @@ enum wasmtime_profiling_strategy_enum { // ProfilingStrategy
///
/// Note that this isn't always enabled at build time.
WASMTIME_PROFILING_STRATEGY_VTUNE,
/// Linux's simple "perfmap" support in `perf` is enabled and when Wasmtime is
/// run under `perf` necessary calls will be made to profile generated JIT
/// code.
WASMTIME_PROFILING_STRATEGY_PERFMAP,
};

#define WASMTIME_CONFIG_PROP(ret, name, ty) \
Expand Down
2 changes: 2 additions & 0 deletions crates/c-api/src/config.rs
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@ pub enum wasmtime_profiling_strategy_t {
WASMTIME_PROFILING_STRATEGY_NONE,
WASMTIME_PROFILING_STRATEGY_JITDUMP,
WASMTIME_PROFILING_STRATEGY_VTUNE,
WASMTIME_PROFILING_STRATEGY_PERFMAP,
}

#[no_mangle]
Expand Down Expand Up @@ -157,6 +158,7 @@ pub extern "C" fn wasmtime_config_profiler_set(
WASMTIME_PROFILING_STRATEGY_NONE => ProfilingStrategy::None,
WASMTIME_PROFILING_STRATEGY_JITDUMP => ProfilingStrategy::JitDump,
WASMTIME_PROFILING_STRATEGY_VTUNE => ProfilingStrategy::VTune,
WASMTIME_PROFILING_STRATEGY_PERFMAP => ProfilingStrategy::PerfMap,
});
}

Expand Down
40 changes: 27 additions & 13 deletions crates/cli-flags/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -68,15 +68,21 @@ pub const SUPPORTED_WASI_MODULES: &[(&str, &str)] = &[
),
];

fn pick_profiling_strategy(jitdump: bool, vtune: bool) -> Result<ProfilingStrategy> {
Ok(match (jitdump, vtune) {
(true, false) => ProfilingStrategy::JitDump,
(false, true) => ProfilingStrategy::VTune,
(true, true) => {
println!("Can't enable --jitdump and --vtune at the same time. Profiling not enabled.");
ProfilingStrategy::None
}
_ => ProfilingStrategy::None,
fn pick_profiling_strategy(perfmap: bool, jitdump: bool, vtune: bool) -> Result<ProfilingStrategy> {
Ok(if (perfmap as u8) + (jitdump as u8) + (vtune as u8) > 1 {
println!(
"Can't enable two or more of --jitdump, --vtune and --perfmap at the same time.
Profiling not enabled."
);
ProfilingStrategy::None
} else if perfmap {
ProfilingStrategy::PerfMap
} else if jitdump {
ProfilingStrategy::JitDump
} else if vtune {
ProfilingStrategy::VTune
} else {
ProfilingStrategy::None
})
}

Expand Down Expand Up @@ -143,11 +149,15 @@ pub struct CommonOptions {
pub wasi_modules: Option<WasiModules>,

/// Generate jitdump file (supported on --features=profiling build)
#[clap(long, conflicts_with = "vtune")]
#[clap(long, conflicts_with_all = &["vtune", "perfmap"])]
pub jitdump: bool,

/// Generate vtune (supported on --features=vtune build)
#[clap(long, conflicts_with = "jitdump")]
/// Generate perf mapping file
#[clap(long, conflicts_with_all = &["vtune", "jitdump"])]
pub perfmap: bool,

/// Generate vtune runtime information (supported on --features=vtune build)
#[clap(long, conflicts_with_all = &["jitdump", "perfmap"])]
pub vtune: bool,
bnjbvr marked this conversation as resolved.
Show resolved Hide resolved

/// Run optimization passes on translated functions, on by default
Expand Down Expand Up @@ -283,7 +293,11 @@ impl CommonOptions {
.cranelift_debug_verifier(self.enable_cranelift_debug_verifier)
.debug_info(self.debug_info)
.cranelift_opt_level(self.opt_level())
.profiler(pick_profiling_strategy(self.jitdump, self.vtune)?)
.profiler(pick_profiling_strategy(
self.perfmap,
self.jitdump,
self.vtune,
)?)
.cranelift_nan_canonicalization(self.enable_cranelift_nan_canonicalization);

self.enable_wasm_features(&mut config);
Expand Down
11 changes: 11 additions & 0 deletions crates/jit/src/profiling.rs
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,16 @@ cfg_if::cfg_if! {
}
}

cfg_if::cfg_if! {
if #[cfg(target_os = "linux")] {
#[path = "profiling/perfmap_linux.rs"]
mod perfmap;
} else {
#[path = "profiling/perfmap_disabled.rs"]
mod perfmap;
}
}

cfg_if::cfg_if! {
// Note: VTune support is disabled on windows mingw because the ittapi crate doesn't compile
// there; see also https://github.com/bytecodealliance/wasmtime/pull/4003 for rationale.
Expand All @@ -24,6 +34,7 @@ cfg_if::cfg_if! {
}

pub use jitdump::JitDumpAgent;
pub use perfmap::PerfMapAgent;
pub use vtune::VTuneAgent;

/// Common interface for profiling tools.
Expand Down
1 change: 0 additions & 1 deletion crates/jit/src/profiling/jitdump_disabled.rs
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,6 @@ pub struct JitDumpAgent {
}

impl JitDumpAgent {
/// Intialize a JitDumpAgent and write out the header
pub fn new() -> Result<Self> {
if cfg!(feature = "jitdump") {
bail!("jitdump is not supported on this platform");
Expand Down
27 changes: 27 additions & 0 deletions crates/jit/src/profiling/perfmap_disabled.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
use crate::{CompiledModule, ProfilingAgent};
use anyhow::{bail, Result};

/// Interface for driving the creation of jitdump files
#[derive(Debug)]
pub struct PerfMapAgent {
_private: (),
}

impl PerfMapAgent {
pub fn new() -> Result<Self> {
bail!("perfmap support not supported on this platform");
}
}

impl ProfilingAgent for PerfMapAgent {
fn module_load(&self, _module: &CompiledModule, _dbg_image: Option<&[u8]>) {}
fn load_single_trampoline(
&self,
_name: &str,
_addr: *const u8,
_size: usize,
__pid: u32,
_tid: u32,
) {
}
}
63 changes: 63 additions & 0 deletions crates/jit/src/profiling/perfmap_linux.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
use crate::{CompiledModule, ProfilingAgent};
use anyhow::Result;
use std::io::Write as _;
use std::process;
use std::{fs::File, sync::Mutex};
use wasmtime_environ::EntityRef as _;

/// Process-wide perf map file. Perf only reads a unique file per process.
static PERFMAP_FILE: Mutex<Option<File>> = Mutex::new(None);

/// Interface for driving the creation of jitdump files
pub struct PerfMapAgent;

impl PerfMapAgent {
/// Intialize a JitDumpAgent and write out the header.
pub fn new() -> Result<Self> {
let mut file = PERFMAP_FILE.lock().unwrap();
if file.is_none() {
let filename = format!("/tmp/perf-{}.map", process::id());
*file = Some(File::create(filename)?);
}
Ok(PerfMapAgent)
}

fn make_line(name: &str, addr: *const u8, len: usize) -> String {
format!("{:#x} {len} {name}\n", addr as usize)
bnjbvr marked this conversation as resolved.
Show resolved Hide resolved
}
}

impl ProfilingAgent for PerfMapAgent {
/// Sent when a method is compiled and loaded into memory by the VM.
fn module_load(&self, module: &CompiledModule, _dbg_image: Option<&[u8]>) {
let mut file = PERFMAP_FILE.lock().unwrap();
let file = file.as_mut().unwrap();

for (idx, func) in module.finished_functions() {
let addr = func.as_ptr();
let len = func.len();
let name = super::debug_name(module, idx);
let _ = file.write_all(Self::make_line(&name, addr, len).as_bytes());
bnjbvr marked this conversation as resolved.
Show resolved Hide resolved
}

// Note: these are the trampolines into exported functions.
for (idx, func, len) in module.trampolines() {
let (addr, len) = (func as usize as *const u8, len);
let name = format!("wasm::trampoline[{}]", idx.index());
let _ = file.write_all(Self::make_line(&name, addr, len).as_bytes());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I missed this earlier, but I think we'll want ot do something other than ignoring errors here ideally. If an error happens it should probably "close" the file and terminate all future writing to it I suspect? (along with perhaps a warning message printed?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! I've added log messages and early-returns. Looks like a BufWriter will automatically flush in its Drop impl, so we can avoid doing it explicitly in most early-returns.

}
}

fn load_single_trampoline(
&self,
name: &str,
addr: *const u8,
size: usize,
_pid: u32,
_tid: u32,
) {
let mut file = PERFMAP_FILE.lock().unwrap();
let file = file.as_mut().unwrap();
let _ = file.write_all(Self::make_line(name, addr, size).as_bytes());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this may want a flush at the end too?

Also with a BufWriter I think you could pass that into make_line to avoid the intermediate string allocation (e.g. write directly to the buffer). Although not required of course, that's ok to defer to if it's ever actually an issue.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turns out File is also a Write implementator, so could have make_line take &mut dyn Write and use it here without having a BufWriter (since there's only one write in this function, it didn't seem worth having an extra buffer).

}
}
6 changes: 5 additions & 1 deletion crates/wasmtime/src/config.rs
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ use wasmparser::WasmFeatures;
#[cfg(feature = "cache")]
use wasmtime_cache::CacheConfig;
use wasmtime_environ::Tunables;
use wasmtime_jit::{JitDumpAgent, NullProfilerAgent, ProfilingAgent, VTuneAgent};
use wasmtime_jit::{JitDumpAgent, NullProfilerAgent, PerfMapAgent, ProfilingAgent, VTuneAgent};
use wasmtime_runtime::{InstanceAllocator, OnDemandInstanceAllocator, RuntimeMemoryCreator};

pub use wasmtime_environ::CacheStore;
Expand Down Expand Up @@ -1536,6 +1536,7 @@ impl Config {

pub(crate) fn build_profiler(&self) -> Result<Box<dyn ProfilingAgent>> {
Ok(match self.profiling_strategy {
ProfilingStrategy::PerfMap => Box::new(PerfMapAgent::new()?) as Box<dyn ProfilingAgent>,
ProfilingStrategy::JitDump => Box::new(JitDumpAgent::new()?) as Box<dyn ProfilingAgent>,
ProfilingStrategy::VTune => Box::new(VTuneAgent::new()?) as Box<dyn ProfilingAgent>,
ProfilingStrategy::None => Box::new(NullProfilerAgent),
Expand Down Expand Up @@ -1732,6 +1733,9 @@ pub enum ProfilingStrategy {
/// No profiler support.
None,

/// Collect function name information as the "perf map" file format, used with `perf` on Linux.
PerfMap,

/// Collect profiling info for "jitdump" file format, used with `perf` on
/// Linux.
JitDump,
Expand Down
53 changes: 53 additions & 0 deletions docs/examples-profiling-perf.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,59 @@ an extremely powerful profiler with lots of documentation on the web, but for
the rest of this section we'll assume you're running on Linux and already have
`perf` installed.

There are two profiling agents for `perf`:

- a very simple one that will map code regions to symbol names: `perfmap`.
- a more detailed one that can provide additional information and mappings between the source
language statements and generated JIT code: `jitdump`.

## Profiling with `perfmap`

Simple profiling support with `perf` generates a "perf map" file that the `perf` CLI will
automatically look for, when running into unresolved symbols. This requires runtime support from
Wasmtime itself, so you will need to manually change a few things to enable profiling support in
your application. Enabling runtime support depends on how you're using Wasmtime:

* **Rust API** - you'll want to call the [`Config::profiler`] method with
`ProfilingStrategy::PerfMap` to enable profiling of your wasm modules.

* **C API** - you'll want to call the `wasmtime_config_profiler_set` API with a
`WASMTIME_PROFILING_STRATEGY_PERFMAP` value.

* **Command Line** - you'll want to pass the `--perfmap` flag on the command
line.

Once perfmap support is enabled, you'll use `perf record` like usual to record
your application's performance.

For example if you're using the CLI, you'll execute:

```sh
$ perf record -k mono wasmtime --perfmap foo.wasm
```

This will create a `perf.data` file as per usual, but it will *also* create a
`/tmp/perf-XXXX.map` file. This extra `.map` file is the perf map file which is
specified by `perf` and Wasmtime generates at runtime.

After that you can explore the `perf.data` profile as you usually would, for example with:

```sh
$ perf report --input perf.data
```

You should be able to see time spent in wasm functions, generate flamegraphs based on that, etc..
You should also see entries for wasm functions show up as one function and the name of each
function matches the debug name section in the wasm file.

Note that support for perfmap is still relatively new in Wasmtime, so if you
have any problems, please don't hesitate to [file an issue]!

[file an issue]: https://github.com/bytecodealliance/wasmtime/issues/new


## Profiling with `jitdump`

Profiling support with `perf` uses the "jitdump" support in the `perf` CLI. This
requires runtime support from Wasmtime itself, so you will need to manually
change a few things to enable profiling support in your application. First
Expand Down