Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to run nperf with "cargo bench"? #32

Open
brainstorm opened this issue May 19, 2023 · 5 comments
Open

How to run nperf with "cargo bench"? #32

brainstorm opened this issue May 19, 2023 · 5 comments

Comments

@brainstorm
Copy link

brainstorm commented May 19, 2023

I would like to profile benchmarks running instead of a target bin file directly. What would be a correct CLI syntax for that? I've tried the following unsuccessfully (the output datafile is not generated):

% cargo run record -P $(cargo bench) -w -o datafile
   Compiling htsget-benchmarks v0.1.0 (/Users/rvalls/dev/umccr/htsget-rs/htsget-benchmarks)
   Finished bench [optimized + debuginfo] target(s) in 7.37s
   Running benches/refserver_benchmarks.rs (/Users/rvalls/dev/umccr/htsget-rs/target/release/deps/refserver_benchmarks-fecd9ace2aeca2e9)
     Running benches/request_benchmarks.rs (/Users/rvalls/dev/umccr/htsget-rs/target/release/deps/request_benchmarks-f78a77d95f70b5d1)
     Running benches/search_benchmarks.rs (/Users/rvalls/dev/umccr/htsget-rs/target/release/deps/search_benchmarks-407b85d5c1d86b5d)
Benchmarking Queries/[LIGHT] Bam query all
Benchmarking Queries/[LIGHT] Bam query all: Warming up for 3.0000 s
Benchmarking Queries/[LIGHT] Bam query all: Collecting 50 samples in estimated 30.048 s (487k iterations)
Benchmarking Queries/[LIGHT] Bam query all: Analyzing
Benchmarking Queries/[LIGHT] Bam query specific
Benchmarking Queries/[LIGHT] Bam query specific: Warming up for 3.0000 s
Benchmarking Queries/[LIGHT] Bam query specific: Collecting 50 samples in estimated 30.260 s (66k iterations)
Benchmarking Queries/[LIGHT] Bam query specific: Analyzing
Benchmarking Queries/[LIGHT] Bam query header
Benchmarking Queries/[LIGHT] Bam query header: Warming up for 3.0000 s
Benchmarking Queries/[LIGHT] Bam query header: Collecting 50 samples in estimated 30.096 s (282k iterations)
Benchmarking Queries/[LIGHT] Bam query header: Analyzing
error: a bin target must be available for `cargo run`

/cc @mmalenic

@koute
Copy link
Owner

koute commented May 19, 2023

nperf record -P refserver_benchmarks-fecd9ace2aeca2e9 -w

The -P argument needs a name of the executable that it'll search for in the process list. In your case three executables are being run by cargo bench, so you'll need to run three nperf records.

@brainstorm
Copy link
Author

Thanks @koute, I thought about doing just that, but I didn't want to clash with other previously generated criterion-rs binaries. Would you be open to have some external profiler hooking capabilities for criterion-rs in not-perf?: https://bheisler.github.io/criterion.rs/book/user_guide/profiling.html#implementing-in-process-profiling-hooks

/cc @mmalenic

@koute
Copy link
Owner

koute commented May 22, 2023

Would you be open to have some external profiler hooking capabilities for criterion-rs in not-perf?: https://bheisler.github.io/criterion.rs/book/user_guide/profiling.html#implementing-in-process-profiling-hooks

What exactly would you like to do here? Do you mean having a crate that would pull not-perf in and implement that trait?

@brainstorm
Copy link
Author

What exactly would you like to do here? Do you mean having a crate that would pull not-perf in and implement that trait?

Precisely, something like https://www.jibbow.com/posts/criterion-flamegraphs/ but with not-perf, because the trace files generated by pprof are way too big to handle in GHA workers (~2-4GB on last test run)

@koute
Copy link
Owner

koute commented May 29, 2023

What exactly would you like to do here? Do you mean having a crate that would pull not-perf in and implement that trait?

Precisely, something like https://www.jibbow.com/posts/criterion-flamegraphs/ but with not-perf, because the trace files generated by pprof are way too big to handle in GHA workers (~2-4GB on last test run)

Well, I'd be fine with that. We'd probably want to add an new crate that'd expose an implementation of that trait and use not-perf's internals to do its work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants