Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat: Vm Profiling #1789

Open
wants to merge 7 commits into
base: dev
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
44 changes: 43 additions & 1 deletion Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Binary file added book/profiling.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
89 changes: 34 additions & 55 deletions book/writing-programs/cycle-tracking.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,69 +56,48 @@ fn main() {

This will log the cycle count for `block name` and include it in the `ExecutionReport` in the `cycle_tracker` map.

## Tracking Cycles with Tracing
### Profiling a ZKVM program

The `cycle-tracker` annotation is a convenient way to track cycles for specific sections of code. However, sometimes it can also be useful to track what functions are taking the most cycles across the entire program, without having to annotate every function individually.
Profiling a zkVM program produces a profile ([example link](https://share.firefox.dev/3Om1pzz)) which makes it easy to examine program performance and see exactly where exactly VM cycles are being spent without needing to modify the program at all.

First, we need to generate a trace file of the program counter at each cycle while the program is executing. This can be done by simply setting the `TRACE_FILE` environment variable with the path of the file you want to write the trace to. For example, you can run the following command in the `script` directory for any example program:
To profile a program, you simply need to:
1. Enable the profiling feature for `sp1-sdk` in `script/Cargo.toml`
2. Set the env variable `TRACE_FILE=trace.json` and then call `ProverClient::execute()` in your script.

```bash
TRACE_FILE=trace.log RUST_LOG=info cargo run --release
```
If you're executing a larger program (>100M cycles), you should set `TRACE_SAMPLE_RATE` to reduce the size of the trace file. A sample rate of `1000` means that 1 in every 1000 VM cycles is sampled. By default, every cycle is sampled.

When the `TRACE_FILE` environment variable is set, as SP1's RISC-V runtime is executing, it will write a log of the program counter to the file specified by `TRACE_FILE`.
Many examples can be found in the repo, such as this ['fibonacci'](https://github.com/succinctlabs/sp1/blob/12f212e386ae4c2da30cf6a61a7d87615d56bdac/examples/fibonacci/script/src/main.rs#L22) script.

Next, we can use the `cargo prove` CLI with the `trace` command to analyze the trace file and generate a table of instruction counts. This can be done with the following command:
Once you have your script it should look like the following:
```rs
// Execute the program using the `ProverClient.execute` method, without generating a proof.
let (_, report) = client.execute(ELF, stdin.clone()).run().unwrap();
```

```bash
cargo prove trace --elf <path_to_program_elf> --trace <path_to_trace_file>
As well you must enable the profiling feature on the SDK:
```toml
sp1-sdk = { version = "3.0.0", features = ["profiling"] }
```

The `trace` command will generate a table of instruction counts, sorted by the number of cycles spent in each function. The output will look something like this:
The `TRACE_FILE` env var tells the executor where to save the profile, and the `TRACE_SAMPLE_RATE` env var tells the executor how often to sample the program.
A larger sample rate will give you a smaller profile, it is the number of instructions in between each sample.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: ... profile. It is the ...


The full command to profile should look something like this
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should just provide an end-to-end example of how to profile the fibonacci example or something. Right now these docs are a bit redundant. For example, I don't really like the overview section at the top. If you explain stuff step by step in your main explanation, I think that will probably suffice

```sh
TRACE_FILE=output.json TRACE_SAMPLE_RATE=100 cargo run ...
```
[00:00:00] [########################################] 17053/17053 (0s)

Total instructions in trace: 17053


Instruction counts considering call graph
+----------------------------------------+-------------------+
| Function Name | Instruction Count |
| __start | 17045 |
| main | 12492 |
| sp1_zkvm::syscalls::halt::syscall_halt | 4445 |
| sha2::sha256::compress256 | 4072 |
| sp1_lib::io::commit | 258 |
| sp1_lib::io::SyscallWriter::write | 255 |
| syscall_write | 195 |
| memcpy | 176 |
| memset | 109 |
| sp1_lib::io::read_vec | 71 |
| __rust_alloc | 29 |
| sp1_zkvm::heap::SimpleAlloc::alloc | 22 |
| syscall_hint_len | 3 |
| syscall_hint_read | 2 |
+----------------------------------------+-------------------+


Instruction counts ignoring call graph
+----------------------------------------+-------------------+
| Function Name | Instruction Count |
| main | 12075 |
| sha2::sha256::compress256 | 4073 |
| sp1_zkvm::syscalls::halt::syscall_halt | 219 |
| memcpy | 180 |
| syscall_write | 123 |
| memset | 111 |
| sp1_lib::io::commit | 88 |
| sp1_lib::io::SyscallWriter::write | 60 |
| __start | 45 |
| sp1_lib::io::read_vec | 35 |
| sp1_zkvm::heap::SimpleAlloc::alloc | 23 |
| anonymous | 7 |
| __rust_alloc | 7 |
| syscall_hint_len | 4 |
| syscall_hint_read | 3 |
+----------------------------------------+-------------------+

To view these profiles, we recommend [Samply](https://github.com/mstange/samply).
```sh
cargo install --locked samply
samply load output.json
```

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: capitalize Firefox, program's execution.

@ctian1 thoughts on a screenshot?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

grammar mistake unsolved

Samply uses the Firefox profiler to create a nice visualization of your programs execution.
![An example screenshot of the Firefox Profiler](../profiling.png)

#### Interpreting the Profile
- The "time" measurement in the profiler is actually the number of cycles spent,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is good, it would be nice to explain what each view in the profiler is and how to interpret it, like call tree (invert call stack), flame graph, the timeline at the top

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

like some people might not know how to read the flame graph, for example

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or they might not know to invert call stack in order to find the most frequent functions

in general the less cycles for a given callframe the better.

- The CPU usage of the program will always be constant, as its running in the VM which is single threaded.
2 changes: 1 addition & 1 deletion crates/cli/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -43,4 +43,4 @@ regex = "1.5.4"
prettytable-rs = "0.10"
textwrap = "0.16.0"
ctrlc = "3.4.2"
cargo_metadata = "0.18.1"
cargo_metadata = "0.18.1"
4 changes: 1 addition & 3 deletions crates/cli/src/bin/cargo-prove.rs
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ use clap::{Parser, Subcommand};
use sp1_cli::{
commands::{
build::BuildCmd, build_toolchain::BuildToolchainCmd,
install_toolchain::InstallToolchainCmd, new::NewCmd, trace::TraceCmd, vkey::VkeyCmd,
install_toolchain::InstallToolchainCmd, new::NewCmd, vkey::VkeyCmd,
},
SP1_VERSION_MESSAGE,
};
Expand All @@ -27,7 +27,6 @@ pub enum ProveCliCommands {
Build(BuildCmd),
BuildToolchain(BuildToolchainCmd),
InstallToolchain(InstallToolchainCmd),
Trace(TraceCmd),
Vkey(VkeyCmd),
}

Expand All @@ -39,7 +38,6 @@ fn main() -> Result<()> {
ProveCliCommands::Build(cmd) => cmd.run(),
ProveCliCommands::BuildToolchain(cmd) => cmd.run(),
ProveCliCommands::InstallToolchain(cmd) => cmd.run(),
ProveCliCommands::Trace(cmd) => cmd.run(),
ProveCliCommands::Vkey(cmd) => cmd.run(),
}
}
1 change: 0 additions & 1 deletion crates/cli/src/commands/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -2,5 +2,4 @@ pub mod build;
pub mod build_toolchain;
pub mod install_toolchain;
pub mod new;
pub mod trace;
pub mod vkey;
Loading
Loading