Skip to content

Commit

Permalink
Merge pull request #507 from kroma-network/feat/integrate-perfetto-fo…
Browse files Browse the repository at this point in the history
…r-profiling

feat: integrate perfetto for profiling
  • Loading branch information
chokobole authored Aug 9, 2024
2 parents a759bfe + b7caabf commit a1fa13f
Show file tree
Hide file tree
Showing 53 changed files with 722 additions and 210 deletions.
9 changes: 9 additions & 0 deletions bazel/tachyon_deps.bzl
Original file line number Diff line number Diff line change
Expand Up @@ -159,6 +159,15 @@ def tachyon_deps():
patches = ["@kroma_network_tachyon//third_party/cxx_rs:add_more_args_to_cxx_bridge.patch"],
)

if not native.existing_rule("perfetto"):
http_archive(
name = "perfetto",
sha256 = "dfc9b645c020d7a7469bae73d7432545b8005411c8176f46f04875058df0aa97",
strip_prefix = "perfetto-46.0",
urls = ["https://github.com/google/perfetto/archive/refs/tags/v46.0.tar.gz"],
build_file = "@kroma_network_tachyon//third_party:perfetto/perfetto.BUILD",
)

if not native.existing_rule("rules_pkg"):
http_archive(
name = "rules_pkg",
Expand Down
11 changes: 11 additions & 0 deletions docs/how_to_contribute/conventions.md
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,17 @@ We use `VLOG` messages in our code to show us the progress of the current runnin
- Output of the proof
- Output of the verifying key serialization

## Profiling with Perfetto

We are currently using Perfetto for tracing and profiling Tachyon. Perfetto provides two primary macros for tracing events: `TRACE_EVENT()` and `TRACE_EVENT_BEGIN()`|`TRACE_EVENT_END()`.

- **`TRACE_EVENT()`**: Use this macro when you want the trace slice to end when the scope it is defined in ends. This is the preferred method whenever possible as it simplifies the code and ensures the trace slice duration is managed by the scope itself.
- **`TRACE_EVENT_BEGIN()` and `TRACE_EVENT_END()`**: Use these macros when you need to manually specify the beginning and end of a trace slice. This approach is suitable for tracing code segments that are too long or complex for a single scope, where adding a scope would be impractical.

### Trace Categories

The trace categories are defined in the [`profiler.h`](/tachyon/base/profiler.h) header. When defining trace names, avoid adding scope to the trace name whenever possible to decrease verbosity. Note that private functions that are always called from other functions with a scoped trace name do not need their own scoped trace names. Scoped trace names should only be used when two different scopes can be described with the same trace name.

## Commits

### Commit Type Classification
Expand Down
5 changes: 5 additions & 0 deletions docs/how_to_use/how_to_build.md
Original file line number Diff line number Diff line change
Expand Up @@ -350,6 +350,11 @@ bazel build --config ${os} --//:has_asm_prime_field=false //...

## Performance Tuning

### Visualizing and Profiling Traces

Tachyon utilizes [Perfetto](https://perfetto.dev/) for low-overhead profiling. You can visualize the generated trace by
uploading it to the [Perfetto Trace Viewer](https://ui.perfetto.dev/). Typically, our traces are generated in the `/tmp` directory with a `perfetto-trace` extension.

### Use Intel OpenMP Runtime Library(libiomp)

By default, Tachyon uses GNU OpenMP (GNU `libgomp`) for parallel computation. On Intel platforms, Intel OpenMP Runtime Library (`libiomp`) provides OpenMP API specification support. It sometimes brings more performance benefits compared to `libgomp`.
Expand Down
11 changes: 11 additions & 0 deletions tachyon/base/BUILD.bazel
Original file line number Diff line number Diff line change
Expand Up @@ -114,6 +114,17 @@ tachyon_cc_library(
],
)

tachyon_cc_library(
name = "profiler",
srcs = ["profiler.cc"],
hdrs = ["profiler.h"],
deps = [
"//tachyon:export",
"//tachyon/base/files:file",
"@perfetto",
],
)

tachyon_cc_library(
name = "random",
srcs = ["random.cc"],
Expand Down
47 changes: 47 additions & 0 deletions tachyon/base/profiler.cc
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
#include "tachyon/base/profiler.h"

PERFETTO_TRACK_EVENT_STATIC_STORAGE();

namespace tachyon::base {

Profiler::Profiler() : Profiler(Options{}) {}

Profiler::Profiler(const Options& options)
: trace_filepath_(options.output_path),
trace_file_(trace_filepath_,
base::File::FLAG_CREATE_ALWAYS | base::File::FLAG_WRITE),
max_size_kb_(options.max_size_kb) {}

Profiler::~Profiler() { Stop(); }

void Profiler::Init() {
perfetto::TracingInitArgs args;
args.backends |= perfetto::kInProcessBackend;
perfetto::Tracing::Initialize(args);
perfetto::TrackEvent::Register();
}

void Profiler::DisableCategories(std::string_view category) {
track_event_cfg_.add_disabled_categories(std::string(category));
}

void Profiler::EnableCategories(std::string_view category) {
track_event_cfg_.add_enabled_categories(std::string(category));
}

void Profiler::Start() {
perfetto::TraceConfig cfg;
cfg.add_buffers()->set_size_kb(max_size_kb_);

auto* ds_cfg = cfg.add_data_sources()->mutable_config();
ds_cfg->set_name("track_event");
ds_cfg->set_track_event_config_raw(track_event_cfg_.SerializeAsString());

tracing_session_ = perfetto::Tracing::NewTrace();
tracing_session_->Setup(cfg, trace_file_.GetPlatformFile());
tracing_session_->StartBlocking();
}

void Profiler::Stop() { tracing_session_->StopBlocking(); }

} // namespace tachyon::base
56 changes: 56 additions & 0 deletions tachyon/base/profiler.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
#ifndef TACHYON_BASE_PROFILER_H_
#define TACHYON_BASE_PROFILER_H_

#include <memory>
#include <string>

#include "third_party/perfetto/perfetto.h"

#include "tachyon/base/files/file.h"
#include "tachyon/export.h"

PERFETTO_DEFINE_CATEGORIES(
perfetto::Category("Utils").SetDescription("Base utility functions"),
perfetto::Category("Subtask").SetDescription(
"Subtask within a bigger task"),
perfetto::Category("MSM").SetDescription(
"Multi Scalar Multiplication operations"),
perfetto::Category("ProofGeneration")
.SetDescription("The proof generation process"),
perfetto::Category("ProofVerification")
.SetDescription("The proof verification process"),
perfetto::Category("EvaluationDomain")
.SetDescription("Evaluation Domain operations"));

namespace tachyon::base {

class TACHYON_EXPORT Profiler {
public:
struct Options {
constexpr static size_t kDefaultMaxSizeKB = 1e6;

base::FilePath output_path = base::FilePath("/tmp/tachyon.perfetto-trace");
size_t max_size_kb = kDefaultMaxSizeKB;
};

Profiler();
explicit Profiler(const Options& options);
~Profiler();

void Init();
void DisableCategories(std::string_view category);
void EnableCategories(std::string_view category);
void Start();
void Stop();

private:
perfetto::protos::gen::TrackEventConfig track_event_cfg_;
std::unique_ptr<perfetto::TracingSession> tracing_session_;
FilePath trace_filepath_;
File trace_file_;
size_t max_size_kb_;
};

} // namespace tachyon::base

#endif // TACHYON_BASE_PROFILER_H_
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ tachyon_cuda_library(
":icicle_msm",
":icicle_msm_utils",
"//tachyon/base:bit_cast",
"//tachyon/base:profiler",
"//tachyon/device/gpu:gpu_enums",
"//tachyon/device/gpu:gpu_logging",
"@icicle//:msm_bls12_381",
Expand All @@ -28,6 +29,7 @@ tachyon_cuda_library(
":icicle_msm",
":icicle_msm_utils",
"//tachyon/base:bit_cast",
"//tachyon/base:profiler",
"//tachyon/device/gpu:gpu_enums",
"//tachyon/device/gpu:gpu_logging",
"@icicle//:msm_bls12_381",
Expand All @@ -45,6 +47,7 @@ tachyon_cuda_library(
":icicle_msm",
":icicle_msm_utils",
"//tachyon/base:bit_cast",
"//tachyon/base:profiler",
"//tachyon/device/gpu:gpu_enums",
"//tachyon/device/gpu:gpu_logging",
"@icicle//:msm_bn254",
Expand All @@ -62,6 +65,7 @@ tachyon_cuda_library(
":icicle_msm",
":icicle_msm_utils",
"//tachyon/base:bit_cast",
"//tachyon/base:profiler",
"//tachyon/device/gpu:gpu_enums",
"//tachyon/device/gpu:gpu_logging",
"@icicle//:msm_bn254",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ bool IcicleMSM<bls12_381::G1AffinePoint>::Run(
#if FIELD_ID != BLS12_381
#error Only BLS12_381 is supported
#endif
TRACE_EVENT("MSM", "Icicle::MSM");

size_t bases_size = cpu_bases.size();
size_t scalars_size = cpu_scalars.size();
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ bool IcicleMSM<bls12_381::G2AffinePoint>::Run(
#if FIELD_ID != BLS12_381
#error Only BLS12_381 is supported
#endif
TRACE_EVENT("MSM", "Icicle::MSM");

size_t bases_size = cpu_bases.size();
size_t scalars_size = cpu_scalars.size();
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ bool IcicleMSM<bn254::G1AffinePoint>::Run(
#if FIELD_ID != BN254
#error Only BN254 is supported
#endif
TRACE_EVENT("MSM", "Icicle::MSM");

size_t bases_size = cpu_bases.size();
size_t scalars_size = cpu_scalars.size();
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ bool IcicleMSM<bn254::G2AffinePoint>::Run(
#if FIELD_ID != BN254
#error Only BN254 is supported
#endif
TRACE_EVENT("MSM", "Icicle::MSM");

size_t bases_size = cpu_bases.size();
size_t scalars_size = cpu_scalars.size();
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ tachyon_cc_library(
deps = [
":pippenger_base",
"//tachyon/base:openmp_util",
"//tachyon/base:profiler",
"//tachyon/math/elliptic_curves:semigroups",
"//tachyon/math/elliptic_curves/msm:msm_ctx",
"//tachyon/math/elliptic_curves/msm:msm_util",
Expand All @@ -22,13 +23,17 @@ tachyon_cc_library(
tachyon_cc_library(
name = "pippenger_adapter",
hdrs = ["pippenger_adapter.h"],
deps = [":pippenger"],
deps = [
":pippenger",
"//tachyon/base:profiler",
],
)

tachyon_cc_library(
name = "pippenger_base",
hdrs = ["pippenger_base.h"],
deps = [
"//tachyon/base:profiler",
"//tachyon/base/containers:adapters",
"//tachyon/math/base:semigroups",
"//tachyon/math/geometry:affine_point",
Expand Down
27 changes: 21 additions & 6 deletions tachyon/math/elliptic_curves/msm/algorithms/pippenger/pippenger.h
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
#include <vector>

#include "tachyon/base/openmp_util.h"
#include "tachyon/base/profiler.h"
#include "tachyon/math/base/big_int.h"
#include "tachyon/math/elliptic_curves/msm/algorithms/pippenger/pippenger_base.h"
#include "tachyon/math/elliptic_curves/msm/msm_ctx.h"
Expand Down Expand Up @@ -78,6 +79,7 @@ class Pippenger : public PippengerBase<Point> {
BaseInputIterator bases_last,
ScalarInputIterator scalars_first,
ScalarInputIterator scalars_last, Bucket* ret) {
TRACE_EVENT("MSM", "Pippenger::Run");
size_t bases_size = std::distance(bases_first, bases_last);
size_t scalars_size = std::distance(scalars_first, scalars_last);
if (bases_size != scalars_size) {
Expand Down Expand Up @@ -112,6 +114,7 @@ class Pippenger : public PippengerBase<Point> {
BaseInputIterator bases_it,
const std::vector<std::vector<int64_t>>& scalar_digits, size_t i,
Bucket* window_sum, bool is_last_window) {
TRACE_EVENT("Utils", "AccumulateSingleWindowNAFSum");
size_t bucket_size;
if (is_last_window) {
bucket_size = 1 << ctx_.window_bits;
Expand All @@ -135,21 +138,29 @@ class Pippenger : public PippengerBase<Point> {
void AccumulateWindowNAFSums(BaseInputIterator bases_first,
absl::Span<const BigInt<N>> scalars,
std::vector<Bucket>* window_sums) {
TRACE_EVENT("Utils", "AccumulateWindowNAFSums");

std::vector<std::vector<int64_t>> scalar_digits;
scalar_digits.resize(scalars.size());
for (std::vector<int64_t>& scalar_digit : scalar_digits) {
scalar_digit.resize(ctx_.window_count);
}
for (size_t i = 0; i < scalars.size(); ++i) {
FillDigits(scalars[i], ctx_.window_bits, &scalar_digits[i]);
{
TRACE_EVENT("Subtask", "InitAndFillScalars");
scalar_digits.resize(scalars.size());
for (std::vector<int64_t>& scalar_digit : scalar_digits) {
scalar_digit.resize(ctx_.window_count);
}
for (size_t i = 0; i < scalars.size(); ++i) {
FillDigits(scalars[i], ctx_.window_bits, &scalar_digits[i]);
}
}

if (parallel_windows_) {
TRACE_EVENT("Subtask", "ParallelWindows");
OMP_PARALLEL_FOR(size_t i = 0; i < ctx_.window_count; ++i) {
AccumulateSingleWindowNAFSum(bases_first, scalar_digits, i,
&(*window_sums)[i],
i == ctx_.window_count - 1);
}
} else {
TRACE_EVENT("Subtask", "SerialWindows");
for (size_t i = 0; i < ctx_.window_count; ++i) {
AccumulateSingleWindowNAFSum(bases_first, scalar_digits, i,
&(*window_sums)[i],
Expand All @@ -162,6 +173,7 @@ class Pippenger : public PippengerBase<Point> {
void AccumulateSingleWindowSum(BaseInputIterator bases_first,
absl::Span<const BigInt<N>> scalars,
size_t window_offset, Bucket* out) {
TRACE_EVENT("Utils", "AccumulateSingleWindowSum");
Bucket window_sum = Bucket::Zero();
// We don't need the "zero" bucket, so we only have 2^{window_bits} - 1
// buckets.
Expand Down Expand Up @@ -202,12 +214,15 @@ class Pippenger : public PippengerBase<Point> {
void AccumulateWindowSums(BaseInputIterator bases_first,
absl::Span<const BigInt<N>> scalars,
std::vector<Bucket>* window_sums) {
TRACE_EVENT("Utils", "AccumulateWindowSums");
if (parallel_windows_) {
TRACE_EVENT("Subtask", "ParallelWindows");
OMP_PARALLEL_FOR(size_t i = 0; i < ctx_.window_count; ++i) {
AccumulateSingleWindowSum(bases_first, scalars, ctx_.window_bits * i,
&(*window_sums)[i]);
}
} else {
TRACE_EVENT("Subtask", "SerialWindows");
for (size_t i = 0; i < ctx_.window_count; ++i) {
AccumulateSingleWindowSum(bases_first, scalars, ctx_.window_bits * i,
&(*window_sums)[i]);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
#include <utility>
#include <vector>

#include "tachyon/base/profiler.h"
#include "tachyon/math/elliptic_curves/msm/algorithms/pippenger/pippenger.h"

namespace tachyon::math {
Expand Down Expand Up @@ -39,6 +40,9 @@ class PippengerAdapter {
ScalarInputIterator scalars_last,
PippengerParallelStrategy strategy,
Bucket* ret) {
TRACE_EVENT("MSM", "PippengerAdapter::RunWithStrategy", "strategy",
static_cast<int>(strategy));

if (strategy == PippengerParallelStrategy::kNone ||
strategy == PippengerParallelStrategy::kParallelWindow) {
Pippenger<Point> pippenger;
Expand Down Expand Up @@ -83,6 +87,7 @@ class PippengerAdapter {
std::vector<Result> results;
results.resize(num_chunks);
OMP_PARALLEL_FOR(size_t i = 0; i < num_chunks; ++i) {
TRACE_EVENT("Subtask", "ParallelLoop");
size_t start = i * chunk_size;
size_t len = i == num_chunks - 1 ? scalars_size - start : chunk_size;
Pippenger<Point> pippenger;
Expand All @@ -96,6 +101,7 @@ class PippengerAdapter {
scalars_end, &results[i].value);
}

TRACE_EVENT("Subtask", "CheckResultAndAccumulate");
bool all_good =
std::all_of(results.begin(), results.end(),
[](const Result& result) { return result.valid; });
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,7 @@ class PippengerBase {

static Bucket AccumulateWindowSums(absl::Span<const Bucket> window_sums,
size_t window_bits) {
TRACE_EVENT("Utils", "PippengerBase::AccumulateWindowSums");
// We store the sum for the lowest window.
Bucket lowest = window_sums.front();
window_sums.remove_prefix(1);
Expand Down
Loading

0 comments on commit a1fa13f

Please sign in to comment.