Skip to content
Ahmad Yasin edited this page Nov 24, 2024 · 9 revisions

yperf profiles, analyzes profiling data and auto-advises for SW optimization in a single pass on your workload.

It is an automated rather quick implementation of the From Top-down Microarchitecture Analysis to Structured Performance Optimizations idea.

How does it work?

yperf profiles a workload on Intel Xeon - Sapphire Rapids or newer recommended, maps metrics to source/machine code, and advises on how to speed up critical hotspots. It guides developers toward relevant optimizations that can address the bottleneck at hand, leveraging in-depth analysis. And the best part? It's all automated!

📊 While this a proof-of-concept, this initial version supports 8 popular software optimizations that span four distinct performance bottlenecks.

Supported SW optimizations include:

  • De-virtualization of indirect branches
  • If-conversion to Conditional Moves
  • Employing a Profile-Guided Optimizer like BOLT or autoFDO, if applicable
  • Loop Unrolling
  • Code Alignment
  • Function Inlining

The following performance bottlenecks are covered:

  • Mispredictions
  • Big Code
  • Instructions Fetch BW (or throughput)
  • Cache Memory Bandwidth

Demo

perf-tools yperf demo

Command lines used in the demo:

git clone --recurse-submodules https://github.com/aayasin/perf-tools /path/to/perf-tools
cd /path/to/perf-tools
make tramp3d-v4
./yperf record -- clang++ -w -std=gnu++11 tramp3d-v4.cpp -o tramp3d-v4
./yperf report -- clang++ -w -std=gnu++11 tramp3d-v4.cpp -o tramp3d-v4
./yperf advise
Clone this wiki locally