-
Notifications
You must be signed in to change notification settings - Fork 21
yperf
yperf
profiles, analyzes profiling data and auto-advises for SW optimization in a single pass on your workload.
It is an automated rather quick implementation of the From Top-down Microarchitecture Analysis to Structured Performance Optimizations idea.
yperf
profiles a workload on Intel Xeon - Sapphire Rapids or newer recommended, maps metrics to source/machine code, and advises on how to speed up critical hotspots.
It guides developers toward relevant optimizations that can address the bottleneck at hand, leveraging in-depth analysis. And the best part? It's all automated!
📊 While this a proof-of-concept, this initial version supports 8 popular software optimizations that span four distinct performance bottlenecks.
Supported SW optimizations include:
- De-virtualization of indirect branches
- If-conversion to Conditional Moves
- Employing a Profile-Guided Optimizer like BOLT or autoFDO, if applicable
- Loop Unrolling
- Code Alignment
- Function Inlining
The following performance bottlenecks are covered:
-
Mispredictions
Big Code
-
Instructions Fetch BW
(or throughput) Cache Memory Bandwidth
Command lines used in the demo:
git clone --recurse-submodules https://github.com/aayasin/perf-tools /path/to/perf-tools
cd /path/to/perf-tools
make tramp3d-v4
./yperf record -- clang++ -w -std=gnu++11 tramp3d-v4.cpp -o tramp3d-v4
./yperf report -- clang++ -w -std=gnu++11 tramp3d-v4.cpp -o tramp3d-v4
./yperf advise
For hyperlinked files, right-click the link and save the file with the name as it appears in the hyperlink.