C++ links: performance tools
-
benchmark (Google)
-
Celero
-
hayai - the C++ benchmarking framework
-
moodycamel::microbench
-
geiger: A micro benchmark library in C++ that supports hardware performance counters
-
Nonius
-
Examples:
-
Systems Benchmarking Crimes - Gernot Heiser - https://www.cse.unsw.edu.au/~gernot/benchmarking-crimes.html
- Intel Memory Latency Checker (MLC)
- a tool used to measure memory latencies and bandwidth, and how they change with increasing load on the system
- https://www.intel.com/software/mlc
- Memory Bandwidth Benchmark
- MBW determines the "copy" memory bandwidth available to userspace programs. Its simplistic approach models that of real applications. It is not tuned to extremes and it is not aware of hardware architecture, just like your average software package.
- https://github.com/raas/mbw
- pmbw: Parallel Memory Bandwidth Benchmark / Measurement
- a set of assembler routines to measure the parallel memory (cache and RAM) bandwidth of modern multi-core machines
- http://panthema.net/2013/pmbw/
- https://github.com/bingmann/pmbw
- STREAM: Sustainable Memory Bandwidth in High Performance Computers
- http://www.cs.virginia.edu/stream/
- STREAM benchmark - https://github.com/jeffhammond/STREAM
- NUMA-STREAM - https://github.com/larsbergstrom/NUMA-STREAM
- tinymembench: simple benchmark for memory throughput and latency
- Heaptrack - A Heap Memory Profiler for Linux
- MALT & NUMAPROF: Memory Profiling for HPC Applications
- NUMAPROF: a NUMA memory profiler based on Pintool to track remote memory accesses
- MALT: a MALloc Tracker to find where and how your made your memory allocations in C/C++/Fortran applications
- MALT: A Malloc Tracker
- International Workshop on Software Engineering for Parallel Systems (SEPS) 2017
- Sébastien Valat, Andres S. Charif-Rubial, William Jalby
- paper: https://memtt.github.io/malt/downloads/2017-seps-malt.pdf
- slides: https://svalat.github.io/docs/2017-10-MALT-SEPS17.pdf
- FOSDEM 2019; Sébastien Valat
- Memoro: A Detailed Heap Profiler
- https://epfl-vlsc.github.io/memoro/
- https://github.com/epfl-vlsc/memoro
- Detailed Heap Profiling
- International Symposium on Memory Management (ISMM) 2018
- Stuart Byma, Jim Larus
- https://dl.acm.org/citation.cfm?id=3210564
- memtrail: A LD_PRELOAD based memory profiler and leak detector for Linux
- memusage - profile memory usage of a program
- MTuner - a C/C++ memory profiler and memory leak finder for Windows, PlayStation 4, PlayStation 3, etc.
- Typegrind
- a type preserving heap profiler for C++ - collects memory allocation information with type information
- https://typegrind.github.io/
- https://github.com/typegrind/typegrind
- Tools for microarchitectural benchmarking
- Intel Architecture Code Analyzer (IACA)
- ibench: Measure instruction latency and throughput
- llvm-exegesis – LLVM Machine Instruction Benchmark
- https://llvm.org/docs/CommandGuide/llvm-exegesis.html
- https://github.com/llvm-mirror/llvm/tree/master/tools/llvm-exegesis
- Static Performance Analysis with LLVM
- 2018 European LLVM Developers Meeting
- C. Courbet, O. Sykora, G. Chatelet, B. De Backer
- https://youtu.be/XinMk-t8N-w
- http://llvm.org/devmtg/2018-04/slides/Courbet-Static%20Performance%20Analysis%20with%20LLVM.pdf
- Measuring x86 instruction latencies with LLVM
- 2018 European LLVM Developers Meeting
- G. Chatelet, C. Courbet, B. De Backer, O. Sykora
- https://youtu.be/ex_C27OoApI
- http://llvm.org/devmtg/2018-04/slides/Chatelet-Measuring%20x86%20instruction%20latencies%20with%20LLVM.pdf
- llvm-mca - LLVM Machine Code Analyzer
- https://llvm.org/docs/CommandGuide/llvm-mca.html
- https://github.com/llvm-mirror/llvm/tree/master/tools/llvm-mca
- Understanding the performance of code using LLVM's Machine Code Analyzer (llvm-mca)
- 2018 LLVM Developers’ Meeting; Andrea Di Biagio & Matt Davis
- https://www.youtube.com/watch?v=Ku2D8bjEGXk
- nanoBench: A tool for running small microbenchmarks on recent Intel and AMD x86 CPUs
- used for running the microbenchmarks for obtaining the latency, throughput, and port usage data available on http://uops.info
- https://github.com/andreas-abel/nanoBench
- uops.info: Characterizing Latency, Throughput, and Port Usage of Instructions on Intel Microarchitectures
- ASPLOS 2019
- Andreas Abel, Jan Reineke
- https://arxiv.org/abs/1810.04610
- OSACA: Open Source Architecture Code Analyzer
- https://github.com/RRZE-HPC/osaca
- https://hpc.fau.de/research/tools/
- Automated Instruction Stream Throughput Prediction for Intel and AMD Microarchitectures
- Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS) 2018
- https://arxiv.org/abs/1809.00912
- uarch-bench: A benchmark for low-level CPU micro-architectural features
- BOLT: Binary Optimization and Layout Tool
- A linux command-line utility used for optimizing performance of binaries
- https://github.com/facebookincubator/BOLT
- Accelerate large-scale applications with BOLT
- Building Binary Optimizer with LLVM
- 2016 EuroLLVM Developers' Meeting; Maksim Panchenko
- https://llvm.org/devmtg/2016-03/Presentations/BOLT_EuroLLVM_2016.pdf
- https://www.youtube.com/watch?v=gw3iDO3By5Y
- BOLT: A Practical Binary Optimizer for Data Centers and Beyond
- Maksim Panchenko, Rafael Auler, Bill Nell, Guilherme Ottoni
- https://arxiv.org/abs/1807.06735
- MAQAO (Modular Assembly Quality Analyzer and Optimizer)
- Agner Fog's test programs for measuring clock cycles and performance monitoring
- BCC - Tools for BPF-based Linux IO analysis, networking, monitoring, and more
- https://iovisor.github.io/bcc/
- https://github.com/iovisor/bcc
- https://github.com/iovisor/bpf-docs
- http://www.brendangregg.com/blog/2016-03-05/linux-bpf-superpowers.html
- http://www.brendangregg.com/blog/2016-03-28/linux-bpf-bcc-road-ahead-2016.html
- http://www.brendangregg.com/blog/2016-06-14/ubuntu-xenial-bcc-bpf.html
- https://qmonnet.github.io/whirl-offload/2016/09/01/dive-into-bpf/
- easy_profiler: Lightweight cross-platform profiler library for C++
- Event Tracing for Windows (ETW) / Windows Performance Toolkit – Xperf
- gperftools (originally Google Performance Tools)
- "The fastest malloc we’ve seen; works particularly well with threads and STL. Also: thread-friendly heap-checker, heap-profiler, and cpu-profiler."
- https://github.com/gperftools/gperftools
- gprof2dot
- "Python script to convert the output from many profilers into a dot graph."
- https://github.com/jrfonseca/gprof2dot
- Hotspot - the Linux perf GUI for performance analysis
- Likwid: Performance monitoring and benchmarking suite
- https://github.com/RRZE-HPC/likwid
- https://github.com/RRZE-HPC/likwid/wiki
- https://github.com/RRZE-HPC/likwid/wiki/TutorialStart
- https://github.com/RRZE-HPC/likwid/wiki/likwid-perfctr
- https://github.com/RRZE-HPC/likwid/wiki/TutorialMarkerC
- https://github.com/RRZE-HPC/likwid/wiki/PatternsHaswellEP
- microprofile: an embeddable profiler
- perf
- perf-tools - https://github.com/brendangregg/perf-tools
- perf_events: The Unofficial Linux Perf Events Web-Page
- perfmon2 - http://perfmon2.sourceforge.net/
- "Perfmon2 aims to be a portable interface across all modern processors. It is designed to give full access to a given PMU and all the corresponding hardware performance counters. Typically the PMU hardware implementations use a different number of registers, counters with different length and possibly other unique features, a complexity that the software has to cope with. Although processors have different PMU implementations, they usually use configurations registers and data registers. Perfmon2 provides a uniform abstract model of these registers and exports read/write operations accordingly."
- Performance Application Programming Interface (PAPI)
- pmu tools: Intel PMU profiling tools
- https://github.com/andikleen/pmu-tools
- https://github.com/andikleen/pmu-tools/wiki/toplev-manual
- pmu-tools part I - introduction, ocperf - http://halobates.de/blog/p/245
- pmu-tools part II - toplev - http://halobates.de/blog/p/262
- Processor Counter Monitor (PCM)
- https://github.com/opcm/pcm
- Intel Performance Counter Monitor (PCM) - http://www.intel.com/software/pcm
- Remotery: Single C file, Realtime CPU/GPU Profiler with Remote Web Viewer
- sysdig
- Tracy Profiler
- Tracy is a real time, nanosecond resolution frame profiler that can be used for remote or embedded telemetry of your application. It can profile CPU (C++, Lua), GPU (OpenGL, Vulkan) and memory. It also can display locks held by threads and their interactions with each other.
- https://bitbucket.org/wolfpld/tracy
- Introduction to the Tracy profiler - https://www.youtube.com/watch?v=fB5B46lbapc
- low-overhead-timers: Very low-overhead timer/counter interfaces for C on Intel 64 processors
- https://github.com/jdmccalpin/low-overhead-timers
- Comments on timing short code sections on Intel processors
- Flame Graphs
- FlameScope: a visualization tool for exploring different time ranges as Flame Graphs
- pprof - a tool for visualization and analysis of profiling data