-
Notifications
You must be signed in to change notification settings - Fork 232
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add support for Intel SapphireRapids (SPR) (#524)
* Intel Sapphire Rapids: Core files with FIXED, PMC, VOLTAGE, THERMAL and RAPL * Allow same registers for access as Intel IcelakeX * Add Intel Sapphire Rapids IDs and strings * By default add the fixed TOPDOWN_SLOTS event * Add CPU feature detection * Add energy monitoring interface * Add general in-core hardware performance monitoring * Add TOPDOWN_SLOTS to perf_event backend * Add support for hardware thread monitoring on Intel SapphireRapids * Remove MEM* groups, no support for uncore yet * Full Uncore support for Intel SapphireRapids * Fixed for multi-socket systems * Fix for direct access mode * Fix for debug output in perf_event backend * Add MEM groups * Fixes for HBM units and HBM group * Combined group for DDR and HBM measurements * Add HBM_SP and HBM_DP group, similar to MEM_SP/DP on SPR * Add unit [FLOP/Byte] to operational intensity. See #541 * Remote empty line. * Reset all bits of temporary variable * Some more checks in Intel's uncore discovery method * Revert setting return value to zero. Breaks lookup * Add missing M2PCIe units * Uncore Discovery: don't use memcpy but own byte-wise copy for reliable results * Add missing MDF units * Complete event file * Need more register indicies
- Loading branch information
1 parent
734cb94
commit 2c684a9
Showing
47 changed files
with
13,418 additions
and
126 deletions.
There are no files selected for viewing
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,32 @@ | ||
SHORT Branch prediction miss rate/ratio | ||
|
||
EVENTSET | ||
FIXC0 INSTR_RETIRED_ANY | ||
FIXC1 CPU_CLK_UNHALTED_CORE | ||
FIXC2 CPU_CLK_UNHALTED_REF | ||
FIXC3 TOPDOWN_SLOTS | ||
PMC0 BR_INST_RETIRED_ALL_BRANCHES | ||
PMC1 BR_MISP_RETIRED_ALL_BRANCHES | ||
|
||
METRICS | ||
Runtime (RDTSC) [s] time | ||
Runtime unhalted [s] FIXC1*inverseClock | ||
Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock | ||
CPI FIXC1/FIXC0 | ||
Branch rate PMC0/FIXC0 | ||
Branch misprediction rate PMC1/FIXC0 | ||
Branch misprediction ratio PMC1/PMC0 | ||
Instructions per branch FIXC0/PMC0 | ||
|
||
LONG | ||
Formulas: | ||
Branch rate = BR_INST_RETIRED_ALL_BRANCHES/INSTR_RETIRED_ANY | ||
Branch misprediction rate = BR_MISP_RETIRED_ALL_BRANCHES/INSTR_RETIRED_ANY | ||
Branch misprediction ratio = BR_MISP_RETIRED_ALL_BRANCHES/BR_INST_RETIRED_ALL_BRANCHES | ||
Instructions per branch = INSTR_RETIRED_ANY/BR_INST_RETIRED_ALL_BRANCHES | ||
- | ||
The rates state how often on average a branch or a mispredicted branch occurred | ||
per instruction retired in total. The branch misprediction ratio sets directly | ||
into relation what ratio of all branch instruction where mispredicted. | ||
Instructions per branch is 1/branch rate. | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,24 @@ | ||
SHORT Power and Energy consumption | ||
|
||
EVENTSET | ||
FIXC0 INSTR_RETIRED_ANY | ||
FIXC1 CPU_CLK_UNHALTED_CORE | ||
FIXC2 CPU_CLK_UNHALTED_REF | ||
FIXC3 TOPDOWN_SLOTS | ||
PWR0 PWR_PKG_ENERGY | ||
|
||
METRICS | ||
Runtime (RDTSC) [s] time | ||
Runtime unhalted [s] FIXC1*inverseClock | ||
Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock | ||
CPI FIXC1/FIXC0 | ||
Energy [J] PWR0 | ||
Power [W] PWR0/time | ||
|
||
LONG | ||
Formulas: | ||
Power = PWR_PKG_ENERGY / time | ||
- | ||
Sapphire Rapids implements the RAPL interface. This interface enables to | ||
monitor the consumed energy on the package (socket) level. | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,23 @@ | ||
SHORT Load to store ratio | ||
|
||
EVENTSET | ||
FIXC0 INSTR_RETIRED_ANY | ||
FIXC1 CPU_CLK_UNHALTED_CORE | ||
FIXC2 CPU_CLK_UNHALTED_REF | ||
FIXC3 TOPDOWN_SLOTS | ||
PMC0 MEM_INST_RETIRED_ALL_LOADS | ||
PMC1 MEM_INST_RETIRED_ALL_STORES | ||
|
||
METRICS | ||
Runtime (RDTSC) [s] time | ||
Runtime unhalted [s] FIXC1*inverseClock | ||
Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock | ||
CPI FIXC1/FIXC0 | ||
Load to store ratio PMC0/PMC1 | ||
|
||
LONG | ||
Formulas: | ||
Load to store ratio = MEM_INST_RETIRED_ALL_LOADS/MEM_INST_RETIRED_ALL_STORES | ||
- | ||
This is a metric to determine your load to store ratio. | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,112 @@ | ||
SHORT Memory bandwidth in MBytes/s for DDR and HBM | ||
|
||
EVENTSET | ||
FIXC0 INSTR_RETIRED_ANY | ||
FIXC1 CPU_CLK_UNHALTED_CORE | ||
FIXC2 CPU_CLK_UNHALTED_REF | ||
FIXC3 TOPDOWN_SLOTS | ||
MBOX0C0 CAS_COUNT_RD | ||
MBOX0C1 CAS_COUNT_WR | ||
MBOX1C0 CAS_COUNT_RD | ||
MBOX1C1 CAS_COUNT_WR | ||
MBOX2C0 CAS_COUNT_RD | ||
MBOX2C1 CAS_COUNT_WR | ||
MBOX3C0 CAS_COUNT_RD | ||
MBOX3C1 CAS_COUNT_WR | ||
MBOX4C0 CAS_COUNT_RD | ||
MBOX4C1 CAS_COUNT_WR | ||
MBOX5C0 CAS_COUNT_RD | ||
MBOX5C1 CAS_COUNT_WR | ||
MBOX6C0 CAS_COUNT_RD | ||
MBOX6C1 CAS_COUNT_WR | ||
MBOX7C0 CAS_COUNT_RD | ||
MBOX7C1 CAS_COUNT_WR | ||
MBOX8C0 CAS_COUNT_RD | ||
MBOX8C1 CAS_COUNT_WR | ||
MBOX9C0 CAS_COUNT_RD | ||
MBOX9C1 CAS_COUNT_WR | ||
MBOX10C0 CAS_COUNT_RD | ||
MBOX10C1 CAS_COUNT_WR | ||
MBOX11C0 CAS_COUNT_RD | ||
MBOX11C1 CAS_COUNT_WR | ||
MBOX12C0 CAS_COUNT_RD | ||
MBOX12C1 CAS_COUNT_WR | ||
MBOX13C0 CAS_COUNT_RD | ||
MBOX13C1 CAS_COUNT_WR | ||
MBOX14C0 CAS_COUNT_RD | ||
MBOX14C1 CAS_COUNT_WR | ||
MBOX15C0 CAS_COUNT_RD | ||
MBOX15C1 CAS_COUNT_WR | ||
HBM0C0 CAS_COUNT_RD | ||
HBM0C1 CAS_COUNT_WR | ||
HBM1C0 CAS_COUNT_RD | ||
HBM1C1 CAS_COUNT_WR | ||
HBM2C0 CAS_COUNT_RD | ||
HBM2C1 CAS_COUNT_WR | ||
HBM3C0 CAS_COUNT_RD | ||
HBM3C1 CAS_COUNT_WR | ||
HBM4C0 CAS_COUNT_RD | ||
HBM4C1 CAS_COUNT_WR | ||
HBM5C0 CAS_COUNT_RD | ||
HBM5C1 CAS_COUNT_WR | ||
HBM6C0 CAS_COUNT_RD | ||
HBM6C1 CAS_COUNT_WR | ||
HBM7C0 CAS_COUNT_RD | ||
HBM7C1 CAS_COUNT_WR | ||
HBM8C0 CAS_COUNT_RD | ||
HBM8C1 CAS_COUNT_WR | ||
HBM9C0 CAS_COUNT_RD | ||
HBM9C1 CAS_COUNT_WR | ||
HBM10C0 CAS_COUNT_RD | ||
HBM10C1 CAS_COUNT_WR | ||
HBM11C0 CAS_COUNT_RD | ||
HBM11C1 CAS_COUNT_WR | ||
HBM12C0 CAS_COUNT_RD | ||
HBM12C1 CAS_COUNT_WR | ||
HBM13C0 CAS_COUNT_RD | ||
HBM13C1 CAS_COUNT_WR | ||
HBM14C0 CAS_COUNT_RD | ||
HBM14C1 CAS_COUNT_WR | ||
HBM15C0 CAS_COUNT_RD | ||
HBM15C1 CAS_COUNT_WR | ||
|
||
|
||
METRICS | ||
Runtime (RDTSC) [s] time | ||
Runtime unhalted [s] FIXC1*inverseClock | ||
Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock | ||
CPI FIXC1/FIXC0 | ||
DDR read bandwidth [MBytes/s] 1.0E-06*(MBOX0C0+MBOX1C0+MBOX2C0+MBOX3C0+MBOX4C0+MBOX5C0+MBOX6C0+MBOX7C0+MBOX8C0+MBOX9C0+MBOX10C0+MBOX11C0+MBOX12C0+MBOX13C0+MBOX14C0+MBOX15C0)*64.0/time | ||
DDR read data volume [GBytes] 1.0E-09*(MBOX0C0+MBOX1C0+MBOX2C0+MBOX3C0+MBOX4C0+MBOX5C0+MBOX6C0+MBOX7C0+MBOX8C0+MBOX9C0+MBOX10C0+MBOX11C0+MBOX12C0+MBOX13C0+MBOX14C0+MBOX15C0)*64.0 | ||
DDR write bandwidth [MBytes/s] 1.0E-06*(MBOX0C1+MBOX1C1+MBOX2C1+MBOX3C1+MBOX4C1+MBOX5C1+MBOX6C1+MBOX7C1+MBOX8C1+MBOX9C1+MBOX10C1+MBOX11C1+MBOX12C1+MBOX13C1+MBOX14C1+MBOX15C1)*64.0/time | ||
DDR write data volume [GBytes] 1.0E-09*(MBOX0C1+MBOX1C1+MBOX2C1+MBOX3C1+MBOX4C1+MBOX5C1+MBOX6C1+MBOX7C1+MBOX8C1+MBOX9C1+MBOX10C1+MBOX11C1+MBOX12C1+MBOX13C1+MBOX14C1+MBOX15C1)*64.0 | ||
DDR bandwidth [MBytes/s] 1.0E-06*(MBOX0C0+MBOX1C0+MBOX2C0+MBOX3C0+MBOX4C0+MBOX5C0+MBOX6C0+MBOX7C0+MBOX8C0+MBOX9C0+MBOX10C0+MBOX11C0+MBOX12C0+MBOX13C0+MBOX14C0+MBOX15C0+MBOX0C1+MBOX1C1+MBOX2C1+MBOX3C1+MBOX4C1+MBOX5C1+MBOX6C1+MBOX7C1+MBOX8C1+MBOX9C1+MBOX10C1+MBOX11C1+MBOX12C1+MBOX13C1+MBOX14C1+MBOX15C1)*64.0/time | ||
DDR data volume [GBytes] 1.0E-09*(MBOX0C0+MBOX1C0+MBOX2C0+MBOX3C0+MBOX4C0+MBOX5C0+MBOX6C0+MBOX7C0+MBOX8C0+MBOX9C0+MBOX10C0+MBOX11C0+MBOX12C0+MBOX13C0+MBOX14C0+MBOX15C0+MBOX0C1+MBOX1C1+MBOX2C1+MBOX3C1+MBOX4C1+MBOX5C1+MBOX6C1+MBOX7C1+MBOX8C1+MBOX9C1+MBOX10C1+MBOX11C1+MBOX12C1+MBOX13C1+MBOX14C1+MBOX15C1)*64.0 | ||
HBM read bandwidth [MBytes/s] 1.0E-06*(HBM0C0+HBM1C0+HBM2C0+HBM3C0+HBM4C0+HBM5C0+HBM6C0+HBM7C0+HBM8C0+HBM9C0+HBM10C0+HBM11C0+HBM12C0+HBM13C0+HBM14C0+HBM15C0)*64.0/time | ||
HBM read data volume [GBytes] 1.0E-09*(HBM0C0+HBM1C0+HBM2C0+HBM3C0+HBM4C0+HBM5C0+HBM6C0+HBM7C0+HBM8C0+HBM9C0+HBM10C0+HBM11C0+HBM12C0+HBM13C0+HBM14C0+HBM15C0)*64.0 | ||
HBM write bandwidth [MBytes/s] 1.0E-06*(HBM0C1+HBM1C1+HBM2C1+HBM3C1+HBM4C1+HBM5C1+HBM6C1+HBM7C1+HBM8C1+HBM9C1+HBM10C1+HBM11C1+HBM12C1+HBM13C1+HBM14C1+HBM15C1)*64.0/time | ||
HBM write data volume [GBytes] 1.0E-09*(HBM0C1+HBM1C1+HBM2C1+HBM3C1+HBM4C1+HBM5C1+HBM6C1+HBM7C1+HBM8C1+HBM9C1+HBM10C1+HBM11C1+HBM12C1+HBM13C1+HBM14C1+HBM15C1)*64.0 | ||
HBM bandwidth [MBytes/s] 1.0E-06*(HBM0C0+HBM1C0+HBM2C0+HBM3C0+HBM4C0+HBM5C0+HBM6C0+HBM7C0+HBM8C0+HBM9C0+HBM10C0+HBM11C0+HBM12C0+HBM13C0+HBM14C0+HBM15C0+HBM0C1+HBM1C1+HBM2C1+HBM3C1+HBM4C1+HBM5C1+HBM6C1+HBM7C1+HBM8C1+HBM9C1+HBM10C1+HBM11C1+HBM12C1+HBM13C1+HBM14C1+HBM15C1)*64.0/time | ||
HBM data volume [GBytes] 1.0E-09*(HBM0C0+HBM1C0+HBM2C0+HBM3C0+HBM4C0+HBM5C0+HBM6C0+HBM7C0+HBM8C0+HBM9C0+HBM10C0+HBM11C0+HBM12C0+HBM13C0+HBM14C0+HBM15C0+HBM0C1+HBM1C1+HBM2C1+HBM3C1+HBM4C1+HBM5C1+HBM6C1+HBM7C1+HBM8C1+HBM9C1+HBM10C1+HBM11C1+HBM12C1+HBM13C1+HBM14C1+HBM15C1)*64.0 | ||
|
||
LONG | ||
Formulas: | ||
DDR read bandwidth [MBytes/s] = 1.0E-06*(SUM(MBOX*C0))*64.0/runtime | ||
DDR read data volume [GBytes] = 1.0E-09*(SUM(MBOX*C0))*64.0 | ||
DDR write bandwidth [MBytes/s] = 1.0E-06*(SUM(MBOX*C1))*64.0/runtime | ||
DDR write data volume [GBytes] = 1.0E-09*(SUM(MBOX*C1))*64.0 | ||
DDR bandwidth [MBytes/s] = 1.0E-06*(SUM(MBOX*C0)+SUM(MBOX*C1))*64.0/runtime | ||
DDR data volume [GBytes] = 1.0E-09*(SUM(MBOX*C0)+SUM(MBOX*C1))*64.0 | ||
HBM read bandwidth [MBytes/s] = 1.0E-06*(SUM(HBM*C0))*64.0/runtime | ||
HBM read data volume [GBytes] = 1.0E-09*(SUM(HBM*C0))*64.0 | ||
HBM write bandwidth [MBytes/s] = 1.0E-06*(SUM(HBM*C1))*64.0/runtime | ||
HBM write data volume [GBytes] = 1.0E-09*(SUM(HBM*C1))*64.0 | ||
HBM bandwidth [MBytes/s] = 1.0E-06*(SUM(HBM*C0)+SUM(HBM*C1))*64.0/runtime | ||
HBM data volume [GBytes] = 1.0E-09*(SUM(HBM*C0)+SUM(HBM*C1))*64.0 | ||
-- | ||
Profiling group to measure memory bandwidth drawn by all cores of a socket for DDR | ||
as well as HBM. Since this group is based on Uncore events it is only possible to measure on a | ||
per socket base. Some of the counters may not be available on your system. | ||
Also outputs total data volume transferred from both memory technologies. | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,35 @@ | ||
SHORT Divide unit information | ||
|
||
EVENTSET | ||
FIXC0 INSTR_RETIRED_ANY | ||
FIXC1 CPU_CLK_UNHALTED_CORE | ||
FIXC2 CPU_CLK_UNHALTED_REF | ||
FIXC3 TOPDOWN_SLOTS | ||
PMC0 ARITH_FPDIV_COUNT | ||
PMC1 ARITH_FPDIV_ACTIVE | ||
PMC2 ARITH_IDIV_COUNT | ||
PMC3 ARITH_IDIV_ACTIVE | ||
|
||
|
||
METRICS | ||
Runtime (RDTSC) [s] time | ||
Runtime unhalted [s] FIXC1*inverseClock | ||
Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock | ||
CPI FIXC1/FIXC0 | ||
Number of FP divide ops PMC0 | ||
Avg. FP divide unit usage duration PMC1/PMC0 | ||
Number of INT divide ops PMC2 | ||
Avg. INT divide unit usage duration PMC3/PMC2 | ||
|
||
LONG | ||
Formulas: | ||
Number of FP divide ops = ARITH_FPDIV_COUNT | ||
Avg. FP divide unit usage duration = ARITH_FPDIV_ACTIVE/ARITH_FPDIV_COUNT | ||
Number of INT divide ops = ARITH_IDIV_COUNT | ||
Avg. INT divide unit usage duration = ARITH_IDIV_ACTIVE/ARITH_IDIV_COUNT | ||
- | ||
This performance group measures the average latency of divide operations. | ||
The Intel Sapphire Rapids architecture performs FP and INT divide operations | ||
on different ports (P0 and P1 respectively). | ||
The COUNT events are the ACTIVE event with the edge detect bit set to count only | ||
the activation of the unit. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,41 @@ | ||
SHORT Power and Energy consumption | ||
|
||
EVENTSET | ||
FIXC0 INSTR_RETIRED_ANY | ||
FIXC1 CPU_CLK_UNHALTED_CORE | ||
FIXC2 CPU_CLK_UNHALTED_REF | ||
FIXC3 TOPDOWN_SLOTS | ||
TMP0 TEMP_CORE | ||
PWR0 PWR_PKG_ENERGY | ||
PWR1 PWR_PP0_ENERGY | ||
PWR3 PWR_DRAM_ENERGY | ||
PWR4 PWR_PLATFORM_ENERGY | ||
|
||
|
||
|
||
METRICS | ||
Runtime (RDTSC) [s] time | ||
Runtime unhalted [s] FIXC1*inverseClock | ||
Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock | ||
CPI FIXC1/FIXC0 | ||
Temperature [C] TMP0 | ||
Energy [J] PWR0 | ||
Power [W] PWR0/time | ||
Energy PP0 [J] PWR1 | ||
Power PP0 [W] PWR1/time | ||
Energy DRAM [J] PWR3 | ||
Power DRAM [W] PWR3/time | ||
Energy PLATFORM [J] PWR4 | ||
Power PLATFORM [W] PWR4/time | ||
|
||
LONG | ||
Formulas: | ||
Power = PWR_PKG_ENERGY / time | ||
Power PP0 = PWR_PP0_ENERGY / time | ||
Power DRAM = PWR_DRAM_ENERGY / time | ||
Power PLATFORM = PWR_PLATFORM_ENERGY / time | ||
- | ||
Icelake implements the RAPL interface. This interface enables to | ||
monitor the consumed energy on the package (socket), DRAM and | ||
platform level. | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
SHORT Packed AVX MFLOP/s | ||
|
||
EVENTSET | ||
FIXC0 INSTR_RETIRED_ANY | ||
FIXC1 CPU_CLK_UNHALTED_CORE | ||
FIXC2 CPU_CLK_UNHALTED_REF | ||
FIXC3 TOPDOWN_SLOTS | ||
PMC0 FP_ARITH_INST_RETIRED_256B_PACKED_SINGLE | ||
PMC1 FP_ARITH_INST_RETIRED_256B_PACKED_DOUBLE | ||
PMC2 FP_ARITH_INST_RETIRED_512B_PACKED_SINGLE | ||
PMC3 FP_ARITH_INST_RETIRED_512B_PACKED_DOUBLE | ||
|
||
METRICS | ||
Runtime (RDTSC) [s] time | ||
Runtime unhalted [s] FIXC1*inverseClock | ||
Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock | ||
CPI FIXC1/FIXC0 | ||
Packed SP [MFLOP/s] 1.0E-06*(PMC0*8.0+PMC2*16.0)/time | ||
Packed DP [MFLOP/s] 1.0E-06*(PMC1*4.0+PMC3*8.0)/time | ||
|
||
LONG | ||
Formulas: | ||
Packed SP [MFLOP/s] = 1.0E-06*(FP_ARITH_INST_RETIRED_256B_PACKED_SINGLE*8+FP_ARITH_INST_RETIRED_512B_PACKED_SINGLE*16)/runtime | ||
Packed DP [MFLOP/s] = 1.0E-06*(FP_ARITH_INST_RETIRED_256B_PACKED_DOUBLE*4+FP_ARITH_INST_RETIRED_512B_PACKED_DOUBLE*8)/runtime | ||
- | ||
Packed 32b AVX FLOPs rates. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,35 @@ | ||
SHORT Double Precision MFLOP/s | ||
|
||
EVENTSET | ||
FIXC0 INSTR_RETIRED_ANY | ||
FIXC1 CPU_CLK_UNHALTED_CORE | ||
FIXC2 CPU_CLK_UNHALTED_REF | ||
FIXC3 TOPDOWN_SLOTS | ||
PMC0 FP_ARITH_INST_RETIRED_128B_PACKED_DOUBLE | ||
PMC1 FP_ARITH_INST_RETIRED_SCALAR_DOUBLE | ||
PMC2 FP_ARITH_INST_RETIRED_256B_PACKED_DOUBLE | ||
PMC3 FP_ARITH_INST_RETIRED_512B_PACKED_DOUBLE | ||
|
||
METRICS | ||
Runtime (RDTSC) [s] time | ||
Runtime unhalted [s] FIXC1*inverseClock | ||
Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock | ||
CPI FIXC1/FIXC0 | ||
DP [MFLOP/s] 1.0E-06*(PMC0*2.0+PMC1+PMC2*4.0+PMC3*8.0)/time | ||
AVX DP [MFLOP/s] 1.0E-06*(PMC2*4.0+PMC3*8.0)/time | ||
AVX512 DP [MFLOP/s] 1.0E-06*(PMC3*8.0)/time | ||
Packed [MUOPS/s] 1.0E-06*(PMC0+PMC2+PMC3)/time | ||
Scalar [MUOPS/s] 1.0E-06*PMC1/time | ||
Vectorization ratio 100*(PMC0+PMC2+PMC3)/(PMC0+PMC1+PMC2+PMC3) | ||
|
||
LONG | ||
Formulas: | ||
DP [MFLOP/s] = 1.0E-06*(FP_ARITH_INST_RETIRED_128B_PACKED_DOUBLE*2+FP_ARITH_INST_RETIRED_SCALAR_DOUBLE+FP_ARITH_INST_RETIRED_256B_PACKED_DOUBLE*4+FP_ARITH_INST_RETIRED_512B_PACKED_DOUBLE*8)/runtime | ||
AVX DP [MFLOP/s] = 1.0E-06*(FP_ARITH_INST_RETIRED_256B_PACKED_DOUBLE*4+FP_ARITH_INST_RETIRED_512B_PACKED_DOUBLE*8)/runtime | ||
AVX512 DP [MFLOP/s] = 1.0E-06*(FP_ARITH_INST_RETIRED_512B_PACKED_DOUBLE*8)/runtime | ||
Packed [MUOPS/s] = 1.0E-06*(FP_ARITH_INST_RETIRED_128B_PACKED_DOUBLE+FP_ARITH_INST_RETIRED_256B_PACKED_DOUBLE+FP_ARITH_INST_RETIRED_512B_PACKED_DOUBLE)/runtime | ||
Scalar [MUOPS/s] = 1.0E-06*FP_ARITH_INST_RETIRED_SCALAR_DOUBLE/runtime | ||
Vectorization ratio = 100*(FP_ARITH_INST_RETIRED_128B_PACKED_DOUBLE+FP_ARITH_INST_RETIRED_256B_PACKED_DOUBLE+FP_ARITH_INST_RETIRED_512B_PACKED_DOUBLE)/(FP_ARITH_INST_RETIRED_SCALAR_DOUBLE+FP_ARITH_INST_RETIRED_128B_PACKED_DOUBLE+FP_ARITH_INST_RETIRED_256B_PACKED_DOUBLE+FP_ARITH_INST_RETIRED_512B_PACKED_DOUBLE) | ||
- | ||
SSE scalar and packed double precision FLOP rates. | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,37 @@ | ||
SHORT Half Precision MFLOP/s | ||
|
||
EVENTSET | ||
FIXC0 INSTR_RETIRED_ANY | ||
FIXC1 CPU_CLK_UNHALTED_CORE | ||
FIXC2 CPU_CLK_UNHALTED_REF | ||
FIXC3 TOPDOWN_SLOTS | ||
PMC0 FP_ARITH_INST_RETIRED2_SCALAR | ||
PMC1 FP_ARITH_INST_RETIRED2_128B_PACKED_HALF | ||
PMC2 FP_ARITH_INST_RETIRED2_256B_PACKED_HALF | ||
PMC3 FP_ARITH_INST_RETIRED2_512B_PACKED_HALF | ||
|
||
METRICS | ||
Runtime (RDTSC) [s] time | ||
Runtime unhalted [s] FIXC1*inverseClock | ||
Clock [MHz] 1.E-06*(FIXC1/FIXC2)/inverseClock | ||
CPI FIXC1/FIXC0 | ||
HP [MFLOP/s] 1.0E-06*(PMC0+PMC1*8.0+PMC2*16.0+PMC3*32.0)/time | ||
128B HP [MFLOP/s] 1.0E-06*(PMC1*8.0)/time | ||
256B HP [MFLOP/s] 1.0E-06*(PMC2*16.0)/time | ||
512B HP [MFLOP/s] 1.0E-06*(PMC3*32.0)/time | ||
Packed [MUOPS/s] 1.0E-06*(PMC1+PMC2+PMC3)/time | ||
Scalar [MUOPS/s] 1.0E-06*PMC0/time | ||
Vectorization ratio 100*(PMC1+PMC2+PMC3)/(PMC0+PMC1+PMC2+PMC3) | ||
|
||
LONG | ||
Formulas: | ||
HP [MFLOP/s] = 1.0E-06*(FP_ARITH_INST_RETIRED2_SCALAR+FP_ARITH_INST_RETIRED2_128B_PACKED_HALF*8+FP_ARITH_INST_RETIRED2_256B_PACKED_HALF*16+FP_ARITH_INST_RETIRED2_512B_PACKED_HALF*32)/runtime | ||
128B HP [MFLOP/s] = 1.0E-06*(FP_ARITH_INST_RETIRED2_128B_PACKED_HALF*8)/runtime | ||
256B HP [MFLOP/s] = 1.0E-06*(FP_ARITH_INST_RETIRED2_256B_PACKED_HALF*8)/runtime | ||
512B HP [MFLOP/s] = 1.0E-06*(FP_ARITH_INST_RETIRED2_512B_PACKED_HALF*8)/runtime | ||
Packed [MUOPS/s] = 1.0E-06*(FP_ARITH_INST_RETIRED2_128B_PACKED_HALF+FP_ARITH_INST_RETIRED2_256B_PACKED_HALF+FP_ARITH_INST_RETIRED2_512B_PACKED_HALF)/runtime | ||
Scalar [MUOPS/s] = 1.0E-06*FP_ARITH_INST_RETIRED2_SCALAR/runtime | ||
Vectorization ratio [%] = 100*(FP_ARITH_INST_RETIRED2_128B_PACKED_HALF+FP_ARITH_INST_RETIRED2_256B_PACKED_HALF+FP_ARITH_INST_RETIRED2_512B_PACKED_HALF)/(FP_ARITH_INST_RETIRED2_SCALAR+FP_ARITH_INST_RETIRED2_128B_PACKED_HALF+FP_ARITH_INST_RETIRED2_256B_PACKED_HALF+FP_ARITH_INST_RETIRED2_512B_PACKED_HALF) | ||
- | ||
Scalar and packed half precision FLOP rates new in Sapphire Rapids. | ||
|
Oops, something went wrong.