-
Notifications
You must be signed in to change notification settings - Fork 232
KNL
In contrast to the old Xeon Phi (KNL) the MIC compiler is not needed anymore. You can build LIKWID directly on the Xeon Phi (KNL) and use it.
Intel® Xeon Phi Performance groups
The input file for the events on Intel® Xeon Phi (KNL) can be found here.
- Core-local counters
-
Socket-wide counters
- Energy counters
- Uncore global counters
- Last level cache counters
- Power control unit general-purpose counters
- Memory controller fixed-purpose counters
- Memory controller general-purpose counters
- Embedded memory controller fixed-purpose counters
- Embedded memory controller general-purpose counters
- Ring-to-PCIe counters
- IRP box counters
Since the Core2 microarchitecture, Intel® provides a set of fixed-purpose counters. Each can measure only one specific event.
Counter name | Event name |
---|---|
FIXC0 | INSTR_RETIRED_ANY |
FIXC1 | CPU_CLK_UNHALTED_CORE |
FIXC2 | CPU_CLK_UNHALTED_REF |
Option | Argument | Description | Comment |
---|---|---|---|
anythread | N | Set bit 2+(index*4) in config register | |
kernel | N | Set bit (index*4) in config register |
The Intel® Xeon Phi (KNL) microarchitecture provides 2 general-purpose counters consisting of a config and a counter register.
Counter name | Event name |
---|---|
PMC0 | * |
PMC1 | * |
Option | Argument | Description | Comment |
---|---|---|---|
edgedetect | N | Set bit 18 in config register | |
kernel | N | Set bit 17 in config register | |
anythread | N | Set bit 21 in config register | |
threshold | 8 bit hex value | Set bits 24-31 in config register | |
invert | N | Set bit 23 in config register |
The Intel® Xeon Phi (KNL) microarchitecture provides measuring of offcore events in PMC counters. Therefore the stream of offcore events must be filtered using the OFFCORE_RESPONSE registers. The Intel® KNLKNL microarchitecture has two of those registers. LIKWID defines some events that perform the filtering according to the event name. Although there are many bitmasks possible, LIKWID natively provides only the ones with response type ANY. Own filtering can be applied with the OFFCORE_RESPONSE_0_OPTIONS and OFFCORE_RESPONSE_1_OPTIONS events. Only OFFCORE_RESPONSE_0_OPTIONS can be used to measure average latencies. Only for those events two more counter options are available:
Option | Argument | Description | Comment |
---|---|---|---|
match0 | 16 bit hex value | Input value masked with 0xFFFF and written to bits 0-15 in the OFFCORE_RESPONSE register | Check the Intel® Software Developer System Programming Manual, Vol. 3, Chapter Performance Monitoring and https://download.01.org/perfmon/SLM. |
match0 | 22 bit hex value | Input value is written to bits 16-38 in the OFFCORE_RESPONSE register | Check the Intel® Software Developer System Programming Manual, Vol. 3, Chapter Performance Monitoring and https://download.01.org/perfmon/SLM. |
The Intel® Xeon Phi (KNL) microarchitecture provides one register for the current core temperature.
Counter name | Event name |
---|---|
TMP0 | TEMP_CORE |
The Intel® Xeon Phi (KNL) microarchitecture provides measurements of the current energy consumption through the RAPL interface.
Counter name | Event name |
---|---|
PWR0 | PWR_PKG_ENERGY |
PWR1 | PWR_PP0_ENERGY |
PWR3* | PWR_DRAM_ENERGY |
The Intel® Xeon Phi (KNL) microarchitecture provides measurements of the management box in the uncore.
The single fixed-purpose counter counts the clock frequency of the clock source of the uncore. The uncore management performance counters are exposed to the operating system through the MSR interface. The name UBOX originates from the Nehalem EX uncore monitoring.
Counter name | Event name |
---|---|
UBOXFIX | UNCORE_CLOCK |
The Intel® Xeon Phi (KNL) microarchitecture provides measurements of the management box in the uncore.
The uncore management performance counters are exposed to the operating system through the MSR interface. The name UBOX originates from the Nehalem EX uncore monitoring.
Counter name | Event name |
---|---|
UBOX0 | * |
UBOX1 | * |
Option | Argument | Operation | Comment |
---|---|---|---|
edgedetect | N | Set bit 18 in config register | |
invert | N | Set bit 23 in config register | |
anythread | N | Set bit 21 in config register |
The Intel® Xeon Phi (KNL) microarchitecture provides measurements for the last level cache segments.
Counter name | Event name |
---|---|
CBOX<0-37>C0 | * |
CBOX<0-37>C1 | * |
CBOX<0-37>C2 | * |
CBOX<0-37>C3 | * |
Option | Argument | Description | Comment |
---|---|---|---|
edgedetect | N | Set bit 18 in config register | |
invert | N | Set bit 23 in config register | |
threshold | 8 bit hex value | Set bits 24-31 in config register | |
opcode | 9 bit hex value | Set bits 9-28 in PERF_UNIT_CTL_1_CHA_<0-37> register | A list of valid opcodes can be found in the Intel® Xeon® Phi Processor Performance Monitoring Reference Manual. |
state | 10 bit hex value | Set bits 17-26 in PERF_UNIT_CTL_CHA_<0-37> register | H: 0x08, E: 0x04, S: 0x02 All other bits reserved. |
tid | 9 bit hex value | Set bits 0-8 in PERF_UNIT_CTL_CHA_<0-37> register and enables TID filtering with bit 19 in config register | 0-2 ThreadID, 3-8 CoreID |
nid | 2 bit hex value | Set bits 0-1 in PERF_UNIT_CTL_1_CHA_<0-37> register | Remote: 0x1 Local: 0x2 |
match0 | 3 bit hex address | Set bits 29-31 in PERF_UNIT_CTL_1_CHA_<0-37> register | C6Opcode: 0x1 NonCohOpcode: 0x2 IsocOpcode: 0x3 |
match1 | 2 bit hex address | Set bits 4-5 in PERF_UNIT_CTL_1_CHA_<0-37> register | Count near memory cache events: 0x1 Count non-near memory cache events: 0x2 |
The Intel® Xeon Phi (KNL) microarchitecture provides an event LLC_LOOKUP which can be filtered with the 'state' option. If no 'state' is set, LIKWID sets the state to 0xE, the default value to measure all lookups.
If the match1 option is not used, bits 4 and 5 in PERF_UNIT_CTL_1_CHA_<0-37> are set.
If no opcode option is set, the bit 3 in PERF_UNIT_CTL_1_CHA_<0-37> is set.
The Intel® Xeon Phi (KNL) microarchitecture provides measurements of the power control unit (PCU) in the uncore.
The PCU performance counters are exposed to the operating system through the MSR interface. The name WBOX originates from the Nehalem EX uncore monitoring.
Counter name | Event name |
---|---|
WBOX0 | * |
WBOX1 | * |
WBOX2 | * |
WBOX3 | * |
Option | Argument | Operation | Comment |
---|---|---|---|
edgedetect | N | Set bit 18 in config register | |
invert | N | Set bit 23 in config register | |
threshold | 5 bit hex value | Set bits 24-28 in config register | |
match0 | 32 bit hex value | Set bits 0-31 in MSR_UNC_PCU_PMON_BOX_FILTER register |
Band0: bits 0-7, Band1: bits 8-15, Band2: bits 16-23, Band3: bits 24-31 |
occupancy | 2 bit hex value | Set bit 14-15 in config register | Cores in C0: 0x1, in C3: 0x2, in C6: 0x3 |
occ_edgedetect | N | Set bit 31 in config register | |
occ_invert | N | Set bit 30 in config register |
The Intel® Xeon Phi (KNL) microarchitecture provides measurements of the integrated Memory Controllers (iMC) in the uncore. The description from Intel®:
The processor implements two Memory Controllers on the processor die. Each memory
controller is capable of controlling three DDR4 memory channels. The MC design is
derived from the EDC (Near-Memory (MCDRAM) controller) and is a sub-set of EDC in
functionality. The main difference from EDC is that the physical interface for MC will be
DDR4 IOs. The processor MC will interface with the rest of the Untile via the mesh
interface (R2Mem -> Ring-to-MC interface). Therefore, the MC agent is broken into
three regions: The front-end ring/mesh interface called the "R2Mem", the core "EDC
controller" logic, and three individual "DDR channel controllers/schedulers."
The integrated Memory Controllers performance counters are exposed to the operating system through PCI interfaces. There are two memory controllers in the system. There are four different PCI devices per memory controller, each covering one memory channel. Each channel has one fixed counter for the DRAM clock. The three channels of the first memory controller are MBOX0-2, the memory controller itself is MBOX3. The three channels of the second memory controller (if available) are named MBOX4-6 and the corresponding controller MBOX7. The name MBOX originates from the Nehalem EX uncore monitoring.
Counter name | Event name |
---|---|
MBOX<0-7>FIX | DRAM_CLOCKTICKS |
The Intel® Xeon Phi (KNL) microarchitecture provides measurements of the integrated Memory Controllers (iMC) in the uncore. The description from Intel®:
The processor implements two Memory Controllers on the processor die. Each memory
controller is capable of controlling three DDR4 memory channels. The MC design is
derived from the EDC (Near-Memory (MCDRAM) controller) and is a sub-set of EDC in
functionality. The main difference from EDC is that the physical interface for MC will be
DDR4 IOs. The processor MC will interface with the rest of the Untile via the mesh
interface (R2Mem -> Ring-to-MC interface). Therefore, the MC agent is broken into
three regions: The front-end ring/mesh interface called the "R2Mem", the core "EDC
controller" logic, and three individual "DDR channel controllers/schedulers."
The integrated Memory Controllers performance counters are exposed to the operating system through PCI interfaces. There may be two memory controllers in the system. There are four different PCI devices per memory controller, three for each memory channel and one for the controller. Each device has four different general-purpose counters. The three channels of the first memory controller are MBOX0-2, the memory controller itself is MBOX3. The three channels of the second memory controller (if available) are named MBOX4-6 and the corresponding controller MBOX7. The name MBOX originates from the Nehalem EX uncore monitoring.
Counter name | Event name |
---|---|
MBOX<0-2,4-6>C0 | MC_DCLK, MC_CAS* |
MBOX<0-2,4-6>C1 | MC_DCLK, MC_CAS* |
MBOX<0-2,4-6>C2 | MC_DCLK, MC_CAS* |
MBOX<0-2,4-6>C3 | MC_DCLK, MC_CAS* |
MBOX<3,7>C0 | MC_UCLK |
MBOX<3,7>C1 | MC_UCLK |
MBOX<3,7>C2 | MC_UCLK |
MBOX<3,7>C3 | MC_UCLK |
Option | Argument | Operation | Comment |
---|---|---|---|
edgedetect | N | Set bit 18 in config register | |
invert | N | Set bit 23 in config register | |
threshold | 8 bit hex value | Set bits 24-31 in config register |
The Intel® Xeon Phi (KNL) microarchitecture provides measurements of the Embedded DRAM Controllers (EDC) performance counters are exposed to the operating system through PCI interfaces. in the uncore. The description from Intel®:
The EDC is the high bandwidth near-memory controller for the processor. EDC refers
to "Embedded DRAM Controller" (i.e. DRAM that is embedded in the processor
package). The technology that is used to implement the embedded DRAM for the
processor is MCDRAM (Multi-Chip (Stacked) DRAM). Eight channels of MCDRAM are
supported by 8 MCDRAM Controllers (EDC). The EDC's are connected to the other
components (clusters) within the processor by the internal mesh interconnect fabric.
The Embedded DRAM Controllers (EDC) performance counters are exposed to the operating system through PCI interfaces. There are eight embedded memory controllers in the system. There are two different PCI devices per memory controller, one for the mesh side (EUBOXFIX)and one on the DRAM side (EDBOXFIX).
NOTE: For the EUBOX counters it is recommended to use the perf_event backend as there are problems when accessing the corresponding PCI devices from user-space. When running the KNL in cache mode and you want to measure the MCDRAM bandwidth, you should use perf_event.
Counter name | Event name |
---|---|
EUBOX<0-7>FIX | EDC_CLOCKTICKS |
EDBOX<0-7>FIX | MCDRAM_CLOCKTICKS |
The Intel® Xeon Phi (KNL) microarchitecture provides measurements of the Embedded DRAM Controllers (EDC) in the uncore, the interface to the MCDRAM. The description from Intel®:
The EDC is the high bandwidth near-memory controller for the processor. EDC refers
to "Embedded DRAM Controller" (i.e. DRAM that is embedded in the processor
package). The technology that is used to implement the embedded DRAM for the
processor is MCDRAM (Multi-Chip (Stacked) DRAM). Eight channels of MCDRAM are
supported by 8 MCDRAM Controllers (EDC). The EDC's are connected to the other
components (clusters) within the processor by the internal mesh interconnect fabric.
The Embedded DRAM Controllers (EDC) performance counters are exposed to the operating system through PCI interfaces. There are eight embedded memory controllers in the system. There are two different PCI devices per memory controller, one for the mesh side (EUBOXC)and one on the DRAM side (EDBOXC). Each device has four different general-purpose counters.
Counter name | Event name |
---|---|
EUBOX<0-7>C0 | EDC_UCLK, EDC_HIT_*, EDC_MISS_* |
EUBOX<0-7>C1 | EDC_UCLK, EDC_HIT_*, EDC_MISS_* |
EUBOX<0-7>C2 | EDC_UCLK, EDC_HIT_*, EDC_MISS_* |
EUBOX<0-7>C3 | EDC_UCLK, EDC_HIT_*, EDC_MISS_* |
EDBOX<0-7>C0 | EDC_ECLK, EDC_WPQ_INSERTS, EDC_RPQ_INSERTS |
EDBOX<0-7>C1 | EDC_ECLK, EDC_WPQ_INSERTS, EDC_RPQ_INSERTS |
EDBOX<0-7>C2 | EDC_ECLK, EDC_WPQ_INSERTS, EDC_RPQ_INSERTS |
EDBOX<0-7>C3 | EDC_ECLK, EDC_WPQ_INSERTS, EDC_RPQ_INSERTS |
Option | Argument | Operation | Comment |
---|---|---|---|
edgedetect | N | Set bit 18 in config register | |
invert | N | Set bit 23 in config register | |
threshold | 8 bit hex value | Set bits 24-31 in config register |
The Intel® Xeon Phi (KNL) microarchitecture provides measurements of the Ring-to-PCIe (R2PCIe) interface in the uncore. The description from Intel®:
The M2PCI is the logic which interfaces the IIO modules to the mesh and includes the mesh stop.
The Ring-to-PCIe performance counters are exposed to the operating system through a PCI interface. Independent of the system's configuration, there is only one Ring-to-PCIe interface per CPU socket.
Counter name | Event name |
---|---|
PBOX0 | * |
PBOX1 | * |
PBOX2 | * |
PBOX3 | * |
Option | Argument | Operation | Comment |
---|---|---|---|
edgedetect | N | Set bit 18 in config register | |
invert | N | Set bit 23 in config register | |
threshold | 8 bit hex value | Set bits 24-31 in config register |
The Intel® Xeon Phi (KNL) microarchitecture provides measurements of the IRP box in the uncore. The description from Intel®:
IRP is responsible for maintaining coherency for IIO traffic that needs to be coherent (e.g. cross-socket P2P).
The IRP box counters are exposed to the operating system through the PCI interface. The IBOX was introduced with the Intel® IvyBridge EP/EN/EX microarchitecture.
Counter name | Event name |
---|---|
IBOX0 | * |
IBOX1 | * |
Option | Argument | Operation | Comment |
---|---|---|---|
edgedetect | N | Set bit 18 in config register | |
invert | N | Set bit 23 in config register | |
threshold | 8 bit hex value | Set bits 24-31 in config register |
-
Applications
-
Config files
-
Daemons
-
Architectures
- Available counter options
- AMD
- Intel
- Intel Atom
- Intel Pentium M
- Intel Core2
- Intel Nehalem
- Intel NehalemEX
- Intel Westmere
- Intel WestmereEX
- Intel Xeon Phi (KNC)
- Intel Silvermont & Airmont
- Intel Goldmont
- Intel SandyBridge
- Intel SandyBridge EP/EN
- Intel IvyBridge
- Intel IvyBridge EP/EN/EX
- Intel Haswell
- Intel Haswell EP/EN/EX
- Intel Broadwell
- Intel Broadwell D
- Intel Broadwell EP
- Intel Skylake
- Intel Coffeelake
- Intel Kabylake
- Intel Xeon Phi (KNL)
- Intel Skylake X
- Intel Cascadelake SP/AP
- Intel Tigerlake
- Intel Icelake
- Intel Icelake X
- Intel SappireRapids
- Intel GraniteRapids
- Intel SierraForrest
- ARM
- POWER
-
Tutorials
-
Miscellaneous
-
Contributing