GitHub - bear-metal/numap

Overview

numap is a Linux library dedicated to memory profiling based on hardware performance monitoring unit (PMU). The main objective for the library is to provide high level abstraction for:

Cores load requests sampling
Cores store requests sampling

Supported processors

Intel processors with family_model information

Xeon_X_5570 (06_26)
Xeon_E_7450 (06_29)
I7_870 (06_30)
WESTMERE_EP (06_44)
Xeon_E5_2670 (06_45)
I5_2520 (06_42)
I7_3770 (06_58)
I5_4670 (06_60)
I7_5960X (06_63)
I7_46OOU (06_69)

AMD processors

Not yet supported

Folders Organization

examples: contains some examples showing how to use numap. One of these examples is a memory bandwidth reporting live tool.
include: contains numap headers
src: contains numap implementation files
Makefile: is a Makefile building both the library and the examples

Dependencies

libpfm4
libnuma

Howto: extend numap in ordre to take your processor model into account.

Intro

The goal is to fill up numap's data structures that look like this :

struct archi your_machine_code_name = { .id = /* A VALUE */ | /* A VALUE */ << 8, 
			 .name = "Your description of the machine",
			 .sampling_read_event= " .... ",
			 .sampling_write_event=" .... ",
			 .counting_read_event=" .... ",
			 .counting_write_event=" ... "
};

More precisely, we need to define the correct values for the .sampling_read_event and .sampling_write_event fields.

Once this is done, simply add you_machine_code_name to the list

static struct archi *supported_archs[NB_SUPPORTED_ARCHS] = {
  ...
};

and increment NB_SUPPORTED_ARCHS.

Getting the correct info

On the machine considered, type

less /proc/cpuinfo

This file contains info in the following form:

processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 45
model name      : Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz
stepping        : 7
microcode       : 0x710
cpu MHz         : 1339.121
cache size      : 15360 KB
physical id     : 0
siblings        : 12
core id         : 0
cpu cores       : 6
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx lahf_lm ida arat epb xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid
bogomips        : 4599.76
clflush size    : 64
cache_alignment : 64
address sizes   : 46 bits physical, 48 bits virtual
power management:

Amongst this info, you are interested in the lines "cpu family" and "model". The associated numbers needs to be converted to hexadecimal and collated into the form 0xFAMILY_MODEL.

In our case, we get

06_2D

In the Intel documentations, this will be noted as 06_2DH (H for ... hexa)

Now, open the Intel documentation called "64, IA, 32 Architectures Software Developer Manual", and search for the string FAMILY_MODEL (in our example 06_2D). This brings you, among others into a section of chapter 19. Chapter 19 is called Performance Monitoring Events. In our case, we find that 06_2DH is described in section 19.6 PERFORMANCE MONITORING EVENTS FOR 2ND GENERATION INTEL® CORETM I7-2XXX, INTEL® CORETM I5-2XXX, INTEL® CORETM I3-2XXX PROCESSOR SERIES

In the table provided in this section, find the lines corresponding to the requried info. In particular, in this example, we fill in the values for sampling_read_event and sampling_write_event. We leave out thos for counting_read_event and counting_write_event

.sampling_read_event

For the sampling of memory reads, you need something like:

| CDH | 01H | MEM_TRANS_RETIRED.LOAD_LATENCY  | Randomly sampled loads whose latency is above a user defined threshold. A small fraction of the overall loads are sampled due to randomization. PMC3 only. | Specify threshold in MSR 3F6H. |

.sampling_write_event

| CDH | 02H | MEM_TRANS_RETIRED.PRECISE_STORE  | Sample stores and collect precise store operation via PEBS record. PMC3 only. | See Section 18.9.4.3. |

Filling up numap's struct archi for your machine

On some architectures, the info provided in the general documentation is INCORRECT. To get the correct naming of the sampling_read_event, one can use libpfm (v>4.0.0). Once libpfm is built, it provides us with a binary (libpfm/examples/shoevtinfo) that prints the list of available events.

For our example architecture, we find that the exact latency-fixing parameter is called LATENCY_ABOVE_THRESHOLD instead of LOAD_LATENCY. So be it!

The numap's struct we need to define thus looks like:

struct archi SANDY_BRIDGE_EP = { .id = 0x06 | 0x2D << 8, // 06_45
			 .name   =    "Xeon_E5_2630   based   on    Sandy   Bridge
			 micro-arch - Romley EP decline",
			 .sampling_read_event= "MEM_TRANS_RETIRED:LATENCY_ABOVE_THRESHOLD:ldlat=3",
			 .sampling_write_event="MEM_TRANS_RETIRED:PRECISE_STORE",
			 .counting_read_event= NOT_SUPPORTED
			 .counting_write_event= NOT_SUPPORTED
};

Note, the last 2 lines of the struct are deprecated and can thus be indicated as NOT_SUPPORTED.

Testing

When this is done go to numap's root directory, type

$ cmake
$ make

Then try the example binary in examples:

$ examples/example

This program should output something looking like:

root@taurus-8 ~/numap:-)examples/example

Starting memory read sampling
Memory read sampling results

head = 192200 compared to max = 266240
Thread 0: 4805     samples
Thread 0: 4805     local cache 1                  100.000%
Thread 0: 0        local cache 2                  0.000%
Thread 0: 0        local cache 3                  0.000%
Thread 0: 0        local cache LFB                0.000%
Thread 0: 0        local memory                   0.000%
Thread 0: 0        remote cache or local memory   0.000%
Thread 0: 0        remote memory                  0.000%
Thread 0: 0        unknown l3 miss                0.000%

head = 193240 compared to max = 266240
Thread 1: 4831     samples
Thread 1: 4831     local cache 1                  100.000%
Thread 1: 0        local cache 2                  0.000%
Thread 1: 0        local cache 3                  0.000%
Thread 1: 0        local cache LFB                0.000%
Thread 1: 0        local memory                   0.000%
Thread 1: 0        remote cache or local memory   0.000%
Thread 1: 0        remote memory                  0.000%
Thread 1: 0        unknown l3 miss                0.000%

Starting memory write sampling
Memory write sampling results

head = 262112 compared to max = 266240
Thread 0: 6452     samples
Thread 0: 6442     local cache 1                  99.845%
Thread 0: 0        local cache 2                  0.000%
Thread 0: 0        local cache 3                  0.000%
Thread 0: 0        local cache LFB                0.000%
Thread 0: 0        local memory                   0.000%
Thread 0: 0        remote cache or local memory   0.000%
Thread 0: 0        remote memory                  0.000%
Thread 0: 0        unknown l3 miss                0.000%

head = 262136 compared to max = 266240
Thread 1: 6451     samples
Thread 1: 6436     local cache 1                  99.767%
Thread 1: 0        local cache 2                  0.000%
Thread 1: 0        local cache 3                  0.000%
Thread 1: 0        local cache LFB                0.000%
Thread 1: 0        local memory                   0.000%
Thread 1: 0        remote cache or local memory   0.000%
Thread 1: 0        remote memory                  0.000%
Thread 1: 0        unknown l3 miss                0.000%

Congrats! Numap is set up for your machine.

Don't forget to push your modifications to github of course :)

Name		Name	Last commit message	Last commit date
Latest commit History 57 Commits
cmake		cmake
examples		examples
include		include
src		src
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
pkg-config.pc.cmake		pkg-config.pc.cmake
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Overview

Supported processors

Intel processors with family_model information

AMD processors

Folders Organization

Dependencies

Howto: extend numap in ordre to take your processor model into account.

Intro

Getting the correct info

.sampling_read_event

.sampling_write_event

Filling up numap's struct archi for your machine

Testing

About

Releases

Packages

Languages

bear-metal/numap

Folders and files

Latest commit

History

Repository files navigation

Overview

Supported processors

Intel processors with family_model information

AMD processors

Folders Organization

Dependencies

Howto: extend numap in ordre to take your processor model into account.

Intro

Getting the correct info

.sampling_read_event

.sampling_write_event

Filling up numap's struct archi for your machine

Testing

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages