[question] Use cycles instead of ref-cycles #1470

Rouzip · 2023-07-11T12:20:57Z

What happened:
Use ref-cycles as CPI factor.
What you expected to happen:
Use cycles as CPI factor.
Environment:

Koordinator version: - v0.6.2
Kubernetes version (use kubectl version): v1.22.5
docker/containerd version: containerd 1.5.0
OS (e.g: cat /etc/os-release): Ubuntu 20.04.4 LTS
Kernel (e.g. uname -a): Linux 5.10.112-11.al8.x86_64 ✨ Add NodeMetric API #1 SMP Tue May 24 16:05:50 CST 2022 x86_64 x86_64 x86_64 GNU/Linux

Anything else we need to know:
CPI is typically measured in cycles rather than ref-cycles as a performance evaluation metric.
References:

saintube · 2023-07-12T01:48:26Z

/area koordlet
/cc @songtao98

songtao98 · 2023-07-18T09:09:23Z

@Rouzip Sorry for being late to answer.

When CPI collector is implemented in koordlet, the reference is CPI2 : CPU performance isolation for shared compute clusters It was clarified in Chapter 3.1 as CPI data is derived from hardware counters,
and is defined as the value of the CPU CLK UNHALTED.REF counter divided by the INSTRUCTIONS RETIRED counter.

Besides, personally speaking, the CPU CLK UNHALTED.REF is not affected by thread frequency changes(CPU's dynamic frequency scaling mechanism). So I'll prefer this one but also want to hear more ideas from you.

/cc @saintube @zwzhang0107

Rouzip · 2023-07-19T08:48:59Z

@Rouzip Sorry for being late to answer.

When CPI collector is implemented in koordlet, the reference is CPI2 : CPU performance isolation for shared compute clusters It was clarified in Chapter 3.1 as CPI data is derived from hardware counters, and is defined as the value of the CPU CLK UNHALTED.REF counter divided by the INSTRUCTIONS RETIRED counter.

Besides, personally speaking, the CPU CLK UNHALTED.REF is not affected by thread frequency changes(CPU's dynamic frequency scaling mechanism). So I'll prefer this one but also want to hear more ideas from you.

/cc @saintube @zwzhang0107

But the retired instructions in constant time doesn't reflect the performance of the application. Under the influence of similar technologies such as Intel SST, the calculation of CPI by the existing algorithm will lead to different results under different frequencies of the same machine, but it does not reflect the corresponding change in the performance of the application.
@songtao98 @saintube

hormes · 2023-07-19T12:49:17Z

But the retired instructions in constant time doesn't reflect the performance of the application. Under the influence of similar technologies such as Intel SST, the calculation of CPI by the existing algorithm will lead to different results under different frequencies of the same machine, but it does not reflect the corresponding change in the performance of the application. @songtao98 @saintube

Can you explain the difference between these two conters in detail, what is the calculation logic of these two indicators inside the CPU when the frequency changes?

Rouzip · 2023-07-19T13:18:50Z

But the retired instructions in constant time doesn't reflect the performance of the application. Under the influence of similar technologies such as Intel SST, the calculation of CPI by the existing algorithm will lead to different results under different frequencies of the same machine, but it does not reflect the corresponding change in the performance of the application. @songtao98 @saintube

Can you explain the difference between these two conters in detail, what is the calculation logic of these two indicators inside the CPU when the frequency changes?

As mentioned by @songtao98 , ref-cycles do not vary based on CPU frequency and can be considered a constant value within a certain period of time. On the other hand, cycles do vary with CPU frequency. Assuming a change in CPU frequency while the program itself remains unchanged, this variation in CPU frequency within the original calculation formula would cause CPI to change. Consequently, it would fail to accurately reflect the scenario where the program itself has not changed. If I have made any mistakes, please help me identify them. Thank you.

zwzhang0107 · 2023-07-20T03:03:51Z

@Rouzip Actually, there are two types of workloads. Let's just say Web Service and Batch Job. They have different characters on loads within a fixed time periods.

Web Service has a stable QPS, which means a (almost)fixed number of instructions to execute in 10s.
Batch Job is a CPU hungerer, which means it has infinite instructions to execute as long as there are enough CPU cycles in 10s.

If we just limit the scope on CPU frequency.
For Web Service, both cycles(or real ref-cycles if I was misunderstanding?) and instruction number will keep the same.
For Batch Job, ref-cycle keeps the same but the instruction number will grow if CPU has higher frequency.

So, maybe
If we want to measure whether the Web Service got interference, we should use cycles/instructions as a metric.
If we want to measure the performance of Batch Job, we should use MIPS as a metric.

zwzhang0107 · 2023-07-20T04:03:28Z

Beside, can we use the ref-cycles counter for calculating the normalized cpu utilization?

Rouzip · 2023-07-21T07:28:17Z

Sorry for the late response. After conducting some experiments, I have found that both cycles and ref-cycles can demonstrate program performance changes. Therefore, read-world experiments may be necessary to determine which metrics produces more accurate result.
However there is a flaw in your code. When calculating CPI using perf_event_open, it is crucial to count ref-cycles and instructions from the same perf group, or time-multiplexed PMU will make a wrong result.

Rouzip · 2023-07-21T07:28:35Z

Beside, can we use the ref-cycles counter for calculating the normalized cpu utilization?

I think ref-cycles is better.

hormes · 2023-07-24T09:00:02Z

As mentioned by @songtao98 , ref-cycles do not vary based on CPU frequency and can be considered a constant value within a certain period of time. On the other hand, cycles do vary with CPU frequency. Assuming a change in CPU frequency while the program itself remains unchanged, this variation in CPU frequency within the original calculation formula would cause CPI to change. Consequently, it would fail to accurately reflect the scenario where the program itself has not changed. If I have made any mistakes, please help me identify them. Thank you.

This is a very interesting question worth discussing. Assume a scenario where Pod A executes instructions that consume memory bandwidth (for example, occupying 50% of the memory access bandwidth), which affects the memory access efficiency of Pod B, causing the latency of Pod B to drop by 10%. If the CPU where Pod B resides increases the operating frequency by 15% due to the turbo mechanism, finally considering the combined impact of frequency and memory access, the performance of Pod B remains the same.

In this scenario, Pod B is observed. Cycles has nothing to do with frequency, so we will see a significant increase (due to the impact of Pod A). The increase and decrease of ref-cycles is not clear, because it is affected by the combined impact of memory access efficiency and frequency improvement?

hormes · 2023-07-25T07:09:27Z

/reopen

koordinator-bot · 2023-07-25T07:09:29Z

@hormes: Reopened this issue.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

hormes · 2023-08-01T02:19:15Z

From the experimental point of view, ref-cycles are not affected by frequency, and cycles are affected by frequency. The larger the turbo frequency, the larger the cycles, which is anti-correlated with the semantics of the CPI response. It seems that ref-cycles is more suitable here. @Rouzip

hormes · 2023-08-01T02:19:53Z

/reopen

koordinator-bot · 2023-08-01T02:19:55Z

@hormes: Reopened this issue.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Rouzip · 2023-08-01T06:45:19Z

From the experimental point of view, ref-cycles are not affected by frequency, and cycles are affected by frequency. The larger the turbo frequency, the larger the cycles, which is anti-correlated with the semantics of the CPI response. It seems that ref-cycles is more suitable here. @Rouzip

Good job! However, the CPI collected by koordinator may not be accurate at present, and perf needs to be used to collect it.

@Rouzip Sorry for being late to answer.

When CPI collector is implemented in koordlet, the reference is CPI2 : CPU performance isolation for shared compute clusters It was clarified in Chapter 3.1 as CPI data is derived from hardware counters, and is defined as the value of the CPU CLK UNHALTED.REF counter divided by the INSTRUCTIONS RETIRED counter.

Besides, personally speaking, the CPU CLK UNHALTED.REF is not affected by thread frequency changes(CPU's dynamic frequency scaling mechanism). So I'll prefer this one but also want to hear more ideas from you.

/cc @saintube @zwzhang0107

In this paper, they use CPI as a symptom, and they get this conclusion by statistical method. So if we want to carefully analyze the underlying reasons reflected by CPI changes (whether it is cycles or ref-cycles), more research is needed.

Rouzip · 2023-08-01T06:57:59Z

Beside, can we use the ref-cycles counter for calculating the normalized cpu utilization?

Sorry for the wrong answer, we should use ref-cycles to calculate CPU utilization.(In emon)

songtao98 · 2023-08-01T07:06:07Z

Thanks for the great job! @hormes

Good job! However, the CPI collected by koordinator may not be accurate at present, and perf needs to be used to collect it.

So the conclusion is your PR #1489 can fix this problem with solving PMU multiplexing.

Sorry for the wrong answer, we should use ref-cycles to calculate CPU utilization.(In emon)

And #1482 should be aborted for using Cycles?

In this paper, they use CPI as a symptom, and they get this conclusion by statistical method. So if we want to carefully analyze the underlying reasons reflected by CPI changes (whether it is cycles or ref-cycles), more research is needed.

And for this, we will work more for better analyzing.

@hormes @Rouzip

hormes · 2023-08-01T08:29:04Z

The larger the turbo frequency, the larger the cycles, which is anti-correlated with the semantics of the CPI response.

I made a mistake here. In fact, the relationship between the two is accurately described by this equation:

$\frac{cycles}{freq} = time = \frac{ref cycles}{base freq}$

Cycles are positively correlated with frequency, and ultimately CPI is positively correlated with frequency, not anticorrelated.

Assuming that the frequency remains unchanged, the direct effect of the program being interfered is to slow down, that is, to run longer time, that is, both cycles and ref cycles can express this result. Therefore, when discussing this issue, the main concern is the situation of frequency change.

For the same QPS online process, when a machine runs at 2.0G main frequency, its cycles are X, and for the same CPU model, when the frequency is 3.0G, the cycles are still X, that is, if the CPI is calculated by cycles, the two CPI for the case is the same, but obviously, the latency seen by the service on the two nodes is different.

Rouzip · 2023-08-01T10:10:34Z

Thanks for the great job! @hormes

Good job! However, the CPI collected by koordinator may not be accurate at present, and perf needs to be used to collect it.

So the conclusion is your PR #1489 can fix this problem with solving PMU multiplexing.

Sorry for the wrong answer, we should use ref-cycles to calculate CPU utilization.(In emon)

And #1482 should be aborted for using Cycles?

In this paper, they use CPI as a symptom, and they get this conclusion by statistical method. So if we want to carefully analyze the underlying reasons reflected by CPI changes (whether it is cycles or ref-cycles), more research is needed.

And for this, we will work more for better analyzing.

@hormes @Rouzip

I will use another pr to fix perf group problem, #1489 is not enough.

Rouzip added the kind/question Support request or question relating to Koordinator label Jul 11, 2023

koordinator-bot bot added the area/koordlet label Jul 12, 2023

bowen-intel mentioned this issue Jul 19, 2023

koordlet: fix cpi compute with CpuCyclesProfiler #1482

Merged

3 tasks

bowen-intel mentioned this issue Jul 24, 2023

fix: reduce PMU multiplexing influence #1489

Merged

3 tasks

koordinator-bot bot closed this as completed in #1489 Jul 25, 2023

koordinator-bot bot reopened this Jul 25, 2023

koordinator-bot bot closed this as completed in #1482 Jul 31, 2023

koordinator-bot bot reopened this Aug 1, 2023

bowen-intel mentioned this issue Aug 17, 2023

koordlet: add libpfm4&perf group #1554

Merged

3 tasks

Rouzip closed this as completed Sep 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[question] Use cycles instead of ref-cycles #1470

[question] Use cycles instead of ref-cycles #1470

Rouzip commented Jul 11, 2023

saintube commented Jul 12, 2023

songtao98 commented Jul 18, 2023 •

edited

Loading

Rouzip commented Jul 19, 2023 •

edited

Loading

hormes commented Jul 19, 2023

Rouzip commented Jul 19, 2023

zwzhang0107 commented Jul 20, 2023 •

edited

Loading

zwzhang0107 commented Jul 20, 2023

Rouzip commented Jul 21, 2023

Rouzip commented Jul 21, 2023 •

edited

Loading

hormes commented Jul 24, 2023

hormes commented Jul 25, 2023

koordinator-bot bot commented Jul 25, 2023

hormes commented Aug 1, 2023

hormes commented Aug 1, 2023

koordinator-bot bot commented Aug 1, 2023

Rouzip commented Aug 1, 2023

Rouzip commented Aug 1, 2023 •

edited

Loading

songtao98 commented Aug 1, 2023

hormes commented Aug 1, 2023

Rouzip commented Aug 1, 2023 •

edited

Loading

[question] Use cycles instead of ref-cycles #1470

[question] Use cycles instead of ref-cycles #1470

Comments

Rouzip commented Jul 11, 2023

saintube commented Jul 12, 2023

songtao98 commented Jul 18, 2023 • edited Loading

Rouzip commented Jul 19, 2023 • edited Loading

hormes commented Jul 19, 2023

Rouzip commented Jul 19, 2023

zwzhang0107 commented Jul 20, 2023 • edited Loading

zwzhang0107 commented Jul 20, 2023

Rouzip commented Jul 21, 2023

Rouzip commented Jul 21, 2023 • edited Loading

hormes commented Jul 24, 2023

hormes commented Jul 25, 2023

koordinator-bot bot commented Jul 25, 2023

hormes commented Aug 1, 2023

hormes commented Aug 1, 2023

koordinator-bot bot commented Aug 1, 2023

Rouzip commented Aug 1, 2023

Rouzip commented Aug 1, 2023 • edited Loading

songtao98 commented Aug 1, 2023

hormes commented Aug 1, 2023

Rouzip commented Aug 1, 2023 • edited Loading

songtao98 commented Jul 18, 2023 •

edited

Loading

Rouzip commented Jul 19, 2023 •

edited

Loading

zwzhang0107 commented Jul 20, 2023 •

edited

Loading

Rouzip commented Jul 21, 2023 •

edited

Loading

Rouzip commented Aug 1, 2023 •

edited

Loading

Rouzip commented Aug 1, 2023 •

edited

Loading