Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[question] Use cycles instead of ref-cycles #1470

Closed
Rouzip opened this issue Jul 11, 2023 · 20 comments · Fixed by #1482 or #1489
Closed

[question] Use cycles instead of ref-cycles #1470

Rouzip opened this issue Jul 11, 2023 · 20 comments · Fixed by #1482 or #1489
Labels
area/koordlet kind/question Support request or question relating to Koordinator

Comments

@Rouzip
Copy link

Rouzip commented Jul 11, 2023

What happened:
Use ref-cycles as CPI factor.
What you expected to happen:
Use cycles as CPI factor.
Environment:

  • Koordinator version: - v0.6.2
  • Kubernetes version (use kubectl version): v1.22.5
  • docker/containerd version: containerd 1.5.0
  • OS (e.g: cat /etc/os-release): Ubuntu 20.04.4 LTS
  • Kernel (e.g. uname -a): Linux 5.10.112-11.al8.x86_64 ✨ Add NodeMetric API #1 SMP Tue May 24 16:05:50 CST 2022 x86_64 x86_64 x86_64 GNU/Linux

Anything else we need to know:
CPI is typically measured in cycles rather than ref-cycles as a performance evaluation metric.
References:

  1. https://www.brendangregg.com/blog/2014-10-31/cpi-flame-graphs.html
  2. https://www.intel.com/content/www/us/en/docs/vtune-profiler/user-guide/2023-0/cpu-metrics-reference.html
@Rouzip Rouzip added the kind/question Support request or question relating to Koordinator label Jul 11, 2023
@saintube
Copy link
Member

/area koordlet
/cc @songtao98

@songtao98
Copy link
Contributor

songtao98 commented Jul 18, 2023

@Rouzip Sorry for being late to answer.

When CPI collector is implemented in koordlet, the reference is CPI2 : CPU performance isolation for shared compute clusters It was clarified in Chapter 3.1 as CPI data is derived from hardware counters,
and is defined as the value of the CPU CLK UNHALTED.REF counter divided by the INSTRUCTIONS RETIRED counter.

Besides, personally speaking, the CPU CLK UNHALTED.REF is not affected by thread frequency changes(CPU's dynamic frequency scaling mechanism). So I'll prefer this one but also want to hear more ideas from you.

/cc @saintube @zwzhang0107

@Rouzip
Copy link
Author

Rouzip commented Jul 19, 2023

@Rouzip Sorry for being late to answer.

When CPI collector is implemented in koordlet, the reference is CPI2 : CPU performance isolation for shared compute clusters It was clarified in Chapter 3.1 as CPI data is derived from hardware counters, and is defined as the value of the CPU CLK UNHALTED.REF counter divided by the INSTRUCTIONS RETIRED counter.

Besides, personally speaking, the CPU CLK UNHALTED.REF is not affected by thread frequency changes(CPU's dynamic frequency scaling mechanism). So I'll prefer this one but also want to hear more ideas from you.

/cc @saintube @zwzhang0107

But the retired instructions in constant time doesn't reflect the performance of the application. Under the influence of similar technologies such as Intel SST, the calculation of CPI by the existing algorithm will lead to different results under different frequencies of the same machine, but it does not reflect the corresponding change in the performance of the application.
@songtao98 @saintube

@hormes
Copy link
Member

hormes commented Jul 19, 2023

But the retired instructions in constant time doesn't reflect the performance of the application. Under the influence of similar technologies such as Intel SST, the calculation of CPI by the existing algorithm will lead to different results under different frequencies of the same machine, but it does not reflect the corresponding change in the performance of the application. @songtao98 @saintube

Can you explain the difference between these two conters in detail, what is the calculation logic of these two indicators inside the CPU when the frequency changes?

@Rouzip
Copy link
Author

Rouzip commented Jul 19, 2023

But the retired instructions in constant time doesn't reflect the performance of the application. Under the influence of similar technologies such as Intel SST, the calculation of CPI by the existing algorithm will lead to different results under different frequencies of the same machine, but it does not reflect the corresponding change in the performance of the application. @songtao98 @saintube

Can you explain the difference between these two conters in detail, what is the calculation logic of these two indicators inside the CPU when the frequency changes?

As mentioned by @songtao98 , ref-cycles do not vary based on CPU frequency and can be considered a constant value within a certain period of time. On the other hand, cycles do vary with CPU frequency. Assuming a change in CPU frequency while the program itself remains unchanged, this variation in CPU frequency within the original calculation formula would cause CPI to change. Consequently, it would fail to accurately reflect the scenario where the program itself has not changed. If I have made any mistakes, please help me identify them. Thank you.

@zwzhang0107
Copy link
Contributor

zwzhang0107 commented Jul 20, 2023

@Rouzip Actually, there are two types of workloads. Let's just say Web Service and Batch Job. They have different characters on loads within a fixed time periods.

Web Service has a stable QPS, which means a (almost)fixed number of instructions to execute in 10s.
Batch Job is a CPU hungerer, which means it has infinite instructions to execute as long as there are enough CPU cycles in 10s.

If we just limit the scope on CPU frequency.
For Web Service, both cycles(or real ref-cycles if I was misunderstanding?) and instruction number will keep the same.
For Batch Job, ref-cycle keeps the same but the instruction number will grow if CPU has higher frequency.

So, maybe
If we want to measure whether the Web Service got interference, we should use cycles/instructions as a metric.
If we want to measure the performance of Batch Job, we should use MIPS as a metric.

@zwzhang0107
Copy link
Contributor

Beside, can we use the ref-cycles counter for calculating the normalized cpu utilization?

@Rouzip
Copy link
Author

Rouzip commented Jul 21, 2023

Sorry for the late response. After conducting some experiments, I have found that both cycles and ref-cycles can demonstrate program performance changes. Therefore, read-world experiments may be necessary to determine which metrics produces more accurate result.
However there is a flaw in your code. When calculating CPI using perf_event_open, it is crucial to count ref-cycles and instructions from the same perf group, or time-multiplexed PMU will make a wrong result.

@Rouzip
Copy link
Author

Rouzip commented Jul 21, 2023

Beside, can we use the ref-cycles counter for calculating the normalized cpu utilization?

I think ref-cycles is better.

@hormes
Copy link
Member

hormes commented Jul 24, 2023

As mentioned by @songtao98 , ref-cycles do not vary based on CPU frequency and can be considered a constant value within a certain period of time. On the other hand, cycles do vary with CPU frequency. Assuming a change in CPU frequency while the program itself remains unchanged, this variation in CPU frequency within the original calculation formula would cause CPI to change. Consequently, it would fail to accurately reflect the scenario where the program itself has not changed. If I have made any mistakes, please help me identify them. Thank you.

This is a very interesting question worth discussing. Assume a scenario where Pod A executes instructions that consume memory bandwidth (for example, occupying 50% of the memory access bandwidth), which affects the memory access efficiency of Pod B, causing the latency of Pod B to drop by 10%. If the CPU where Pod B resides increases the operating frequency by 15% due to the turbo mechanism, finally considering the combined impact of frequency and memory access, the performance of Pod B remains the same.

In this scenario, Pod B is observed. Cycles has nothing to do with frequency, so we will see a significant increase (due to the impact of Pod A). The increase and decrease of ref-cycles is not clear, because it is affected by the combined impact of memory access efficiency and frequency improvement?

@hormes
Copy link
Member

hormes commented Jul 25, 2023

/reopen

@koordinator-bot koordinator-bot bot reopened this Jul 25, 2023
@koordinator-bot
Copy link

@hormes: Reopened this issue.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@hormes
Copy link
Member

hormes commented Aug 1, 2023

image

From the experimental point of view, ref-cycles are not affected by frequency, and cycles are affected by frequency. The larger the turbo frequency, the larger the cycles, which is anti-correlated with the semantics of the CPI response. It seems that ref-cycles is more suitable here. @Rouzip

@hormes
Copy link
Member

hormes commented Aug 1, 2023

/reopen

@koordinator-bot
Copy link

@hormes: Reopened this issue.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@koordinator-bot koordinator-bot bot reopened this Aug 1, 2023
@Rouzip
Copy link
Author

Rouzip commented Aug 1, 2023

image

From the experimental point of view, ref-cycles are not affected by frequency, and cycles are affected by frequency. The larger the turbo frequency, the larger the cycles, which is anti-correlated with the semantics of the CPI response. It seems that ref-cycles is more suitable here. @Rouzip

Good job! However, the CPI collected by koordinator may not be accurate at present, and perf needs to be used to collect it.

@Rouzip Sorry for being late to answer.

When CPI collector is implemented in koordlet, the reference is CPI2 : CPU performance isolation for shared compute clusters It was clarified in Chapter 3.1 as CPI data is derived from hardware counters, and is defined as the value of the CPU CLK UNHALTED.REF counter divided by the INSTRUCTIONS RETIRED counter.

Besides, personally speaking, the CPU CLK UNHALTED.REF is not affected by thread frequency changes(CPU's dynamic frequency scaling mechanism). So I'll prefer this one but also want to hear more ideas from you.

/cc @saintube @zwzhang0107

In this paper, they use CPI as a symptom, and they get this conclusion by statistical method. So if we want to carefully analyze the underlying reasons reflected by CPI changes (whether it is cycles or ref-cycles), more research is needed.
image

@Rouzip
Copy link
Author

Rouzip commented Aug 1, 2023

Beside, can we use the ref-cycles counter for calculating the normalized cpu utilization?

Sorry for the wrong answer, we should use ref-cycles to calculate CPU utilization.(In emon)

@songtao98
Copy link
Contributor

Thanks for the great job! @hormes

Good job! However, the CPI collected by koordinator may not be accurate at present, and perf needs to be used to collect it.

So the conclusion is your PR #1489 can fix this problem with solving PMU multiplexing.

Sorry for the wrong answer, we should use ref-cycles to calculate CPU utilization.(In emon)

And #1482 should be aborted for using Cycles?

In this paper, they use CPI as a symptom, and they get this conclusion by statistical method. So if we want to carefully analyze the underlying reasons reflected by CPI changes (whether it is cycles or ref-cycles), more research is needed.

And for this, we will work more for better analyzing.

@hormes @Rouzip

@hormes
Copy link
Member

hormes commented Aug 1, 2023

The larger the turbo frequency, the larger the cycles, which is anti-correlated with the semantics of the CPI response.

I made a mistake here. In fact, the relationship between the two is accurately described by this equation:

$\frac{cycles}{freq} = time = \frac{ref cycles}{base freq}$

Cycles are positively correlated with frequency, and ultimately CPI is positively correlated with frequency, not anticorrelated.

Assuming that the frequency remains unchanged, the direct effect of the program being interfered is to slow down, that is, to run longer time, that is, both cycles and ref cycles can express this result. Therefore, when discussing this issue, the main concern is the situation of frequency change.

For the same QPS online process, when a machine runs at 2.0G main frequency, its cycles are X, and for the same CPU model, when the frequency is 3.0G, the cycles are still X, that is, if the CPI is calculated by cycles, the two CPI for the case is the same, but obviously, the latency seen by the service on the two nodes is different.

@Rouzip
Copy link
Author

Rouzip commented Aug 1, 2023

Thanks for the great job! @hormes

Good job! However, the CPI collected by koordinator may not be accurate at present, and perf needs to be used to collect it.

So the conclusion is your PR #1489 can fix this problem with solving PMU multiplexing.

Sorry for the wrong answer, we should use ref-cycles to calculate CPU utilization.(In emon)

And #1482 should be aborted for using Cycles?

In this paper, they use CPI as a symptom, and they get this conclusion by statistical method. So if we want to carefully analyze the underlying reasons reflected by CPI changes (whether it is cycles or ref-cycles), more research is needed.

And for this, we will work more for better analyzing.

@hormes @Rouzip

I will use another pr to fix perf group problem, #1489 is not enough.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/koordlet kind/question Support request or question relating to Koordinator
Projects
None yet
5 participants