Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[issue] Optimize CPI collect in perfGroup #1884

Closed
Rouzip opened this issue Jan 31, 2024 · 3 comments · Fixed by #1905
Closed

[issue] Optimize CPI collect in perfGroup #1884

Rouzip opened this issue Jan 31, 2024 · 3 comments · Fixed by #1905
Labels
area/koordlet kind/proposal Create a report to help us improve

Comments

@Rouzip
Copy link

Rouzip commented Jan 31, 2024

What is your proposal:
Refactor CPI collection logic, splitting the collection logic for containers that are unrelated.

Why is this needed:
In mixed deployment scenarios, the combination of multi-core CPUs and a large number of pods can lead to concentrated syscall invocations during CPI collection, causing spikes in CPU usage.

Is there a suggested solution, if so, please add it:
By introducing jitter, the CPI collection logic has been diversified, resulting in a more evenly distributed CPU usage pattern.

@Rouzip Rouzip added the kind/proposal Create a report to help us improve label Jan 31, 2024
@saintube
Copy link
Member

/cc @songtao98 @zwzhang0107

@songtao98
Copy link
Contributor

@Rouzip By introducing jitter, do you refer to add a small delay to the timing of CPI collection events across different container? For this, maybe one simple strategy to divide containers is by QoS or Priority.

P.S. one of my concern is that jitter might affect the precision of CPI data by reflecting container performance at varied times, leading to potential accuracy loss. Anyway, by intuition, this should be acceptable.

@Rouzip Rouzip changed the title [proposal] Optimize CPI collect [issue] Optimize CPI collect Feb 5, 2024
@Rouzip Rouzip changed the title [issue] Optimize CPI collect [issue] Optimize CPI collect in perfGroup Feb 5, 2024
@Rouzip
Copy link
Author

Rouzip commented Feb 5, 2024

@Rouzip By introducing jitter, do you refer to add a small delay to the timing of CPI collection events across different container? For this, maybe one simple strategy to divide containers is by QoS or Priority.

P.S. one of my concern is that jitter might affect the precision of CPI data by reflecting container performance at varied times, leading to potential accuracy loss. Anyway, by intuition, this should be acceptable.

Thanks for your reply. After double check the code, I found that the high spikes of CPU usage reason is in the wrong implement in perfGroup interface. I will fix this issue soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/koordlet kind/proposal Create a report to help us improve
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants