Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

don't report CPU metrics if it's negative value #2538

Merged
merged 3 commits into from
Mar 18, 2022

Conversation

nagworld9
Copy link
Contributor

@nagworld9 nagworld9 commented Mar 17, 2022

Description

The cpu usage can be negative value if we notice less current cpu ticks from last time and this could happen when service restart between two polls.

Fix is not to report telemetry if cpu usage is negative value.

Issue #


PR information

  • The title of the PR is clear and informative.
  • There are a small number of commits, each of which has an informative message. This means that previously merged commits do not appear in the history of the PR. For information on cleaning up the commits in your pull request, see this page.
  • If applicable, the PR references the bug/issue that it fixes in the description.
  • New Unit tests were added for the changes made

Quality of Code and Contribution Guidelines

@codecov
Copy link

codecov bot commented Mar 17, 2022

Codecov Report

Merging #2538 (74fe486) into develop (f05ec2e) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff            @@
##           develop    #2538   +/-   ##
========================================
  Coverage    71.98%   71.99%           
========================================
  Files          102      102           
  Lines        15197    15202    +5     
  Branches      2404     2406    +2     
========================================
+ Hits         10940    10945    +5     
  Misses        3776     3776           
  Partials       481      481           
Impacted Files Coverage Δ
azurelinuxagent/common/cgroup.py 90.96% <100.00%> (+0.30%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f05ec2e...74fe486. Read the comment docs.

if 'track_throttled_time' in kwargs and kwargs['track_throttled_time']:
tracked.append(MetricValue(MetricsCategory.CPU_CATEGORY, MetricsCounter.THROTTLED_TIME, self.name, self.get_throttled_time()))
throttled_time = self.get_throttled_time()
if cpu_usage >= float(0) and throttled_time >= float(0):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, very likely throttled time could have wrong value too. So, I decided not to report both.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we do not need to check for the 2 conditions, if the single condition cpu_usage < 0 is true, we should not report the 2 metrics

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, if cpu_usage < 0 is true then it won't check second condition. I added second condition for rare case.

MetricValue(MetricsCategory.CPU_CATEGORY, MetricsCounter.PROCESSOR_PERCENT_TIME, self.name, self.get_cpu_usage()),
]
tracked = []
cpu_usage = self.get_cpu_usage()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updating previous/current values happens inside these get functions on every poll, so no explicit logic needed to reset the values.

kevinclark19a
kevinclark19a previously approved these changes Mar 17, 2022
@@ -374,3 +374,44 @@ def test_extension_telemetry_not_sent_for_empty_perf_metrics(self, *args): # py
metrics = CGroupsTelemetry.poll_all_tracked()
self.assertEqual(0, len(metrics))

@patch("azurelinuxagent.common.cgroup.MemoryCgroup.get_max_memory_usage")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

too many mocks here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I followed old test cases. Actually, we don't need to mock memory usage as this test/change only for CPU values. I will remove them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, do you need the mock for is_active?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needed otherwise it won't track cpu metrics on subsequent polls

@patch("azurelinuxagent.common.cgroup.CpuCgroup.get_cpu_usage")
@patch("azurelinuxagent.common.cgroup.CpuCgroup.get_throttled_time")
@patch("azurelinuxagent.common.cgroup.CGroup.is_active")
def test_cgroup_telemetry_should_not_report_cpu_negative_value(self, patch_is_active, path_get_throttled_time, patch_get_cpu_usage, patch_get_memory_usage, patch_get_memory_max_usage, *args):# pylint: disable=unused-argument
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't disable unused-argument, use "_" as the argument name instead

@nagworld9 nagworld9 merged commit ae89acd into Azure:develop Mar 18, 2022
@nagworld9 nagworld9 deleted the TASK-13845269 branch March 18, 2022 21:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants