Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Report processes that do not belong to the agent's cgroup #1908

Merged
merged 5 commits into from
Jun 15, 2020

Conversation

narrieta
Copy link
Member

As part of collecting resource usage metrics we now report processes that should not be in the agent's cgroup.

def get_monitor_handler():
return MonitorHandler()


class PollResourceUsageOperation(PeriodicOperation):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved the periodic tasks into their own classes, which can be used/tested independently of the monitor thread

PeriodicOperation("collect_and_send_events", self.collect_and_send_events, self.EVENT_COLLECTION_PERIOD),
PeriodicOperation("send_telemetry_heartbeat", self.send_telemetry_heartbeat, self.TELEMETRY_HEARTBEAT_PERIOD),
PeriodicOperation("poll_telemetry_metrics usage", self.poll_telemetry_metrics, self.CGROUP_TELEMETRY_POLLING_PERIOD),
ReportNetworkErrorsOperation(),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left these 3 operations in the monitor class because I still need to unravel their dependency on the protocol

@codecov
Copy link

codecov bot commented Jun 12, 2020

Codecov Report

Merging #1908 into develop will increase coverage by 0.10%.
The diff coverage is 77.23%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #1908      +/-   ##
===========================================
+ Coverage    69.38%   69.49%   +0.10%     
===========================================
  Files           85       85              
  Lines        11813    11865      +52     
  Branches      1658     1666       +8     
===========================================
+ Hits          8197     8246      +49     
- Misses        3248     3249       +1     
- Partials       368      370       +2     
Impacted Files Coverage Δ
azurelinuxagent/ga/monitor.py 77.51% <67.69%> (+5.89%) ⬆️
azurelinuxagent/common/cgroupconfigurator.py 74.00% <73.07%> (-0.81%) ⬇️
azurelinuxagent/common/cgroup.py 90.54% <100.00%> (+1.06%) ⬆️
azurelinuxagent/common/cgroupapi.py 79.80% <100.00%> (+0.73%) ⬆️
azurelinuxagent/common/cgroupstelemetry.py 82.60% <100.00%> (-4.21%) ⬇️
azurelinuxagent/common/event.py 85.96% <100.00%> (+0.03%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7f8b071...e271b81. Read the comment docs.

larohra
larohra previously approved these changes Jun 15, 2020
Copy link
Contributor

@larohra larohra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

pgombar
pgombar previously approved these changes Jun 15, 2020
Copy link
Contributor

@pgombar pgombar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with minor comments


class MetricsCategory(object):
MEMORY_CATEGORY = "Memory"
PROCESS_CATEGORY = "Process"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't a better name CPU? This metric contains CPU counters, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved that code from elsewhere, but CPU does make more sense. We are not using the existing metric in kusto so I'll go ahead with the rename.

@@ -1,3 +1,4 @@
# -*- coding: utf-8 -*-
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this needed?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, because of this?
├─27519 /usr/bin/python3 -u /usr/sbin/waagent -daemon
└─27547 python3 -u bin/WALinuxAgent-2.2.48.1-py2.7.egg -run-exthandlers

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup

@@ -130,6 +127,21 @@ def raise_exception(*_):
message = args[0]
self.assertIn("A TEST EXCEPTION", message)

def test_get_processes_in_agent_cgroup_should_return_the_processes_within_the_agent_cgroup(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this test different from test_get_processes_in_cgroup_should_return_the_processes_within_the_cgroup in test_cgroupapi.py?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code-wise they are the same, but test 2 different components (configurator vs api)

@narrieta narrieta dismissed stale reviews from pgombar and larohra via bf8b2e1 June 15, 2020 19:27
@narrieta narrieta merged commit 29f5b67 into Azure:develop Jun 15, 2020
@narrieta narrieta deleted the cgroup-processes branch June 15, 2020 21:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants