Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix stop tracking extension cgroups #2384

Merged
merged 6 commits into from
Oct 27, 2021
Merged

Conversation

nagworld9
Copy link
Contributor

@nagworld9 nagworld9 commented Oct 21, 2021

Description

The tracking cgroups dict populated with key as cgroup path and value as Cgroup Instance.
Ex: Extension cgroup tracking
CGroupsTelemetry._tracked['/sys/fs/cgroup/cpu,cpuacct/azure.slice/azure-vmextensions.slice/azure-vmextensions-Microsoft.CPlat.Extension.slice'] = CpuCgroup('Microsoft.CPlat.Extension', '/sys/fs/cgroup/cpu,cpuacct/azure.slice/azure-vmextensions.slice/azure-vmextensions-Microsoft.CPlat.Extension.slice')

Fixing the stop tracking extension groups from extension slice to build same cgroup /sys/fs path instead of just extension name which is used to find an entry in dictionary to remove.

Issue #


PR information

  • The title of the PR is clear and informative.
  • There are a small number of commits, each of which has an informative message. This means that previously merged commits do not appear in the history of the PR. For information on cleaning up the commits in your pull request, see this page.
  • If applicable, the PR references the bug/issue that it fixes in the description.
  • New Unit tests were added for the changes made

Quality of Code and Contribution Guidelines

@nagworld9 nagworld9 changed the title fix extension stop tracking cgroups fix stop tracking extension cgroups Oct 21, 2021
@codecov
Copy link

codecov bot commented Oct 21, 2021

Codecov Report

Merging #2384 (a1c3e03) into develop (0da0228) will increase coverage by 0.04%.
The diff coverage is 77.77%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #2384      +/-   ##
===========================================
+ Coverage    71.14%   71.19%   +0.04%     
===========================================
  Files           97       97              
  Lines        14387    14398      +11     
  Branches      2077     2078       +1     
===========================================
+ Hits         10236    10250      +14     
+ Misses        3689     3685       -4     
- Partials       462      463       +1     
Impacted Files Coverage Δ
azurelinuxagent/common/cgroupconfigurator.py 71.20% <71.42%> (+1.24%) ⬆️
azurelinuxagent/common/cgroupapi.py 81.39% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0da0228...a1c3e03. Read the comment docs.

try:
extension_slice_name = SystemdCgroupsApi.get_extension_cgroup_name(extension_name) + ".slice"
cgroup_relative_path = os.path.join('azure.slice/azure-vmextensions.slice',
extension_slice_name + ".slice")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There seems to be a typo, an extra .slice here -
SystemdCgroupsApi.get_extension_cgroup_name(extension_name) + ".slice" + ".slice"

If this is used in multiple other places, maybe make this a function too (something like SystemdCgroupsApi.get_extension_slice_name(extension_name)) or something like that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Updated.

tracked = CGroupsTelemetry._tracked

self.assertFalse(
any(cg for cg in tracked.values() if cg.name == 'Microsoft.CPlat.Extension' and 'cpu' in cg.path),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also add another stricter check for making sure the tracked.values() actually contains what you expect (this would've caught the typo introduced above)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually tracked.values() should not contain whatever I'm checking. The place where I mocked test data was wrong. Thanks for the catch. I fixed all those places.

@narrieta
Copy link
Member

@nagworld9 Please let's try to write a Description of the PR. Thanks.

TODO: Memory tracking
"""
try:
extension_slice_name = SystemdCgroupsApi.get_extension_slice_name(extension_name) + ".slice"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the new name, get_extension_slice_name gives the impression that one should have to append ".slice" to it. Could you append it within the function or rename the function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed

@@ -678,6 +678,24 @@ def stop_tracking_unit_cgroups(self, unit_name):
except Exception as exception:
logger.info("Failed to stop tracking resource usage for the extension service: {0}", ustr(exception))

def stop_tracking_extension_cgroups(self, extension_name):
"""
TODO: Memory tracking
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we be more descriptive about what needs to be done?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed

"""
try:
extension_slice_name = SystemdCgroupsApi.get_extension_slice_name(extension_name) + ".slice"
cgroup_relative_path = os.path.join('azure.slice/azure-vmextensions.slice',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are using "azure.slice" and "azure-vmextensions.slice" as constants? seems to be the same case for the concatenation ("azure.slice/azure-vmextensions.slice")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed

cpu_cgroup_mountpoint, _ = self._cgroups_api.get_cgroup_mount_points()
cpu_cgroup_path = os.path.join(cpu_cgroup_mountpoint, cgroup_relative_path)

if cpu_cgroup_path is not None and os.path.exists(cpu_cgroup_path):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need to check that the path exists before removing the item from the tracked list?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If not exist monitoring thread will remove when polling happens

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think regardless we can remove here so that monitoring thread won't poll for metrics before it removes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed

@nagworld9
Copy link
Contributor Author

@narrieta Added description

Copy link
Contributor

@larohra larohra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 minor comment else LGTM

@@ -39,6 +39,7 @@
Before=slices.target
"""
_VMEXTENSIONS_SLICE = EXTENSION_SLICE_PREFIX + ".slice"
_AZURE_VMEXTENSIONS_SLICE = AZURE_SLICE + "/" + _VMEXTENSIONS_SLICE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: Would os.path.join look a bit cleaner here?

@nagworld9 nagworld9 merged commit 70f9329 into Azure:develop Oct 27, 2021
@nagworld9 nagworld9 deleted the cgroups-test branch October 27, 2021 13:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants