Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merging release 2.2.41 into master #1585

Merged
merged 91 commits into from
Jul 17, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
91 commits
Select commit Hold shift + click to select a range
4f7042a
Merge pull request #1478 from Azure/release-2.2.37
vrdmr Mar 4, 2019
3cb14b8
Use 1804-style deprovisioning for all versions >= 18.04 (#1483)
jasonzio Mar 7, 2019
52a3ec0
Updating Travis settings
vrdmr Mar 6, 2019
c9699c6
Merge pull request #1480 from vrdmr/vameru/travis-update
vrdmr Mar 12, 2019
5fe89a8
Added description for Extension Error Codes (#1482)
vrdmr Mar 12, 2019
eac8f39
Make launch_command resilient to cgroups failures (#1484)
pgombar Mar 12, 2019
ec51856
WireServer Certificates parser gracefully exits if format of package …
ashokcsn Mar 13, 2019
3f83843
Removing the requirement of passing cipher-name. Default is more secu…
vrdmr Mar 13, 2019
ca1d395
Merge pull request #1490 from Azure/release-2.2.38
vrdmr Mar 22, 2019
ff01b37
Fix PID tracking for cgroups (#1489)
pgombar Mar 26, 2019
4c7c892
Do not report restart_if warnings as errors (#1498)
narrieta Mar 28, 2019
5f310b5
Setting the host.manifest_uri even if it was not downloaded using host
Apr 12, 2019
4f368b6
Added a unit test to check if the manifest is being set for both case…
Apr 12, 2019
2dc43ce
Suppress 'Errno 2' messages when deleting extensions (#1504)
narrieta Apr 16, 2019
a8e6589
Test code cleanup
larohra Apr 16, 2019
1daa2b9
Removed unusable comments for a cleaner code
larohra Apr 18, 2019
0d52a7d
Modified unit test to follow AAA pattern
larohra Apr 18, 2019
6756a30
Addressed PR comments, added comments to the tests and cleaned the co…
larohra Apr 18, 2019
e1332c8
verbose test comments
larohra Apr 18, 2019
840c0ba
Mocking the return call of mock_host instead of the whole HostPluginP…
larohra Apr 18, 2019
cc648d5
Merge pull request #1510 from larohra/nsg_dcr_hostga_issue
larohra Apr 18, 2019
de02445
Release 2.2.40 (#1516)
pgombar Apr 30, 2019
df97eec
Split cgroups.py into smaller files and rename classes. (#1524)
narrieta May 13, 2019
830c422
Add logging for cert management
mbearup May 16, 2019
ab8f394
Defined API for initialization of cgroups (#1530)
narrieta May 20, 2019
8d02446
API for extension cgroups (#1533)
narrieta May 21, 2019
8309d14
Do not test Python nightly (#1535)
narrieta May 21, 2019
8e537f5
Refactor start_extension_command to CGroupsApi (#1534)
narrieta May 21, 2019
f537cfe
Implement API for systemd-managed cgroups (#1536)
pgombar May 22, 2019
4da18d3
Refactor configurator (#1537)
narrieta May 22, 2019
e7d6acb
Making cgroup telemetry work end to end, with new spec
vrdmr May 26, 2019
7120c52
Fixing the unittests
vrdmr May 28, 2019
f8f32c7
Fixing the telemetry tests for active polling
vrdmr May 28, 2019
8863864
Added unit tests fror CGroupApi (#1539)
narrieta May 29, 2019
4d4d985
Addressing PR comments and cleanup
vrdmr May 30, 2019
b0bfd08
Merge branch 'cgroups' into vameru/cgroup-telemetry
vrdmr May 30, 2019
11b5a75
Fixing float issues for tests
vrdmr May 30, 2019
ee5dfbc
Addressing latest round of comments
vrdmr May 30, 2019
689fdbd
Adding more tests for Metric and CGroupTelemetry class
vrdmr May 31, 2019
a123635
Adding more tests to check state changes and fixing locks for _tracked
vrdmr May 31, 2019
368c71e
Merge pull request #1538 from vrdmr/vameru/cgroup-telemetry
vrdmr May 31, 2019
c6e2952
Adding more unittests to enable cgroup telemetry tests
vrdmr Jun 3, 2019
6e2a02a
CGroupConfigurator unit tests (#1543)
narrieta Jun 3, 2019
c4a3010
Addressing PR comments - part 1
vrdmr Jun 3, 2019
296dc50
Merge branch 'cgroups' into vameru/cgroup-test-extension-with-cgroups…
vrdmr Jun 3, 2019
d397c66
Fix the string decoding issue & added better test realistic cgroup case.
vrdmr Jun 4, 2019
57fa6d9
Merge pull request #1526 from mbearup/develop
vrdmr Jun 4, 2019
128f3c8
Improve error handling in remove cgroups (#1545)
narrieta Jun 4, 2019
070e15f
Fixing unittest runs issue and review comments.
vrdmr Jun 5, 2019
786122c
Added a new unit file for RHEL8 and modified setup file for it
larohra Jun 5, 2019
a2e84c4
Removing reference to Travis
vrdmr Jun 5, 2019
c3d0ab8
nit fixes; add some test for message json
vrdmr Jun 5, 2019
bea5923
Changing the is_supported to enabled.
vrdmr Jun 5, 2019
74e58ee
Modified the get_python_cmd to match with RHEL 8
larohra Jun 5, 2019
71bf281
nit; correcting decorator messages
vrdmr Jun 6, 2019
3a7f220
Added separate bin files for RHEL 8
larohra Jun 6, 2019
468714c
Merge pull request #1541 from vrdmr/vameru/cgroup-test-extension-with…
vrdmr Jun 6, 2019
0dc01e3
Made waagent file executable
Jun 6, 2019
6630ab1
Updated the tests
larohra Jun 10, 2019
4bda31b
Merge branch 'Rhel8Changes' of https://github.com/larohra/WALinuxAgen…
larohra Jun 10, 2019
ea978cd
Added a basic test to test get_python_cmd()
larohra Jun 10, 2019
69022c6
Create unique scopes for extensions
narrieta Jun 10, 2019
09c117b
Merge pull request #1547 from larohra/Rhel8Changes
larohra Jun 11, 2019
f153b56
Use _ as separator for scope name
narrieta Jun 11, 2019
8379bbb
Code review feedback
narrieta Jun 11, 2019
37420aa
Merge pull request #1549 from narrieta/unique-scope
vrdmr Jun 11, 2019
a780d28
Prevent adding duplicates in _tracked data structure.
vrdmr Jun 7, 2019
0767eb9
Change the scope name to have the .scope suffix
vrdmr Jun 7, 2019
5d628b0
nit fixes; correcting the systemd filename and other unittest fixes.
vrdmr Jun 8, 2019
909de2c
is_tracked check only with path now.
vrdmr Jun 10, 2019
994eb17
Fix the test_monitor unittest
vrdmr Jun 10, 2019
74bd5f7
Rebased on cgroups, and fixed unittests.
vrdmr Jun 11, 2019
8194f84
Review comments addressed.
vrdmr Jun 11, 2019
d12a03a
Merge pull request #1548 from vrdmr/vameru/cgroups-bug-fixes
vrdmr Jun 11, 2019
51471d4
Merge remote-tracking branch 'upstream/cgroups' into cgroups
vrdmr Jun 11, 2019
ee8a741
Preventing incorrect metrics to be sent on telemetry
vrdmr Jun 11, 2019
52a5994
Fixing the unittests
vrdmr Jun 11, 2019
bb622b1
Review comments addressed
vrdmr Jun 12, 2019
113fc1b
Addressing review comments; periodic logger expanded
vrdmr Jun 13, 2019
aae9fc9
Refactored periodic logger, and other review comments addressed
vrdmr Jun 13, 2019
f2b7888
Revert "Rhel8 changes"
larohra Jun 13, 2019
3323a88
Merge pull request #1554 from Azure/revert-1547-Rhel8Changes
larohra Jun 13, 2019
338adc0
Simplifying periodic_* interfaces
vrdmr Jun 13, 2019
ab9cffe
Merge pull request #1553 from vrdmr/vameru/remove-empty-perf-metrics
vrdmr Jun 14, 2019
3816e53
Merge pull request #1555 from Azure/cgroups
vrdmr Jun 14, 2019
1e7e63b
Preventing empty extensions to be sent into telemetry (#1557)
vrdmr Jun 14, 2019
9aae8a0
Handle systemd failures when invoking extensions (#1556)
pgombar Jun 15, 2019
8761775
Prevents over-logging in known cases. (#1558)
vrdmr Jun 15, 2019
0f1f1cf
Version update for release 2.2.41
vrdmr Jun 15, 2019
94e873f
Merging release 2.2.41 into master
vrdmr Jul 16, 2019
bc4b721
Fixing the LaunchCommandTestCase
vrdmr Jul 16, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -69,3 +69,6 @@ bin/waagent2.0c

# rope project
.ropeproject/

# mac osx specific files
.DS_Store
4 changes: 0 additions & 4 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,6 @@ env:
# Add SETUPOPTS="check flake8" to enable flake8 checks

matrix:
allow_failures:
- python: nightly

# exclude the default "python" build - we're being specific here...
exclude:
- python:
Expand All @@ -31,7 +28,6 @@ matrix:
--cover-min-percentage=60 --cover-branches
--cover-package=azurelinuxagent --cover-xml"
SETUPOPTS=""
- python: nightly

install:
- pip install -r requirements.txt
Expand Down
371 changes: 371 additions & 0 deletions azurelinuxagent/common/cgroup.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,371 @@
# Copyright 2018 Microsoft Corporation
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# Requires Python 2.6+ and Openssl 1.0+
import errno
import os
import re

from azurelinuxagent.common import logger
from azurelinuxagent.common.exception import CGroupsException
from azurelinuxagent.common.future import ustr
from azurelinuxagent.common.osutil import get_osutil
from azurelinuxagent.common.utils import fileutil

re_user_system_times = re.compile(r'user (\d+)\nsystem (\d+)\n')


class CGroup(object):
@staticmethod
def create(cgroup_path, controller, extension_name):
"""
Factory method to create the correct CGroup.
"""
if controller is "cpu":
return CpuCgroup(extension_name, cgroup_path)
elif controller is "memory":
return MemoryCgroup(extension_name, cgroup_path)

def __init__(self, name, cgroup_path, controller_type):
"""
Initialize _data collection for the Memory controller
:param: name: Name of the CGroup
:param: cgroup_path: Path of the controller
:param: controller_type:
:return:
"""
self.name = name
self.path = cgroup_path
self.controller = controller_type

def _get_cgroup_file(self, file_name):
return os.path.join(self.path, file_name)

def _get_file_contents(self, file_name):
"""
Retrieve the contents to file.

:param str file_name: Name of file within that metric controller
:return: Entire contents of the file
:rtype: str
"""

parameter_file = self._get_cgroup_file(file_name)

try:
return fileutil.read_file(parameter_file)
except Exception:
raise

def _get_parameters(self, parameter_name, first_line_only=False):
"""
Retrieve the values of a parameter from a controller.
Returns a list of values in the file.

:param first_line_only: return only the first line.
:param str parameter_name: Name of file within that metric controller
:return: The first line of the file, without line terminator
:rtype: [str]
"""
result = []
try:
values = self._get_file_contents(parameter_name).splitlines()
result = values[0] if first_line_only else values
except IndexError:
parameter_filename = self._get_cgroup_file(parameter_name)
logger.error("File {0} is empty but should not be".format(parameter_filename))
raise CGroupsException("File {0} is empty but should not be".format(parameter_filename))
except Exception as e:
if isinstance(e, (IOError, OSError)) and e.errno == errno.ENOENT:
raise e
parameter_filename = self._get_cgroup_file(parameter_name)
logger.error("Exception while attempting to read {0}: {1}".format(parameter_filename, ustr(e)))
raise CGroupsException("Exception while attempting to read {0}".format(parameter_filename), e)
return result

def collect(self):
raise NotImplementedError()

def is_active(self):
try:
tasks = self._get_parameters("tasks")
if tasks:
return len(tasks) != 0
except (IOError, OSError) as e:
if e.errno == errno.ENOENT:
# only suppressing file not found exceptions.
pass
else:
logger.periodic_warn(logger.EVERY_HALF_HOUR,
'Could not get list of tasks from "tasks" file in the cgroup: {0}.'
' Internal error: {1}'.format(self.path, ustr(e)))
except CGroupsException as e:
logger.periodic_warn(logger.EVERY_HALF_HOUR,
'Could not get list of tasks from "tasks" file in the cgroup: {0}.'
' Internal error: {1}'.format(self.path, ustr(e)))
return False

return False


class CpuCgroup(CGroup):
def __init__(self, name, cgroup_path):
"""
Initialize _data collection for the Cpu controller. User must call update() before attempting to get
any useful metrics.

:return: CpuCgroup
"""
super(CpuCgroup, self).__init__(name, cgroup_path, "cpu")

self._osutil = get_osutil()
self._current_cpu_total = 0
self._previous_cpu_total = 0
self._current_system_cpu = self._osutil.get_total_cpu_ticks_since_boot()
self._previous_system_cpu = 0

def __str__(self):
return "cgroup: Name: {0}, cgroup_path: {1}; Controller: {2}".format(
self.name, self.path, self.controller
)

def _get_current_cpu_total(self):
"""
Compute the number of USER_HZ of CPU time (user and system) consumed by this cgroup since boot.

:return: int
"""
cpu_total = 0
try:
cpu_stat = self._get_file_contents('cpuacct.stat')
except Exception as e:
if isinstance(e, (IOError, OSError)) and e.errno == errno.ENOENT:
raise e
raise CGroupsException("Exception while attempting to read {0}".format("cpuacct.stat"), e)

if cpu_stat:
m = re_user_system_times.match(cpu_stat)
if m:
cpu_total = int(m.groups()[0]) + int(m.groups()[1])
return cpu_total

def _update_cpu_data(self):
"""
Update all raw _data required to compute metrics of interest. The intent is to call update() once, then
call the various get_*() methods which use this _data, which we've collected exactly once.
"""
self._previous_cpu_total = self._current_cpu_total
self._previous_system_cpu = self._current_system_cpu
self._current_cpu_total = self._get_current_cpu_total()
self._current_system_cpu = self._osutil.get_total_cpu_ticks_since_boot()

def _get_cpu_percent(self):
"""
Compute the percent CPU time used by this cgroup over the elapsed time since the last time this instance was
update()ed. If the cgroup fully consumed 2 cores on a 4 core system, return 200.

:return: CPU usage in percent of a single core
:rtype: float
"""
cpu_delta = self._current_cpu_total - self._previous_cpu_total
system_delta = max(1, self._current_system_cpu - self._previous_system_cpu)

return round(float(cpu_delta * self._osutil.get_processor_cores() * 100) / float(system_delta), 3)

def collect(self):
"""
Collect and return a list of all cpu metrics. If no metrics are collected, return an empty list.

:rtype: [(str, str, float)]
"""
self._update_cpu_data()
usage = self._get_cpu_percent()
return [CollectedMetrics("cpu", "% Processor Time", usage)]


class MemoryCgroup(CGroup):
def __init__(self, name, cgroup_path):
"""
Initialize _data collection for the Memory controller

:return: MemoryCgroup
"""
super(MemoryCgroup, self).__init__(name, cgroup_path, "memory")

def __str__(self):
return "cgroup: Name: {0}, cgroup_path: {1}; Controller: {2}".format(
self.name, self.path, self.controller
)

def _get_memory_usage(self):
"""
Collect memory.usage_in_bytes from the cgroup.

:return: Memory usage in bytes
:rtype: int
"""
usage = self._get_parameters('memory.usage_in_bytes', first_line_only=True)

if not usage:
usage = "0"
return int(usage)

def _get_memory_max_usage(self):
"""
Collect memory.usage_in_bytes from the cgroup.

:return: Memory usage in bytes
:rtype: int
"""
usage = self._get_parameters('memory.max_usage_in_bytes', first_line_only=True)
if not usage:
usage = "0"
return int(usage)

def collect(self):
"""
Collect and return a list of all memory metrics

:rtype: [(str, str, float)]
"""
usage = self._get_memory_usage()
max_usage = self._get_memory_max_usage()
return [CollectedMetrics("memory", "Total Memory Usage", usage),
CollectedMetrics("memory", "Max Memory Usage", max_usage)]


class CollectedMetrics(object):
def __init__(self, controller, metric_name, value):
self.controller = controller
self.metric_name = metric_name
self.value = value

#
# TODO: Do we need this code? - For not we'll keep this code. Will remove in the next round.
#
#
# MEMORY_DEFAULT = -1
#
# # percentage of a single core
# DEFAULT_CPU_LIMIT_AGENT = 10
# DEFAULT_CPU_LIMIT_EXT = 40
#
# DEFAULT_MEM_LIMIT_MIN_MB = 256 # mb, applies to agent and extensions
# DEFAULT_MEM_LIMIT_MAX_MB = 512 # mb, applies to agent only
# DEFAULT_MEM_LIMIT_PCT = 15 # percent, applies to extensions
#
# @staticmethod
# def _convert_cpu_limit_to_fraction(value):
# """
# Convert a CPU limit from percent (e.g. 50 meaning 50%) to a decimal fraction (0.50).
# :return: Fraction of one CPU to be made available (e.g. 0.5 means half a core)
# :rtype: float
# """
# try:
# limit = float(value)
# except ValueError:
# raise CGroupsException('CPU Limit must be convertible to a float')
#
# if limit <= float(0) or limit > float(CGroupConfigurator.get_num_cores() * 100):
# raise CGroupsException('CPU Limit must be between 0 and 100 * numCores')
#
# return limit / 100.0
# def set_cpu_limit(self, limit=None):
# """
# Limit this cgroup to a percentage of a single core. limit=10 means 10% of one core; 150 means 150%, which
# is useful only in multi-core systems.
# To limit a cgroup to utilize 10% of a single CPU, use the following commands:
# # echo 10000 > /cgroup/cpu/red/cpu.cfs_quota_us
# # echo 100000 > /cgroup/cpu/red/cpu.cfs_period_us
#
# :param limit:
# """
# if not CGroupConfigurator.enabled():
# return
#
# if limit is None:
# return
#
# if 'cpu' in self.cgroups:
# total_units = float(self.get_parameter('cpu', 'cpu.cfs_period_us'))
# limit_units = int(self._convert_cpu_limit_to_fraction(limit) * total_units)
# cpu_shares_file = self._get_cgroup_file('cpu', 'cpu.cfs_quota_us')
# logger.verbose("writing {0} to {1}".format(limit_units, cpu_shares_file))
# fileutil.write_file(cpu_shares_file, '{0}\n'.format(limit_units))
# else:
# raise CGroupsException("CPU controller not available in this cgroup")
#
# @staticmethod
# def get_num_cores():
# """
# Return the number of CPU cores exposed to this system.
#
# :return: int
# """
# return CGroupConfigurator._osutil.get_processor_cores()
#
# @staticmethod
# def _format_memory_value(unit, limit=None):
# units = {'bytes': 1, 'kilobytes': 1024, 'megabytes': 1024*1024, 'gigabytes': 1024*1024*1024}
# if unit not in units:
# raise CGroupsException("Unit must be one of {0}".format(units.keys()))
# if limit is None:
# value = MEMORY_DEFAULT
# else:
# try:
# limit = float(limit)
# except ValueError:
# raise CGroupsException('Limit must be convertible to a float')
# else:
# value = int(limit * units[unit])
# return value
#
# def set_memory_limit(self, limit=None, unit='megabytes'):
# if 'memory' in self.cgroups:
# value = self._format_memory_value(unit, limit)
# memory_limit_file = self._get_cgroup_file('memory', 'memory.limit_in_bytes')
# logger.verbose("writing {0} to {1}".format(value, memory_limit_file))
# fileutil.write_file(memory_limit_file, '{0}\n'.format(value))
# else:
# raise CGroupsException("Memory controller not available in this cgroup")
#
# class CGroupsLimits(object):
# @staticmethod
# def _get_value_or_default(name, threshold, limit, compute_default):
# return threshold[limit] if threshold and limit in threshold else compute_default(name)
#
# def __init__(self, cgroup_name, threshold=None):
# self.cpu_limit = self._get_value_or_default(cgroup_name, threshold, "cpu", CGroupsLimits.get_default_cpu_limits)
# self.memory_limit = self._get_value_or_default(cgroup_name, threshold, "memory",
# CGroupsLimits.get_default_memory_limits)
#
# @staticmethod
# def get_default_cpu_limits(cgroup_name):
# # default values
# cpu_limit = DEFAULT_CPU_LIMIT_EXT
# if AGENT_CGROUP_NAME.lower() in cgroup_name.lower():
# cpu_limit = DEFAULT_CPU_LIMIT_AGENT
# return cpu_limit
#
# @staticmethod
# def get_default_memory_limits(cgroup_name):
# os_util = get_osutil()
#
# # default values
# mem_limit = max(DEFAULT_MEM_LIMIT_MIN_MB, round(os_util.get_total_mem() * DEFAULT_MEM_LIMIT_PCT / 100, 0))
#
# # agent values
# if AGENT_CGROUP_NAME.lower() in cgroup_name.lower():
# mem_limit = min(DEFAULT_MEM_LIMIT_MAX_MB, mem_limit)
# return mem_limit
Loading