Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use common download logic for agent downloads #2682

Merged
merged 4 commits into from
Oct 18, 2022

Conversation

narrieta
Copy link
Member

@narrieta narrieta commented Oct 7, 2022

This PR completes the refactoring of the download logic to use a common implementation for all downloads.

The downloads on this PR are for the agent's package. I made one change of behavior: now we delete the ZIP file once we expand it (both for the agent and extensions). These ZIPs are not used after we expand them and we have received reports that the current cleanup logic does not always work. I did not remove the cleanup logic, since we still need to cleanup files downloaded by previous agents or the daemon.

Fixes #2489

@@ -152,13 +152,6 @@ def _fetch_manifest(self, manifest_type, name, uris):
except Exception as e:
raise ProtocolError("Failed to retrieve {0} manifest. Error: {1}".format(manifest_type, ustr(e)))

def download_extension(self, uris, destination, on_downloaded=lambda: True):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initially I added this method here for consistency with fetch_extension_manifest above, but it really doesn't quite fit here so I'm removing it.

@@ -604,35 +606,39 @@ def hgap_download(uri):

return self._download_with_fallback_channel(download_type, uris, direct_download=direct_download, hgap_download=hgap_download)

def download_extension(self, uris, destination, use_verify_header, on_downloaded=lambda: True):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renamed this method since it can be used to download the agent as well; also, now this method can only be used to download ZIPs (we don't download any other type of files) and the expansion of the package is done here, instead of in the caller.

raise
finally:
try:
os.remove(target_file)
Copy link
Member Author

@narrieta narrieta Oct 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now we remove the ZIP file after expansion

@codecov
Copy link

codecov bot commented Oct 7, 2022

Codecov Report

Merging #2682 (0bf267a) into develop (faa8b14) will increase coverage by 0.03%.
The diff coverage is 96.29%.

@@             Coverage Diff             @@
##           develop    #2682      +/-   ##
===========================================
+ Coverage    72.02%   72.06%   +0.03%     
===========================================
  Files          103      103              
  Lines        15740    15702      -38     
  Branches      2254     2237      -17     
===========================================
- Hits         11337    11315      -22     
+ Misses        3881     3872       -9     
+ Partials       522      515       -7     
Impacted Files Coverage Δ
azurelinuxagent/common/protocol/goal_state.py 94.76% <ø> (+0.19%) ⬆️
azurelinuxagent/common/protocol/wire.py 78.24% <93.10%> (+0.40%) ⬆️
azurelinuxagent/ga/exthandlers.py 85.67% <100.00%> (+0.02%) ⬆️
azurelinuxagent/ga/update.py 90.70% <100.00%> (+1.33%) ⬆️
azurelinuxagent/common/cgroupapi.py 81.00% <0.00%> (-2.80%) ⬇️
azurelinuxagent/common/protocol/hostplugin.py 87.76% <0.00%> (+0.53%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@@ -1494,18 +1488,20 @@ def _reset_legacy_blacklisted_agents(self):


class GuestAgent(object):
def __init__(self, path=None, pkg=None, is_fast_track_goal_state=False, host=None):
def __init__(self, path, pkg, protocol, is_fast_track_goal_state):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The interface for init is confusing, since it can be used for 2 mutually exclusive purposes depending on the parameters given/omitted. I defined 'from_installed_agent' and 'from_installed_agent' that can be used instead of init with a (hopefully) clearer interface

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for changing this

@@ -1529,26 +1525,12 @@ def __init__(self, path=None, pkg=None, is_fast_track_goal_state=False, host=Non
self._ensure_downloaded()
self._ensure_loaded()
except Exception as e:
if isinstance(e, ResourceGoneError):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ResourceGoneError is already handled in the common download logic, no need to treat it as a special case here.

# encountered while downloading a later version. Errors of type
# socket.error are IOError, so this should provide sufficient
# protection against a large class of I/O operation failures.
if isinstance(e, IOError):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check seems to be leftover code that is not needed. There is a unit test that verifies we do not blacklist the agent on download error so these errors are handled correctly even without this check.

@@ -1637,39 +1632,10 @@ def _ensure_loaded(self):
self._load_error()

def _download(self):
uris_shuffled = self.pkg.uris
Copy link
Member Author

@narrieta narrieta Oct 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now we use the common download logic instead of this (as well as _fetch and _unpack below)

@@ -140,8 +141,6 @@ def test_cleanup_leaves_installed_extensions(self):
exthandlers_handler.run()
exthandlers_handler.report_ext_handlers_status()

self.assertEqual(no_of_exts, TestExtensionCleanup._count_packages(),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ZIP package is now deleted after expansion so all these tests do not need to check the count of packages

@@ -163,7 +163,7 @@ def create_mock_protocol():
yield protocol

@patch("azurelinuxagent.common.protocol.healthservice.HealthService.report_host_plugin_versions")
@patch("azurelinuxagent.ga.update.restutil.http_get")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is sad. We were mocking restutil.http_get via the update module, which is not even used by this test.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I have seen these few places, thanks for correcting.


from azurelinuxagent.common.datacontract import set_properties
from azurelinuxagent.common.exception import HttpError, ResourceGoneError
from azurelinuxagent.common.future import ustr, httpclient
from azurelinuxagent.common.utils import restutil
from tests.ga.test_update import ResponseMock
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are multiple implementations of mock HTTP responses that make things confusing and cause some really weird test failures. Removing the usage of this one in favor of MockHttpResponse, which provides a more complete implementation

tests/ga/test_extension.py Show resolved Hide resolved
@@ -163,7 +163,7 @@ def create_mock_protocol():
yield protocol

@patch("azurelinuxagent.common.protocol.healthservice.HealthService.report_host_plugin_versions")
@patch("azurelinuxagent.ga.update.restutil.http_get")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I have seen these few places, thanks for correcting.

@@ -1494,18 +1488,20 @@ def _reset_legacy_blacklisted_agents(self):


class GuestAgent(object):
def __init__(self, path=None, pkg=None, is_fast_track_goal_state=False, host=None):
def __init__(self, path, pkg, protocol, is_fast_track_goal_state):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for changing this

try:
if os.path.isdir(self.get_agent_dir()):
shutil.rmtree(self.get_agent_dir(), ignore_errors=True)
if os.path.isfile(self.get_agent_pkg_path()):
os.remove(self.get_agent_pkg_path())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like we should keep this too which gives second chance to remove incase first attempt failed or didn't get to the unzip package path.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we still have the cleanup logic that runs periodically.

also, a new goal state would find that the ZIP is there and skip the download (and remove the ZIP after expanding it)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't find the cleanup logic for agent ZIPs. When I check the vm, zips still there. I don't think we have cleanup thing.

I'm more worry about if some reason agent gets partial zip or corrupted zip while downloading and received exception before expanding it. In that case on next goal state, we would skip the download if that zip is still in the vm.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bad download should be covered as well.

The previous logic was trying to delete on download or expand failures. That cleanup is done covered elsewhere: if we begin the download, start creating the file and then downloading the rest of the file fails, the WireClient.stream would delete the ZIP. If the download succeeds, but the ZIP is corrupt and expand fails, then WireClient._try_expand_zip_package would remove it.

I was mistaken in my previous comment: a new goal state does not check for the existence of the ZIP, but rather for the manifest file. If for any reason a bad ZIP is still there from a previous goal state, the new goal state would do the download and overwrite the bad ZIP.

Lastly, during agent update, UpdateHandler._purge_agents removes agent directories and ZIPs that are no longer in the goal state.

All of this maintains the same logic as before, except that some of the resposibilities have moved elsewhere as part of using the common download logic.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sonds like we are covering all the cases, thanks for explaining.

def mock_cgroup_paths(*args, **kwargs):
if args and args[0] == "self":
relative_path = "{0}/{1}".format(cgroupconfigurator.LOGCOLLECTOR_SLICE, logcollector.CGROUPS_UNIT)
return (cgroupconfigurator.LOGCOLLECTOR_SLICE, relative_path)
return SystemdCgroupsApi.get_process_cgroup_relative_paths(*args, **kwargs)

with patch.object(SystemdCgroupsApi, "get_process_cgroup_relative_paths", mock_cgroup_paths):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mocks for the log collector tests need improvement. These 2 tests were failing on Ubuntu 22 because by default there no cgroup controllers mounted. Worked around the issue by moving the mock one level up the call stack.

self.assertFalse(agent.is_blacklisted)
self.assertEqual(agent.is_blacklisted, agent.error.is_blacklisted)

agent.mark_failure(is_fatal=True)
self.assertTrue(agent.is_blacklisted)
self.assertEqual(agent.is_blacklisted, agent.error.is_blacklisted)

@patch("azurelinuxagent.ga.update.GuestAgent._ensure_downloaded")
@patch("azurelinuxagent.ga.update.GuestAgent._ensure_loaded")
def test_resource_gone_error_not_blacklisted(self, mock_loaded, mock_downloaded): # pylint: disable=unused-argument
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ResourceGone error is handled by the common download logic and GuestAgent does not handle it anymore so this test is not needed


@patch("azurelinuxagent.ga.update.GuestAgent._ensure_downloaded")
@patch("azurelinuxagent.ga.update.GuestAgent._ensure_loaded")
def test_ioerror_not_blacklisted(self, mock_loaded, mock_downloaded): # pylint: disable=unused-argument
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was testing a left over behavior that I removed. Removing as well.

'test_download_fail' checks that the agent is not blacklisted on download error

@@ -521,225 +485,115 @@ def test_mark_failure(self, mock_loaded, mock_downloaded): # pylint: disable=un
self.assertEqual(2, agent.error.failure_count)
self.assertTrue(agent.is_blacklisted)

@patch("azurelinuxagent.ga.update.GuestAgent._ensure_downloaded")
@patch("azurelinuxagent.ga.update.GuestAgent._ensure_loaded")
def test_unpack(self, mock_loaded, mock_downloaded): # pylint: disable=unused-argument
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unpack is done in the common logic, the test is no longer needed

self.assertFalse(agent.is_blacklisted)
self.assertEqual(agent.is_blacklisted, agent.error.is_blacklisted)

agent.mark_failure(is_fatal=True)
self.assertTrue(agent.is_blacklisted)
self.assertEqual(agent.is_blacklisted, agent.error.is_blacklisted)

@patch("azurelinuxagent.ga.update.GuestAgent._ensure_downloaded")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

many of those tests were mocking stuff that is not even executed by the test, seems like copy-and-paste implementation. the changes below remove those unneeded mocks

@patch("azurelinuxagent.ga.update.GuestAgent._ensure_loaded")
@patch("azurelinuxagent.ga.update.restutil.http_get")
@patch("azurelinuxagent.ga.update.restutil.http_post")
def test_download_fallback(self, mock_http_post, mock_http_get, mock_loaded, mock_downloaded): # pylint: disable=unused-argument
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was testing the logic that was removed to used the common download implementation

@@ -990,15 +844,6 @@ def test_find_agents_sorts(self):
self.assertTrue(v > a.version)
v = a.version

@patch('azurelinuxagent.common.protocol.wire.WireClient.get_host_plugin')
def test_get_host_plugin_returns_host_for_wireserver(self, mock_get_host):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tested method was removed. Removing test

try:
if os.path.isdir(self.get_agent_dir()):
shutil.rmtree(self.get_agent_dir(), ignore_errors=True)
if os.path.isfile(self.get_agent_pkg_path()):
os.remove(self.get_agent_pkg_path())
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we still have the cleanup logic that runs periodically.

also, a new goal state would find that the ZIP is there and skip the download (and remove the ZIP after expanding it)

tests/ga/test_extension.py Show resolved Hide resolved
@narrieta narrieta merged commit 527443c into Azure:develop Oct 18, 2022
@narrieta narrieta deleted the agent-download branch October 18, 2022 16:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants