Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix self-update frequency to spread over 24 hrs for regular type and 4 hrs for hotfix #2948

Merged
merged 4 commits into from
Oct 23, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
78 changes: 36 additions & 42 deletions azurelinuxagent/ga/agent_update_handler.py
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ def __init__(self, protocol):
self._is_requested_version_update = True # This is to track the current update type(requested version or self update)
self.update_state = AgentUpdateHandlerUpdateState()

def __should_update_agent(self, requested_version):
def __check_if_agent_update_allowed_and_update_next_upgrade_times(self, requested_version):
narrieta marked this conversation as resolved.
Show resolved Hide resolved
"""
requested version update:
update is allowed once per (as specified in the conf.get_autoupdate_frequency())
Expand All @@ -70,24 +70,20 @@ def __should_update_agent(self, requested_version):
if next_attempt_time > now:
return False
# The time limit elapsed for us to allow updates.
self.update_state.last_attempted_requested_version_update_time = now
return True
else:
next_hotfix_time, next_normal_time = self.__get_next_upgrade_times(now)
upgrade_type = self.__get_agent_upgrade_type(requested_version)

if (upgrade_type == AgentUpgradeType.Hotfix and next_hotfix_time <= now) or (
upgrade_type == AgentUpgradeType.Normal and next_normal_time <= now):
# Update the last upgrade check time even if no new agent is available for upgrade
self.update_state.last_attempted_hotfix_update_time = now
self.update_state.last_attempted_normal_update_time = now
return True
return False

def __update_last_attempt_update_times(self):
now = datetime.datetime.now()
if self._is_requested_version_update:
self.update_state.last_attempted_requested_version_update_time = now
else:
self.update_state.last_attempted_normal_update_time = now
self.update_state.last_attempted_hotfix_update_time = now

def __should_agent_attempt_manifest_download(self):
"""
The agent should attempt to download the manifest if
Expand Down Expand Up @@ -309,6 +305,7 @@ def run(self, goal_state):
if not self.__should_agent_attempt_manifest_download():
return
if conf.get_enable_ga_versioning(): # log the warning only when ga versioning is enabled
# TODO: Need to revisit this msg when version is missing in Goal state. We may need to handle better way to report the error
warn_msg = "Missing requested version in agent family: {0} for incarnation: {1}, fallback to largest version update".format(self._ga_family, self._gs_id)
GAUpdateReportState.report_error_msg = warn_msg
agent_manifest = goal_state.fetch_agent_manifest(agent_family.name, agent_family.uris)
Expand All @@ -322,49 +319,46 @@ def run(self, goal_state):
if "Missing requested version" in GAUpdateReportState.report_error_msg:
GAUpdateReportState.report_error_msg = ""

if requested_version == CURRENT_VERSION:
# Check if an update is allowed and update next upgrade times even if no new agent is available for upgrade
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need the warning at line 308?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if some reason requested version is empty or missing in the goal state. we still want to update with old logic

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't think that is the case, since the check is on the boolean flag, and if the boolean is true, the version cannot be missing

Copy link
Contributor Author

@nagworld9 nagworld9 Oct 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Boolean flag is local to agent. If we enable the GA versioning but the version property is empty in GS for whatever reason (may be issue in CRP), in that case, we log the warning and continue with self-update.

Copy link
Member

@narrieta narrieta Oct 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that behavior is correct. If CRP is indicating that the machine is enrolled to RSM updates, we should never fallback to self updates (unless the CRP flag changes, of course).

If the flag is True and the version is missing, then, as you point out, it may be an issue on CRP. We should then report this error and skip the update.

Copy link
Contributor Author

@nagworld9 nagworld9 Oct 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initially this flag is not there when we implemented. That time we decided to do like this because things are in flux and if issue in crp and fix takes days to reach prod, till then we don't do the update. That's not good for customer without update so we want to continue with self-update and report status with this msg as RSM update has error.

After all that discussion today we think we shouldn't update sure I'll change it. But do you think is this issue in current release?

I'll consider changing as part of RSM stuff.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove the code and add it when it is needed, or comment it out or mark it somehow. Otherwise this may be missed in a future review. Marking it facilitates reviews.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added comment to revisit this.

if not self.__check_if_agent_update_allowed_and_update_next_upgrade_times(requested_version):
return

# Check if an update is allowed
if not self.__should_update_agent(requested_version):
if requested_version == CURRENT_VERSION:
return

if warn_msg != "":
self.__log_event(LogLevel.WARNING, warn_msg)

try:
# Downgrades are not allowed for self-update version
# Added it in try block after agent update timewindow check so that we don't log it too frequently
if not self.__check_if_downgrade_is_requested_and_allowed(requested_version):
return

daemon_version = self.__get_daemon_version_for_update()
if requested_version < daemon_version:
# Don't process the update if the requested version is less than daemon version,
# as historically we don't support downgrades below daemon versions. So daemon will not pickup that requested version rather start with
# installed latest version again. When that happens agent go into loop of downloading the requested version, exiting and start again with same version.
#
raise AgentUpdateError("The Agent received a request to downgrade to version {0}, but downgrading to a version less than "
"the Agent installed on the image ({1}) is not supported. Skipping downgrade.".format(requested_version, daemon_version))

msg = "Goal state {0} is requesting a new agent version {1}, will update the agent before processing the goal state.".format(
self._gs_id, str(requested_version))
self.__log_event(LogLevel.INFO, msg)

agent = self.__download_and_get_agent(goal_state, agent_family, agent_manifest, requested_version)
# Downgrades are not allowed for self-update version
if not self.__check_if_downgrade_is_requested_and_allowed(requested_version):
return

if agent.is_blacklisted or not agent.is_downloaded:
msg = "Downloaded agent version is in bad state : {0} , skipping agent update".format(
str(agent.version))
self.__log_event(LogLevel.WARNING, msg)
return
daemon_version = self.__get_daemon_version_for_update()
if requested_version < daemon_version:
# Don't process the update if the requested version is less than daemon version,
# as historically we don't support downgrades below daemon versions. So daemon will not pickup that requested version rather start with
# installed latest version again. When that happens agent go into loop of downloading the requested version, exiting and start again with same version.
#
raise AgentUpdateError("The Agent received a request to downgrade to version {0}, but downgrading to a version less than "
"the Agent installed on the image ({1}) is not supported. Skipping downgrade.".format(requested_version, daemon_version))

# Todo: Need to update the message when we fix RSM stuff
msg = "Self-update discovered new agent version:{0} in agent manifest for goal state {1}, will update the agent before processing the goal state.".format(
str(requested_version), self._gs_id)
self.__log_event(LogLevel.INFO, msg)

agent = self.__download_and_get_agent(goal_state, agent_family, agent_manifest, requested_version)

if agent.is_blacklisted or not agent.is_downloaded:
msg = "Downloaded agent version is in bad state : {0} , skipping agent update".format(
str(agent.version))
self.__log_event(LogLevel.WARNING, msg)
return

# We delete the directory and the zip package from the filesystem except current version and target version
self.__purge_extra_agents_from_disk(CURRENT_VERSION, known_agents=[agent])
self.__proceed_with_update(requested_version)
# We delete the directory and the zip package from the filesystem except current version and target version
self.__purge_extra_agents_from_disk(CURRENT_VERSION, known_agents=[agent])
self.__proceed_with_update(requested_version)

finally:
self.__update_last_attempt_update_times()

except Exception as err:
if isinstance(err, AgentUpgradeExitException):
Expand Down
2 changes: 1 addition & 1 deletion tests/ga/test_agent_update_handler.py
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,7 @@ def __assert_agent_directories_exist_and_others_dont_exist(self, versions):

def __assert_agent_requested_version_in_goal_state(self, mock_telemetry, inc=1, version="9.9.9.10"):
upgrade_event_msgs = [kwarg['message'] for _, kwarg in mock_telemetry.call_args_list if
'Goal state incarnation_{0} is requesting a new agent version {1}'.format(inc, version) in kwarg['message'] and kwarg[
'discovered new agent version:{0} in agent manifest for goal state incarnation_{1}'.format(version, inc) in kwarg['message'] and kwarg[
'op'] == WALAEventOperation.AgentUpgrade]
self.assertEqual(1, len(upgrade_event_msgs),
"Did not find the event indicating that the agent requested version found. Got: {0}".format(
Expand Down
2 changes: 1 addition & 1 deletion tests/ga/test_update.py
Original file line number Diff line number Diff line change
Expand Up @@ -1593,7 +1593,7 @@ def test_it_should_not_update_if_requested_version_not_found_in_manifest(self):
kwarg['op'] in (WALAEventOperation.AgentUpgrade, WALAEventOperation.Download)]
# This will throw if corresponding message not found so not asserting on that
requested_version_found = next(kwarg for kwarg in agent_msgs if
"Goal state incarnation_1 is requesting a new agent version 5.2.1.0, will update the agent before processing the goal state" in kwarg['message'])
"discovered new agent version:5.2.1.0 in agent manifest for goal state incarnation_1, will update the agent before processing the goal state" in kwarg['message'])
self.assertTrue(requested_version_found['is_success'],
"The requested version found op should be reported as a success")

Expand Down
Loading