Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recreate handler environment file on service startup #1960

Merged
merged 10 commits into from
Aug 14, 2020

Conversation

larohra
Copy link
Contributor

@larohra larohra commented Jul 23, 2020

Description

This PR addresses the following use cases-

  • Recreate HandlerEnvironment.json file on service startup for existing extensions. This is needed in these scenarios -
  1. When the Extension Telemetry Pipeline is enabled, the HandlerEnvironment.json file will reflect that its enabled and the existing extensions can start sending out telemetry right away.
  2. If the feature is disabled in the future, we will remove the EventsFolder option from the HandlerEnvironment.json file to ensure extensions stop sending out telemetry.
  • This PR also deletes all existing events/ directories for extensions incase the feature is disabled. This is to ensure that no extension is writing data to a pre-exisiting directory in the hopes that the agent would send it eventually.

Issue #


PR information

  • The title of the PR is clear and informative.
  • There are a small number of commits, each of which has an informative message. This means that previously merged commits do not appear in the history of the PR. For information on cleaning up the commits in your pull request, see this page.
  • Except for special cases involving multiple contributors, the PR is started from a fork of the main repository, not a branch.
  • If applicable, the PR references the bug/issue that it fixes in the description.
  • New Unit tests were added for the changes made and Travis.CI is passing.

Quality of Code and Contribution Guidelines

@codecov
Copy link

codecov bot commented Jul 23, 2020

Codecov Report

Merging #1960 into develop will increase coverage by 1.01%.
The diff coverage is 81.96%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #1960      +/-   ##
===========================================
+ Coverage    69.67%   70.69%   +1.01%     
===========================================
  Files           85       85              
  Lines        12028    12055      +27     
  Branches      1680     1685       +5     
===========================================
+ Hits          8381     8522     +141     
+ Misses        3273     3152     -121     
- Partials       374      381       +7     
Impacted Files Coverage Δ
azurelinuxagent/ga/update.py 88.13% <69.56%> (+0.54%) ⬆️
azurelinuxagent/ga/exthandlers.py 87.50% <89.47%> (+0.75%) ⬆️
azurelinuxagent/common/protocol/wire.py 76.86% <0.00%> (+0.12%) ⬆️
azurelinuxagent/common/event.py 86.52% <0.00%> (+0.42%) ⬆️
azurelinuxagent/common/conf.py 78.28% <0.00%> (+0.50%) ⬆️
azurelinuxagent/ga/remoteaccess.py 89.00% <0.00%> (+1.00%) ⬆️
azurelinuxagent/common/osutil/default.py 62.06% <0.00%> (+3.73%) ⬆️
azurelinuxagent/common/utils/textutil.py 66.51% <0.00%> (+4.97%) ⬆️
azurelinuxagent/ga/env.py 63.79% <0.00%> (+12.93%) ⬆️
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bf58537...5bb7bea. Read the comment docs.

azurelinuxagent/ga/exthandlers.py Show resolved Hide resolved
@@ -1479,6 +1486,118 @@ def test_telemetry_heartbeat_creates_event(self, patch_add_event, patch_info, *_
self.assertTrue(any(call_args[0] == "[HEARTBEAT] Agent {0} is running as the goal state agent {1}"
for call_args in patch_info.call_args), "The heartbeat was not written to the agent's log")

@contextlib.contextmanager
def _get_update_handler(self, iterations=1, test_data=DATA_FILE):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though I modified an existing test class for these tests, I didnt change any of the existing tests because most of them are calling/mocking specific functions of the UpdateHandler and have the tests build around it (not to mention TestUpdate class has a huge amount of tests in it). The time investment would be huge to refactor these tests because I would have to fist understand what the test is doing and then modify the tests accordingly. In the end I chose not to do that. If you guys feel it should be done as part of this PR, I can start working on it.

HandlerEnvironment.logFolder: self.get_log_dir(),
HandlerEnvironment.configFolder: self.get_conf_dir(),
HandlerEnvironment.statusFolder: self.get_status_dir(),
HandlerEnvironment.heartbeatFile: self.get_heartbeat_file()
}

if is_extension_telemetry_pipeline_enabled():

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to return value of _ENABLE_EXTENSION_TELEMETRY_PIPELINE, which needs a new deployment if changed, correct ? Is it possible to dynamically change the config so that it doesn't require deployment ? (more flexibility). Just thinking

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was meant to be that specific and not that dynamic :)
Basically I can also add the flag to the agent config file but we didnt want to give the customers the option to turn this off as this is a very basic workflow needed by the extensions. If we do find an issue, we will turn it off from our end rather and do a successive deployment. That was the whole idea behind it.

Do you have any other ideas by any chance for making it more dynamic but at the same time not opening up the flag to the customers?

@@ -232,6 +232,16 @@ def get_exthandlers_handler(protocol):
return ExtHandlersHandler(protocol)


def list_agent_lib_directory(skip_agent_package=True):
for name in os.listdir(conf.get_lib_dir()):
path = os.path.join(conf.get_lib_dir(), name)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

call conf.get_lib_dir() once and reuse ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually an atomic operation but good catch, this makes more sense, I'll implement this!

}

if is_extension_telemetry_pipeline_enabled():
handler_env["eventsFolder"] = self.get_extension_events_dir()
handler_env[HandlerEnvironment.eventsFolder] = self.get_extension_events_dir()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks correct

configFolder = "configFolder"
statusFolder = "statusFolder"
heartbeatFile = "heartbeatFile"
eventsFolder = "eventsFolder"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the value of the folder name or property ? Anyways, name used for the folder in Windows is "Events"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the name for the property. The name for the folder would be events (lower case) in linux too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to our document -

/var/log/azure/ExtensionName/events/ and C:\WindowsAzure\Logs\Plugin\ExtensionName \Events

Looks like for Linux we use lower-case and for windows we use Capitalised-case. Do you want to keep this or converge the names? I personally dont mind the difference because both of them are specific to the specific OSes (like in linux the standard is to have lower-case names and in windows the standard is to have Capitalised-case.

handler_instance.create_handler_env()

except Exception:
# Ignore errors if any
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why ignore errors here? maybe log them?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, we should ignore any errors that we get from get_ext_handler_instance_from_path_if_valid function but we should log errors if we're unable to re-create hanlder_env for whatever reason. Will add the change

narrieta
narrieta previously approved these changes Aug 4, 2020
Copy link
Member

@narrieta narrieta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left a few comments/questions

pgombar
pgombar previously approved these changes Aug 5, 2020
Copy link
Contributor

@pgombar pgombar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments.

azurelinuxagent/ga/exthandlers.py Outdated Show resolved Hide resolved
if not is_extension_telemetry_pipeline_enabled():
# If extension telemetry pipeline is disabled, ensure we delete all existing extension events directory
# because the agent will not be listening on those events.
extension_event_dirs = glob.glob(os.path.join(conf.get_ext_log_dir(), "*", EVENTS_DIRECTORY))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is * needed here, I thought the path for extension events is: /var/log/azure/extension_name/events?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I think I got it. The * corresponds to any extension_name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah exactly, its to capture every extension event dir and delete it

for ext_dir in extension_event_dirs:
shutil.rmtree(ext_dir, ignore_errors=True)
except Exception as e:
logger.warn("Error when trying to delete existing Extension events directory. Error: {0}".format(ustr(e)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a thought: do you want to also send out telemetry while the feature is being stabilized to know what the issues are?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, the error specific telemetries are added to the #1918 PR :)

finally:
# Since PropertyMock requires us to mock the type(ClassName).property of the object,
# reverting it back to keep the state of the test clean
type(update_handler).running = True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't you also reset the iterations counter here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This block (finally) is only hit when we exit the block where we initialize the update handler. The whole idea was to re-use the same object without creating a new mock_update_handler for every small run. Does that answer your question? I can explain more offline if needed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it! Thanks.

for ext_dir in expected_events_dirs:
self.assertFalse(os.path.exists(ext_dir), "Extension directory {0} still exists!".format(ext_dir))

def test_it_should_retain_events_directories_if_extension_telemetry_pipeline_enabled(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: "extension events" instead of just "events" in the method name

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@larohra larohra dismissed stale reviews from pgombar and narrieta via b897747 August 12, 2020 23:28
…-env

# Conflicts:
#	tests/ga/test_extension.py
narrieta
narrieta previously approved these changes Aug 13, 2020
@larohra
Copy link
Contributor Author

larohra commented Aug 14, 2020

@larohra larohra merged commit a4d6404 into Azure:develop Aug 14, 2020
@larohra larohra deleted the recreate-handler-env branch August 14, 2020 21:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants