Recreate handler environment file on service startup #1960

larohra · 2020-07-23T00:58:03Z

Description

This PR addresses the following use cases-

Recreate HandlerEnvironment.json file on service startup for existing extensions. This is needed in these scenarios -

When the Extension Telemetry Pipeline is enabled, the HandlerEnvironment.json file will reflect that its enabled and the existing extensions can start sending out telemetry right away.
If the feature is disabled in the future, we will remove the EventsFolder option from the HandlerEnvironment.json file to ensure extensions stop sending out telemetry.

This PR also deletes all existing events/ directories for extensions incase the feature is disabled. This is to ensure that no extension is writing data to a pre-exisiting directory in the hopes that the agent would send it eventually.

Issue #

PR information

The title of the PR is clear and informative.
There are a small number of commits, each of which has an informative message. This means that previously merged commits do not appear in the history of the PR. For information on cleaning up the commits in your pull request, see this page.
Except for special cases involving multiple contributors, the PR is started from a fork of the main repository, not a branch.
If applicable, the PR references the bug/issue that it fixes in the description.
New Unit tests were added for the changes made and Travis.CI is passing.

Quality of Code and Contribution Guidelines

I have read the contribution guidelines.

…e feature change

…-handler-env

codecov · 2020-07-23T01:00:05Z

Codecov Report

Merging #1960 into develop will increase coverage by 1.01%.
The diff coverage is 81.96%.

@@             Coverage Diff             @@
##           develop    #1960      +/-   ##
===========================================
+ Coverage    69.67%   70.69%   +1.01%     
===========================================
  Files           85       85              
  Lines        12028    12055      +27     
  Branches      1680     1685       +5     
===========================================
+ Hits          8381     8522     +141     
+ Misses        3273     3152     -121     
- Partials       374      381       +7

Impacted Files	Coverage Δ
azurelinuxagent/ga/update.py	`88.13% <69.56%> (+0.54%)`	⬆️
azurelinuxagent/ga/exthandlers.py	`87.50% <89.47%> (+0.75%)`	⬆️
azurelinuxagent/common/protocol/wire.py	`76.86% <0.00%> (+0.12%)`	⬆️
azurelinuxagent/common/event.py	`86.52% <0.00%> (+0.42%)`	⬆️
azurelinuxagent/common/conf.py	`78.28% <0.00%> (+0.50%)`	⬆️
azurelinuxagent/ga/remoteaccess.py	`89.00% <0.00%> (+1.00%)`	⬆️
azurelinuxagent/common/osutil/default.py	`62.06% <0.00%> (+3.73%)`	⬆️
azurelinuxagent/common/utils/textutil.py	`66.51% <0.00%> (+4.97%)`	⬆️
azurelinuxagent/ga/env.py	`63.79% <0.00%> (+12.93%)`	⬆️
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bf58537...5bb7bea. Read the comment docs.

azurelinuxagent/ga/exthandlers.py

larohra · 2020-07-23T01:02:26Z

tests/ga/test_update.py

@@ -1479,6 +1486,118 @@ def test_telemetry_heartbeat_creates_event(self, patch_add_event, patch_info, *_
        self.assertTrue(any(call_args[0] == "[HEARTBEAT] Agent {0} is running as the goal state agent {1}"
                            for call_args in patch_info.call_args), "The heartbeat was not written to the agent's log")

+    @contextlib.contextmanager
+    def _get_update_handler(self, iterations=1, test_data=DATA_FILE):


Even though I modified an existing test class for these tests, I didnt change any of the existing tests because most of them are calling/mocking specific functions of the UpdateHandler and have the tests build around it (not to mention TestUpdate class has a huge amount of tests in it). The time investment would be huge to refactor these tests because I would have to fist understand what the test is doing and then modify the tests accordingly. In the end I chose not to do that. If you guys feel it should be done as part of this PR, I can start working on it.

vivlingaiah · 2020-07-25T00:27:32Z

azurelinuxagent/ga/exthandlers.py

+                HandlerEnvironment.logFolder: self.get_log_dir(),
+                HandlerEnvironment.configFolder: self.get_conf_dir(),
+                HandlerEnvironment.statusFolder: self.get_status_dir(),
+                HandlerEnvironment.heartbeatFile: self.get_heartbeat_file()
            }

        if is_extension_telemetry_pipeline_enabled():


This seems to return value of _ENABLE_EXTENSION_TELEMETRY_PIPELINE, which needs a new deployment if changed, correct ? Is it possible to dynamically change the config so that it doesn't require deployment ? (more flexibility). Just thinking

It was meant to be that specific and not that dynamic :)
Basically I can also add the flag to the agent config file but we didnt want to give the customers the option to turn this off as this is a very basic workflow needed by the extensions. If we do find an issue, we will turn it off from our end rather and do a successive deployment. That was the whole idea behind it.

Do you have any other ideas by any chance for making it more dynamic but at the same time not opening up the flag to the customers?

vivlingaiah · 2020-07-25T00:27:38Z

azurelinuxagent/ga/exthandlers.py

@@ -232,6 +232,16 @@ def get_exthandlers_handler(protocol):
    return ExtHandlersHandler(protocol)


+def list_agent_lib_directory(skip_agent_package=True):
+    for name in os.listdir(conf.get_lib_dir()):
+        path = os.path.join(conf.get_lib_dir(), name)


call conf.get_lib_dir() once and reuse ?

This is actually an atomic operation but good catch, this makes more sense, I'll implement this!

vivlingaiah · 2020-07-25T00:27:57Z

azurelinuxagent/ga/exthandlers.py

            }

        if is_extension_telemetry_pipeline_enabled():
-            handler_env["eventsFolder"] = self.get_extension_events_dir()
+            handler_env[HandlerEnvironment.eventsFolder] = self.get_extension_events_dir()


this looks correct

vivlingaiah · 2020-07-25T00:32:35Z

azurelinuxagent/ga/exthandlers.py

+    configFolder = "configFolder"
+    statusFolder = "statusFolder"
+    heartbeatFile = "heartbeatFile"
+    eventsFolder = "eventsFolder"


Is this the value of the folder name or property ? Anyways, name used for the folder in Windows is "Events"

This is the name for the property. The name for the folder would be events (lower case) in linux too.

According to our document -

/var/log/azure/ExtensionName/events/ and C:\WindowsAzure\Logs\Plugin\ExtensionName \Events

Looks like for Linux we use lower-case and for windows we use Capitalised-case. Do you want to keep this or converge the names? I personally dont mind the difference because both of them are specific to the specific OSes (like in linux the standard is to have lower-case names and in windows the standard is to have Capitalised-case.

narrieta · 2020-08-04T02:22:21Z

azurelinuxagent/ga/update.py

+                    handler_instance.create_handler_env()
+
+            except Exception:
+                # Ignore errors if any


why ignore errors here? maybe log them?

Good point, we should ignore any errors that we get from get_ext_handler_instance_from_path_if_valid function but we should log errors if we're unable to re-create hanlder_env for whatever reason. Will add the change

azurelinuxagent/ga/exthandlers.py

narrieta

left a few comments/questions

pgombar

Minor comments.

azurelinuxagent/ga/exthandlers.py

pgombar · 2020-08-05T18:12:38Z

azurelinuxagent/ga/update.py

+            if not is_extension_telemetry_pipeline_enabled():
+                # If extension telemetry pipeline is disabled, ensure we delete all existing extension events directory
+                # because the agent will not be listening on those events.
+                extension_event_dirs = glob.glob(os.path.join(conf.get_ext_log_dir(), "*", EVENTS_DIRECTORY))


Why is * needed here, I thought the path for extension events is: /var/log/azure/extension_name/events?

Ah, I think I got it. The * corresponds to any extension_name.

Yeah exactly, its to capture every extension event dir and delete it

pgombar · 2020-08-05T18:14:14Z

azurelinuxagent/ga/update.py

+                for ext_dir in extension_event_dirs:
+                    shutil.rmtree(ext_dir, ignore_errors=True)
+        except Exception as e:
+            logger.warn("Error when trying to delete existing Extension events directory. Error: {0}".format(ustr(e)))


Just a thought: do you want to also send out telemetry while the feature is being stabilized to know what the issues are?

Yup, the error specific telemetries are added to the #1918 PR :)

pgombar · 2020-08-05T18:44:42Z

tests/ga/test_update.py

+                            finally:
+                                # Since PropertyMock requires us to mock the type(ClassName).property of the object,
+                                # reverting it back to keep the state of the test clean
+                                type(update_handler).running = True


Couldn't you also reset the iterations counter here?

This block (finally) is only hit when we exit the block where we initialize the update handler. The whole idea was to re-use the same object without creating a new mock_update_handler for every small run. Does that answer your question? I can explain more offline if needed

Got it! Thanks.

pgombar · 2020-08-05T18:53:11Z

tests/ga/test_update.py

+                    for ext_dir in expected_events_dirs:
+                        self.assertFalse(os.path.exists(ext_dir), "Extension directory {0} still exists!".format(ext_dir))
+
+    def test_it_should_retain_events_directories_if_extension_telemetry_pipeline_enabled(self):


nit: "extension events" instead of just "events" in the method name

…-env # Conflicts: # tests/ga/test_extension.py

larohra · 2020-08-14T00:17:53Z

Successful DCR run - https://tuxgold.corp.microsoft.com/job/AzLinux/job/DungeonCrawler/job/DungeonCrawler.larohra/304/
(has 1 known issue)

larohra added 5 commits July 21, 2020 17:20

Recreate HandlerEnv on service startup and delete events folder.

7a9ebcd

Fix failing tests

5eccd13

Created a new framework for Update Handler and added new tests for th…

2011f33

…e feature change

Code cleanup

ba34a2a

Merge remote-tracking branch 'remotes/upstream/develop' into recreate…

fdd7139

…-handler-env

larohra requested review from kevinclark19a, narrieta, pgombar and ZhidongPeng as code owners July 23, 2020 00:58

larohra commented Jul 23, 2020

View reviewed changes

vivlingaiah reviewed Jul 25, 2020

View reviewed changes

larohra added 2 commits July 31, 2020 10:47

Code cleanup

eb6b25a

Merge branch 'develop' into recreate-handler-env

8ae44d6

narrieta reviewed Aug 4, 2020

View reviewed changes

azurelinuxagent/ga/exthandlers.py Show resolved Hide resolved

narrieta previously approved these changes Aug 4, 2020

View reviewed changes

pgombar previously approved these changes Aug 5, 2020

View reviewed changes

Addressed PR comments

b897747

larohra dismissed stale reviews from pgombar and narrieta via b897747 August 12, 2020 23:28

Merge remote-tracking branch 'upstream/develop' into recreate-handler…

9d616d7

…-env # Conflicts: # tests/ga/test_extension.py

narrieta previously approved these changes Aug 13, 2020

View reviewed changes

Code beautification

5bb7bea

larohra dismissed narrieta’s stale review via 5bb7bea August 13, 2020 22:13

pgombar approved these changes Aug 13, 2020

View reviewed changes

narrieta approved these changes Aug 14, 2020

View reviewed changes

larohra merged commit a4d6404 into Azure:develop Aug 14, 2020

larohra deleted the recreate-handler-env branch August 14, 2020 21:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Recreate handler environment file on service startup #1960

Recreate handler environment file on service startup #1960

larohra commented Jul 23, 2020

codecov bot commented Jul 23, 2020 •

edited

Loading

larohra Jul 23, 2020

vivlingaiah Jul 25, 2020

larohra Jul 31, 2020

vivlingaiah Jul 25, 2020

larohra Jul 31, 2020

vivlingaiah Jul 25, 2020

vivlingaiah Jul 25, 2020

larohra Jul 31, 2020

larohra Jul 31, 2020

narrieta Aug 4, 2020

larohra Aug 12, 2020

narrieta left a comment

pgombar left a comment

pgombar Aug 5, 2020

pgombar Aug 5, 2020

larohra Aug 12, 2020

pgombar Aug 5, 2020

larohra Aug 12, 2020

pgombar Aug 5, 2020

larohra Aug 12, 2020

pgombar Aug 13, 2020

pgombar Aug 5, 2020

larohra Aug 12, 2020

larohra commented Aug 14, 2020

Recreate handler environment file on service startup #1960

Recreate handler environment file on service startup #1960

Conversation

larohra commented Jul 23, 2020

Description

PR information

Quality of Code and Contribution Guidelines

codecov bot commented Jul 23, 2020 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

narrieta left a comment

Choose a reason for hiding this comment

pgombar left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

larohra commented Aug 14, 2020

codecov bot commented Jul 23, 2020 •

edited

Loading