-
Notifications
You must be signed in to change notification settings - Fork 372
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Recreate handler environment file on service startup #1960
Conversation
Codecov Report
@@ Coverage Diff @@
## develop #1960 +/- ##
===========================================
+ Coverage 69.67% 70.69% +1.01%
===========================================
Files 85 85
Lines 12028 12055 +27
Branches 1680 1685 +5
===========================================
+ Hits 8381 8522 +141
+ Misses 3273 3152 -121
- Partials 374 381 +7
Continue to review full report at Codecov.
|
@@ -1479,6 +1486,118 @@ def test_telemetry_heartbeat_creates_event(self, patch_add_event, patch_info, *_ | |||
self.assertTrue(any(call_args[0] == "[HEARTBEAT] Agent {0} is running as the goal state agent {1}" | |||
for call_args in patch_info.call_args), "The heartbeat was not written to the agent's log") | |||
|
|||
@contextlib.contextmanager | |||
def _get_update_handler(self, iterations=1, test_data=DATA_FILE): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even though I modified an existing test class for these tests, I didnt change any of the existing tests because most of them are calling/mocking specific functions of the UpdateHandler and have the tests build around it (not to mention TestUpdate class has a huge amount of tests in it). The time investment would be huge to refactor these tests because I would have to fist understand what the test is doing and then modify the tests accordingly. In the end I chose not to do that. If you guys feel it should be done as part of this PR, I can start working on it.
HandlerEnvironment.logFolder: self.get_log_dir(), | ||
HandlerEnvironment.configFolder: self.get_conf_dir(), | ||
HandlerEnvironment.statusFolder: self.get_status_dir(), | ||
HandlerEnvironment.heartbeatFile: self.get_heartbeat_file() | ||
} | ||
|
||
if is_extension_telemetry_pipeline_enabled(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to return value of _ENABLE_EXTENSION_TELEMETRY_PIPELINE, which needs a new deployment if changed, correct ? Is it possible to dynamically change the config so that it doesn't require deployment ? (more flexibility). Just thinking
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was meant to be that specific and not that dynamic :)
Basically I can also add the flag to the agent config file but we didnt want to give the customers the option to turn this off as this is a very basic workflow needed by the extensions. If we do find an issue, we will turn it off from our end rather and do a successive deployment. That was the whole idea behind it.
Do you have any other ideas by any chance for making it more dynamic but at the same time not opening up the flag to the customers?
azurelinuxagent/ga/exthandlers.py
Outdated
@@ -232,6 +232,16 @@ def get_exthandlers_handler(protocol): | |||
return ExtHandlersHandler(protocol) | |||
|
|||
|
|||
def list_agent_lib_directory(skip_agent_package=True): | |||
for name in os.listdir(conf.get_lib_dir()): | |||
path = os.path.join(conf.get_lib_dir(), name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
call conf.get_lib_dir() once and reuse ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually an atomic operation but good catch, this makes more sense, I'll implement this!
} | ||
|
||
if is_extension_telemetry_pipeline_enabled(): | ||
handler_env["eventsFolder"] = self.get_extension_events_dir() | ||
handler_env[HandlerEnvironment.eventsFolder] = self.get_extension_events_dir() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks correct
configFolder = "configFolder" | ||
statusFolder = "statusFolder" | ||
heartbeatFile = "heartbeatFile" | ||
eventsFolder = "eventsFolder" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the value of the folder name or property ? Anyways, name used for the folder in Windows is "Events"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the name for the property. The name for the folder would be events
(lower case) in linux too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to our document -
/var/log/azure/ExtensionName/events/ and C:\WindowsAzure\Logs\Plugin\ExtensionName \Events
Looks like for Linux we use lower-case and for windows we use Capitalised-case. Do you want to keep this or converge the names? I personally dont mind the difference because both of them are specific to the specific OSes (like in linux the standard is to have lower-case names and in windows the standard is to have Capitalised-case.
azurelinuxagent/ga/update.py
Outdated
handler_instance.create_handler_env() | ||
|
||
except Exception: | ||
# Ignore errors if any |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why ignore errors here? maybe log them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, we should ignore any errors that we get from get_ext_handler_instance_from_path_if_valid
function but we should log errors if we're unable to re-create hanlder_env for whatever reason. Will add the change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
left a few comments/questions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comments.
if not is_extension_telemetry_pipeline_enabled(): | ||
# If extension telemetry pipeline is disabled, ensure we delete all existing extension events directory | ||
# because the agent will not be listening on those events. | ||
extension_event_dirs = glob.glob(os.path.join(conf.get_ext_log_dir(), "*", EVENTS_DIRECTORY)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is *
needed here, I thought the path for extension events is: /var/log/azure/extension_name/events
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I think I got it. The *
corresponds to any extension_name
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah exactly, its to capture every extension event dir and delete it
for ext_dir in extension_event_dirs: | ||
shutil.rmtree(ext_dir, ignore_errors=True) | ||
except Exception as e: | ||
logger.warn("Error when trying to delete existing Extension events directory. Error: {0}".format(ustr(e))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a thought: do you want to also send out telemetry while the feature is being stabilized to know what the issues are?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, the error specific telemetries are added to the #1918 PR :)
finally: | ||
# Since PropertyMock requires us to mock the type(ClassName).property of the object, | ||
# reverting it back to keep the state of the test clean | ||
type(update_handler).running = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't you also reset the iterations counter here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This block (finally
) is only hit when we exit the block where we initialize the update handler. The whole idea was to re-use the same object without creating a new mock_update_handler for every small run. Does that answer your question? I can explain more offline if needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it! Thanks.
tests/ga/test_update.py
Outdated
for ext_dir in expected_events_dirs: | ||
self.assertFalse(os.path.exists(ext_dir), "Extension directory {0} still exists!".format(ext_dir)) | ||
|
||
def test_it_should_retain_events_directories_if_extension_telemetry_pipeline_enabled(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: "extension events" instead of just "events" in the method name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
…-env # Conflicts: # tests/ga/test_extension.py
Successful DCR run - https://tuxgold.corp.microsoft.com/job/AzLinux/job/DungeonCrawler/job/DungeonCrawler.larohra/304/ |
Description
This PR addresses the following use cases-
Issue #
PR information
Quality of Code and Contribution Guidelines