-
Notifications
You must be signed in to change notification settings - Fork 371
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extension telemetry pipeline #1918
Conversation
…n-telemetry-pipeline # Conflicts: # azurelinuxagent/ga/exthandlers.py
…n-telemetry-pipeline # Conflicts: # azurelinuxagent/common/event.py # azurelinuxagent/common/telemetryevent.py # tests/common/test_event.py
Codecov Report
@@ Coverage Diff @@
## develop #1918 +/- ##
===========================================
+ Coverage 70.69% 71.14% +0.45%
===========================================
Files 85 86 +1
Lines 12055 12285 +230
Branches 1685 1728 +43
===========================================
+ Hits 8522 8740 +218
- Misses 3152 3159 +7
- Partials 381 386 +5
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few initial comments. I still need to look at extension_telemetry.py and tests.
@@ -60,7 +61,8 @@ | |||
|
|||
_VALID_HANDLER_STATUS = ['Ready', 'NotReady', "Installing", "Unresponsive"] | |||
|
|||
HANDLER_NAME_PATTERN = re.compile(_HANDLER_PATTERN + r'$', re.IGNORECASE) | |||
HANDLER_NAME_PATTERN = re.compile(_HANDLER_NAME_PATTERN, re.IGNORECASE) | |||
HANDLER_COMPLETE_NAME_PATTERN = re.compile(_HANDLER_PATTERN + r'$', re.IGNORECASE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I missed this in the previous PR, but is_extension_telemetry_pipeline_enabled should be defined elsewhere (extension_telemetry.py?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does make more sense for this flag to be in extension_telemetry.py but I didnt put it there because that creates a circular dependency (because I need to read the HANDLER_NAME from exthandlers.py) and moving these globals (HANDLER_NAME, etc) outside doesn't make logical sense either since they're defined for the Extension Handlers.
To avoid all this confusion I just left the is_extension_telemetry_pipeline_enabled in exthandlers because that's where we are reading the flag. If you feel strongly about it I can move it to a common place though (maybe AgentGlobals.py or version.py) to avoid the circular dependency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a few more comments, still need to look at the tests
|
||
# Limits | ||
MAX_NUMBER_OF_EVENTS_PER_EXTENSION_PER_PERIOD = 300 | ||
EXTENSION_EVENT_FILE_MAX_SIZE = 4 * 1024 * 1024 # 4 MB = 4 * 1,048,576 Bytes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we have similar limits for the agent? should they be the same?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have some limiters for the Agent events but they're more relaxed as compared to these. These are mainly needed to ensure that the extensions dont abuse the pipeline. I dont think these limiters are needed for the agent events.
event_file_path, convert_to_mb(event_file_size), | ||
convert_to_mb(self.EXTENSION_EVENT_FILE_MAX_SIZE)) | ||
logger.warn(msg) | ||
add_log_event(level=logger.LogLevel.WARNING, message=msg, forced=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why add_log_event? (instead of add_event)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add_event goes to the ExtensionEvents table and add_log_event goes to the GenericLogs table. I figured for the extension related stuff, it would make sense if all the data (including errors) are in 1 table so that its easier for debugging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But above you are using add_event(op=WALAEventOperation.ExtensionTelemetryEventProcessing, message=msg, is_success=False)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea is basically that any error thrown by the agent while processing Extension Telemetry will be logged in ExtensionEvents table and any errors thrown due to the telemetry events (malformed, limit exceeds, etc) will be logged in the GenericLogs table for easier lookup
azurelinuxagent/common/exception.py
Outdated
@@ -210,6 +210,28 @@ def __init__(self, msg=None, inner=None): | |||
super(ResourceGoneError, self).__init__(msg, inner) | |||
|
|||
|
|||
class RemoteAccessError(AgentError): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems like you need to update your branch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's still something funny with your branch. This exception was removed by #1935
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I dont think the merge happened properly/correctly. Even Travis is failing with some weird error. I'll try fixing it today properly if I get time. Will let you know once its ready for review!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what went wrong here, I tried the merge again but somehow this is still there. I'll just cherry-pick that commit into my branch to get this change in. Apart from this I dont see anything out of ordinary
…try-pipeline # Conflicts: # tests/ga/test_extension.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some comments, still haven't looked at the test code.
event_file_path, convert_to_mb(event_file_size), | ||
convert_to_mb(self.EXTENSION_EVENT_FILE_MAX_SIZE)) | ||
logger.warn(msg) | ||
add_log_event(level=logger.LogLevel.WARNING, message=msg, forced=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But above you are using add_event(op=WALAEventOperation.ExtensionTelemetryEventProcessing, message=msg, is_success=False)
|
||
# EventName maps to HandlerName + '-' + Version from event file | ||
expected_mapping = { | ||
GuestAgentGenericLogsSchema.EventName: ExtensionEventSchema.Version, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Confused between this line and the comment above, the definitions don't seem to match?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ohh yeah sorry for the confusion, in the code below I'm actually mapping it to "Name-Version". This dict is supposed to just be a placeholder
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its happening in Line 433 of this file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comments on the tests, but otherwise LGTM. I think it's important to re-enable the special characters test since that scenario has been problematic so far.
…try-pipeline # Conflicts: # azurelinuxagent/ga/exthandlers.py # azurelinuxagent/ga/update.py # tests/ga/test_extension.py
Description
This PR contains the major changes for enabling Extension Telemetry pipeline for Extensions.
It adds a new thread to the Agent which comes alive every 5 mins and reads the events directory per extension and sends those extensions to Wireserver.
Issue #
PR information
Quality of Code and Contribution Guidelines