feat: Add telemetry event for uncaught exceptions #203

marofke · 2024-03-12T16:33:29Z

What was the problem/requirement? (What/Why)

We don't have a mechanism to tell if a version of the worker is more error prone than others.

What was the solution? (How)

Capture telemetry on uncaught exceptions, so we can get an idea of the types of errors customers may be encountering that we're not handling properly. Uses changes from aws-deadline/deadline-cloud#205

What is the impact of this change?

We have a better idea if customers are hitting unintended errors using the worker

How was this change tested?

ran locally, confirmed it wrote telemetry to backend
unit tests pass, using changes from deadline-cloud mainline

Was this change documented?

Updated the README

Is this a breaking change?

No

ddneilson · 2024-03-18T14:20:13Z

src/deadline_worker_agent/startup/entrypoint.py

@@ -180,6 +181,7 @@ def filter(self, record: logging.LogRecord) -> bool:
            raise
        else:
            _logger.critical(e, exc_info=True)
+            record_uncaught_exception_telemetry_event(exception_type=str(type(e)))


This is good for the main thread, but there are other threads in flight. An exception in one of those may not propagate to the main thread. It'd be worth discussing with @jusiskin to see where else the agent should be recording these events.

I did speak with Josh and he figured this was a good place to start, and we can expand on it in the future

To quote Josh:

there are other places where we handle exceptions and try to be resilient. those we’d have to instrument as SMEs

So essentially, let's get this in for now and we can improve on all of the nitty gritty error cases when we have the time

Works for me. Something's better than nothing.

Signed-off-by: Caden Marofke <marofke@amazon.com>

marofke mentioned this pull request Mar 13, 2024

feat: Adds common data to telemetry events aws-deadline/deadline-cloud#205

Merged

marofke force-pushed the marofke/telemetry-updates branch 2 times, most recently from c096d52 to f069fc5 Compare March 14, 2024 22:11

ddneilson reviewed Mar 18, 2024

View reviewed changes

marofke force-pushed the marofke/telemetry-updates branch 3 times, most recently from 52a4142 to da2ac0c Compare March 19, 2024 18:48

marofke marked this pull request as ready for review March 19, 2024 19:21

marofke requested a review from a team as a code owner March 19, 2024 19:21

ddneilson approved these changes Mar 20, 2024

View reviewed changes

gahyusuh approved these changes Mar 20, 2024

View reviewed changes

fix: Add telemetry event for uncaught exceptions

fcd6af3

Signed-off-by: Caden Marofke <marofke@amazon.com>

marofke force-pushed the marofke/telemetry-updates branch from da2ac0c to fcd6af3 Compare March 20, 2024 15:02

marofke merged commit 9a17a07 into mainline Mar 20, 2024
12 checks passed

marofke deleted the marofke/telemetry-updates branch March 20, 2024 15:09

marofke changed the title ~~fix: Add telemetry event for uncaught exceptions~~ feat: Add telemetry event for uncaught exceptions Mar 20, 2024

client-software-ci mentioned this pull request Mar 21, 2024

chore(release): 0.23.0 #241

Merged

baxeaz pushed a commit that referenced this pull request Mar 23, 2024

feat: Add telemetry event for uncaught exceptions (#203)

af76831

Signed-off-by: Caden Marofke <marofke@amazon.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add telemetry event for uncaught exceptions #203

feat: Add telemetry event for uncaught exceptions #203

marofke commented Mar 12, 2024 •

edited

Loading

ddneilson Mar 18, 2024

marofke Mar 19, 2024 •

edited

Loading

marofke Mar 19, 2024

ddneilson Mar 20, 2024

feat: Add telemetry event for uncaught exceptions #203

feat: Add telemetry event for uncaught exceptions #203

Conversation

marofke commented Mar 12, 2024 • edited Loading

What was the problem/requirement? (What/Why)

What was the solution? (How)

What is the impact of this change?

How was this change tested?

Was this change documented?

Is this a breaking change?

ddneilson Mar 18, 2024

Choose a reason for hiding this comment

marofke Mar 19, 2024 • edited Loading

Choose a reason for hiding this comment

marofke Mar 19, 2024

Choose a reason for hiding this comment

ddneilson Mar 20, 2024

Choose a reason for hiding this comment

marofke commented Mar 12, 2024 •

edited

Loading

marofke Mar 19, 2024 •

edited

Loading