Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

emit failure type in attempt_failure_by_origin #20349

Merged
merged 4 commits into from
Dec 12, 2022

Conversation

cgardens
Copy link
Contributor

@cgardens cgardens commented Dec 11, 2022

What

  • We emit attempt failure by origin (e.g. where in the system it was caused), but we do not include failure type. This PR adds it.
  • Adding it because I want to be able to see it in our DD dashboards and be able to cut by failure type.

@octavia-squidington-iv octavia-squidington-iv added area/platform issues related to the platform area/worker Related to worker labels Dec 11, 2022
@cgardens cgardens temporarily deployed to more-secrets December 11, 2022 21:56 — with GitHub Actions Inactive
@cgardens cgardens temporarily deployed to more-secrets December 11, 2022 21:56 — with GitHub Actions Inactive
@@ -521,11 +521,13 @@ private void trackFailures(final AttemptFailureSummary failureSummary) {
if (failureSummary != null) {
for (final FailureReason reason : failureSummary.getFailures()) {
MetricClientFactory.getMetricClient().count(OssMetricsRegistry.ATTEMPT_FAILED_BY_FAILURE_ORIGIN, 1,
new MetricAttribute(MetricTags.FAILURE_ORIGIN, MetricTags.getFailureOrigin(reason.getFailureOrigin())));
new MetricAttribute(MetricTags.FAILURE_ORIGIN, MetricTags.getFailureOrigin(reason.getFailureOrigin())),
new MetricAttribute(MetricTags.FAILURE_TYPE, MetricTags.getFailureType(reason.getFailureType())));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we also add this to the traceFailures method right about the trackFailures one? That would add the information as a facet on the APM trace in addition to the metric.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, this sounds like a good idea

Copy link
Contributor

@pmossman pmossman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! (leaving an approval assuming you'll decide whether it's worth it to add the same thing to the traceFailures method as per Jonathan's suggestion

@cgardens cgardens temporarily deployed to more-secrets December 12, 2022 22:17 — with GitHub Actions Inactive
@cgardens cgardens temporarily deployed to more-secrets December 12, 2022 22:17 — with GitHub Actions Inactive
@cgardens cgardens temporarily deployed to more-secrets December 12, 2022 22:22 — with GitHub Actions Inactive
@cgardens cgardens temporarily deployed to more-secrets December 12, 2022 22:22 — with GitHub Actions Inactive
@cgardens cgardens merged commit 0af6bd0 into master Dec 12, 2022
@cgardens cgardens deleted the cgardens/emit-failure-type branch December 12, 2022 23:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/platform issues related to the platform area/worker Related to worker
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants