Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Health Metrics for data-pipeline + Dogstatsd-Client Crate #638

Merged
merged 51 commits into from
Sep 24, 2024

Conversation

ajgajg1134
Copy link
Contributor

@ajgajg1134 ajgajg1134 commented Sep 18, 2024

What does this PR do?

Add some health metrics to the datapipeline-crate to assist with debugging any issues in customer environments. To support this the dogstatsd-client code from the sidecar has been promoted to a separate crate where it can be utilized by both.

Motivation

health metrics are good for debugging, we've used them successfully in other tracers.

Additional Notes

Unfortunately there is a bit of new code duplication in the dogstatsd-client crate. This was to get around some very difficult to read rust syntax in trying to write a DogStatsDAction type that could accept Vec<Tag>, &[&Tag] and a chained iterator of &Tag. Doing this turned into a very difficult task that seems harder than it's worth, if there are any rust experts reviewing this who want to take a stab please feel free to do so, just ensure that your resulting code is able to be used by BOTH the sidecar and data-pipeline as these crates both use the client with different types.

How to test the change?

Unit tests here provide good coverage for these implementations.

Ticket: APMSP-1248

Copy link
Contributor

@pierotibou pierotibou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM from a functional standpoint following @ajgajg1134 's answers.

Copy link
Contributor

@VianneyRuhlmann VianneyRuhlmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM just a few suggestions on design alternatives

Cargo.toml Show resolved Hide resolved
dogstatsd-client/src/lib.rs Outdated Show resolved Hide resolved
@@ -197,13 +235,29 @@ impl TraceExporter {
})
}

/// Emit a health metric to dogstatsd
fn emit_metric(&self, metric: HealthMetric, custom_tags: Option<Vec<&Tag>>) {
Copy link
Contributor

@VianneyRuhlmann VianneyRuhlmann Sep 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if it is worth it performance wise. But using a closure to lazily create the metric and tags can avoid computation and allocation when stats metrics are disabled.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How likely are metrics to be disabled? If there is a decent chance we definitely should avoid any extra computation and allocation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, probably delaying the allocation would be beneficial if health metrics can be configurable by the customer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I follow how one would do this / where the performance benefit would be? If the client is None then this function does nothing and just returns? 🤔 (or do you mean allocating the HealthMetric type?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes the HealthMetric and Vec still need to be allocated. Another option would be to use a macro to add a if flusher.is_some() {...} everywhere we create metrics

sidecar/src/service/sidecar_server.rs Show resolved Hide resolved
dogstatsd-client/src/lib.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@ekump ekump left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My only remaining concern is the one @VianneyRuhlmann raised. We shouldn't be calculating metrics at all if the functionality is disabled...assuming there may be a non-trivial amount of situations where metrics are disabled. But that can be addressed by a future PR.

Comment on lines +351 to +354
let tags = match custom_tags {
None => Either::Left(&self.common_stats_tags),
Some(custom) => Either::Right(self.common_stats_tags.iter().chain(custom)),
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably a personal preference because I tend to find the use of Either rather obscure. Wouldn't it be better to define a more descriptive enum type?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I definitely don't love having to use it here but I couldn't find a way around it. I'm also not a huge fan of a more descriptive type since that type would only be used in exactly this one place, and would only exist to get around the type system so I can have tags be something that implements "Into Iterator" and avoid needing dynamic dispatch. (more context here: #638 (comment))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a big fan of Either when it makes sense (as in this case). We know that the a tag can be A or B, and adding our own TagAorB type just to represent that feels superfluous.

@@ -197,13 +235,29 @@ impl TraceExporter {
})
}

/// Emit a health metric to dogstatsd
fn emit_metric(&self, metric: HealthMetric, custom_tags: Option<Vec<&Tag>>) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, probably delaying the allocation would be beneficial if health metrics can be configurable by the customer.

Ok(())
}

fn create_client(endpoint: &Endpoint) -> anyhow::Result<StatsdClient> {
Copy link
Contributor

@hoolioh hoolioh Sep 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: as the function has multiple points for returning an error and these errors can bubble up in the stack frame I would suggest to add more context creating some explicit errors with anyhow so it's easier to debug which method failed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good call out, I've just added additional anyhow in the places it wasn't used before!

@ajgajg1134 ajgajg1134 merged commit 829e93d into main Sep 24, 2024
30 checks passed
@ajgajg1134 ajgajg1134 deleted the andrew.glaude/dogstatsd-client-SPLIT branch September 24, 2024 19:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants