[APR-208] fix: allow pushing multiple data points along within a single metric #217

tobz · 2024-08-26T15:01:29Z

Context

In #216, we documented the suboptimal behavior of the aggregate transform when compared to the Datadog Agent. Specifically, the current behavior of the aggregate transform leads to additional metric payloads being sent, and thus more network bandwidth consumed, when compared to the Datadog Agent.

This is suboptimal as metrics traffic could jump by a large amount -- 20 to 40% -- for an identical workload, which is an unacceptable difference, even in this experimental stage.

Solution

This PR introduces a large body of work to effectively bake in the concept of a metric being able to hold multiple timestamp/value pairs in a single Metric, commonly referred to as "data points" in the Datadog Agent and other popular metrics protocols such as OTLP.

In making this change, we can more efficiently shuttle multiple data points from source to transform/destination, and also allow destinations to avoid having to implement their own costly/complex aggregation logic to efficiently forward these metrics.

Most of the work centers around the addition of a new value container, MetricValues, which lives alongside MetricValue, and handles the hard work of ensuring a homogenous set of values, holding their timestamps, merging in values based on timestamp, and all ancillary operations needed to effectively build and utilize MetricValues.

Fixes #216.

pr-commenter · 2024-08-26T15:10:38Z

Regression Detector (DogStatsD)

Regression Detector Results

Run ID: f9d8b06b-add6-4080-af9b-7ccec9295291

Baseline: 7.55.2
Comparison: 7.55.3

Performance changes are noted in the perf column of each table:

✅ = significantly better comparison variant performance
❌ = significantly worse comparison variant performance
➖ = no significant change in performance

No significant changes in experiment optimization goals

Confidence level: 90.00%
Effect size tolerance: |Δ mean %| ≥ 5.00%

There were no significant changes in experiment optimization goals at this confidence level and effect size tolerance.

Fine details of change detection per experiment

perf	experiment	goal	Δ mean %	Δ mean % CI
➖	dsd_uds_512kb_3k_contexts	ingress throughput	+0.07	[+0.00, +0.13]
➖	dsd_uds_1mb_50k_contexts_memlimit	ingress throughput	+0.03	[+0.00, +0.06]
➖	dsd_uds_100mb_3k_contexts	ingress throughput	+0.02	[+0.01, +0.03]
➖	dsd_uds_1mb_50k_contexts	ingress throughput	+0.02	[-0.01, +0.05]
➖	dsd_uds_100mb_250k_contexts	ingress throughput	+0.00	[-0.05, +0.06]
➖	dsd_uds_500mb_3k_contexts	ingress throughput	+0.00	[-0.00, +0.01]
➖	dsd_uds_10mb_3k_contexts	ingress throughput	-0.00	[-0.04, +0.04]
➖	dsd_uds_1mb_3k_contexts	ingress throughput	-0.05	[-0.09, -0.01]
➖	dsd_uds_100mb_3k_contexts_distributions_only	memory utilization	-0.45	[-0.69, -0.22]

Explanation

A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".

For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:

Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.
Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.
Its configuration does not mark it "erratic".

pr-commenter · 2024-08-26T16:40:05Z

Regression Detector (Saluki)

Regression Detector Results

Run ID: 892b2b8b-4aea-4fd0-b7ef-1d9a6d128208

Baseline: a5bdd38
Comparison: 86dd5d8

Performance changes are noted in the perf column of each table:

✅ = significantly better comparison variant performance
❌ = significantly worse comparison variant performance
➖ = no significant change in performance

Significant changes in experiment optimization goals

Confidence level: 90.00%
Effect size tolerance: |Δ mean %| ≥ 5.00%

perf	experiment	goal	Δ mean %	Δ mean % CI	links
✅	dsd_uds_100mb_250k_contexts	ingress throughput	+6.52	[+5.78, +7.25]
✅	dsd_uds_100mb_3k_contexts_distributions_only	memory utilization	-7.86	[-7.99, -7.72]

Fine details of change detection per experiment

perf	experiment	goal	Δ mean %	Δ mean % CI
✅	dsd_uds_100mb_250k_contexts	ingress throughput	+6.52	[+5.78, +7.25]
➖	dsd_uds_1mb_50k_contexts_memlimit	ingress throughput	+2.13	[-1.11, +5.37]
➖	dsd_uds_500mb_3k_contexts	ingress throughput	+0.66	[+0.55, +0.76]
➖	dsd_uds_512kb_3k_contexts	ingress throughput	+0.07	[+0.00, +0.14]
➖	dsd_uds_1mb_3k_contexts	ingress throughput	+0.07	[+0.02, +0.12]
➖	dsd_uds_50mb_10k_contexts_no_inlining_no_allocs	ingress throughput	+0.01	[-0.04, +0.05]
➖	dsd_uds_10mb_3k_contexts	ingress throughput	+0.00	[-0.05, +0.06]
➖	dsd_uds_50mb_10k_contexts_no_inlining	ingress throughput	+0.00	[-0.00, +0.00]
➖	dsd_uds_1mb_50k_contexts	ingress throughput	-0.00	[-0.00, +0.00]
➖	dsd_uds_100mb_3k_contexts	ingress throughput	-0.00	[-0.01, +0.00]
✅	dsd_uds_100mb_3k_contexts_distributions_only	memory utilization	-7.86	[-7.99, -7.72]

Explanation

A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".

For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:

Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.
Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.
Its configuration does not mark it "erratic".

pr-commenter · 2024-08-26T16:40:33Z

Regression Detector Links

Experiment Result Links

experiment	link(s)
dsd_uds_100mb_250k_contexts	[Profiling (ADP)] [Profiling (DSD)] [SMP Dashboard]
dsd_uds_100mb_3k_contexts	[Profiling (ADP)] [Profiling (DSD)] [SMP Dashboard]
dsd_uds_100mb_3k_contexts_distributions_only	[Profiling (ADP)] [Profiling (DSD)] [SMP Dashboard]
dsd_uds_10mb_3k_contexts	[Profiling (ADP)] [Profiling (DSD)] [SMP Dashboard]
dsd_uds_1mb_3k_contexts	[Profiling (ADP)] [Profiling (DSD)] [SMP Dashboard]
dsd_uds_1mb_50k_contexts	[Profiling (ADP)] [Profiling (DSD)] [SMP Dashboard]
dsd_uds_1mb_50k_contexts_memlimit	[Profiling (ADP)] [Profiling (DSD)] [SMP Dashboard]
dsd_uds_500mb_3k_contexts	[Profiling (ADP)] [Profiling (DSD)] [SMP Dashboard]
dsd_uds_512kb_3k_contexts	[Profiling (ADP)] [Profiling (DSD)] [SMP Dashboard]
dsd_uds_50mb_10k_contexts_no_inlining (ADP only)	[Profiling (ADP)] [SMP Dashboard]
dsd_uds_50mb_10k_contexts_no_inlining_no_allocs (ADP only)	[Profiling (ADP)] [SMP Dashboard]

lukesteensen

Made a high-level pass at least, but will review more in-depth after some discussion.

lukesteensen · 2024-08-26T22:27:26Z

lib/saluki-components/src/transforms/aggregate/mod.rs

+                                let metric = if self.forward_timestamped_metrics {
+                                    // If we're configured to forward timestamped metrics immediately, then we need to
+                                    // try to handle any timestamped values in this metric. If we get back `Some(...)`,
+                                    // it's either the original metric because no values had timestamps _or_ it's a
+                                    // modified version of the metric after all timestamped values were split out and
+                                    // directly forwarded.
+                                    match handle_forward_timestamped_metric(metric, &mut event_buffer) {


This is making me wonder if it'd be better to split out timestamped metric samples earlier (i.e. at the source) and send them through a different pipeline that doesn't include the aggregate transform. That'd be a much more direct mapping of the agent's current no-aggregation "pipeline". We could even differentiate more strongly between "samples" (single data point, no timestamp, to be aggregated) and "metrics" (potentially multiple points, timestamped, aggregated) at the topology level and have different components that work on each (e.g. aggregate is samples -> metrics). Just a thought.

That's not a bad idea.

I did try to spend some time during the crafting of MetricValues to figure out how to separate those two concerns but found it difficult, because single DSD packets can be multi-value, and it's not great to be emitting five events from the DSD source instead of just one, since they all have an identical timestamp (or lack thereof, really).

Like if we just sketched it out right now, based on your thoughts, we could end up with something like...

pub type RawSamples = Vec<MetricValue>; pub type AggregatedSamples = Vec<(u64, MetricValue)>; pub enum Event { RawMetric(Context, RawSamples, MetricMetadata), AggregatedMetric(Context, AggregatedSamples, MetricMetadata), ... }

This doesn't strike me as the worst thing, although I don't love the idea of every component needing to now consider metrics separately -- raw vs aggregated -- although I suppose this PR is still making them do that, just by calling things like MetricValues::set_timestamp explicitly to reify the maybe-timestamped-maybe-not samples into definitely-timestamped samples. 🤔

The other bit, which the Datadog Agent gets around by deconstructing what we have with the MetricValue enum, is by having all the non-common data -- rate interval, etc -- on the metric sample type, and then just having the raw samples in a vector.

Our choice here of having MetricValue definitely paints us into a corner, to an extent, since it makes it harder to have homogenous value containers.

lukesteensen · 2024-08-26T22:53:56Z

lib/saluki-event/src/metric/data.rs

+pub struct MetricValues {
+    kind: MetricKind,
+    values: SmallVec<[(u64, MetricValue); 2]>,
+}


Similar to my previous comment, my biggest question here is whether we want to have these two axes mixed within a single collection (timestamped vs not and potentially values with different kinds). It kind of seems like we're allowing a lot of potentially possibilities to be represented and I'm not sure how many of are really valid/useful. If we can split these possibilities out into more structured forms, the code that handles them can likely get quite a bit simpler.

Yeah, I think to pull it out a little bit, there's a bunch of very real scenarios in terms of what we see coming into the DSD source:

single metric value w/o timestamp (arguably the most common scenario)

multiple metric values w/o timestamp (I don't have the data on how common this is, but I'd guess much much less common than single metric value)

single metric value w/ timestamp (probably becoming more common.. maybe more common than multiple-value-no-timestamp?)

multiple metric values w/ timestamp (technically possible, but not likely since having a timestamp implies aggregation, which implies a single value)

So our most common scenario is handling metric values without a timestamp, with a very real possibility of handling multiple of this. After that, it's single timestamped value.

After that, then there's what we need to handle coming out of the aggregator phase, which is single or multiple values with timestamps: they always have a timestamp, and it's just as likely to have one timestamped value as it is to have two (the two is just a factor of the flush interval, and will likely only ever be a max of two... but I'm not sure it's safe to use that as an invariant or if it buys us anything by doing so).

Kind of stream of consciousness spewing here... but yeah, I do think this design is only as complex as it needs to be to support the relevant scenarios with a single data type, but related to your comment above, I agree that it's not the most optimal design and we could definitely think about trying to bifurcate some of this a little more.

I'll give some more thought to that approach.

lukesteensen

One note for future discussion, but happy to see this merged as a definite improvement.

lukesteensen · 2024-08-30T17:21:16Z

lib/saluki-event/src/metric/value/mod.rs

+struct TimestampedValue<T> {
+    timestamp: Option<NonZeroU64>,


It feels a bit odd that TimestampedValue might not have a timestamp. Obviously we can get values without timestamps from the DSD protocol, but I don't think we have any use cases right now where a timestamp never gets assigned (and I'm not sure we will?). Is there ever any meaningful variation in how those timestamps eventually get assigned? If not, it seems simplest to remove this variability from the data model and simply assign the timestamp when we receive the metric. If we're using timestamped vs not as an indicator of whether something should be aggregated or not, it seems like we could mark that more explicitly somehow (or maybe we don't need to? How important is it that those points are explicitly not aggregated vs running through the same logic and simply not getting aggregated because they're already independent? I.e. is it an optimization or a semantic difference?)

tobz added the type/bug Bug fixes. label Aug 26, 2024

tobz requested review from a team as code owners August 26, 2024 15:01

tobz changed the title ~~fix: properly~~ fix: allow pushing multiple data points along within a single metric Aug 26, 2024

tobz force-pushed the tobz/metrics-data-model-multiple-data-points branch from c251996 to 71ea255 Compare August 26, 2024 15:18

github-actions bot added area/ci CI/CD, automated testing, etc. and removed source/dogstatsd DogStatsD source. labels Aug 26, 2024

tobz force-pushed the tobz/metrics-data-model-multiple-data-points branch from b9a08a4 to 554a55d Compare August 26, 2024 19:48

lukesteensen reviewed Aug 26, 2024

View reviewed changes

tobz added 5 commits August 27, 2024 18:46

wip

41fd919

lots of tweaks and fixes to behavior

01053a6

clean up datadog metrics req builder

fd859d6

update aggregate transform bounds + smp experiment definitions

5bca5aa

rework

86dd5d8

tobz force-pushed the tobz/metrics-data-model-multiple-data-points branch from 554a55d to 86dd5d8 Compare August 28, 2024 20:04

lukesteensen approved these changes Aug 30, 2024

View reviewed changes

tobz merged commit 409e44e into main Aug 30, 2024
14 checks passed

tobz deleted the tobz/metrics-data-model-multiple-data-points branch August 30, 2024 17:39

tobz changed the title ~~fix: allow pushing multiple data points along within a single metric~~ [APR-208] fix: allow pushing multiple data points along within a single metric Aug 30, 2024

lukesteensen mentioned this pull request Sep 3, 2024

[APR-206] refactor to add shim for per-tenant context resolving/interning #232

Merged

tobz mentioned this pull request Sep 5, 2024

Ensure that our aggregation behavior matches that of the Datadog Agent. #157

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[APR-208] fix: allow pushing multiple data points along within a single metric #217

[APR-208] fix: allow pushing multiple data points along within a single metric #217

tobz commented Aug 26, 2024 •

edited

Loading

pr-commenter bot commented Aug 26, 2024 •

edited

Loading

Fine details of change detection per experiment

Explanation

pr-commenter bot commented Aug 26, 2024 •

edited

Loading

Fine details of change detection per experiment

Explanation

pr-commenter bot commented Aug 26, 2024 •

edited

Loading

lukesteensen left a comment

lukesteensen Aug 26, 2024

tobz Aug 27, 2024

tobz Aug 27, 2024

lukesteensen Aug 26, 2024

tobz Aug 27, 2024

lukesteensen left a comment

lukesteensen Aug 30, 2024

[APR-208] fix: allow pushing multiple data points along within a single metric #217

[APR-208] fix: allow pushing multiple data points along within a single metric #217

Conversation

tobz commented Aug 26, 2024 • edited Loading

Context

Solution

pr-commenter bot commented Aug 26, 2024 • edited Loading

Regression Detector (DogStatsD)

Regression Detector Results

No significant changes in experiment optimization goals

Fine details of change detection per experiment

Explanation

pr-commenter bot commented Aug 26, 2024 • edited Loading

Regression Detector (Saluki)

Regression Detector Results

Significant changes in experiment optimization goals

Fine details of change detection per experiment

Explanation

pr-commenter bot commented Aug 26, 2024 • edited Loading

Regression Detector Links

Experiment Result Links

lukesteensen left a comment

Choose a reason for hiding this comment

lukesteensen Aug 26, 2024

Choose a reason for hiding this comment

tobz Aug 27, 2024

Choose a reason for hiding this comment

tobz Aug 27, 2024

Choose a reason for hiding this comment

lukesteensen Aug 26, 2024

Choose a reason for hiding this comment

tobz Aug 27, 2024

Choose a reason for hiding this comment

lukesteensen left a comment

Choose a reason for hiding this comment

lukesteensen Aug 30, 2024

Choose a reason for hiding this comment

tobz commented Aug 26, 2024 •

edited

Loading

pr-commenter bot commented Aug 26, 2024 •

edited

Loading

pr-commenter bot commented Aug 26, 2024 •

edited

Loading

pr-commenter bot commented Aug 26, 2024 •

edited

Loading