Reuse MetricData #5178

jack-berg · 2023-02-05T18:00:03Z

This is POC that pushes the effort to reduce memory allocation to its limit by reusing all data carrier classes on repeated collections (i.e. MetricData, PointData, supporting arrays etc). I've prototyped this on the exponential histogram aggregation, which is the most complicated.

We can arguably do this safely because readers aren't allowed to perform concurrent reads, so if they synchronously consume all the data during collection and export, there's no risk of the data being updated out from under them. Could also make this explicit by adding a method to MetricReader / MetricExporter that indicated the desired memory behavior, where the default is to make immutable data carriers as we do today, while allowing for opting in to this improved alternative.

The memory allocation pretty close to as low as possible with this change. The only remaining allocations I see when profiling are allocations for iterators like this, which would be hard to get rid of.

Performance results before:

Benchmark                                                                                  (aggregationGenerator)  (aggregationTemporality)  Mode  Cnt           Score           Error   Units
HistogramCollectBenchmark.recordAndCollect                                              EXPLICIT_BUCKET_HISTOGRAM                     DELTA    ss    5  3520507291.800 ± 120427365.649   ns/op
HistogramCollectBenchmark.recordAndCollect:·gc.alloc.rate                               EXPLICIT_BUCKET_HISTOGRAM                     DELTA    ss    5           2.612 ±         0.099  MB/sec
HistogramCollectBenchmark.recordAndCollect:·gc.alloc.rate.norm                          EXPLICIT_BUCKET_HISTOGRAM                     DELTA    ss    5     9642284.800 ±     38984.668    B/op
HistogramCollectBenchmark.recordAndCollect:·gc.count                                    EXPLICIT_BUCKET_HISTOGRAM                     DELTA    ss    5           1.000                  counts
HistogramCollectBenchmark.recordAndCollect:·gc.time                                     EXPLICIT_BUCKET_HISTOGRAM                     DELTA    ss    5           2.000                      ms
HistogramCollectBenchmark.recordAndCollect                                              EXPLICIT_BUCKET_HISTOGRAM                CUMULATIVE    ss    5  3587937850.200 ±  44279700.645   ns/op
HistogramCollectBenchmark.recordAndCollect:·gc.alloc.rate                               EXPLICIT_BUCKET_HISTOGRAM                CUMULATIVE    ss    5           2.789 ±         0.146  MB/sec
HistogramCollectBenchmark.recordAndCollect:·gc.alloc.rate.norm                          EXPLICIT_BUCKET_HISTOGRAM                CUMULATIVE    ss    5    10495027.200 ±    508130.695    B/op
HistogramCollectBenchmark.recordAndCollect:·gc.count                                    EXPLICIT_BUCKET_HISTOGRAM                CUMULATIVE    ss    5           1.000                  counts
HistogramCollectBenchmark.recordAndCollect:·gc.time                                     EXPLICIT_BUCKET_HISTOGRAM                CUMULATIVE    ss    5           1.000                      ms
HistogramCollectBenchmark.recordAndCollect                             DEFAULT_BASE2_EXPONENTIAL_BUCKET_HISTOGRAM                     DELTA    ss    5  9513074758.000 ± 595008484.127   ns/op
HistogramCollectBenchmark.recordAndCollect:·gc.alloc.rate              DEFAULT_BASE2_EXPONENTIAL_BUCKET_HISTOGRAM                     DELTA    ss    5           0.608 ±         0.039  MB/sec
HistogramCollectBenchmark.recordAndCollect:·gc.alloc.rate.norm         DEFAULT_BASE2_EXPONENTIAL_BUCKET_HISTOGRAM                     DELTA    ss    5     6060252.800 ±     37585.400    B/op
HistogramCollectBenchmark.recordAndCollect:·gc.count                   DEFAULT_BASE2_EXPONENTIAL_BUCKET_HISTOGRAM                     DELTA    ss    5             ≈ 0                  counts
HistogramCollectBenchmark.recordAndCollect                             DEFAULT_BASE2_EXPONENTIAL_BUCKET_HISTOGRAM                CUMULATIVE    ss    5  9063879408.400 ± 194774415.736   ns/op
HistogramCollectBenchmark.recordAndCollect:·gc.alloc.rate              DEFAULT_BASE2_EXPONENTIAL_BUCKET_HISTOGRAM                CUMULATIVE    ss    5           0.897 ±         0.089  MB/sec
HistogramCollectBenchmark.recordAndCollect:·gc.alloc.rate.norm         DEFAULT_BASE2_EXPONENTIAL_BUCKET_HISTOGRAM                CUMULATIVE    ss    5     8527204.800 ±    992718.242    B/op
HistogramCollectBenchmark.recordAndCollect:·gc.count                   DEFAULT_BASE2_EXPONENTIAL_BUCKET_HISTOGRAM                CUMULATIVE    ss    5           1.000                  counts
HistogramCollectBenchmark.recordAndCollect:·gc.time                    DEFAULT_BASE2_EXPONENTIAL_BUCKET_HISTOGRAM                CUMULATIVE    ss    5           2.000                      ms
HistogramCollectBenchmark.recordAndCollect                      ZERO_MAX_SCALE_BASE2_EXPONENTIAL_BUCKET_HISTOGRAM                     DELTA    ss    5  2609093750.200 ±  27543850.838   ns/op
HistogramCollectBenchmark.recordAndCollect:·gc.alloc.rate       ZERO_MAX_SCALE_BASE2_EXPONENTIAL_BUCKET_HISTOGRAM                     DELTA    ss    5           2.214 ±         0.024  MB/sec
HistogramCollectBenchmark.recordAndCollect:·gc.alloc.rate.norm  ZERO_MAX_SCALE_BASE2_EXPONENTIAL_BUCKET_HISTOGRAM                     DELTA    ss    5     6057052.800 ±     37585.400    B/op
HistogramCollectBenchmark.recordAndCollect:·gc.count            ZERO_MAX_SCALE_BASE2_EXPONENTIAL_BUCKET_HISTOGRAM                     DELTA    ss    5             ≈ 0                  counts
HistogramCollectBenchmark.recordAndCollect                      ZERO_MAX_SCALE_BASE2_EXPONENTIAL_BUCKET_HISTOGRAM                CUMULATIVE    ss    5  2584224741.400 ±  66418902.656   ns/op
HistogramCollectBenchmark.recordAndCollect:·gc.alloc.rate       ZERO_MAX_SCALE_BASE2_EXPONENTIAL_BUCKET_HISTOGRAM                CUMULATIVE    ss    5           3.186 ±         0.109  MB/sec
HistogramCollectBenchmark.recordAndCollect:·gc.alloc.rate.norm  ZERO_MAX_SCALE_BASE2_EXPONENTIAL_BUCKET_HISTOGRAM                CUMULATIVE    ss    5     8633201.600 ±    167908.737    B/op
HistogramCollectBenchmark.recordAndCollect:·gc.count            ZERO_MAX_SCALE_BASE2_EXPONENTIAL_BUCKET_HISTOGRAM                CUMULATIVE    ss    5           1.000                  counts
HistogramCollectBenchmark.recordAndCollect:·gc.time             ZERO_MAX_SCALE_BASE2_EXPONENTIAL_BUCKET_HISTOGRAM                CUMULATIVE    ss    5           1.000                      ms

And after:

Benchmark                                                                                  (aggregationGenerator)  (aggregationTemporality)  Mode  Cnt           Score           Error   Units
HistogramCollectBenchmark.recordAndCollect                                              EXPLICIT_BUCKET_HISTOGRAM                     DELTA    ss    5  3473301742.000 ± 136836882.403   ns/op
HistogramCollectBenchmark.recordAndCollect:·gc.alloc.rate                               EXPLICIT_BUCKET_HISTOGRAM                     DELTA    ss    5           2.588 ±         0.111  MB/sec
HistogramCollectBenchmark.recordAndCollect:·gc.alloc.rate.norm                          EXPLICIT_BUCKET_HISTOGRAM                     DELTA    ss    5     9424992.000 ±     38035.843    B/op
HistogramCollectBenchmark.recordAndCollect:·gc.count                                    EXPLICIT_BUCKET_HISTOGRAM                     DELTA    ss    5           1.000                  counts
HistogramCollectBenchmark.recordAndCollect:·gc.time                                     EXPLICIT_BUCKET_HISTOGRAM                     DELTA    ss    5           2.000                      ms
HistogramCollectBenchmark.recordAndCollect                                              EXPLICIT_BUCKET_HISTOGRAM                CUMULATIVE    ss    5  3415919933.600 ±  17967140.478   ns/op
HistogramCollectBenchmark.recordAndCollect:·gc.alloc.rate                               EXPLICIT_BUCKET_HISTOGRAM                CUMULATIVE    ss    5           2.979 ±         0.155  MB/sec
HistogramCollectBenchmark.recordAndCollect:·gc.alloc.rate.norm                          EXPLICIT_BUCKET_HISTOGRAM                CUMULATIVE    ss    5    10671091.200 ±    507025.668    B/op
HistogramCollectBenchmark.recordAndCollect:·gc.count                                    EXPLICIT_BUCKET_HISTOGRAM                CUMULATIVE    ss    5           1.000                  counts
HistogramCollectBenchmark.recordAndCollect:·gc.time                                     EXPLICIT_BUCKET_HISTOGRAM                CUMULATIVE    ss    5           2.000                      ms
HistogramCollectBenchmark.recordAndCollect                             DEFAULT_BASE2_EXPONENTIAL_BUCKET_HISTOGRAM                     DELTA    ss    5  9312673500.000 ±  18521603.321   ns/op
HistogramCollectBenchmark.recordAndCollect:·gc.alloc.rate              DEFAULT_BASE2_EXPONENTIAL_BUCKET_HISTOGRAM                     DELTA    ss    5           0.066 ±         0.004  MB/sec
HistogramCollectBenchmark.recordAndCollect:·gc.alloc.rate.norm         DEFAULT_BASE2_EXPONENTIAL_BUCKET_HISTOGRAM                     DELTA    ss    5      640233.600 ±     37420.249    B/op
HistogramCollectBenchmark.recordAndCollect:·gc.count                   DEFAULT_BASE2_EXPONENTIAL_BUCKET_HISTOGRAM                     DELTA    ss    5             ≈ 0                  counts
HistogramCollectBenchmark.recordAndCollect                             DEFAULT_BASE2_EXPONENTIAL_BUCKET_HISTOGRAM                CUMULATIVE    ss    5  9148826575.000 ± 140145663.125   ns/op
HistogramCollectBenchmark.recordAndCollect:·gc.alloc.rate              DEFAULT_BASE2_EXPONENTIAL_BUCKET_HISTOGRAM                CUMULATIVE    ss    5           0.011 ±         0.010  MB/sec
HistogramCollectBenchmark.recordAndCollect:·gc.alloc.rate.norm         DEFAULT_BASE2_EXPONENTIAL_BUCKET_HISTOGRAM                CUMULATIVE    ss    5      110198.400 ±    100198.845    B/op
HistogramCollectBenchmark.recordAndCollect:·gc.count                   DEFAULT_BASE2_EXPONENTIAL_BUCKET_HISTOGRAM                CUMULATIVE    ss    5             ≈ 0                  counts
HistogramCollectBenchmark.recordAndCollect                      ZERO_MAX_SCALE_BASE2_EXPONENTIAL_BUCKET_HISTOGRAM                     DELTA    ss    5  2666786225.000 ±  22839718.733   ns/op
HistogramCollectBenchmark.recordAndCollect:·gc.alloc.rate       ZERO_MAX_SCALE_BASE2_EXPONENTIAL_BUCKET_HISTOGRAM                     DELTA    ss    5           0.228 ±         0.011  MB/sec
HistogramCollectBenchmark.recordAndCollect:·gc.alloc.rate.norm  ZERO_MAX_SCALE_BASE2_EXPONENTIAL_BUCKET_HISTOGRAM                     DELTA    ss    5      637870.400 ±     37356.180    B/op
HistogramCollectBenchmark.recordAndCollect:·gc.count            ZERO_MAX_SCALE_BASE2_EXPONENTIAL_BUCKET_HISTOGRAM                     DELTA    ss    5             ≈ 0                  counts
HistogramCollectBenchmark.recordAndCollect                      ZERO_MAX_SCALE_BASE2_EXPONENTIAL_BUCKET_HISTOGRAM                CUMULATIVE    ss    5  2590788883.400 ± 129472502.049   ns/op
HistogramCollectBenchmark.recordAndCollect:·gc.alloc.rate       ZERO_MAX_SCALE_BASE2_EXPONENTIAL_BUCKET_HISTOGRAM                CUMULATIVE    ss    5           0.036 ±         0.009  MB/sec
HistogramCollectBenchmark.recordAndCollect:·gc.alloc.rate.norm  ZERO_MAX_SCALE_BASE2_EXPONENTIAL_BUCKET_HISTOGRAM                CUMULATIVE    ss    5       98403.200 ±     21687.413    B/op
HistogramCollectBenchmark.recordAndCollect:·gc.count            ZERO_MAX_SCALE_BASE2_EXPONENTIAL_BUCKET_HISTOGRAM                CUMULATIVE    ss    5             ≈ 0                  counts

The aggregate reduction of memory between this and the other changes is quite impressive. The default exp histogram aggregation with cumulative temporality has reduced from an original bytes / op of 46_466_259 to 110_198 with this PR. A 99.8% reduction, and 420x improvement!

I've run this locally with an app that produces 1_000_000 unique series, and its pretty impressive how little memory is allocated on collect. Something like 25mb per collection, or 25 bytes per series. Immutability is great, but it's hard to ignore these performance gains!

codecov · 2023-02-05T19:31:28Z

Codecov Report

Patch coverage: 92.80% and project coverage change: -0.05 ⚠️

Comparison is base (3d5424a) 90.97% compared to head (3db1e81) 90.93%.

Additional details and impacted files

@@             Coverage Diff              @@
##               main    #5178      +/-   ##
============================================
- Coverage     90.97%   90.93%   -0.05%     
- Complexity     4907     4941      +34     
============================================
  Files           552      556       +4     
  Lines         14489    14593     +104     
  Branches       1372     1374       +2     
============================================
+ Hits          13182    13270      +88     
- Misses          907      919      +12     
- Partials        400      404       +4

Impacted Files	Coverage Δ
...rnal/aggregator/AdaptingCircularBufferCounter.java	`86.79% <ø> (ø)`
...nal/data/MutableExponentialHistogramPointData.java	`78.37% <78.37%> (ø)`
.../opentelemetry/sdk/internal/PrimitiveLongList.java	`95.83% <88.88%> (-4.17%)`	⬇️
...io/opentelemetry/sdk/metrics/SdkMeterProvider.java	`95.58% <100.00%> (+0.20%)`	⬆️
...tor/DoubleBase2ExponentialHistogramAggregator.java	`98.71% <100.00%> (+0.08%)`	⬆️
...egator/DoubleBase2ExponentialHistogramBuckets.java	`63.30% <100.00%> (-6.15%)`	⬇️
...ernal/data/MutableExponentialHistogramBuckets.java	`100.00% <100.00%> (ø)`
...internal/data/MutableExponentialHistogramData.java	`100.00% <100.00%> (ø)`
...y/sdk/metrics/internal/data/MutableMetricData.java	`100.00% <100.00%> (ø)`
...nternal/state/DefaultSynchronousMetricStorage.java	`93.10% <100.00%> (+0.12%)`	⬆️
... and 1 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

…y-java into reuse-metric-data

...c/main/java/io/opentelemetry/sdk/metrics/internal/state/DefaultSynchronousMetricStorage.java

…y-java into reuse-metric-data

github-actions · 2023-02-26T15:28:14Z

This PR was marked stale due to lack of activity. It will be closed in 14 days.

jsuereth · 2023-03-14T15:54:28Z

...opentelemetry/sdk/metrics/internal/aggregator/DoubleBase2ExponentialHistogramAggregator.java

      if (reset) {
        buckets.clear();
      }
-      return copy;
+      return mutableBuckets;


This was one idea I entertained for awhile performance gain. How are you avoiding multi-threads touching this data?

Is it because you're only returning this to ONE metric-reader at a time and the "hot path" of writes is still writing to the underlying data allocated in this handle?

If so, VERY clever. We should document this in the handle class how it works and why it's safe.

Is it because you're only returning this to ONE metric-reader at a time and the "hot path" of writes is still writing to the underlying data allocated in this handle?

Yes exactly. While we support multiple readers, we don't support concurrent reads. As long as readers don't hold on to references to MetricData and try to read after they're done reading, they shouldn't get any weird behavior. Right now this won't work with multiple readers since once PeriodicMetricReader calls MetricProducer#collectAllMetrics(), another reader will be able to start reading and MetricData will be mutated out from under the PeriodicMetricReader. Ouch. But this is solvable by providing readers a way to communicate to MetricProducer that they're done consuming the data. For example, by adjusting collectAllMetrics to accept a CompletableResultCode which the reader completes when finished consuming the data, i.e. MetricProducer#collectAllMetrics(CompleteableResultCode).

As you noticed, this also relies on different objects for writes vs. reads (writes use AggregationHandle, reads use some some mutuable MetricData).

Scratch that part about readers needing to communicate when they're finished consuming the data. Each reader has its own copies of metric storage, and the mutable MetricData, so its much simpler: It should be safe as long as a MetricReader doesn't hold on to the MetricData references and try to consume them during a subsequent collect.

…y-java into reuse-metric-data

jack-berg · 2023-09-28T15:32:15Z

Closing since #5709 has been merged.

jack-berg added 2 commits February 5, 2023 11:41

Reuse MetricData

a47519c

Fix test

4c9013d

jack-berg added 4 commits February 5, 2023 13:36

Fix test

42c1f8b

spotless

2bd5295

Iterate over map without allocations

9b79448

Merge branch 'main' of https://github.com/open-telemetry/opentelemetr…

ff85519

…y-java into reuse-metric-data

jack-berg commented Feb 7, 2023

View reviewed changes

...c/main/java/io/opentelemetry/sdk/metrics/internal/state/DefaultSynchronousMetricStorage.java Outdated Show resolved Hide resolved

This was referenced Feb 7, 2023

Avoid exemplar allocations if there are no measurements #5182

Merged

Optimize DefaultSynchronousMetricStorage iteration #5183

Merged

Remove boxed primitives from aggregations #5184

Merged

Merge branch 'main' of https://github.com/open-telemetry/opentelemetr…

4d3dd0d

…y-java into reuse-metric-data

github-actions bot added the Stale label Feb 26, 2023

jack-berg added prototype and removed Stale labels Feb 27, 2023

jack-berg mentioned this pull request Mar 10, 2023

Allow mutable accumulations #3766

Closed

jsuereth reviewed Mar 14, 2023

View reviewed changes

jack-berg added 2 commits March 14, 2023 12:52

Merge branch 'main' of https://github.com/open-telemetry/opentelemetr…

5ead921

…y-java into reuse-metric-data

Fix merge

3db1e81

jack-berg mentioned this pull request Mar 21, 2023

Proposal: Add filter predicate to MetricReader (push-down predicate) open-telemetry/opentelemetry-specification#3324

Closed

jack-berg mentioned this pull request Sep 1, 2023

Memory Mode support: Adding memory mode, and implementing it for Asynchronous Instruments #5709

Merged

jack-berg closed this Sep 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reuse MetricData #5178

Reuse MetricData #5178

jack-berg commented Feb 5, 2023 •

edited

Loading

codecov bot commented Feb 5, 2023 •

edited

Loading

github-actions bot commented Feb 26, 2023

jsuereth Mar 14, 2023

jack-berg Mar 14, 2023

jack-berg Mar 14, 2023

jack-berg commented Sep 28, 2023

Reuse MetricData #5178

Reuse MetricData #5178

Conversation

jack-berg commented Feb 5, 2023 • edited Loading

codecov bot commented Feb 5, 2023 • edited Loading

Codecov Report

github-actions bot commented Feb 26, 2023

jsuereth Mar 14, 2023

Choose a reason for hiding this comment

jack-berg Mar 14, 2023

Choose a reason for hiding this comment

jack-berg Mar 14, 2023

Choose a reason for hiding this comment

jack-berg commented Sep 28, 2023

jack-berg commented Feb 5, 2023 •

edited

Loading

codecov bot commented Feb 5, 2023 •

edited

Loading