Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metric AggregatorStore optimizations for sorting tags #2805

Conversation

utpilla
Copy link
Contributor

@utpilla utpilla commented Jan 22, 2022

Fixes item 1 of #2374

Changes

  • The existing two-level lookup structure saves memory by storing distinct set of tag Keys only once. However, it makes it difficult to not rely on sorting the tags based on keys on every update call
  • This PR uses just one dictionary with key as Tags (a struct which has both string[] tagKeys and object[] tagValues) and value as an int which denotes the MetricPoint index assigned

Here is the new look-up algorithm:

  • Check if the dictionary contains the given tags
    • If yes, return the value
    • If not, sort the given tag keys and check if the dictionary contains the sorted order of the tags
      • If yes, then return the value
      • If not, then
        • for tagsLength > 1:
        1. Increment metricPointIndex
        2. Add the sorted order of tags with the newly incremented metricPointIndex
        3. Add the given order of tags with the newly incremented metricPointIndex
        4. return the newly incremented metricPointIndex
        • for tagsLength == 0 or 1
        1. Increment metricPointIndex
        2. Add the given order of tags with the newly incremented metricPointIndex
        3. return the newly incremented metricPointIndex

This means that for any given set of tags, we store the sorted order of the tags and at the most one additional combination of the tags in the dictionary.

Performance Improvement:


Follow-up issues to track:

@utpilla utpilla requested a review from a team January 22, 2022 03:39
@utpilla
Copy link
Contributor Author

utpilla commented Jan 22, 2022

Updating the benchmarks numbers with the latest changes of the PR:

Benchmarks

There is up to ~63% improvement in Perf for higher number of Tags (updated)

// * Summary *

BenchmarkDotNet=v0.13.1, OS=Windows 10.0.22000
Intel Core i7-9700 CPU 3.00GHz, 1 CPU, 8 logical and 8 physical cores
.NET SDK=6.0.101
[Host] : .NET 6.0.1 (6.0.121.56705), X64 RyuJIT
DefaultJob : .NET 6.0.1 (6.0.121.56705), X64 RyuJIT

main

Method AggregationTemporality Mean Error StdDev Allocated
CounterHotPath Cumulative 15.82 ns 0.078 ns 0.069 ns -
CounterWith1LabelsHotPath Cumulative 71.28 ns 0.270 ns 0.252 ns -
CounterWith3LabelsHotPath Cumulative 390.50 ns 1.550 ns 1.450 ns -
CounterWith5LabelsHotPath Cumulative 594.25 ns 11.624 ns 13.386 ns -
CounterWith6LabelsHotPath Cumulative 720.17 ns 7.979 ns 6.663 ns -
CounterWith7LabelsHotPath Cumulative 768.72 ns 5.820 ns 4.544 ns -
CounterHotPath Delta 15.68 ns 0.097 ns 0.086 ns -
CounterWith1LabelsHotPath Delta 68.67 ns 0.233 ns 0.195 ns -
CounterWith3LabelsHotPath Delta 375.43 ns 3.357 ns 3.140 ns -
CounterWith5LabelsHotPath Delta 588.71 ns 5.730 ns 5.360 ns -
CounterWith6LabelsHotPath Delta 683.18 ns 3.388 ns 3.169 ns -
CounterWith7LabelsHotPath Delta 782.50 ns 5.883 ns 5.503 ns -

With the new changes

Method AggregationTemporality Mean Error StdDev Allocated
CounterHotPath Cumulative 15.77 ns 0.049 ns 0.041 ns -
CounterWith1LabelsHotPath Cumulative 57.13 ns 0.138 ns 0.115 ns -
CounterWith3LabelsHotPath Cumulative 140.92 ns 0.602 ns 0.563 ns -
CounterWith5LabelsHotPath Cumulative 225.50 ns 0.527 ns 0.440 ns -
CounterWith6LabelsHotPath Cumulative 256.97 ns 1.173 ns 1.097 ns -
CounterWith7LabelsHotPath Cumulative 288.46 ns 1.003 ns 0.889 ns -
CounterHotPath Delta 15.77 ns 0.019 ns 0.017 ns -
CounterWith1LabelsHotPath Delta 62.79 ns 0.168 ns 0.149 ns -
CounterWith3LabelsHotPath Delta 141.30 ns 0.276 ns 0.230 ns -
CounterWith5LabelsHotPath Delta 228.05 ns 0.443 ns 0.415 ns -
CounterWith6LabelsHotPath Delta 256.34 ns 1.902 ns 1.686 ns -
CounterWith7LabelsHotPath Delta 285.86 ns 0.436 ns 0.407 ns -

@utpilla
Copy link
Contributor Author

utpilla commented Jan 22, 2022

Updating the Stress Test numbers with the latest changes of the PR:

Stress Test

  • These is an increase of ~12M loops/sec from 14.7M in the main branch to 27.3M with the new changes

main

image

With the new changes (updated)

image

@codecov
Copy link

codecov bot commented Jan 22, 2022

Codecov Report

Merging #2805 (39e237f) into main (840b24e) will increase coverage by 0.08%.
The diff coverage is 88.57%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #2805      +/-   ##
==========================================
+ Coverage   83.73%   83.82%   +0.08%     
==========================================
  Files         251      250       -1     
  Lines        8866     8877      +11     
==========================================
+ Hits         7424     7441      +17     
+ Misses       1442     1436       -6     
Impacted Files Coverage Δ
src/OpenTelemetry/Metrics/Tags.cs 72.00% <72.00%> (ø)
src/OpenTelemetry/Metrics/AggregatorStore.cs 82.85% <97.77%> (+1.09%) ⬆️
...nTelemetry/Internal/OpenTelemetrySdkEventSource.cs 74.52% <0.00%> (+2.83%) ⬆️
...lemetry/Internal/SelfDiagnosticsConfigRefresher.cs 89.42% <0.00%> (+2.88%) ⬆️

Co-authored-by: Cijo Thomas <cithomas@microsoft.com>
utpilla and others added 2 commits January 28, 2022 13:12
@TimothyMothra
Copy link
Contributor

Looking at the LookupAggregatorStore method, the if (length > 1) and else bodies are almost identical...

  • Lines 185-244 Perform the sort, and then add both givenTags and sortedTags.
  • Lines 251-294 Are only adding the givenTags.

If it's important to keep these bodies consistent, consider refactoring this out to a helper method.

Copy link
Member

@alanwest alanwest left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work @utpilla! This looks like a solid improvement.

Copy link
Member

@reyang reyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@utpilla
Copy link
Contributor Author

utpilla commented Feb 2, 2022

Looking at the LookupAggregatorStore method, the if (length > 1) and else bodies are almost identical...

  • Lines 185-244 Perform the sort, and then add both givenTags and sortedTags.
  • Lines 251-294 Are only adding the givenTags.

If it's important to keep these bodies consistent, consider refactoring this out to a helper method.

I have created the issue #2843 to track this.

@utpilla utpilla changed the title Metric AggregatorStore optimizations for sorting tags- New Metric AggregatorStore optimizations for sorting tags Feb 2, 2022
Copy link
Member

@cijothomas cijothomas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job in improving the perf significantly! LGTM.
Please address the non blocking comments as follow ups.

@cijothomas cijothomas merged commit c1c5436 into open-telemetry:main Feb 2, 2022
return false;
}

for (int i = 0; i < valuesLength; i++)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@utpilla Sorry, just noticed this. If we re-order this a bit so we validate that key & value lengths are equal first, then we could use one loop to check the keys & values for equality. Faster that way I think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did try this it didn't affect the benchmark numbers that much. I would still update it nonetheless.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants