Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update AggregatorStore to reclaim unused MetricPoints for Delta aggregation temporality #4486

Conversation

utpilla
Copy link
Contributor

@utpilla utpilla commented May 13, 2023

Addresses #2360

This PR shows a possible approach to reclaim MetricPoints for Delta aggregation temporality.

Current behavior:

For any metric, once the SDK has encountered a given number (default: 2000) of unique dimension combinations, it drops any new measurement with a newer dimension combination.

With this PR, the AggregatorStore is updated to start reclaiming unused MetricPoints once it has encountered a particular number of unique dimensions combinations. This number is set to 75% of the max metric points allowed. For default case, this means the AggregatorStore does not begin reclaiming MetricPoints until it has seen 1500 (= 75% * 2000) unique dimension combinations. Once it hits 1500, it changes its behavior to begin reclaiming unused MetricPoints. I have set this threshold as there is some cost associated with reclaiming MetricPoints and we should try to avoid every user to pay for this.

What changes when AggregatorStore starts to reclaim unused MetricPoints?

  • On the Collect call, Snapshot method checks for unused MetricPoints, that is any MetricPoint, for which there is no collection pending and there is no other thread waiting to update it. If it finds such a MetricPoint, it adds the index of that MetricPoint to a queue
  • Any update thread which cannot find a MetricPoint for its dimension combination would now check for an available MetricPoint from the queue instead of simply incrementing the array index.
  • When exporting Metrics, we would not necessarily by providing the MetricPoints in a sorted order, that is, the order in which they were emitted

Changes

  • Added a Func named lookupAggregatorStore to allow for different behavior for Delta and Cumulative aggregation
  • Created separate lookup dictionaries for Delta and Cumulative aggregation
  • Added a Queue<int> named availableMetricPoints- This holds the indices for reclaimed MetricPoints. In the AggregatorStore ctor, it's initialized with the remaining 25% of the indices after the threshold.
  • There are two new fields on MetricPoint struct:
    • ReferenceCount- This is used by Snapshot method to determine if a MetricPoint is in use by an Update thread. Update threads increment the ReferenceCount of the MetricPoint before the update and decrement it after the update
    • LookupData- This is the lookup dictionary value type for Delta aggregation. This helps Update threads determine if the MetricPoint that they are about to update has already been reclaimed by the Snapshot thread or by some other Update thread for a different set of dimensions combination
  • Added an internal field named DroppedCount- This is used by the unit tests to verify that no measurements were dropped when they test the MetricPoint reclaim feature
  • Appropriate CHANGELOG.md files updated for non-trivial changes

@utpilla
Copy link
Contributor Author

utpilla commented May 13, 2023

Stress Tests

For max MetricPoints = 1000, there is a drop of ~4.6% in total number of loops executed in around 5 minutes:

main branch: Loops: 7,809,957,791, Loops/Second: 23,745,565, CPU Cycles/Loop: 878, RunwayTime (Seconds): 315
This PR: Loops: 7,448,850,671, Loops/Second: 22,608,029, CPU Cycles/Loop: 904, RunwayTime (Seconds): 315

Replace Program.cs with the code below

using System.Diagnostics.Metrics;
using System.Runtime.CompilerServices;
using OpenTelemetry.Metrics;

namespace OpenTelemetry.Tests.Stress;

public partial class Program
{
    private const int ArraySize = 10;

    private static readonly Meter TestMeter = new(Utils.GetCurrentMethodName());
    private static readonly Counter<long> TestCounter = TestMeter.CreateCounter<long>("TestCounter");
    private static readonly string[] DimensionValues = new string[ArraySize];
    private static readonly ThreadLocal<Random> ThreadLocalRandom = new(() => new Random());

    public static void Main()
    {
        for (int i = 0; i < ArraySize; i++)
        {
            DimensionValues[i] = $"DimValue{i}";
        }

        using var exporter = new CustomExporter();
        using var metricReader = new PeriodicExportingMetricReader(exporter, exportIntervalMilliseconds: 10)
        {
            TemporalityPreference = MetricReaderTemporalityPreference.Delta,
        };

        using var meterProvider = Sdk.CreateMeterProviderBuilder()
            .AddMeter(TestMeter.Name)
            .AddReader(metricReader)
            .Build();

        Stress(prometheusPort: 9464);
    }

    [MethodImpl(MethodImplOptions.AggressiveInlining)]
    protected static void Run()
    {
        var random = ThreadLocalRandom.Value;
        TestCounter.Add(
            100,
            new("DimName1", DimensionValues[random.Next(0, ArraySize)]),
            new("DimName2", DimensionValues[random.Next(0, ArraySize)]),
            new("DimName3", DimensionValues[random.Next(0, ArraySize)]));
    }

    private class CustomExporter : BaseExporter<Metric>
    {
        public long Sum = 0;
        public string DimensionKey;
        public object DimensionValue;

        public CustomExporter()
        {
        }

        public override ExportResult Export(in Batch<Metric> batch)
        {
            foreach (var metric in batch)
            {
                foreach (ref readonly var metricPoint in metric.GetMetricPoints())
                {
                    foreach (var tag in metricPoint.Tags)
                    {
                        this.DimensionKey = tag.Key;
                        this.DimensionValue = tag.Value;
                    }

                    if (metric.MetricType.IsSum())
                    {
                        this.Sum += metricPoint.GetSumLong();
                    }
                }
            }

            return ExportResult.Success;
        }
    }
}

@github-actions
Copy link
Contributor

This PR was marked stale due to lack of activity and will be closed in 7 days. Commenting or Pushing will instruct the bot to automatically remove the label. This bot runs once per day.

@github-actions github-actions bot added the Stale Issues and pull requests which have been flagged for closing due to inactivity label May 24, 2023
@github-actions
Copy link
Contributor

Closed as inactive. Feel free to reopen if this PR is still being worked on.

@github-actions github-actions bot closed this May 31, 2023
@utpilla utpilla reopened this Sep 1, 2023
@codecov
Copy link

codecov bot commented Sep 1, 2023

Codecov Report

Merging #4486 (380b159) into main (535c819) will decrease coverage by 0.27%.
Report is 1 commits behind head on main.
The diff coverage is 65.89%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #4486      +/-   ##
==========================================
- Coverage   83.19%   82.93%   -0.27%     
==========================================
  Files         293      294       +1     
  Lines       11984    12193     +209     
==========================================
+ Hits         9970    10112     +142     
- Misses       2014     2081      +67     
Flag Coverage Δ
unittests 82.93% <65.89%> (-0.27%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Changed Coverage
src/OpenTelemetry/Metrics/AggregatorStore.cs 63.50%
src/OpenTelemetry/Metrics/MetricPoint.cs 90.90%
src/OpenTelemetry/Metrics/LookupData.cs 100.00%
src/OpenTelemetry/Metrics/Tags.cs 100.00%

@github-actions github-actions bot removed the Stale Issues and pull requests which have been flagged for closing due to inactivity label Sep 1, 2023
@github-actions
Copy link
Contributor

github-actions bot commented Sep 9, 2023

This PR was marked stale due to lack of activity and will be closed in 7 days. Commenting or Pushing will instruct the bot to automatically remove the label. This bot runs once per day.

@github-actions github-actions bot added the Stale Issues and pull requests which have been flagged for closing due to inactivity label Sep 9, 2023
@utpilla utpilla marked this pull request as ready for review September 12, 2023 03:02
@utpilla utpilla requested review from a team, reyang and cijothomas September 12, 2023 03:02
@github-actions github-actions bot removed the Stale Issues and pull requests which have been flagged for closing due to inactivity label Sep 12, 2023
@vishweshbankwar
Copy link
Member

vishweshbankwar commented Sep 18, 2023

@utpilla - Could you please define what "limit of active points" means here? Since it is very specific to how the reclaiming is proposed here.

lock (this.tagsToMetricPointIndexDictionaryDelta!)
{
LookupData? dictionaryValue;
if (lookupData.SortedTags != Tags.EmptyTags)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit to consider... this might read a little better with with a property if (!lookupData.GivenEqualsSorted).

If this were the case I think EmptyTags could be moved to the LookupData class and made private.

internal class LookupData {
    private static readonly Tags EmptyTags = new(Array.Empty<KeyValuePair<string, object?>>());

    public LookupData(int index) { ... }
    public LookupData(int index, Tags tags) { ... }
    public LookupData(int index, Tags sorted, Tags given) { ... }

    public bool GivenEqualsSorted => SortedTags == Tags.EmptyTags;
}

utpilla and others added 2 commits September 18, 2023 16:57
Co-authored-by: Alan West <3676547+alanwest@users.noreply.github.com>
Co-authored-by: Alan West <3676547+alanwest@users.noreply.github.com>
utpilla and others added 7 commits September 18, 2023 22:12
Co-authored-by: Vishwesh Bankwar <vishweshbankwar@users.noreply.github.com>
Co-authored-by: Vishwesh Bankwar <vishweshbankwar@users.noreply.github.com>
@utpilla
Copy link
Contributor Author

utpilla commented Sep 20, 2023

With this PR, the AggregatorStore is updated to start reclaiming unused MetricPoints once it has encountered a particular number of unique dimensions combinations. This number is set to 75% of the max metric points allowed. For default case, this means the AggregatorStore does not begin reclaiming MetricPoints until it has seen 1500 (= 75% * 2000) unique dimension combinations. Once it hits 1500, it changes its behavior to begin reclaiming unused MetricPoints. I have set this threshold as there is some cost associated with reclaiming MetricPoints and we should try to avoid every user to pay for this.

The MetricPoint reclaim behavior makes some fundamental changes to how the SDK exports metrics. It might surprise some users.

For example, @vishweshbankwar pointed out a specific case where the SDK has exported a particular MetricPoint (k1,v1) at time T1. After a while, this MetricPoint gets reclaimed. Now if their application produces a totally new set of unique metric points and consistently updates them, they might run into a situation where there is no MetricPoint available when a thread later tries to update the MetricPoint (k1,v1) at time T2 and the measurement gets dropped. When consuming the metrics, their dashboard would show them that no measurements were recorded for the MetricPoint (k1,v1) at time T2, which is not correct as it was actually dropped.

I would be in favor of the user opting in for the reclaim behavior so that they are aware of what to expect instead of simply making it the default behavior for Delta aggregation temporality.

I also spoke to @alanwest who brought up a good point to check how the Java SIG sets the default behavior and assess if it would make sense to be consistent with them.

I think for now it's okay to have this behavior (even for 1.7.0-alpha release). But before 1.7.0 stable release, we should follow-up on what the desired default behavior should be.

@Yun-Ting
Copy link
Contributor

For example, @vishweshbankwar pointed out a specific case where the SDK has exported a particular MetricPoint (k1,v1) at time T1. After a while, this MetricPoint gets reclaimed. Now if their application produces a totally new set of unique metric points and consistently updates them, they might run into a situation where there is no MetricPoint available when a thread later tries to update the MetricPoint (k1,v1) at time T2 and the measurement gets dropped. When consuming the metrics, their dashboard would show them that no measurements were recorded for the MetricPoint (k1,v1) at time T2, which is not correct as it was actually dropped.

I'm curious about what would be the behavioral difference from the client side between the below 2 cases:

  1. Client's cardinality reaches 2000 and new distinct tags start getting dropped, and
  2. (with this reclaimed feature) cardinality reaches the 75% * 2000 limit and the metricPoint with the lowest reference count start getting dropped.
    ?

It seems that Delta temporality is designed to unburden the client from keeping high-cardinality state?

@reyang
Copy link
Member

reyang commented Sep 22, 2023

For example, @vishweshbankwar pointed out a specific case where the SDK has exported a particular MetricPoint (k1,v1) at time T1. After a while, this MetricPoint gets reclaimed. Now if their application produces a totally new set of unique metric points and consistently updates them, they might run into a situation where there is no MetricPoint available when a thread later tries to update the MetricPoint (k1,v1) at time T2 and the measurement gets dropped. When consuming the metrics, their dashboard would show them that no measurements were recorded for the MetricPoint (k1,v1) at time T2, which is not correct as it was actually dropped.

I don't feel we should be too worried about this case. Folks using DELTA aggregation temporality should expect:

  1. If the SDK is configured to keep memory, the behavior will be the same as CUMULATIVE - new metric streams get dropped or collapsed after reaching the cardinality cap.
  2. If the SDK is configured to reclaim memory, the behavior will be essentially the same as if we're doing CUMULATIVE for each collection cycle, and the start time is set to the wall clock after the collection is done.
  3. In both cases, the user SHOULD ensure that cardinality is planned and properly configured, so they can get accurate data. Otherwise, they won't get accurate data - the SDK might still try its best effort (e.g. the total count might still be accurate) within the limitations.

I would be in favor of the user opting in for the reclaim behavior so that they are aware of what to expect instead of simply making it the default behavior for Delta aggregation temporality.

+1 on making this opt-in.

Copy link
Member

@alanwest alanwest left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

I spoke with @utpilla and I agree we can ship this PR as-is for 1.7.0-alpha release. Making this opt-in vs. opt-out will be a follow up discussion to resolve prior to 1.7.0 stable.

@vishweshbankwar I think between you and me, we've reviewed this PR most closely. If you ready to give it a 👍, I'll merge it.

@alanwest alanwest merged commit 2a480cd into open-telemetry:main Sep 25, 2023
48 of 49 checks passed
// We never increment the ReferenceCount for MetricPoint with no tags (index == 0) and the MetricPoint for overflow attribute,
// but we always decrement it (in the Update methods). This should be fine.
// ReferenceCount doesn't matter for MetricPoint with no tags and overflow attribute as they are never reclaimed.
internal int ReferenceCount;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alanwest @vishweshbankwar This is going to add to our memory consumption (4 bytes x MaxMetricPoints). Below there is another field AggregationType aggType which I'm guessing is also taking up 4 bytes. I don't think either of those two things really need the full 4 bytes, what if we did some bit manipulation on a single int field to process both needs with better memory utilization?

@okonaraddi-msft
Copy link

okonaraddi-msft commented May 31, 2024

For example, @vishweshbankwar pointed out a specific case where the SDK has exported a particular MetricPoint (k1,v1) at time T1. After a while, this MetricPoint gets reclaimed. Now if their application produces a totally new set of unique metric points and consistently updates them, they might run into a situation where there is no MetricPoint available when a thread later tries to update the MetricPoint (k1,v1) at time T2 and the measurement gets dropped. When consuming the metrics, their dashboard would show them that no measurements were recorded for the MetricPoint (k1,v1) at time T2, which is not correct as it was actually dropped.

Hi @utpilla @vishweshbankwar , I'd like to better understand the risk associated with enabling reclaim. Does the above mean that any reclaimed MetricPoint's key-value set will be dropped moving forward? So if (k1,v1) is reclaimed and then some time later a metric with (k1,v1) is emitted, that metric will be dropped instead of a new metric point being created?

EDIT: Sorry, I missed the "they might run into a situation where there is no MetricPoint available" earlier, now that case makes sense. If I understand correctly, the quoted case applies only when the max limit for metric points is reached and nothing can be reclaimed to make room for (k1,v1) again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants