Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[API Proposal]: System.Diagnostics.ActivityLink/ActivityEvent: Enumeration API #68056

Closed
Tracked by #62027
CodeBlanch opened this issue Apr 14, 2022 · 14 comments · Fixed by #68933
Closed
Tracked by #62027

[API Proposal]: System.Diagnostics.ActivityLink/ActivityEvent: Enumeration API #68056

CodeBlanch opened this issue Apr 14, 2022 · 14 comments · Fixed by #68933
Labels
api-approved API was approved in API review, it can be implemented area-System.Diagnostics.Activity blocking Marks issues that we want to fast track in order to unblock other important work
Milestone

Comments

@CodeBlanch
Copy link
Contributor

CodeBlanch commented Apr 14, 2022

Background and motivation

Follow-up to #67207

ActivityEvent and ActivityLink both expose public IEnumerable<KeyValuePair<string, object?>>? Tags { get; } for accessing the tags associated with the event or link.

Enumerating using this API is expensive relative to the Enumerate* API on Activity itself:

Method Mean Error StdDev Gen 0 Allocated
EnumerateTagsIEnumerable 69.287 ns 0.2583 ns 0.2416 ns 0.0038 48 B

Today the OpenTelemetry SDK uses reflection to bind to the underlying struct enumerator. It doesn't really need to do that, because it could instead perform a cast (eg foreach (KeyValuePair<string, object> tag in (ActivityTagsCollection)activityEvents.Tags) but the underlying struct enumerator is also very slow:

Method Mean Error StdDev Gen 0 Allocated
EnumerateTags 19.686 ns 0.0369 ns 0.0345 ns - -

The reason for the slowdown is due to how the List<KeyValuePair<string, object?>>.Enumerator is being wrapped. Lots of copies of KeyValuePair<string, object?> being made in the loop and a copy of the underlying enumerator in ctor. List<T>.Enumerator also does versioning, which isn't really needed during the OpenTelemetry export phase.

For comparison, enumerating over an array...

        private struct Enumerator
        {
            private readonly KeyValuePair<string, object>[] source;
            private int index;

            public Enumerator(KeyValuePair<string, object>[] source)
            {
                this.source = source;
                this.index = -1;
            }

            public readonly KeyValuePair<string, object> Current => this.source[this.index];

            public readonly Enumerator GetEnumerator() => this;

            public bool MoveNext()
            {
                return ++this.index < this.source.Length;
            }
        }
Method Mean Error StdDev Gen 0 Allocated
EnumerateTagsStruct 3.324 ns 0.0134 ns 0.0119 ns - -

The goal of this API proposal is to:

  • Eliminate the need to reflect or cast.
  • Increase the performance.

API Proposal

Proposing to use a similar API as #67207:

namespace System.Diagnostics
{
    partial struct ActivityLink
    {
        public Activity.Enumerator<KeyValuePair<string, object?>> EnumerateTagObjects();
    }

    partial struct ActivityEvent
    {
        public Activity.Enumerator<KeyValuePair<string, object?>> EnumerateTagObjects();
    }
}

API Usage

foreach (ref readonly ActivityLink activityLink in activity.EnumerateLinks())
{
   ExportData(in activityLink);
   foreach (ref readonly KeyValuePair<string, object?> tag in activityLink.EnumerateTagObjects())
   {
       ExportData(in activityLink, in tag);
   }
}

Alternative Designs

If we improve the enumerator inside ActivityTagsCollection it probably isn't a big deal to live with the cast. KeyValuePair<string, object?> is not a large struct so returning by readonly ref isn't doing much to improve the perf. However during API review of #67207 we moved away from forcing the use of a cast so I thought the proposed API would be preferrable.

Risks

None

/cc @cijothomas @reyang @tarekgh @noahfalk

@CodeBlanch CodeBlanch added the api-suggestion Early API idea and discussion, it is NOT ready for implementation label Apr 14, 2022
@dotnet-issue-labeler dotnet-issue-labeler bot added the untriaged New issue has not been triaged by the area owner label Apr 14, 2022
@dotnet-issue-labeler
Copy link

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

@Wraith2
Copy link
Contributor

Wraith2 commented Apr 14, 2022

IIRC there was some discussion of a way to make this cheaper in the api review on monday, it might be worth looking up the video on youtube to see if it affects this proposal.

@tarekgh
Copy link
Member

tarekgh commented Apr 14, 2022

@CodeBlanch before we proceed with this either the APIs or try to optimize the internal implementation, I would like to see if there is a big concern with the overall performance for this part considering the whole exportation operation and not just this part of enumeration. If we see this is significant, then we can consider it. Do we have such data today to tell?

For the proposal, I would prefer optimizing the implementation instead of producing those APIs. If we really need the APIs, I will have the new APIs return Activity.Enumerator instead and work out the implementation.

I'll mark this issue for the future till we decide about it.

CC @noahfalk @reyang

@ghost
Copy link

ghost commented Apr 14, 2022

Tagging subscribers to this area: @dotnet/area-system-diagnostics-activity
See info in area-owners.md if you want to be subscribed.

Issue Details

Background and motivation

Follow-up to #67207

ActivityEvent and ActivityLink both expose public IEnumerable<KeyValuePair<string, object?>>? Tags { get; } for accessing the tags associated with the event or link.

Enumerating using this API is expensive relative to the Enumerate* API on Activity itself:

Method Mean Error StdDev Gen 0 Allocated
EnumerateTagsIEnumerable 69.287 ns 0.2583 ns 0.2416 ns 0.0038 48 B

Today the OpenTelemetry SDK uses reflection to bind to the underlying struct enumerator. It doesn't really need to do that, because it could instead perform a cast (eg foreach (KeyValuePair<string, object> tag in (ActivityTagsCollection)activityEvents.Tags) but the underlying struct enumerator is also very slow:

Method Mean Error StdDev Gen 0 Allocated
EnumerateTags 19.686 ns 0.0369 ns 0.0345 ns - -

The reason for the slowdown is due to how the List<KeyValuePair<string, object?>>.Enumerator is being wrapped. Lots of copies of KeyValuePair<string, object?> being made in the loop and a copy of the underlying enumerator in ctor. List<T>.Enumerator also does versioning, which isn't really needed during the OpenTelemetry export phase.

For comparison, enumerating over an array...

        private struct Enumerator
        {
            private readonly KeyValuePair<string, object>[] source;
            private int index;

            public Enumerator(KeyValuePair<string, object>[] source)
            {
                this.source = source;
                this.index = -1;
            }

            public readonly KeyValuePair<string, object> Current => this.source[this.index];

            public readonly Enumerator GetEnumerator() => this;

            public bool MoveNext()
            {
                return ++this.index < this.source.Length;
            }
        }
Method Mean Error StdDev Gen 0 Allocated
EnumerateTagsStruct 3.324 ns 0.0134 ns 0.0119 ns - -

The goal of this API proposal is to:

  • Eliminate the need to reflect or cast.
  • Increase the performance.

API Proposal

Proposing to use a similar API as #67207:

namespace System.Diagnostics
{
    partial struct ActivityLink
    {
        public Enumerator<KeyValuePair<string, object?>> EnumerateTags();
    
        public struct Enumerator<T>
        {
            private Enumerator(...)  { ... }
            public readonly Enumerator<T> GetEnumerator() => this;
            public readonly ref T Current => ...
            public bool MoveNext() { ... }
        }
    }

    partial struct ActivityEvent
    {
        public Enumerator<KeyValuePair<string, object?>> EnumerateTags();
    
        public struct Enumerator<T>
        {
            private Enumerator(...)  { ... }
            public readonly Enumerator<T> GetEnumerator() => this;
            public readonly ref T Current => ...
            public bool MoveNext() { ... }
        }
    }
}

API Usage

foreach (ref readonly ActivityLink activityLink in activity.EnumerateLinks())
{
   ExportData(in activityLink);
   foreach (ref readonly KeyValuePair<string, object?> tag in activityLink.EnumerateTags())
   {
       ExportData(in activityLink, in tag);
   }
}

Alternative Designs

If we improve the enumerator inside ActivityTagsCollection it probably isn't a big deal to live with the cast. KeyValuePair<string, object?> is not a large struct so returning by readonly ref isn't doing much to improve the perf. However during API review of #67207 we moved away from forcing the use of a cast so I thought the proposed API would be preferrable.

Risks

None

/cc @cijothomas @reyang @tarekgh @noahfalk

Author: CodeBlanch
Assignees: -
Labels:

api-suggestion, untriaged, area-System.Diagnostics.Activity

Milestone: -

@tarekgh tarekgh removed the untriaged New issue has not been triaged by the area owner label Apr 14, 2022
@tarekgh tarekgh added this to the Future milestone Apr 14, 2022
@CodeBlanch
Copy link
Contributor Author

@tarekgh No problem, I expected nothing less 😄

I made these changes so I could test the perf: https://github.com/dotnet/runtime/compare/main...CodeBlanch:activity-detail-enumeration?expand=1

Using stock IEnumerable<KeyValuePair<string, object>> ActivityEvent.Tags:

Method NumberOfEvents Mean Error StdDev Gen 0 Allocated
JaegerExporter_Batching 1 3.444 us 0.0090 us 0.0084 us 0.0038 48 B
JaegerExporter_Batching 5 8.414 us 0.0214 us 0.0189 us 0.0153 240 B

Using ActivityEvent.Tags cast as ActivityTagsCollection:

Method NumberOfEvents Mean Error StdDev Allocated
JaegerExporter_Batching 1 3.435 us 0.0086 us 0.0081 us -
JaegerExporter_Batching 5 8.120 us 0.0231 us 0.0216 us -

Using the reflection engine:

Method NumberOfEvents Mean Error StdDev Allocated
JaegerExporter_Batching 1 3.456 us 0.0110 us 0.0097 us -
JaegerExporter_Batching 5 8.123 us 0.0192 us 0.0171 us -

Using the proposed API + changes in that branch:

Method NumberOfEvents Mean Error StdDev Allocated
JaegerExporter_Batching 1 3.394 us 0.0094 us 0.0088 us -
JaegerExporter_Batching 5 8.063 us 0.0248 us 0.0220 us -

I have either 1 event w/ 6 tags or 5 events w/ 6 tags each. Is this a real world scenario? SDK adds an event for exceptions so 1 event is ~common. Some users attach ILogger logs to current Activity as events. For those users they will have way more events with tags on them for structured log data (state/scopes).

@tarekgh
Copy link
Member

tarekgh commented Apr 15, 2022

Thanks @CodeBlanch. Looking at the numbers, speed wise I am not seeing significant effect. The memory allocation even when using IEnumerable<KeyValuePair<string, object>> ActivityEvent.Tags is not that significant either considering it is short living allocations. Honestly, I am seeing this is not worth exposing APIs for. If still concerned, we can consider changing the internal implementation as needed but I would say let's wait to see if this is really causing any real issue to OpenTelemetry.

@CodeBlanch
Copy link
Contributor Author

@tarekgh A couple more benchmarks for you. May or may not convince you this is worthwhile but I figured it was worth a shot!

So the above benchmarks are all for a single Activity/Span. That would be the case of reentrant exporter writing to something like ETW or perhaps a function which only does one thing and then spins down.

That is one scenario but most users are probably doing a batch export. Activity/Span instances are held in memory until either a max limit is reached or some time threshold is reached.

Here are how the numbers look for a batch size of 2048. Users may set their own size and time limits.

Using stock IEnumerable<KeyValuePair<string, object>> ActivityEvent.Tags:

Method NumberOfEvents Mean Error StdDev Gen 0 Allocated
JaegerExporter_Batching 1 7.116 ms 0.0195 ms 0.0183 ms 7.8125 96 KB
JaegerExporter_Batching 5 17.501 ms 0.0581 ms 0.0515 ms 31.2500 480 KB

Using the proposed API + changes in that branch:

Method NumberOfEvents Mean Error StdDev Allocated
JaegerExporter_Batching 1 6.935 ms 0.0182 ms 0.0170 ms 14 B
JaegerExporter_Batching 5 16.506 ms 0.0571 ms 0.0506 ms 57 B

The enumeration time and memory adds up 😄 A batch of sufficient size could probably survive into gen 1? Not sure about that.

@reyang
Copy link

reyang commented Apr 16, 2022

The enumeration time and memory adds up 😄 A batch of sufficient size could probably survive into gen 1? Not sure about that.

+1 Typically telemetry data is sent in batches to the backend, and the exporters will take a batch of Activities and process them in a tight loop. For large batch (e.g. some services sending large pile of summary data every 15 minutes) this could be significant.

@tarekgh
Copy link
Member

tarekgh commented Apr 17, 2022

A batch of sufficient size could probably survive into gen 1?

I doubt this can happen. All enumeration allocation is small and truly short lived. Such allocations will either be collected or temporarily survive the GC collection if currently in use but not going to migrated to gen 1.

+1 Typically telemetry data is sent in batches to the backend, and the exporters will take a batch of Activities and process them in a tight loop. For large batch (e.g. some services sending large pile of summary data every 15 minutes) this could be significant.

Running operations for 15 minutes can do all kind of allocations and most likely GC collection will kick off anyway during such time. What I am trying to get to what is the percentage of the Events/Links enumeration allocation of the total allocations during the batch operation. This number can help tell if it is worth doing something for it or not.

@CodeBlanch
Copy link
Contributor Author

What I am trying to get to what is the percentage of the Events/Links enumeration allocation of the total allocations during the batch operation.

100% of the allocations shown above (96 KB & 480 KB) are from foreach over ActivityEvent.Tags during the export call.

The second benchmark using the proposed ActivityEvent.EnumerateTagObjects() API is showing 14 bytes & 57 bytes. Should actually be 0, not sure why it is allocating anything at all. Some BenchmarkDotNet noise methinks.

@tarekgh
Copy link
Member

tarekgh commented Apr 17, 2022

100% of the allocations shown above (96 KB & 480 KB) are from foreach over ActivityEvent.Tags during the export call.

Are we saying the whole batch operation is allocating 14 and 57 bytes if we exclude the Event enumeration? I was expecting much more allocation should be happening there.

Do we have any other places in the SDK that reflect on the internal types? or these are the last cases? I am asking to know if we need to collect all cases in one proposal. I was expecting to have this proposal as part of the last approved proposal of the Activity tags/events/links enumeration.

@CodeBlanch
Copy link
Contributor Author

Are we saying the whole batch operation is allocating 14 and 57 bytes if we exclude the Event enumeration? I was expecting much more allocation should be happening there.

Yes! Kind of amazing 😄 It is zero-allocation other than the enumerator(s).

Do we have any other places in the SDK that reflect on the internal types? or these are the last cases? I am asking to know if we need to collect all cases in one proposal. I was expecting to have this proposal as part of the last approved proposal of the Activity tags/events/links enumeration.

Just double-checked, these are the last spots.

@tarekgh
Copy link
Member

tarekgh commented Apr 18, 2022

The proposal looks good to me now after using Activity.Enumerator. I am marking this issue as ready for review.

@tarekgh tarekgh modified the milestones: Future, 7.0.0 Apr 18, 2022
@tarekgh tarekgh added blocking Marks issues that we want to fast track in order to unblock other important work api-ready-for-review API is ready for review, it is NOT ready for implementation and removed api-suggestion Early API idea and discussion, it is NOT ready for implementation labels Apr 18, 2022
@terrajobst
Copy link
Member

terrajobst commented May 5, 2022

Video

Looks good as proposed

namespace System.Diagnostics;

partial struct ActivityLink
{
    public Activity.Enumerator<KeyValuePair<string, object?>> EnumerateTagObjects();
}

partial struct ActivityEvent
{
    public Activity.Enumerator<KeyValuePair<string, object?>> EnumerateTagObjects();
}

@terrajobst terrajobst added api-approved API was approved in API review, it can be implemented and removed api-ready-for-review API is ready for review, it is NOT ready for implementation labels May 5, 2022
@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label May 5, 2022
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label May 6, 2022
@ghost ghost locked as resolved and limited conversation to collaborators Jun 5, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-approved API was approved in API review, it can be implemented area-System.Diagnostics.Activity blocking Marks issues that we want to fast track in order to unblock other important work
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants