System.Diagnostics.Activity Perf Improvement #40362

CodeBlanch · 2020-08-05T05:50:08Z

Added a struct enumerator for the LinkedList used internally by Activity. The idea is performance-sensitive callers can use that to avoid allocations when enumerating TagObjects, Events, & Links.

Performance numbers: #40362 (comment)

/cc @tarekgh @noahfalk @cijothomas

ghost · 2020-08-05T05:50:13Z

Tagging subscribers to this area: @tommcdon
See info in area-owners.md if you want to be subscribed.

CodeBlanch · 2020-08-05T06:01:04Z

Some other areas I noticed:

CreateAndStart method takes in IEnumerable Tags & Links, which it will enumerate. Those will allocate. If we changed the API to accept IList instead of IEnumerable we could do a more simple for loop which wouldn't allocate?
IEnumerable<KeyValuePair<string, string?>> Tags property has some special logic to only return the strings in the LinkedList. I couldn't figure out a way to solve that without an allocation. Probably not a big deal, if caller is worried about the perf they can use TagObjects and just exclude the non-string items?
IEnumerable<KeyValuePair<string, string?>> Baggage property is an interesting enumeration. It goes up the Parent chain enumerating over any items it finds along the way. Might be able to craft something for that. For now I left it alone because I don't think in OpenTelemetry we export Baggage meaning there isn't a foreach in the hot path.

tarekgh · 2020-08-05T16:51:32Z

@CodeBlanch thanks for submitting this. in general LGTM.
I want to call out one thing which is, this enumeration now will not be thread safe. I think this is fine but we should clarify that in the doc.

CreateAndStart method takes in IEnumerable Tags & Links, which it will enumerate. Those will allocate. If we changed the API to accept IList instead of IEnumerable we could do a more simple for loop which wouldn't allocate?

I don't think using IList will be a good idea. This will restrict the users of this API to some specific collections and will not be able to use the other collections (e.g. Dictionary<,>)

IEnumerable<KeyValuePair<string, string?>> Tags property has some special logic to only return the strings in the LinkedList. I couldn't figure out a way to solve that without an allocation. Probably not a big deal, if caller is worried about the perf they can use TagObjects and just exclude the non-string items?

I think this can still be done by having the Enumerator<>.MoveNext to do this logic. no?

IEnumerable<KeyValuePair<string, string?>> Baggage property is an interesting enumeration. It goes up the Parent chain enumerating over any items it finds along the way. Might be able to craft something for that. For now I left it alone because I don't think in OpenTelemetry we export Baggage meaning there isn't a foreach in the hot path.

agree, we can optimize this later if we need to.

tarekgh · 2020-08-05T16:52:53Z

@noahfalk I'll wait your review before I merge it.

CodeBlanch · 2020-08-05T17:23:21Z

I want to call out one thing which is, this enumeration now will not be thread safe. I think this is fine but we should clarify that in the doc.

Can you go into a little more detail on that for me? It looks essentially the same as before to me, so I'm just curious what I'm missing.

I don't think using IList will be a good idea. This will restrict the users of this API to some specific collections and will not be able to use the other collections (e.g. Dictionary<,>)

Good point. It would be nice to have a solve here, but I can't think of anything. Tags you can apply after creation, so you could workaround it if you were so inclined. But Links, you can only pass on ctor. If we added Add/Remove Link, we could workaround for that too.

IEnumerable<KeyValuePair<string, string?>> Tags property has some special logic to only return the strings in the LinkedList. I couldn't figure out a way to solve that without an allocation. Probably not a big deal, if caller is worried about the perf they can use TagObjects and just exclude the non-string items?

I think this can still be done by having the Enumerator<>.MoveNext to do this logic. no?

It seems like it should be possible, doesn't it? I tried a couple of ways, and just tried again, but I can't get it without an allocation. The method needs to return an IEnumerable which exposes the GetEnumerator. So if I do return new StringEnumerableTagThing(_tags) no problem, but there's an allocation. What I really want to do is add IEnumerable<KeyValuePair<string, string>> on TagsLinkedList but it already has IEnumerable<KeyValuePair<string, object?>> so they collide on the IEnumerable.GetEnumerator method.

tarekgh · 2020-08-05T17:32:53Z

Can you go into a little more detail on that for me? It looks essentially the same as before to me, so I'm just curious what I'm missing.

Sorry I was not clear. you are not changing the thread safety issue. I was just calling out the fact that the enumeration is not thread safe even before your change.

For Tags enumeration, it is ok to keep it like that for now even if it is allocating.

one last ask here, I am seeing the issue #40366 which need a little change. can you add this change with your here? at least to avoid the conflicts? thanks.

CodeBlanch · 2020-08-05T17:40:39Z

one last ask here, I am seeing the issue #40366 which need a little change. can you add this change with your here? at least to avoid the conflicts? thanks.

Done

tarekgh · 2020-08-05T20:45:17Z

@noahfalk I am merging this but let me know if you have anything you want to change so we can still do it.

@CodeBlanch thanks for your help with this issue.

noahfalk · 2020-08-06T03:26:47Z

Usually when we make a perf change we also have result from BenchmarkDotNet showing the improvement we got from the change. Can we do that here?

Also just to confirm the new code still allocates, but I think it allocates less than the previous did (old code allocated an IEnumerable and an IEnumerator, new code only allocates the IEnumerator). Does that sound accurate?

CodeBlanch · 2020-08-06T04:51:27Z

@noahfalk Agreed. For most callers the struct Enumerator will be boxed up/allocated because it is accessed through the IEnumerable/IEnumerator interface. The reason I asked for these is we have a ~~hack~~ helper in OTel which will lookup the struct GetEnumerator and build a DynamicMethod for it. So, if you are determined enough, these do actually allow you to get there.

Regarding the perf tests, won't really look as good without the helper. You still want me to do them? Or would you like to see before/after from OTel perspective maybe?

noahfalk · 2020-08-06T06:44:16Z

Regarding the perf tests, won't really look as good without the helper. You still want me to do them?

Yes please. I want to make sure that when people look at this change and it claims "we made perf better" then there is some data to substantiate the claim. It doesn't have to look amazing and you are welcome to include extra OTel specific cases that have even larger gains if you want to show that off : )

CodeBlanch · 2020-08-07T07:23:22Z

@noahfalk Working on the perf tests. It looks like the GetEnumerator I added is being removed. I'm guessing because nothing is referencing it in the code base and it is internal?

If I do this, it seems to work:

        [DynamicDependency(DynamicallyAccessedMemberTypes.PublicMethods, typeof(TagsLinkedList))]
        private TagsLinkedList? _tags;
        [DynamicDependency(DynamicallyAccessedMemberTypes.PublicMethods, typeof(LinkedList<ActivityLink>))]
        private LinkedList<ActivityLink>? _links;
        [DynamicDependency(DynamicallyAccessedMemberTypes.PublicMethods, typeof(LinkedList<ActivityEvent>))]
        private LinkedList<ActivityEvent>? _events;

Correct approach? OK to PR this change?

tarekgh · 2020-08-07T11:37:41Z

@eerhardt

@CodeBlanch has been optimizing the Activity.TagObject which return IEnumerable<...>. He provided GetEnumerator there. you can look at this PR. He notice GetEnumerator is removed from the code. Is this because of the linker? notice this is in the System.Diagnostics.DiagnosticSource library.

@CodeBlanch are you using the official build of this library, or you are building yours?

eerhardt · 2020-08-07T15:59:51Z

Is this because of the linker?

Yes. You have a non-public, unused method. This is exactly what the linker removes when it is run.

tarekgh · 2020-08-07T16:16:49Z

Thanks @eerhardt. what is the best way to force the linker to not remove them?

eerhardt · 2020-08-07T16:18:35Z

Using DynamicDependency as listed above will work. But honestly, the use of private reflection here is the issue IMO.

tarekgh · 2020-08-07T16:26:23Z

That is why I asked what is the best way to do that. Why we don't have specific attribute to tell the linker ignore this type/method/field instead of depending on the Reflection?

eerhardt · 2020-08-07T16:37:36Z

The use of private reflection is coming from open telemetry. Even @CodeBlanch calls it a hack above.

The reason I asked for these is we have a ~~hack~~ helper in OTel which will lookup the struct GetEnumerator and build a DynamicMethod for it.

To answer the question:

what is the best way to do that.

The "best" way to do that is to make the method you want exposed publicly - that way callers can take advantage of it without using private reflection.

In this specific situation, if I write the code:

Activity a = ...;
foreach (KeyValuePair<string, object?> tag in a.TagObjects)
{
}

I won't be able to take advantage of the perf improvement added here.

If you just want a way to tell the linker not to remove these methods, then the DynamicDependencyAttribute listed above will work.

tarekgh · 2020-08-07T16:59:54Z

Theoretically and practically GetEnumerator code is reachable from the public API.

TagObjects property is a public returning the _tags object which is internal type implementing IEnumerable<> interface and part of this implementation is GetEnumerator(). Isn't the linker is too aggressive here?

CodeBlanch · 2020-08-07T17:13:50Z

Let me make this change locally and put up the performance numbers. I think when you guys see them, you will understand why we are doing the hack. I'm open to doing a more proper fix, which would be to modify the public API to return concrete type(s) that exposes the struct GetEnumerator properly (eg: public TagLinkedList TagObjects { get; }), but I don't think there will be much appetite for such a change.

tarekgh · 2020-08-07T17:21:55Z

I am not seeing this a hack. this is legitimate thing to do. I don't think we can expose any new types for that in current time.

CodeBlanch · 2020-08-07T17:45:19Z

Below are the perf numbers. I'm going to open a PR to get DynamicDependency in here.

@noahfalk

As we expected, minor improvement for callers doing foreach. Massive improvement for OTel using the struct Enumerator helper. No change to EnumerateActivityTags.
Do you want me to PR my perf tests into dotnet/performance?

Without perf improvements (BEFORE):

Method	NumberOfActivities	Mean	Error	StdDev	Median	Min	Max	Gen 0	Gen 1	Gen 2	Allocated
EnumerateActivityTags	5000	313.6 us	9.05 us	10.43 us	312.3 us	300.9 us	334.0 us	33.1633	-	-	273.44 KB
EnumerateActivityTagObjects	5000	385.1 us	7.35 us	7.55 us	383.1 us	373.1 us	403.1 us	33.1230	-	-	273.44 KB
OTelHelperActivityTagObjects	5000	467.0 us	9.09 us	9.34 us	465.0 us	449.6 us	488.3 us	-	-	-	273.49 KB
EnumerateActivityLinks	5000	546.1 us	7.17 us	6.36 us	545.1 us	539.0 us	558.2 us	47.4138	-	-	390.63 KB
OTelHelperActivityLinks	5000	618.9 us	4.77 us	4.23 us	618.5 us	613.3 us	625.2 us	45.6731	-	-	390.7 KB
EnumerateActivityEvents	5000	442.1 us	4.00 us	3.55 us	442.3 us	437.2 us	448.8 us	41.6667	-	-	351.56 KB
OTelHelperActivityEvents	5000	500.5 us	8.67 us	8.11 us	500.4 us	488.6 us	515.2 us	41.6667	-	-	351.63 KB

With perf improvements (AFTER):

Method	NumberOfActivities	Mean	Error	StdDev	Median	Min	Max	Gen 0	Gen 1	Gen 2	Allocated
EnumerateActivityTags (Note: No change to this code path)	5000	334.5 us	16.09 us	18.53 us	328.2 us	314.3 us	376.7 us	32.6705	-	-	280000 B
EnumerateActivityTagObjects	5000	449.7 us	7.18 us	6.37 us	448.9 us	440.0 us	461.9 us	27.7778	-	-	240000 B
OTelHelperActivityTagObjects	5000	233.1 us	11.44 us	13.17 us	227.6 us	219.5 us	259.3 us	-	-	-	1 B
EnumerateActivityLinks	5000	625.3 us	17.45 us	19.39 us	616.3 us	605.3 us	664.4 us	40.8654	-	-	360000 B
OTelHelperActivityLinks	5000	258.5 us	4.89 us	4.80 us	257.0 us	252.7 us	268.6 us	-	-	-	-
EnumerateActivityEvents	5000	473.1 us	27.85 us	32.07 us	461.3 us	441.2 us	539.3 us	37.5000	-	-	320000 B
OTelHelperActivityEvents	5000	251.7 us	4.69 us	4.16 us	250.9 us	247.0 us	262.6 us	-	-	-	-

eerhardt · 2020-08-07T17:54:09Z

Theoretically and practically GetEnumerator code is reachable from the public API.

This is not true.

runtime/src/libraries/System.Diagnostics.DiagnosticSource/src/System/Diagnostics/Activity.cs

Line 1382 in a778b7e

    
           public Enumerator<KeyValuePair<string, object?>> GetEnumerator() => new Enumerator<KeyValuePair<string, object?>>(_first);

That method has no callers.

TagObjects property is a public returning the _tags object which is internal type implementing IEnumerable<> interface and part of this implementation is GetEnumerator(). Isn't the linker is too aggressive here?

TagsLinkedList implements IEnumerable<> expicitly. So the "public Enumerator GetEnumerator()" method doesn't implement that interface.

tarekgh · 2020-08-07T18:11:46Z

thanks @eerhardt for explaining it.

noahfalk · 2020-08-07T20:03:32Z

As we expected, minor improvement for callers doing foreach

My read of the numbers is showing that callers doing foreach got 6-16% worse, not better?
313.6 -> 334.5 (6.6% slower)
385.1 -> 449.7 (16.7% slower)
546.1 -> 625.3 (14.5% slower)
442.1 -> 473.1 (7% slower)

The OTel gains are great but I hope not to be in the position where we are making one set of users get worse performance so that only OTel gets better performance. Am I interpretting these numbers correctly, and if so do we understand why the change is making the foreach case slower?

Do you want me to PR my perf tests into dotnet/performance?

Yeah that would be helpful to ensure we can keep measuring this in the future and don't accidentally regress the performance.

CodeBlanch · 2020-08-07T20:18:35Z

@noahfalk Sorry I was talking specifically about the allocations, not speed. I don't know what's up with the differences in mean. I just ran a new set...

Before:

Method	NumberOfActivities	Mean	Error	StdDev	Median	Min	Max	Gen 0	Gen 1	Gen 2	Allocated
EnumerateActivityTags	5000	311.6 μs	13.76 μs	15.85 μs	311.8 μs	291.2 μs	344.9 μs	32.4519	-	-	280001 B
EnumerateActivityTagObjects	5000	413.6 μs	8.77 μs	9.75 μs	414.5 μs	398.8 μs	432.0 μs	31.9865	-	-	280003 B
OTelHelperActivityTagObjects	5000	491.5 μs	24.52 μs	28.23 μs	485.2 μs	460.4 μs	551.7 μs	31.8352	-	-	280057 B
EnumerateActivityLinks	5000	595.1 μs	18.87 μs	21.73 μs	597.7 μs	560.3 μs	637.7 μs	46.2963	-	-	400000 B
OTelHelperActivityLinks	5000	640.8 μs	19.84 μs	22.85 μs	639.6 μs	611.1 μs	686.9 μs	46.1957	-	-	400080 B
EnumerateActivityEvents	5000	452.7 μs	7.69 μs	6.42 μs	453.0 μs	440.6 μs	465.2 μs	42.2794	-	-	360000 B
OTelHelperActivityEvents	5000	556.5 μs	27.76 μs	29.71 μs	562.1 μs	491.9 μs	601.4 μs	42.9688	-	-	360072 B
EnumerateActivityLinkTags	5000	387.8 μs	12.29 μs	13.66 μs	385.4 μs	368.2 μs	417.2 μs	27.9605	-	-	240000 B
OTelHelperActivityLinkTags	5000	196.9 μs	4.99 μs	5.74 μs	195.6 μs	188.7 μs	209.1 μs	-	-	-	-

After:

Method	NumberOfActivities	Mean	Error	StdDev	Median	Min	Max	Gen 0	Gen 1	Gen 2	Allocated
EnumerateActivityTags	5000	315.9 μs	7.30 μs	8.41 μs	315.4 μs	299.6 μs	332.2 μs	33.0882	-	-	280001 B
EnumerateActivityTagObjects	5000	414.6 μs	4.45 μs	3.95 μs	414.8 μs	406.8 μs	420.3 μs	27.0270	-	-	240000 B
OTelHelperActivityTagObjects	5000	222.6 μs	16.22 μs	18.68 μs	223.2 μs	191.9 μs	255.6 μs	-	-	-	1 B
EnumerateActivityLinks	5000	663.1 μs	9.76 μs	9.13 μs	663.7 μs	647.7 μs	677.7 μs	40.7609	-	-	360000 B
OTelHelperActivityLinks	5000	273.8 μs	5.85 μs	6.51 μs	273.1 μs	264.3 μs	289.4 μs	-	-	-	-
EnumerateActivityEvents	5000	461.4 μs	9.92 μs	11.02 μs	462.0 μs	443.7 μs	484.4 μs	37.1094	-	-	320000 B
OTelHelperActivityEvents	5000	254.4 μs	5.46 μs	6.28 μs	254.6 μs	245.8 μs	270.3 μs	-	-	-	-
EnumerateActivityLinkTags	5000	385.2 μs	5.47 μs	4.57 μs	386.4 μs	373.4 μs	390.0 μs	27.4390	-	-	240000 B
OTelHelperActivityLinkTags	5000	195.5 μs	2.28 μs	1.90 μs	195.7 μs	191.9 μs	199.4 μs	-	-	-	-

Some of them are more or less identical (311.6 μs vs 315.9 μs), some are a bit off. I don't have an explanation for this currently. Could just be what my system is doing while the benchmarks run, could be a real issue? The change itself is essentially returning an explicit enumerator vs letting the compiler make one, any idea why there would be a measurable difference in that?

I'm not an expert in statistics, but if the variance falls within the error, they should be considered ~equal? 🤷

tarekgh · 2020-08-07T21:32:33Z

@CodeBlanch can you share your test code so I may try running on my machine?

CodeBlanch · 2020-08-07T21:46:23Z

@tarekgh Sure, pushed here: https://github.com/CodeBlanch/performance/tree/activity-perf

Benchmarks: https://github.com/CodeBlanch/performance/blob/activity-perf/src/benchmarks/micro/libraries/System.Diagnostics/Perf_Activity.cs

noahfalk · 2020-08-07T22:58:16Z

I'm not an expert in statistics, but if the variance falls within the error, they should be considered ~equal? 🤷

I think if we were being good statisticians a t-test does that, but its been admittedly a good while since my last stats class ; ) In practice even if there was a statistically significant regression we still might decide it is acceptably small that it isn't worth the time to pursue it or that the value of other perf improvements outweighs it.

any idea why there would be a measurable difference in that?

A good way to get more data is to use the EtwProfiler or EventPipeProfiler attributes on your benchmark. You can then analyze the traces in PerfView or share the traces with us. There is also the Disassembler which is sometimes useful to analyze and compare the code being generated.

Another area to be wary of are source of background noise for measurements such as CPUs that throttle performance in response to heat, background activities such as downloads, installations, virus scans, and other services or VMs. For everything that BenchmarkDotNet does trying to reduce noise, it still is only effective against noise that varies within the time interval it is measuring. For lower frequency noise like a background download that runs for 5 minutes it might encompass an entire BDN run. In an ideal world we'd be able to eliminate all those sources of noise, but often it is easier to control for it a bit by running the baseline and new version of the benchmark in alternating order several times in a row. In the easy case the numbers for each run are repeatable. You might also suddenly see the some runs are notably slower/faster in which case I usually assume the higher performing runs are the ones that had least interference from background load or hardware throttling and these are the ones that will be most fairly comparable.

Added struct enumerators for TagObjects, Events, & Links.

2e7929a

Dotnet-GitSync-Bot added the area-Diagnostics-coreclr label Aug 5, 2020

CodeBlanch mentioned this pull request Aug 5, 2020

Updating exporters to use TagObjects instead of Tags open-telemetry/opentelemetry-dotnet#1000

Merged

2 tasks

tarekgh approved these changes Aug 5, 2020

View reviewed changes

Fix dotnet#40366.

b177c79

tarekgh mentioned this pull request Aug 5, 2020

Invalid use of Unsafe.As in System.Diagnostics.DiagnosticSource #40366

Closed

tarekgh merged commit b62f482 into dotnet:master Aug 5, 2020

CodeBlanch deleted the activity-enumerators branch August 5, 2020 22:27

CodeBlanch mentioned this pull request Aug 7, 2020

System.Diagnostics.Activity Perf Improvement Part 2 #40544

Merged

Jacksondr5 pushed a commit to Jacksondr5/runtime that referenced this pull request Aug 10, 2020

System.Diagnostics.Activity Perf Improvement (dotnet#40362)

5285030

karelz added this to the 5.0.0 milestone Aug 18, 2020

ghost locked as resolved and limited conversation to collaborators Dec 7, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

System.Diagnostics.Activity Perf Improvement #40362

System.Diagnostics.Activity Perf Improvement #40362

CodeBlanch commented Aug 5, 2020 •

edited

Loading

ghost commented Aug 5, 2020

CodeBlanch commented Aug 5, 2020

tarekgh commented Aug 5, 2020

tarekgh commented Aug 5, 2020

CodeBlanch commented Aug 5, 2020

tarekgh commented Aug 5, 2020

CodeBlanch commented Aug 5, 2020

tarekgh commented Aug 5, 2020

noahfalk commented Aug 6, 2020

CodeBlanch commented Aug 6, 2020

noahfalk commented Aug 6, 2020

CodeBlanch commented Aug 7, 2020

tarekgh commented Aug 7, 2020

eerhardt commented Aug 7, 2020

tarekgh commented Aug 7, 2020

eerhardt commented Aug 7, 2020

tarekgh commented Aug 7, 2020

eerhardt commented Aug 7, 2020

tarekgh commented Aug 7, 2020

CodeBlanch commented Aug 7, 2020

tarekgh commented Aug 7, 2020

CodeBlanch commented Aug 7, 2020

eerhardt commented Aug 7, 2020

tarekgh commented Aug 7, 2020

noahfalk commented Aug 7, 2020

CodeBlanch commented Aug 7, 2020

tarekgh commented Aug 7, 2020

CodeBlanch commented Aug 7, 2020

noahfalk commented Aug 7, 2020

System.Diagnostics.Activity Perf Improvement #40362

System.Diagnostics.Activity Perf Improvement #40362

Conversation

CodeBlanch commented Aug 5, 2020 • edited Loading

ghost commented Aug 5, 2020

CodeBlanch commented Aug 5, 2020

tarekgh commented Aug 5, 2020

tarekgh commented Aug 5, 2020

CodeBlanch commented Aug 5, 2020

tarekgh commented Aug 5, 2020

CodeBlanch commented Aug 5, 2020

tarekgh commented Aug 5, 2020

noahfalk commented Aug 6, 2020

CodeBlanch commented Aug 6, 2020

noahfalk commented Aug 6, 2020

CodeBlanch commented Aug 7, 2020

tarekgh commented Aug 7, 2020

eerhardt commented Aug 7, 2020

tarekgh commented Aug 7, 2020

eerhardt commented Aug 7, 2020

tarekgh commented Aug 7, 2020

eerhardt commented Aug 7, 2020

tarekgh commented Aug 7, 2020

CodeBlanch commented Aug 7, 2020

tarekgh commented Aug 7, 2020

CodeBlanch commented Aug 7, 2020

eerhardt commented Aug 7, 2020

tarekgh commented Aug 7, 2020

noahfalk commented Aug 7, 2020

CodeBlanch commented Aug 7, 2020

tarekgh commented Aug 7, 2020

CodeBlanch commented Aug 7, 2020

noahfalk commented Aug 7, 2020

CodeBlanch commented Aug 5, 2020 •

edited

Loading