Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

System.Diagnostics.Activity Perf Improvement Part 2 #40544

Merged
merged 7 commits into from
Aug 13, 2020

Conversation

CodeBlanch
Copy link
Contributor

@CodeBlanch CodeBlanch commented Aug 7, 2020

Updated the public code to use the internal struct GetEnumerator method to prevent the linker from removing it.

See #40362

/cc @tarekgh @noahfalk @cijothomas @eerhardt

@ghost
Copy link

ghost commented Aug 7, 2020

Tagging subscribers to this area: @tommcdon
See info in area-owners.md if you want to be subscribed.

@@ -255,9 +265,6 @@ public string DisplayName
/// </summary>
public IEnumerable<KeyValuePair<string, object?>> TagObjects
{
#if ALLOW_PARTIALLY_TRUSTED_CALLERS
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tarekgh I removed the SecuritySafeCritical. Assuming this is no longer needed because we removed the Unsafe.As on the last PR?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make sense. If you run the tests locally should catch any issue with the security attributes when running against NetFX.

@@ -76,10 +76,20 @@ public partial class Activity : IDisposable

private byte _w3CIdFlags;

/*
* Note:
* DynamicDependency is used here to prevent the linker from removing the struct GetEnumerator on the internal types.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking more at this code, you could also change the Enumerator implementations like the following:

            public Enumerator<T> GetEnumerator() => new Enumerator<T>(_first);
            IEnumerator<T> IEnumerable<T>.GetEnumerator() => GetEnumerator();
            IEnumerator IEnumerable.GetEnumerator() => GetEnumerator();

And the public Enumerator<T> GetEnumerator() method would be preserved, since now it is being called. This would be more preferrable, since if we were linking an application, preserving ALL public methods on these types is more than necessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call. I updated the code. Should have just done that initially, would have saved myself a lot of time 🤣 Learned a bit about the linker though so not totally wasted 👍

Copy link
Member

@eerhardt eerhardt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may make sense to add a test to ensure the public Enumerator<T> GetEnumerator() method is preserved, so OpenTelemetry can invoke it through reflection.

@@ -1273,8 +1270,8 @@ public void Add(T value)
}

public Enumerator<T> GetEnumerator() => new Enumerator<T>(_first);
IEnumerator<T> IEnumerable<T>.GetEnumerator() => new Enumerator<T>(_first);
IEnumerator IEnumerable.GetEnumerator() => new Enumerator<T>(_first);
IEnumerator<T> IEnumerable<T>.GetEnumerator() => GetEnumerator();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor nit - you may want to add a comment here and below describing the situation here - "in order to preserve the non-allocating GetEnumerator() so OpenTelemetry can call it..."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added comments & tests.

@tarekgh
Copy link
Member

tarekgh commented Aug 7, 2020

@CodeBlanch could you please re-run the perf tests again with the change? Thanks.

Copy link
Member

@eerhardt eerhardt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Thanks for following up on this.

@CodeBlanch
Copy link
Contributor Author

@tarekgh Here's the perf on the latest changes. I added a benchmark for ActivityLinkTags which is testing the struct Enumerator we added previously on ActivityTagsCollection.

Method NumberOfActivities Mean Error StdDev Median Min Max Gen 0 Gen 1 Gen 2 Allocated
EnumerateActivityTags 5000 290.1 us 5.32 us 4.44 us 288.4 us 285.5 us 300.8 us 33.0882 - - 280000 B
EnumerateActivityTagObjects 5000 409.5 us 1.57 us 1.39 us 409.1 us 407.6 us 412.9 us 27.9605 - - 240000 B
OTelHelperActivityTagObjects 5000 205.6 us 2.77 us 2.59 us 205.6 us 201.7 us 211.0 us - - - -
EnumerateActivityLinks 5000 586.8 us 7.89 us 6.58 us 586.5 us 574.1 us 601.9 us 42.4107 - - 360000 B
OTelHelperActivityLinks 5000 272.0 us 5.19 us 4.60 us 270.7 us 266.3 us 282.1 us - - - -
EnumerateActivityEvents 5000 487.6 us 33.32 us 38.37 us 473.9 us 445.8 us 566.8 us 37.5000 - - 320000 B
OTelHelperActivityEvents 5000 259.1 us 4.05 us 3.79 us 258.1 us 254.5 us 266.0 us - - - -
EnumerateActivityLinkTags 5000 357.9 us 6.64 us 6.21 us 355.8 us 351.1 us 372.4 us 28.4091 - - 240000 B
OTelHelperActivityLinkTags 5000 195.6 us 6.68 us 7.69 us 191.5 us 189.1 us 213.7 us - - - -


bool foundGetEnumerator = false;
foreach (MethodInfo method in enumerable.GetType().GetMethods(BindingFlags.Public | BindingFlags.Instance | BindingFlags.NonPublic))
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't need foreach here. yu can request the method name directly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

@CodeBlanch
Copy link
Contributor Author

Also, I would add all needed information to the comment instead of having issue link here. nobody knows in the future this link will become invalid.

@tarekgh I just removed the link. The comment seemed to have enough info already?

@noahfalk
Copy link
Member

noahfalk commented Aug 7, 2020

@CodeBlanch - thanks for doing this! Can you also add some baseline numbers for comparison (or point out which numbers above were baseline if I missed it?). My goal is to demonstrate that the code change improved performance because new numbers are better than baseline numbers.

[EDIT]: nevermind, I see the baseline in the other issue

@tarekgh
Copy link
Member

tarekgh commented Aug 8, 2020

Here is the numbers when running same exact tests on my machine:

Baseline

Method NumberOfActivities Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
EnumerateActivityTags 5000 518.6 us 10.17 us 17.82 us 66.4063 - - 280000 B
EnumerateActivityTagObjects 5000 654.0 us 12.67 us 14.59 us 66.4063 - - 280000 B
OTelHelperActivityTagObjects 5000 809.4 us 15.75 us 21.03 us 66.4063 - - 280056 B
EnumerateActivityLinks 5000 1,009.3 us 19.92 us 27.26 us 93.7500 - - 400000 B
OTelHelperActivityLinks 5000 1,073.3 us 20.82 us 34.79 us 93.7500 - - 400080 B
EnumerateActivityEvents 5000 819.3 us 16.29 us 24.38 us 85.9375 - - 360000 B
OTelHelperActivityEvents 5000 904.5 us 18.05 us 31.62 us 85.9375 - - 360072 B
EnumerateActivityLinkTags 5000 653.0 us 12.47 us 13.86 us 56.6406 - - 240000 B
OTelHelperActivityLinkTags 5000 352.8 us 6.79 us 8.58 us - - - -

Updated

Method NumberOfActivities Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
EnumerateActivityTags 5000 547.8 us 10.81 us 13.67 us 66.4063 - - 280000 B
EnumerateActivityTagObjects 5000 719.8 us 12.64 us 11.82 us 56.6406 - - 240000 B
OTelHelperActivityTagObjects 5000 347.8 us 6.93 us 11.19 us - - - -
EnumerateActivityLinks 5000 1,119.1 us 11.22 us 10.50 us 85.9375 - - 360000 B
OTelHelperActivityLinks 5000 500.4 us 6.90 us 6.46 us - - - -
EnumerateActivityEvents 5000 875.6 us 17.16 us 26.20 us 76.1719 - - 320000 B
OTelHelperActivityEvents 5000 410.3 us 7.28 us 6.81 us - - - -
EnumerateActivityLinkTags 5000 680.4 us 13.51 us 28.49 us 56.6406 - - 240000 B
OTelHelperActivityLinkTags 5000 355.1 us 7.08 us 13.98 us - - - -

Gain & Loss

positive is gain and negative is loss in the perf.

Method CPU Allocations
EnumerateActivityTags -5% 0%
EnumerateActivityTagObjects -9% 14%
OTelHelperActivityTagObjects 133% ?
EnumerateActivityLinks -10% 11%
OTelHelperActivityLinks 115% ?
EnumerateActivityEvents -6% 12%
OTelHelperActivityEvents 120% ?
EnumerateActivityLinkTags -4% 0%
OTelHelperActivityLinkTags -0.7% ?

Looks we are gaining in general in the memory allocation but seeing some regression in the speed. mainly the tags scenario:

| EnumerateActivityTagObjects | -9% | 14% |

It is a good memory allocation gain 14% but looks we are losing almost 10% of the speed. we need to weigh here the cost benefit of this change to decide if we can scrify this 10% speed for the 14% memory? or we may look to see if we can enhance more the enumeration speed somehow?

@CodeBlanch
Copy link
Contributor Author

I was looking at the decompiled code before & after. Before we didn't support Reset of the enumerator, so I dropped that. Micro-micro optimization there, but saves on a field and a set. I ran perf for that, but I think it's just getting lost in the noise at that point.

In @tarekgh's numbers above, check out these:

Baseline:

Method NumberOfActivities Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
EnumerateActivityTags 5000 518.6 us 10.17 us 17.82 us 66.4063 - - 280000 B
EnumerateActivityLinkTags 5000 653.0 us 12.47 us 13.86 us 56.6406 - - 240000 B
OTelHelperActivityLinkTags 5000 352.8 us 6.79 us 8.58 us - - - -

Updated

Method NumberOfActivities Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
EnumerateActivityTags 5000 547.8 us 10.81 us 13.67 us 66.4063 - - 280000 B
EnumerateActivityLinkTags 5000 680.4 us 13.51 us 28.49 us 56.6406 - - 240000 B
OTelHelperActivityLinkTags 5000 355.1 us 7.08 us 13.98 us - - - -

What's interesting about those numbers is, there wasn't a change in those code paths! EnumerateActivityTags before & after both call EnumerateStringValues. EnumerateActivityLinkTags & OTelHelperActivityLinkTags are using the ActivityTagsCollection as it was originally merged in (it has a struct Enumerator already). No change, yet they show as slower?

I'm down to continuing to work on this, but it feels like we're running into noise we can't avoid? @tarekgh @noahfalk Let me know what you want me to do!

Some extra thoughts...

  • The way it's working right now is essentially like List<T> and the other standard/mainstay collections/dictionaries. Should we deviate from that?

  • The tests that show slower CPU, but less memory. In theory, that's less time in GC. So lost CPU time executing, CPU time saved in GC. Comes out a wash?

  • Before it had class Enumerator, now it has struct Enumerator. So it makes sense to me it would be slightly slower for callers not using the helper. Instead of allocating on the heap, it now builds on the stack and boxes to the heap. We could have a class version for the IEnumerator calls, which would skip the boxing...

       public Enumerator<KeyValuePair<string, object?>> GetEnumerator() => new StructEnumerator<KeyValuePair<string, object?>>(_first);
       IEnumerator<KeyValuePair<string, object?>> IEnumerable<KeyValuePair<string, object?>>.GetEnumerator() => new ClassEnumerator<KeyValuePair<string, object?>>(_first);
       IEnumerator IEnumerable.GetEnumerator() => new ClassEnumerator<KeyValuePair<string, object?>>(_first);

    That would essentially make it like it was before. Would need to re-introduce DynamicDependency to prevent the linker from removing the struct, since it would again be considered dead code. This deviates from List<T> et. all, but might be worth considering? Slightly lower CPU time, slightly greater binary size.

@tarekgh
Copy link
Member

tarekgh commented Aug 10, 2020

@CodeBlanch I'll try to re-run it again with your last change. I doubt there is a noises here but this will show up with my next run. Could be the difference between my results and yours is the baseline version. I am using 5.0.100-preview.8.20406.14 as the baseline in my case.

@tarekgh
Copy link
Member

tarekgh commented Aug 10, 2020

I played more with measuring, and I am seeing the noises that @CodeBlanch talked about. So, I have written a test case for EnumerateActivityTagObjects which uses static created Activity object with tags. running my modified test, I am seeing the memory allocations gain is 40%. And I am seeing the regressions in the speed is range from 4% to 8%. I ran it many times and I am getting consistent results. I think this figure may be accurate.

I have a couple of suggestion if we can apply:

  • For the EnumerateActivityTags scenario, we are not getting any memory allocation gain but we are regressing by around 10%. I suggest for now we revert the enumeration of this scenario to old enumeration code to gain back the speed and we'll not be regressing any memory allocations in same time.
  • For EnumerateActivityTagObjects scenario, I think maybe what we have is reasonable enough considering the big memory allocations gain but I'll let @noahfalk comment on that. If we want to look into the details of this scenario to understand where we lost this portion of the speed, we can use the EtwProfiler.

@CodeBlanch
Copy link
Contributor Author

Thanks for the help on this @tarekgh!

  • For the EnumerateActivityTags scenario, we are not getting any memory allocation gain but we are regressing by around 10%. I suggest for now we revert the enumeration of this scenario to old enumeration code to gain back the speed and we'll not be regressing any memory allocations in same time.

This is what's weird with the benchmarks. I didn't change anything for the EnumerateActivityTags code path. No explanation whatsoever for why it is 10% slower. It does this...

        public IEnumerable<KeyValuePair<string, string?>> Tags
        {
            get => _tags?.EnumerateStringValues() ?? s_emptyBaggageTags;
        }

            public IEnumerable<KeyValuePair<string, string?>> EnumerateStringValues()
            {
                LinkedListNode<KeyValuePair<string, object?>>? current = _first;

                while (current != null)
                {
                    if (current.Value.Value is string || current.Value.Value == null)
                    {
                        yield return new KeyValuePair<string, string?>(current.Value.Key, (string?)current.Value.Value);
                    }

                    current = current.Next;
                };
            }

Doesn't use the new struct Enumerator at all.

@tarekgh
Copy link
Member

tarekgh commented Aug 11, 2020

@CodeBlanch you are right, I have looked at the implementation of Activity.Tags in the baseline and your changes and they identical. I even compared the IL code. maybe this could be a JIT change somehow? we may compare without your changes at all and look at the numbers. If you don't have time to try that, I can try it tomorrow. just build a private without your changes and run the perf test on it.

@tarekgh
Copy link
Member

tarekgh commented Aug 11, 2020

@CodeBlanch

I did more experiment but building the runtime without your change (as the baseline) and then built it with your change and now the numbers make more sense:

Baseline

Method NumberOfActivities Mean Error StdDev Median Gen 0 Gen 1 Gen 2 Allocated
EnumTagObjectsModified 5000 140.1 ns 2.82 ns 6.35 ns 139.0 ns 0.0114 - - 48 B
EnumerateActivityTags 5000 366,563.9 ns 7,279.27 ns 11,754.67 ns 363,993.9 ns 66.8945 - - 280000 B
EnumerateActivityTagObjects 5000 488,715.0 ns 9,670.10 ns 23,538.35 ns 481,678.6 ns 56.6406 - - 240000 B
OTelHelperActivityTagObjects 5000 661,686.9 ns 27,655.32 ns 79,348.29 ns 628,248.4 ns 56.6406 - - 240000 B
EnumerateActivityLinks 5000 734,792.4 ns 14,645.61 ns 36,472.65 ns 730,422.8 ns 85.9375 - - 360000 B
OTelHelperActivityLinks 5000 811,958.4 ns 16,101.89 ns 39,498.20 ns 804,629.6 ns 85.9375 - - 360000 B
EnumerateActivityEvents 5000 636,692.5 ns 12,550.44 ns 28,836.73 ns 636,810.4 ns 76.1719 - - 320000 B
OTelHelperActivityEvents 5000 759,502.5 ns 19,452.18 ns 54,865.23 ns 753,648.6 ns 76.1719 - - 320000 B
EnumerateActivityLinkTags 5000 471,584.6 ns 9,358.02 ns 18,252.10 ns 469,536.3 ns 57.1289 - - 240000 B
OTelHelperActivityLinkTags 5000 241,946.4 ns 6,225.74 ns 17,661.37 ns 237,177.0 ns - - - -

your change

Method NumberOfActivities Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
EnumTagObjectsModified 5000 135.3 ns 2.74 ns 4.58 ns 0.0095 - - 40 B
EnumerateActivityTags 5000 368,999.7 ns 8,402.93 ns 24,109.59 ns 66.8945 - - 280000 B
EnumerateActivityTagObjects 5000 552,394.7 ns 12,034.74 ns 34,335.78 ns 46.8750 - - 200000 B
OTelHelperActivityTagObjects 5000 257,119.6 ns 4,974.11 ns 7,290.99 ns - - - -
EnumerateActivityLinks 5000 827,523.3 ns 16,451.55 ns 30,494.00 ns 76.1719 - - 320000 B
OTelHelperActivityLinks 5000 343,683.9 ns 9,706.69 ns 28,314.87 ns - - - -
EnumerateActivityEvents 5000 612,119.9 ns 12,075.81 ns 18,800.57 ns 66.4063 - - 280000 B
OTelHelperActivityEvents 5000 306,375.2 ns 5,765.50 ns 5,920.74 ns - - - -
EnumerateActivityLinkTags 5000 468,951.4 ns 9,364.89 ns 20,751.96 ns 56.6406 - - 240000 B
OTelHelperActivityLinkTags 5000 283,948.1 ns 5,674.92 ns 14,850.28 ns - - - -

I think there is no real CPU perf regressions in EnumerateActivityTags and we have a good allocation gain. I believe this change is good to have. I know we still have some CPU regression in EnumerateActivityTagObjects thoug.

@noahfalk will be around tomorrow if he has any comment before we merge this

@CodeBlanch
Copy link
Contributor Author

@tarekgh Awesome! Thanks for the assist. I'll open a PR for the perf tests a bit later. Did you make any changes to the actual benchmarks I should include on that?

@tarekgh
Copy link
Member

tarekgh commented Aug 12, 2020

Did you make any changes to the actual benchmarks I should include on that?

I just added EnumTagObjectsModified to use static created Activity instead of recreating the Activity every time as you are doing in EnumerateActivityTagObjects. looking at the benchmark numbers when using static activity, we are actually faster than the baseline. But with your EnumerateActivityTagObjects which allocating everytime, we are slower by 13%. I created EnumTagObjectsModified only to measure the enumeration without any other operations (e.g. Activity create and start, Listener callbacks, AddTag...etc.)

        [Benchmark]
        public void EnumTagObjectsModified()
        {
            foreach (var t in s_activity.TagObjects)
            {
              bool b = t.Value.GetType() == typeof(string); // I know this is not the best :-(
            }
        }

        private static Activity s_activity = InitActivity();
        
        private static Activity InitActivity()
        {
          ActivitySource aSource = new ActivitySource("TestActivitySource-mine");
            ActivityListener al= new ActivityListener
            {
                ShouldListenTo = s => s.Name == "TestActivitySource-mine",

                GetRequestedDataUsingContext = (ref ActivityCreationOptions<ActivityContext> o) => ActivityDataRequest.AllDataAndRecorded
            };
            ActivitySource.AddActivityListener(al);
            
            Activity a = aSource.StartActivity(
                "TestActivity",
                ActivityKind.Internal,
                parentContext: default,
                tags: new Dictionary<string, object>
                {
                    ["tag1"] = "string1",
                    ["tag2"] = 1,
                    ["tag3"] = "string2",
                    ["tag4"] = false,
                    ["tag5"] = 2.4,
                    ["tag6"] = new Object(),
                });
            
            return a;
        }

Copy link
Member

@noahfalk noahfalk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@noahfalk will be around tomorrow if he has any comment before we merge this

Thanks for the diligence, I'm happy if you guys are. If we are getting requests to improve perf we can continue tweaking in 6.0. Fwiw when benchmarking at this level controlling for noise is hard, for example code alignment

@CodeBlanch CodeBlanch deleted the activity-enumerators-fix branch August 15, 2020 23:52
@karelz karelz added this to the 5.0.0 milestone Aug 18, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 7, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants