-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
System.Diagnostics.Activity Perf Improvement #40362
Conversation
Tagging subscribers to this area: @tommcdon |
Some other areas I noticed:
|
@CodeBlanch thanks for submitting this. in general LGTM.
I don't think using IList will be a good idea. This will restrict the users of this API to some specific collections and will not be able to use the other collections (e.g. Dictionary<,>)
I think this can still be done by having the Enumerator<>.MoveNext to do this logic. no?
agree, we can optimize this later if we need to. |
@noahfalk I'll wait your review before I merge it. |
Can you go into a little more detail on that for me? It looks essentially the same as before to me, so I'm just curious what I'm missing.
Good point. It would be nice to have a solve here, but I can't think of anything. Tags you can apply after creation, so you could workaround it if you were so inclined. But Links, you can only pass on ctor. If we added Add/Remove Link, we could workaround for that too.
It seems like it should be possible, doesn't it? I tried a couple of ways, and just tried again, but I can't get it without an allocation. The method needs to return an IEnumerable which exposes the GetEnumerator. So if I do return |
Sorry I was not clear. you are not changing the thread safety issue. I was just calling out the fact that the enumeration is not thread safe even before your change. For Tags enumeration, it is ok to keep it like that for now even if it is allocating. one last ask here, I am seeing the issue #40366 which need a little change. can you add this change with your here? at least to avoid the conflicts? thanks. |
Done |
@noahfalk I am merging this but let me know if you have anything you want to change so we can still do it. @CodeBlanch thanks for your help with this issue. |
Usually when we make a perf change we also have result from BenchmarkDotNet showing the improvement we got from the change. Can we do that here? Also just to confirm the new code still allocates, but I think it allocates less than the previous did (old code allocated an IEnumerable and an IEnumerator, new code only allocates the IEnumerator). Does that sound accurate? |
@noahfalk Agreed. For most callers the Regarding the perf tests, won't really look as good without the helper. You still want me to do them? Or would you like to see before/after from OTel perspective maybe? |
Yes please. I want to make sure that when people look at this change and it claims "we made perf better" then there is some data to substantiate the claim. It doesn't have to look amazing and you are welcome to include extra OTel specific cases that have even larger gains if you want to show that off : ) |
@noahfalk Working on the perf tests. It looks like the GetEnumerator I added is being removed. I'm guessing because nothing is referencing it in the code base and it is internal? If I do this, it seems to work: [DynamicDependency(DynamicallyAccessedMemberTypes.PublicMethods, typeof(TagsLinkedList))]
private TagsLinkedList? _tags;
[DynamicDependency(DynamicallyAccessedMemberTypes.PublicMethods, typeof(LinkedList<ActivityLink>))]
private LinkedList<ActivityLink>? _links;
[DynamicDependency(DynamicallyAccessedMemberTypes.PublicMethods, typeof(LinkedList<ActivityEvent>))]
private LinkedList<ActivityEvent>? _events; Correct approach? OK to PR this change? |
@CodeBlanch has been optimizing the @CodeBlanch are you using the official build of this library, or you are building yours? |
Yes. You have a non-public, unused method. This is exactly what the linker removes when it is run. |
Thanks @eerhardt. what is the best way to force the linker to not remove them? |
Using |
That is why I asked what is the best way to do that. Why we don't have specific attribute to tell the linker ignore this type/method/field instead of depending on the Reflection? |
The use of private reflection is coming from open telemetry. Even @CodeBlanch calls it a hack above.
To answer the question:
The "best" way to do that is to make the method you want exposed publicly - that way callers can take advantage of it without using private reflection. In this specific situation, if I write the code: Activity a = ...;
foreach (KeyValuePair<string, object?> tag in a.TagObjects)
{
} I won't be able to take advantage of the perf improvement added here. If you just want a way to tell the linker not to remove these methods, then the |
Theoretically and practically
|
Let me make this change locally and put up the performance numbers. I think when you guys see them, you will understand why we are doing the hack. I'm open to doing a more proper fix, which would be to modify the public API to return concrete type(s) that exposes the |
I am not seeing this a hack. this is legitimate thing to do. I don't think we can expose any new types for that in current time. |
Below are the perf numbers. I'm going to open a PR to get
Without perf improvements (BEFORE):
With perf improvements (AFTER):
|
This is not true. runtime/src/libraries/System.Diagnostics.DiagnosticSource/src/System/Diagnostics/Activity.cs Line 1382 in a778b7e
That method has no callers.
|
thanks @eerhardt for explaining it. |
My read of the numbers is showing that callers doing foreach got 6-16% worse, not better? The OTel gains are great but I hope not to be in the position where we are making one set of users get worse performance so that only OTel gets better performance. Am I interpretting these numbers correctly, and if so do we understand why the change is making the foreach case slower?
Yeah that would be helpful to ensure we can keep measuring this in the future and don't accidentally regress the performance. |
@noahfalk Sorry I was talking specifically about the allocations, not speed. I don't know what's up with the differences in mean. I just ran a new set... Before:
After:
Some of them are more or less identical (311.6 μs vs 315.9 μs), some are a bit off. I don't have an explanation for this currently. Could just be what my system is doing while the benchmarks run, could be a real issue? The change itself is essentially returning an explicit enumerator vs letting the compiler make one, any idea why there would be a measurable difference in that? I'm not an expert in statistics, but if the variance falls within the error, they should be considered ~equal? 🤷 |
@CodeBlanch can you share your test code so I may try running on my machine? |
I think if we were being good statisticians a t-test does that, but its been admittedly a good while since my last stats class ; ) In practice even if there was a statistically significant regression we still might decide it is acceptably small that it isn't worth the time to pursue it or that the value of other perf improvements outweighs it.
A good way to get more data is to use the EtwProfiler or EventPipeProfiler attributes on your benchmark. You can then analyze the traces in PerfView or share the traces with us. There is also the Disassembler which is sometimes useful to analyze and compare the code being generated. Another area to be wary of are source of background noise for measurements such as CPUs that throttle performance in response to heat, background activities such as downloads, installations, virus scans, and other services or VMs. For everything that BenchmarkDotNet does trying to reduce noise, it still is only effective against noise that varies within the time interval it is measuring. For lower frequency noise like a background download that runs for 5 minutes it might encompass an entire BDN run. In an ideal world we'd be able to eliminate all those sources of noise, but often it is easier to control for it a bit by running the baseline and new version of the benchmark in alternating order several times in a row. In the easy case the numbers for each run are repeatable. You might also suddenly see the some runs are notably slower/faster in which case I usually assume the higher performing runs are the ones that had least interference from background load or hardware throttling and these are the ones that will be most fairly comparable. |
Added a struct enumerator for the LinkedList used internally by Activity. The idea is performance-sensitive callers can use that to avoid allocations when enumerating TagObjects, Events, & Links.
Performance numbers: #40362 (comment)
/cc @tarekgh @noahfalk @cijothomas