-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove thread contention from Activity Start/Stop #107333
Remove thread contention from Activity Start/Stop #107333
Conversation
@dotnet-policy-service agree |
src/libraries/System.Diagnostics.DiagnosticSource/src/System/Diagnostics/ActivitySource.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Diagnostics.DiagnosticSource/src/System/Diagnostics/ActivitySource.cs
Outdated
Show resolved
Hide resolved
I'm personally not sure if interlocked complexity is worth over a dedicated mutation (Add/Remove) lock? Class will have 1 more field though. |
All those Array.Copy will give the GC some more work to do, I'm curious to know what's the net effect of this change on the affected benchmarks, do you have some numbers already? |
I would actually argue this is simpler than the previous approach which was using some very tricky looking interleaved locking that I couldn't trivially prove were correct. Also, the excessive locking was causing quite a bit of contention, i'll include details in the PR description |
No I mean you don't really need interlocked operations to update array because adding/removing listeners is expected to be rare and shouldn't be a contention cause. Otherwise even with interlocked there will be a contention, just not apparent/reported. The idea here is to remove locks/waits from actual listener iteration (which you did), but it doesn't matter for add/remove (because this behavioral change you made with removing version does not care about that anymore) |
Yes, this will be a performance regression for adding and removing listeners. But with a few caveats
|
I'm not sure I understand. How else besides locking or Interlocked could I guarantee that the Activity Start/Stop thread will not see a torn list of listeners if a listener is added/removed? And the use of locks was causing contention not with the threads that were mutating the collection, but with the various worker threads that were starting/stopping Activities which is on the main path |
Author: algorithmsarecool <algorithmsarecool@gmail.com>
Author: algorithmsarecool<algorithmsarecool@gmail.com>
- Reduce duplication - add comments and make code more obvious - Use IndexOf Author: algorithmsarecool <algorithmsarecool@gmail.com>
ac0212e
to
0f562d3
Compare
+1. Adding/Removing listeners is a very rare operation. |
@AlgorithmsAreCool If you'd like to, you can run the stress test tool built by OpenTelemetry for testing OTel itself! |
Will do! |
In my dev box (16 logical cores), the stress test for Traces is showing a throughput of ~2M/sec. Whereas the one for Logs show 50M/sec. A good chunk of this difference is likely the contention being fixed here. The stress test show 0 contention for Logs as Logs has no feature of "listeners"! Hopefully, with this change, Traces should see a huge jump in throughput! |
I have run the stress test on my local machine and I show a very large jump in performance (1.4M/s -> 15M/s) |
@dotnet/area-system-diagnostics-activity I'm not sure how to proceed, the build failures as far as i can tell do not seem to be related |
Adding @elinor-fung @jkoritzinsky as the failures seem to be COM/binder source gen related |
The com generator tests failure looks like a Helix infra failure from the logs. |
src/libraries/System.Diagnostics.DiagnosticSource/src/System/Diagnostics/ActivitySource.cs
Outdated
Show resolved
Hide resolved
CC @noahfalk |
src/libraries/System.Diagnostics.DiagnosticSource/src/System/Diagnostics/ActivitySource.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Diagnostics.DiagnosticSource/src/System/Diagnostics/ActivitySource.cs
Show resolved
Hide resolved
src/libraries/System.Diagnostics.DiagnosticSource/src/System/Diagnostics/ActivitySource.cs
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! I'll defer to @tarekgh for signing off.
src/libraries/System.Diagnostics.DiagnosticSource/src/System/Diagnostics/ActivitySource.cs
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modulo left minor comments, LGTM. Thanks!
I'm assuming someone else has to merge this? |
@tarekgh, based on what I see it is/will be targeted to .NET10. Do you have also plan to backport to .NET9 or it is to late? |
From previous discussions parties reported this were ok to have in next release 10. Our current plan is to have it in .NET 10 to reduce any risk for .NET 9.0. Is it blocking any real scenario for you? Please elaborate more why you are asking to have the fix in 9.0. |
For my team, it isn't necessarily blocking our use of tracing, but we operate at least 1 service that is very low latency that might notice this change. I'm curious how much of the 15% impact to topline throughput this change wins back. |
@AlgorithmsAreCool This behavior has been present for a few releases and is not a regression. While improving performance is beneficial, we need a strong justification to accept this fix for .NET 9.0, as we aim to minimize risks in the release, especially since this change hasn't been tested in preview builds. I understand this isn't a blocker for your scenario. However, if the low latency issue in your service is significantly impacted by this and you feel it's critical to address it soon, we can consider including the fix in .NET 9.0. |
* Remove thread contention from Activity Start/Stop Author: algorithmsarecool <algorithmsarecool@gmail.com> * Fix ref parameters and whitespace Author: algorithmsarecool<algorithmsarecool@gmail.com> * PR feedback. - Reduce duplication - add comments and make code more obvious - Use IndexOf Author: algorithmsarecool <algorithmsarecool@gmail.com> * PR feedback to simplify locking strategy * PR feedback, final nits
* Remove thread contention from Activity Start/Stop Author: algorithmsarecool <algorithmsarecool@gmail.com> * Fix ref parameters and whitespace Author: algorithmsarecool<algorithmsarecool@gmail.com> * PR feedback. - Reduce duplication - add comments and make code more obvious - Use IndexOf Author: algorithmsarecool <algorithmsarecool@gmail.com> * PR feedback to simplify locking strategy * PR feedback, final nits
/backport to 9.0-staging |
Started backporting to 9.0-staging: https://github.com/dotnet/runtime/actions/runs/11582206026 |
@tarekgh an error occurred while backporting to 9.0-staging, please check the run log for details! The process '/usr/bin/git' failed with exit code 1 |
/backport to release/9.0-staging |
Started backporting to release/9.0-staging: https://github.com/dotnet/runtime/actions/runs/11582621574 |
Fixes #107158
This change tries to resolve a performance issue related to thread contention when starting and stopping Activities. The new approach is a lock-free, copy-on-write array design which eliminates all thread contention and improves the throughput of Activity.Start/Stop (based on local testing).
I would like to add some basic tests to the
SynchronizedList<T>
class, but I'm unsure what the policy/procedure is for testing internal classesI would also like to make sure that this PR has the desired effect of improving request throughput in key benchmarks when telemetry is enabled.
The problem
The existing code was causing lock contention between request threads that were starting and stopping activities. The approach was to lock and unlock for each item in the collection. So if there were 10 listeners, the request thread would have to acquire and release 10 locks, each of which could contend with other request threads.
Proposed solution
Use "immutable" arrays of listeners that can be acquired by a request thread without locking. This should make the common/hot path (starting/stopping activities) faster when the system is under load. It should, however, also have a small negative impact on adding/removing listeners, which now becomes a O(N) operation instead of the O(~1) that it was before. However in practice this should have essentially no impact on application performance or allocations
Local Testing
My methodology was to start multiple threads (16 vCore machine) and to Start/Stop Activities in a hot loop (worst case).
Results Before Changes
Benchmark execution time
00:00:15.1899827
Results After Changes
Benchmark execution time
00:00:01.5148120
Microbenchmarks
Open Telemetry Stress test results
Before
After