Remove thread contention from Activity Start/Stop #107333

AlgorithmsAreCool · 2024-09-04T02:34:23Z

This change tries to resolve a performance issue related to thread contention when starting and stopping Activities. The new approach is a lock-free, copy-on-write array design which eliminates all thread contention and improves the throughput of Activity.Start/Stop (based on local testing).

I would like to add some basic tests to the SynchronizedList<T> class, but I'm unsure what the policy/procedure is for testing internal classes

I would also like to make sure that this PR has the desired effect of improving request throughput in key benchmarks when telemetry is enabled.

The problem

The existing code was causing lock contention between request threads that were starting and stopping activities. The approach was to lock and unlock for each item in the collection. So if there were 10 listeners, the request thread would have to acquire and release 10 locks, each of which could contend with other request threads.

Proposed solution

Use "immutable" arrays of listeners that can be acquired by a request thread without locking. This should make the common/hot path (starting/stopping activities) faster when the system is under load. It should, however, also have a small negative impact on adding/removing listeners, which now becomes a O(N) operation instead of the O(~1) that it was before. However in practice this should have essentially no impact on application performance or allocations

Local Testing

My methodology was to start multiple threads (16 vCore machine) and to Start/Stop Activities in a hot loop (worst case).

10 worker/request threads
50,000 start/dispose activities per thread
10 ActivityListeners subscribed up front and not mutated
10 runs of benchmark

Results Before Changes

Benchmark execution time 00:00:15.1899827

Results After Changes

Benchmark execution time 00:00:01.5148120

Microbenchmarks

With Prometheus metrics exporter
- AddRuntimeInstrumentation
1 synthetic/null tracing/actvity processor
1 synthetic activity listener (to verify invocation counts)
50,000 activity start/stops per thread

Thread Count	Methodology	Max Contentions	Min Contentions	Average Contentions	Average Execution Time (s)
1	Baseline	0	0	0	0.04198
2	Baseline	1	0	0.125	0.05053
4	Baseline	9	18	3.406	0.081563
8	Baseline	2168	1472	1743.125	0.300675
16	Baseline	4436	3622	4005.375	0.6288
1	New	0	0	0	0.039686
2	New	0	0	0	0.041909
4	New	0	0	0	0.047901
8	New	0	0	0	0.063951
16	New	0	0	0	0.073664

Open Telemetry Stress test results

Before

opentelemetry-dotnet\test\OpenTelemetry.Tests.Stress.Traces on  main via .NET v8.0.100 🎯 $(TargetFrameworksForTests) took 1m17s
❯ dotnet run --framework net8.0 --configuration Release -- -d 30
Options: {"Concurrency":16,"PrometheusInternalMetricsPort":9464,"DurationSeconds":30}
Run OpenTelemetry.Tests.Stress.Traces.exe --help to see available options.
Running (concurrency = 16, internalPrometheusEndpoint = http://localhost:9464/metrics/), press <Esc> to stop, press <Spacebar> to toggle statistics in the console...
Loops: 41,878,519, Loops/Second: 1,490,815, CPU Cycles/Loop: 10,920, RemainingTime (Seconds): 0
Stopping the stress test...
* Total Running Time (Seconds) 30
* Total Loops: 42,160,488
* Average Loops/Second: 1,404,928
* Average CPU Cycles/Loop: 11,512
* GC Total Allocated Bytes: 20948591184

After

opentelemetry-dotnet\test\OpenTelemetry.Tests.Stress.Traces on  main via .NET v8.0.100 🎯 $(TargetFrameworksForTests)
❯ dotnet run --framework net8.0 --configuration Release -- -d 30
Options: {"Concurrency":16,"PrometheusInternalMetricsPort":9464,"DurationSeconds":30}
Run OpenTelemetry.Tests.Stress.Traces.exe --help to see available options.
Running (concurrency = 16, internalPrometheusEndpoint = http://localhost:9464/metrics/), press <Esc> to stop, press <Spacebar> to toggle statistics in the console...
Loops: 451,278,899, Loops/Second: 13,309,562, CPU Cycles/Loop: 1,687, RemainingTime (Seconds): 0
Stopping the stress test...
* Total Running Time (Seconds) 30
* Total Loops: 454,080,963
* Average Loops/Second: 15,048,749
* Average CPU Cycles/Loop: 1,665
* GC Total Allocated Bytes: 225228374680

AlgorithmsAreCool · 2024-09-04T02:35:52Z

@dotnet-policy-service agree

src/libraries/System.Diagnostics.DiagnosticSource/src/System/Diagnostics/ActivitySource.cs

En3Tho · 2024-09-04T04:45:26Z

I'm personally not sure if interlocked complexity is worth over a dedicated mutation (Add/Remove) lock? Class will have 1 more field though.

ilmax · 2024-09-04T08:36:33Z

All those Array.Copy will give the GC some more work to do, I'm curious to know what's the net effect of this change on the affected benchmarks, do you have some numbers already?

AlgorithmsAreCool · 2024-09-04T15:06:47Z

I'm personally not sure if interlocked complexity is worth over a dedicated mutation (Add/Remove) lock? Class will have 1 more field though.

I would actually argue this is simpler than the previous approach which was using some very tricky looking interleaved locking that I couldn't trivially prove were correct. Also, the excessive locking was causing quite a bit of contention, i'll include details in the PR description

En3Tho · 2024-09-04T15:10:16Z

No I mean you don't really need interlocked operations to update array because adding/removing listeners is expected to be rare and shouldn't be a contention cause. Otherwise even with interlocked there will be a contention, just not apparent/reported.

The idea here is to remove locks/waits from actual listener iteration (which you did), but it doesn't matter for add/remove (because this behavioral change you made with removing version does not care about that anymore)

AlgorithmsAreCool · 2024-09-04T15:10:59Z

All those Array.Copy will give the GC some more work to do, I'm curious to know what's the net effect of this change on the affected benchmarks, do you have some numbers already?

Yes, this will be a performance regression for adding and removing listeners. But with a few caveats

Adding/Removing listeners should be relatively infrequent. These allocations should only begin to matter if
1. Somone is adding / removing listeners more than a few dozen times per second (which sounds crazy)
2. Someone has more than a few dozen listeners and is still churning the list continuously
The hot path is broadcasting Activity notifications to listeners, which should be faster in this approach

AlgorithmsAreCool · 2024-09-04T15:14:52Z

No I mean you don't really need interlocked operations to update array because adding/removing listeners is expected to be rare and shouldn't be a contention cause. Otherwise even with interlocked there will be a contention, just not apparent/reported.

The idea here is to remove locks/waits from actual listener iteration (which you did), but it doesn't matter for add/remove

I'm not sure I understand. How else besides locking or Interlocked could I guarantee that the Activity Start/Stop thread will not see a torn list of listeners if a listener is added/removed? And the use of locks was causing contention not with the threads that were mutating the collection, but with the various worker threads that were starting/stopping Activities which is on the main path

Author: algorithmsarecool <algorithmsarecool@gmail.com>

Author: algorithmsarecool<algorithmsarecool@gmail.com>

- Reduce duplication - add comments and make code more obvious - Use IndexOf Author: algorithmsarecool <algorithmsarecool@gmail.com>

cijothomas · 2024-09-04T16:16:31Z

All those Array.Copy will give the GC some more work to do, I'm curious to know what's the net effect of this change on the affected benchmarks, do you have some numbers already?

Yes, this will be a performance regression for adding and removing listeners. But with a few caveats

Adding/Removing listeners should be relatively infrequent. These allocations should only begin to matter if

Somone is adding / removing listeners more than a few dozen times per second (which sounds crazy)

Someone has more than a few dozen listeners and is still churning the list continuously

The hot path is broadcasting Activity notifications to listeners, which should be faster in this approach

+1. Adding/Removing listeners is a very rare operation.

cijothomas · 2024-09-04T16:23:05Z

@AlgorithmsAreCool If you'd like to, you can run the stress test tool built by OpenTelemetry for testing OTel itself!
https://github.com/open-telemetry/opentelemetry-dotnet/tree/main/test/OpenTelemetry.Tests.Stress.Traces - The stress test shows the throughput, and it should show an improvement, when applying changes from this PR.

AlgorithmsAreCool · 2024-09-04T16:23:35Z

Will do!

cijothomas · 2024-09-04T16:28:43Z

@AlgorithmsAreCool If you'd like to, you can run the stress test tool built by OpenTelemetry for testing OTel itself! https://github.com/open-telemetry/opentelemetry-dotnet/tree/main/test/OpenTelemetry.Tests.Stress.Traces - The stress test shows the throughput, and it should show an improvement, when applying changes from this PR.

In my dev box (16 logical cores), the stress test for Traces is showing a throughput of ~2M/sec. Whereas the one for Logs show 50M/sec. A good chunk of this difference is likely the contention being fixed here. The stress test show 0 contention for Logs as Logs has no feature of "listeners"! Hopefully, with this change, Traces should see a huge jump in throughput!

AlgorithmsAreCool · 2024-09-04T16:43:07Z

@AlgorithmsAreCool If you'd like to, you can run the stress test tool built by OpenTelemetry for testing OTel itself! https://github.com/open-telemetry/opentelemetry-dotnet/tree/main/test/OpenTelemetry.Tests.Stress.Traces - The stress test shows the throughput, and it should show an improvement, when applying changes from this PR.

In my dev box (16 logical cores), the stress test for Traces is showing a throughput of ~2M/sec. Whereas the one for Logs show 50M/sec. A good chunk of this difference is likely the contention being fixed here. The stress test show 0 contention for Logs as Logs has no feature of "listeners"! Hopefully, with this change, Traces should see a huge jump in throughput!

I have run the stress test on my local machine and I show a very large jump in performance (1.4M/s -> 15M/s)

AlgorithmsAreCool · 2024-09-05T14:36:16Z

@dotnet/area-system-diagnostics-activity I'm not sure how to proceed, the build failures as far as i can tell do not seem to be related

tommcdon · 2024-09-05T14:45:02Z

@dotnet/area-system-diagnostics-activity I'm not sure how to proceed, the build failures as far as i can tell do not seem to be related

Adding @elinor-fung @jkoritzinsky as the failures seem to be COM/binder source gen related

jkoritzinsky · 2024-09-05T15:05:31Z

The com generator tests failure looks like a Helix infra failure from the logs.

src/libraries/System.Diagnostics.DiagnosticSource/src/System/Diagnostics/ActivitySource.cs

tarekgh · 2024-09-05T19:14:00Z

CC @noahfalk

src/libraries/System.Diagnostics.DiagnosticSource/src/System/Diagnostics/ActivitySource.cs

noahfalk

LGTM! I'll defer to @tarekgh for signing off.

src/libraries/System.Diagnostics.DiagnosticSource/src/System/Diagnostics/ActivitySource.cs

tarekgh

Modulo left minor comments, LGTM. Thanks!

AlgorithmsAreCool · 2024-09-06T05:16:03Z

I'm assuming someone else has to merge this?

Kielek · 2024-09-06T05:36:05Z

@tarekgh, based on what I see it is/will be targeted to .NET10. Do you have also plan to backport to .NET9 or it is to late?

tarekgh · 2024-09-06T16:01:37Z

@Kielek

based on what I see it is/will be targeted to .NET10. Do you have also plan to backport to .NET9 or it is to late?

From previous discussions parties reported this were ok to have in next release 10. Our current plan is to have it in .NET 10 to reduce any risk for .NET 9.0. Is it blocking any real scenario for you? Please elaborate more why you are asking to have the fix in 9.0.

AlgorithmsAreCool · 2024-09-06T17:50:29Z

@Kielek

based on what I see it is/will be targeted to .NET10. Do you have also plan to backport to .NET9 or it is to late?

From previous discussions parties reported this were ok to have in next release 10. Our current plan is to have it in .NET 10 to reduce any risk for .NET 9.0. Is it blocking any real scenario for you? Please elaborate more why you are asking to have the fix in 9.0.

For my team, it isn't necessarily blocking our use of tracing, but we operate at least 1 service that is very low latency that might notice this change.

I'm curious how much of the 15% impact to topline throughput this change wins back.

tarekgh · 2024-09-06T22:39:00Z

@AlgorithmsAreCool This behavior has been present for a few releases and is not a regression. While improving performance is beneficial, we need a strong justification to accept this fix for .NET 9.0, as we aim to minimize risks in the release, especially since this change hasn't been tested in preview builds. I understand this isn't a blocker for your scenario. However, if the low latency issue in your service is significantly impacted by this and you feel it's critical to address it soon, we can consider including the fix in .NET 9.0.

* Remove thread contention from Activity Start/Stop Author: algorithmsarecool <algorithmsarecool@gmail.com> * Fix ref parameters and whitespace Author: algorithmsarecool<algorithmsarecool@gmail.com> * PR feedback. - Reduce duplication - add comments and make code more obvious - Use IndexOf Author: algorithmsarecool <algorithmsarecool@gmail.com> * PR feedback to simplify locking strategy * PR feedback, final nits

tarekgh · 2024-10-29T20:47:22Z

/backport to 9.0-staging

github-actions · 2024-10-29T20:47:33Z

Started backporting to 9.0-staging: https://github.com/dotnet/runtime/actions/runs/11582206026

github-actions · 2024-10-29T20:49:08Z

@tarekgh an error occurred while backporting to 9.0-staging, please check the run log for details!

The process '/usr/bin/git' failed with exit code 1

tarekgh · 2024-10-29T21:18:07Z

/backport to release/9.0-staging

github-actions · 2024-10-29T21:18:20Z

Started backporting to release/9.0-staging: https://github.com/dotnet/runtime/actions/runs/11582621574

dotnet-issue-labeler bot added the area-System.Diagnostics.Activity label Sep 4, 2024

dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Sep 4, 2024

jkotas reviewed Sep 4, 2024

View reviewed changes

src/libraries/System.Diagnostics.DiagnosticSource/src/System/Diagnostics/ActivitySource.cs Outdated Show resolved Hide resolved

jkotas reviewed Sep 4, 2024

View reviewed changes

src/libraries/System.Diagnostics.DiagnosticSource/src/System/Diagnostics/ActivitySource.cs Outdated Show resolved Hide resolved

build-analysis bot mentioned this pull request Sep 4, 2024

System.Runtime.Serialization.Formatters CI failure. #107309

Closed

AlgorithmsAreCool added 3 commits September 4, 2024 10:32

Remove thread contention from Activity Start/Stop

0fe5f9b

Author: algorithmsarecool <algorithmsarecool@gmail.com>

Fix ref parameters and whitespace

3cde632

Author: algorithmsarecool<algorithmsarecool@gmail.com>

PR feedback.

0f562d3

- Reduce duplication - add comments and make code more obvious - Use IndexOf Author: algorithmsarecool <algorithmsarecool@gmail.com>

AlgorithmsAreCool force-pushed the algorithmsarecool/reduce-activity-contention branch from ac0212e to 0f562d3 Compare September 4, 2024 15:36

AlgorithmsAreCool marked this pull request as ready for review September 4, 2024 15:37

cijothomas mentioned this pull request Sep 4, 2024

OpenTelemetry .NET Tracing AspNetCore instrumentation is inefficient open-telemetry/opentelemetry-dotnet-contrib#1745

Closed

tarekgh added this to the 10.0.0 milestone Sep 5, 2024

tarekgh reviewed Sep 5, 2024

View reviewed changes

src/libraries/System.Diagnostics.DiagnosticSource/src/System/Diagnostics/ActivitySource.cs Outdated Show resolved Hide resolved

PR feedback to simplify locking strategy

4af797f

tarekgh reviewed Sep 6, 2024

View reviewed changes

src/libraries/System.Diagnostics.DiagnosticSource/src/System/Diagnostics/ActivitySource.cs Outdated Show resolved Hide resolved

jkotas reviewed Sep 6, 2024

View reviewed changes

src/libraries/System.Diagnostics.DiagnosticSource/src/System/Diagnostics/ActivitySource.cs Show resolved Hide resolved

tarekgh reviewed Sep 6, 2024

View reviewed changes

src/libraries/System.Diagnostics.DiagnosticSource/src/System/Diagnostics/ActivitySource.cs Outdated Show resolved Hide resolved

noahfalk reviewed Sep 6, 2024

View reviewed changes

src/libraries/System.Diagnostics.DiagnosticSource/src/System/Diagnostics/ActivitySource.cs Outdated Show resolved Hide resolved

tarekgh approved these changes Sep 6, 2024

View reviewed changes

AlgorithmsAreCool added 2 commits September 5, 2024 20:04

PR feedback, final nits

bbd4f9f

Merge branch 'main' into algorithmsarecool/reduce-activity-contention

cb8428b

tarekgh merged commit ad430a1 into dotnet:main Sep 6, 2024
83 of 85 checks passed

AlgorithmsAreCool deleted the algorithmsarecool/reduce-activity-contention branch September 6, 2024 17:50

build-analysis bot mentioned this pull request Sep 6, 2024

Microsoft.Extensions.SourceGeneration.Configuration.Binder.Tests.ConfigurationBindingGeneratorTests fails on CI #107381

Closed

github-actions bot locked and limited conversation to collaborators Oct 7, 2024

djluck mentioned this pull request Oct 20, 2024

Consider releasing #107333 into dotnet 9 #109060

Closed

github-actions bot unlocked this conversation Oct 29, 2024

github-actions bot locked as resolved and limited conversation to collaborators Oct 29, 2024

github-actions bot unlocked this conversation Oct 29, 2024

github-actions bot locked as resolved and limited conversation to collaborators Oct 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove thread contention from Activity Start/Stop #107333

Remove thread contention from Activity Start/Stop #107333

AlgorithmsAreCool commented Sep 4, 2024 •

edited

Loading

AlgorithmsAreCool commented Sep 4, 2024

En3Tho commented Sep 4, 2024

ilmax commented Sep 4, 2024

AlgorithmsAreCool commented Sep 4, 2024

En3Tho commented Sep 4, 2024 •

edited

Loading

AlgorithmsAreCool commented Sep 4, 2024

AlgorithmsAreCool commented Sep 4, 2024

cijothomas commented Sep 4, 2024

cijothomas commented Sep 4, 2024

AlgorithmsAreCool commented Sep 4, 2024

cijothomas commented Sep 4, 2024 •

edited

Loading

AlgorithmsAreCool commented Sep 4, 2024

AlgorithmsAreCool commented Sep 5, 2024

tommcdon commented Sep 5, 2024

jkoritzinsky commented Sep 5, 2024

tarekgh commented Sep 5, 2024

noahfalk left a comment

tarekgh left a comment

AlgorithmsAreCool commented Sep 6, 2024

Kielek commented Sep 6, 2024

tarekgh commented Sep 6, 2024

AlgorithmsAreCool commented Sep 6, 2024

tarekgh commented Sep 6, 2024

tarekgh commented Oct 29, 2024

github-actions bot commented Oct 29, 2024

github-actions bot commented Oct 29, 2024

tarekgh commented Oct 29, 2024

github-actions bot commented Oct 29, 2024

Remove thread contention from Activity Start/Stop #107333

Remove thread contention from Activity Start/Stop #107333

Conversation

AlgorithmsAreCool commented Sep 4, 2024 • edited Loading

The problem

Proposed solution

Local Testing

Results Before Changes

Results After Changes

Microbenchmarks

Open Telemetry Stress test results

Before

After

AlgorithmsAreCool commented Sep 4, 2024

En3Tho commented Sep 4, 2024

ilmax commented Sep 4, 2024

AlgorithmsAreCool commented Sep 4, 2024

En3Tho commented Sep 4, 2024 • edited Loading

AlgorithmsAreCool commented Sep 4, 2024

AlgorithmsAreCool commented Sep 4, 2024

cijothomas commented Sep 4, 2024

cijothomas commented Sep 4, 2024

AlgorithmsAreCool commented Sep 4, 2024

cijothomas commented Sep 4, 2024 • edited Loading

AlgorithmsAreCool commented Sep 4, 2024

AlgorithmsAreCool commented Sep 5, 2024

tommcdon commented Sep 5, 2024

jkoritzinsky commented Sep 5, 2024

tarekgh commented Sep 5, 2024

noahfalk left a comment

Choose a reason for hiding this comment

tarekgh left a comment

Choose a reason for hiding this comment

AlgorithmsAreCool commented Sep 6, 2024

Kielek commented Sep 6, 2024

tarekgh commented Sep 6, 2024

AlgorithmsAreCool commented Sep 6, 2024

tarekgh commented Sep 6, 2024

tarekgh commented Oct 29, 2024

github-actions bot commented Oct 29, 2024

github-actions bot commented Oct 29, 2024

tarekgh commented Oct 29, 2024

github-actions bot commented Oct 29, 2024

AlgorithmsAreCool commented Sep 4, 2024 •

edited

Loading

En3Tho commented Sep 4, 2024 •

edited

Loading

cijothomas commented Sep 4, 2024 •

edited

Loading