Randomized allocation sampling #104955

noahfalk · 2024-07-16T11:51:38Z

This feature allows profilers to do allocation profiling based off randomized samples. It has better theoretical and empirically observed accuracy than our current allocation profiling approaches while also maintaining low performance overhead. It is designed for use in production profiling scenarios. For more information about usage and implementation, see the included doc docs/design/features/RandomizedAllocationSampling.md

[Updated 10/30]
This PR supersedes #100356. Currently it includes 2 commits, the first has the (mostly) rebased changes from July and the 2nd addressed a few things I missed in the rebase and followed up on issues flagged in July.

Most of the testing happened during July but I did re-run some of the functional tests as a smoke test to ensure nothing broke in the few recent changes. We'll also get updated CI regression testing and I'm coordinating with @chrisnas to have him validate it works in a real profiler.

src/coreclr/nativeaot/Runtime/GCHelpers.cpp

noahfalk · 2024-07-20T00:42:30Z

Functional testing found and fixed an off-by-one error in the RNG code but otherwise things looked fine. I also resynced this PR on top of the latest changes in #104849 and #104851. The last commit, now number 10, remains the interesting one.

I also did some performance testing using GCPerfSim as an allocation benchmark. My default configuration was 4 threads, workstation mode, 500GB of allocations entirely with small objects and no survival. It is intended to put maximum stress on the allocation code paths. GCPerfSim command line: -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -lohsr 100000-2000000 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time. I also ran a few variations that added modest amounts of survival and LOH allocations that are a bit more realistic, though still extremely allocation heavy.

EDIT: Don't rely on these numbers, they are misleading. See #104955 (comment)

Benchmarks - No Tracing enabled

Scenario	Baseline time	PR time
Default	20.8	21.4
Default + `sohsi 100`	40.9	41.8
Default + `sohsi 100 -lohar 10`	42.8	43.9
Default + `sohsi 100 -lohar 10 -lohsi 200`	43.6	44.6

Benchmarks - Tracing AllocSampling+GC keywords, verbose level

Scenario	Baseline time	PR time
Default	22	23.1
Default + `sohsi 100`	41.6	43.0
Default + `sohsi 100 -lohar 10`	44.7	44.5
Default + `sohsi 100 -lohar 10 -lohsi 200`	44.2	45.0

Overall it looks like around ~0.9 additional seconds for the PR to do 500GB of allocations. On a tight microbenchmark its noticeable and then as other GC or non-allocation costs increase it becomes relatively less noticeable. I'm investigating to see if it can be improved at all.

src/coreclr/nativeaot/Runtime/thread.inl

noahfalk · 2024-07-24T13:18:13Z

Continued perf investigation+testing has cast my previous results into doubt. After lots of searching for what could have caused the regression, my best explanation is that it actually had nothing to do with the source changes in this PR and instead it is either non-determinism in the build process or some user error on my part. I reached that conclusion by doing the following:

I've had a folder on my machine C:\git\runtime3 that throughout the entire process has been synced here:

commit 42b2b19e883f06af5771b5d85b26af263c62e781 (HEAD)
Author: Matous Kozak <55735845+matouskozak@users.noreply.github.com>
Date:   Fri Jul 12 09:42:55 2024 +0200

This folder has no changes from any of my PRs in it and I've been using the build here for all the baseline measurements. Then I executed the following changes:

move artifacts -> artifacts_backup
build.cmd clr+libs -c release
src\tests\build.cmd generatelayoutonly Release
copy artifacts_backup -> artifacts_backup_2

I can consistently reproduce the same magnitude perf regression using the coreclr built in the artifacts directory, but the regression doesn't appear using the build in the backup or backup_2 directory. I've done many runs on each binary switching between them in a semi-randomized ordering trying to ensure that the results for each binary are repeatable and robust relative to background noise on the machine.

Beyond that I've also got many other builds that include different subsets of the change but there is no clear relationship between the source and the perf results. During one period I progressively added functionality starting from the baseline without triggering the regression to occur, then during another period I was progressively removing functionality from the final PR and the regression would always occur. Even deleting the entirety of the source changes in that folder and syncing it back to the baseline didn't eliminate the perf overhead. Every build was done in a new folder starting without an artifacts folder to remove opportunity for incremental build problems to play a role.

The only explanations I have that make sense to me are either: (a) non-deterministic builds are giving bi-modal perf results for the same input source code or (b) I am making some other error in my testing methodology repeatedly

I'm going to see if I can get another machine to repeat some of the original experiments but at the moment I no longer have any evidence the PR is causing a regression.

noahfalk · 2024-07-29T03:50:57Z

@jkotas @MichalStrehovsky - Functional and perf testing both looked good now, all outstanding comments on the PRs have been addressed, and CI is green. From my perspective this is ready to be merged unless any further review is planned?

I could check in #104849, #104851, then this PR in sequence but I'm not sure that gives any advantage over just checking in this PR alone and closing 104849 and 104851 as no longer needed.

noahfalk · 2024-10-30T11:22:42Z

@brianrob @chrisnas - I wanted to give a heads up that I removed the HeapIndex parameter on the AllocationSampling event. The value wasn't readily accessible from outside the GC and @Maoni0 recommended that it probably isn't very useful so it is easiest to drop it. If you have any concerns let me know.

@chrisnas - do you want to give this branch a spin with your profiler to test it out? I'm guessing this is the final source or pretty close to it.

chrisnas · 2024-10-30T22:40:11Z

@brianrob @chrisnas - I wanted to give a heads up that I removed the HeapIndex parameter on the AllocationSampling event. The value wasn't readily accessible from outside the GC and @Maoni0 recommended that it probably isn't very useful so it is easiest to drop it. If you have any concerns let me know.

Sounds good to me: I never found a usage for it :^)

@chrisnas - do you want to give this branch a spin with your profiler to test it out? I'm guessing this is the final source or pretty close to it.

I'll run my tests on monday - the rest of the week is off for me sorry

noahfalk · 2024-10-31T04:03:33Z

I'll run my tests on monday - the rest of the week is off for me sorry

Sounds great! Its not a rush :)

noahfalk · 2024-11-04T23:27:14Z

/ba-g remaining issues are deadletters for which an issue isn't appropriate

noahfalk · 2024-11-04T23:31:48Z

@jkotas - Thanks for the earlier feedback. I think that has been covered now but let me know if you have any other feedback. Thanks!

src/coreclr/minipal/Unix/CMakeLists.txt

noahfalk · 2024-11-07T00:05:20Z

/ba-g remaining issues are deadletters for which an issue isn't appropriate

noahfalk · 2024-11-07T00:08:06Z

@chrisnas - just let me know whenever you've had a chance to test.
@jkotas - all the feedback so far should be addressed, let me know if there is anything else.

Thanks!

...ts/tracing/eventpipe/randomizedallocationsampling/manual/Allocate/AllocateArraysOfDoubles.cs

jkotas · 2024-11-07T00:43:34Z

I do not have any additional feedback. As I have mentioned in #104955 (review), this should get a review from somebody familiar with tracing.

src/native/minipal/xoshiro128pp.h

noahfalk · 2024-11-07T23:45:25Z

As I have mentioned in #104955 (review), this should get a review from somebody familiar with tracing.

@brianrob - would you mind taking a look if you haven't already?

chrisnas · 2024-11-08T10:34:31Z

@chrisnas - just let me know whenever you've had a chance to test.

I'm back from a conference in Belgium and I'm starting to work on it.
I've validated the numbers with the "manual" tests from src\tests\tracing\eventpipe\randomizedallocationsampling\manual.
I need now to change our profiler to leverage the new event and then compare to the really allocated objects for a more general scenario

Co-authored-by: Adeel Mujahid <3840695+am11@users.noreply.github.com>

noahfalk · 2024-11-12T23:42:33Z

Thanks all!

As a heads up I'll be out for extended period starting the week after next. I'm hoping that review will be finished this week and we can merge this. That gives me all of next week to respond if there is any unexpected post-merge regression discovered. I think we've got a reasonable amount of testing that the feature is working correctly already, but if you did find something @chrisnas in your additional testing we'd still have a long runway to get it fixed before .NET 10 ships next year.

brianrob

LGTM. Thanks @noahfalk!

noahfalk · 2024-11-14T00:50:02Z

/ba-g Still getting deadletter issues on Chrome debugger tests, but it appears to be unrelated and pre-existing

MichalStrehovsky · 2024-11-14T22:46:27Z

I'm seeing the newly added test failing in native AOT outerloop in #109842.

/root/helix/work/correlation/nativeaottest.sh /root/helix/work/workitem/e/tracing/eventpipe/randomizedallocationsampling/allocationsampling/ allocationsampling.dll ''
  0.0s: ==TEST STARTING==
  0.0s: Started sending sentinel events...
  0.0s: Connecting to EventPipe...
  0.0s: Creating EventPipeEventSource...
  0.0s: EventPipeEventSource created
  0.0s: Dynamic.All callback registered
  0.0s: Running optional trace validator
  0.0s: Finished running optional trace validator
  0.0s: Starting stream processing...
  0.0s: Saw new provider 'Microsoft-Windows-DotNETRuntime'
  0.1s: Saw sentinel event
  0.1s: Stopped sending sentinel events
  0.1s: Starting event generating action...
  0.5s: Allocated 400000 instances...
  1.0s: Allocated 800000 instances...
  2.0s: Allocated 1200000 instances...
  3.5s: Allocated 1600000 instances...
  4.8s: 2000000 instances allocated
  4.8s: Stopping event generating action
  4.8s: Sending StopTracing command...
  4.9s: Finished StopTracing command
  4.9s: Saw new provider 'Microsoft-DotNETCore-EventPipe'
  4.9s: Stopping stream processing
  4.9s: Dropped 0 events
  4.9s: Reader task finished
  4.9s: Validating optional callback...
  4.9s: AllocationSampled counts validation
  4.9s: Nb events: 2763
  4.9s: Nb object128: 0
  4.9s: ==TEST FINISHED: FAILED!==
Xunit.Sdk.EqualException: Assert.Equal() Failure: Values differ
Expected: 100
Actual:   -1
   at Xunit.Assert.Equal[T](T, T, IEqualityComparer`1) + 0x24b
   at __GeneratedMainWrapper.Main() + 0x4f

noahfalk · 2024-11-14T23:37:55Z

I'm seeing the newly added test failing in native AOT outerloop in #109842.

@jkotas - gave me the heads up earlier today and filed #109828. I'm investigating it today.

dotnet-issue-labeler bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Jul 16, 2024

noahfalk self-assigned this Jul 16, 2024

noahfalk added area-VM-coreclr community-contribution Indicates that the PR has been added by a community member area-NativeAOT-coreclr and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Jul 16, 2024

noahfalk force-pushed the randomized_alloc branch from 20be562 to 3a77982 Compare July 16, 2024 12:15

build-analysis bot mentioned this pull request Jul 16, 2024

mono assertion failure in CI: mono_class_get_flags: unexpected GC filler class #104956

Closed

jkotas reviewed Jul 16, 2024

View reviewed changes

src/coreclr/nativeaot/Runtime/GCHelpers.cpp Show resolved Hide resolved

noahfalk force-pushed the randomized_alloc branch 2 times, most recently from 62716a0 to 8272401 Compare July 17, 2024 13:16

This was referenced Jul 17, 2024

System.IO.Net5Compat.Tests and System.IO.Tests suddenly exiting with error 137 #100558

Open

SIGKILL (OOM?) while running LibraryImportGenerator.Tests w/o actionable log messages or artifacts dotnet/dnceng#2496

Open

noahfalk force-pushed the randomized_alloc branch 2 times, most recently from 3551533 to 91b51be Compare July 20, 2024 00:19

noahfalk marked this pull request as ready for review July 20, 2024 00:20

noahfalk requested a review from MichalStrehovsky as a code owner July 20, 2024 00:20

jkotas reviewed Jul 20, 2024

View reviewed changes

src/coreclr/nativeaot/Runtime/thread.inl Outdated Show resolved Hide resolved

jkotas reviewed Jul 20, 2024

View reviewed changes

src/coreclr/nativeaot/Runtime/thread.inl Outdated Show resolved Hide resolved

build-analysis bot mentioned this pull request Jul 21, 2024

linux-armel checked CoreCLR_NonPortable failing to build with "Unable to find toolchain executable. Name: 'ar', Prefix: 'llvm-'" #105176

Closed

This was referenced Jul 29, 2024

msbuild crashes with "MSB0001: Internal MSBuild Error: must be valid" dotnet/dnceng#3304

Open

[browser][wbt] fails with InvalidOperationException: There is no currently active test #105315

Open

build-analysis bot mentioned this pull request Oct 31, 2024

chrome-DebuggerTests.GetPropertiesTests timing out #109070

Open

jkotas reviewed Nov 4, 2024

View reviewed changes

src/coreclr/minipal/Unix/CMakeLists.txt Outdated Show resolved Hide resolved

PR feedback: Better factoring the minipal make sources

a096a9f

build-analysis bot mentioned this pull request Nov 6, 2024

restarted. Azure DevOps can't recover from restarts. dotnet/dnceng#3879

Open

3 tasks

jkotas reviewed Nov 7, 2024

View reviewed changes

...ts/tracing/eventpipe/randomizedallocationsampling/manual/Allocate/AllocateArraysOfDoubles.cs Outdated Show resolved Hide resolved

am11 reviewed Nov 7, 2024

View reviewed changes

src/native/minipal/xoshiro128pp.h Outdated Show resolved Hide resolved

noahfalk and others added 2 commits November 12, 2024 15:22

Update src/native/minipal/xoshiro128pp.h

672bb8f

Co-authored-by: Adeel Mujahid <3840695+am11@users.noreply.github.com>

Add license header on the manual test code

45e1987

brianrob approved these changes Nov 13, 2024

View reviewed changes

noahfalk merged commit 1c4c009 into dotnet:main Nov 14, 2024
154 of 161 checks passed

LoopedBard3 mentioned this pull request Nov 19, 2024

[Perf] Windows/x64: Multiple (13) Regressions on 11/14/2024 12:52:44 AM #109967

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Randomized allocation sampling #104955

Randomized allocation sampling #104955

noahfalk commented Jul 16, 2024 •

edited

Loading

noahfalk commented Jul 20, 2024 •

edited

Loading

noahfalk commented Jul 24, 2024

noahfalk commented Jul 29, 2024

noahfalk commented Oct 30, 2024

chrisnas commented Oct 30, 2024

noahfalk commented Oct 31, 2024

noahfalk commented Nov 4, 2024

noahfalk commented Nov 4, 2024

noahfalk commented Nov 7, 2024

noahfalk commented Nov 7, 2024

jkotas commented Nov 7, 2024

noahfalk commented Nov 7, 2024

chrisnas commented Nov 8, 2024 •

edited

Loading

noahfalk commented Nov 12, 2024

brianrob left a comment

noahfalk commented Nov 14, 2024

MichalStrehovsky commented Nov 14, 2024

noahfalk commented Nov 14, 2024

Randomized allocation sampling #104955

Randomized allocation sampling #104955

Conversation

noahfalk commented Jul 16, 2024 • edited Loading

noahfalk commented Jul 20, 2024 • edited Loading

Benchmarks - No Tracing enabled

Benchmarks - Tracing AllocSampling+GC keywords, verbose level

noahfalk commented Jul 24, 2024

noahfalk commented Jul 29, 2024

noahfalk commented Oct 30, 2024

chrisnas commented Oct 30, 2024

noahfalk commented Oct 31, 2024

noahfalk commented Nov 4, 2024

noahfalk commented Nov 4, 2024

noahfalk commented Nov 7, 2024

noahfalk commented Nov 7, 2024

jkotas commented Nov 7, 2024

noahfalk commented Nov 7, 2024

chrisnas commented Nov 8, 2024 • edited Loading

noahfalk commented Nov 12, 2024

brianrob left a comment

Choose a reason for hiding this comment

noahfalk commented Nov 14, 2024

MichalStrehovsky commented Nov 14, 2024

noahfalk commented Nov 14, 2024

noahfalk commented Jul 16, 2024 •

edited

Loading

noahfalk commented Jul 20, 2024 •

edited

Loading

chrisnas commented Nov 8, 2024 •

edited

Loading