Add randomized allocation sampling #100356

chrisnas · 2024-03-27T18:24:27Z

This PR adds a new feature, randomized allocation sampling, that can be used for low overhead allocation sampling with good probabilistic error bounds. We expect APM tools and other scenarios that do memory profiling in development or production will be interested in this. See docs/design/features/RandomizedAllocationSampling.md for more details about what this is and how it works.

This PR is a continuation from work originally in #98167. @Maoni0 and @jkotas had some discussion at that point to agree on the overall development direction.

There are a variety of test results showing some performance comparisons and statistical sampling distributions in src/tests/tracing/eventpipe/randomizedallocationsampling/manual/testing_results/. These are available for reviewers to see but will be deleted prior to merging the PR.

Current status:

The CoreCLR support is mostly complete but still tracking down some CI failures now that the code is being built and tested in a wider variety of configs
NativeAOT support is in progress: the new event is emitted (currently looking for a better randomizer + measure statistical distribution but more complicated since the name of the type is not known)

Note:
An option could be to merge the Core part now to secure the feature in .NET 9 and continue the NativeAOT that might not be finished for the cut. Note that the fix to emit AllocationTick in NativeAOT is now merged so it could help if the new randomized sampling is not available.

Guide to the changes:

The majority of changed files in the product code are simple refactoring to accommodate the new ee_alloc_context struct. This struct wraps a gc_alloc_context and adds one additional field combined_limit. All of the fast allocation helpers that previously referred to alloc_limit now refer to combined_limit.
A few more interesting changes are:
- src/coreclr/vm/gcheaputilities.h - Adds the definition of ee_alloc_context which tracks the state of randomized sampling and generates new random numbers for the sampler.
- src/coreclr/vm/gchelpers.cpp - Modifications of the Alloc() method check if sampling is enabled, check if a new allocation should be sampled, and fire the sampling event if so. There is also a separate change AllocateSzArray() that simplifies aligning double arrays and removes a potentially unnecessary min_obj padding allocation there.
- src/coreclr/vm/gcenv.ee.cpp - Modifies GCToEEInterface::GcEnumAllocContexts() to ensure that the combined_limit field is kept up-to-date with potential alloc_context changes made by the GC.
- src/coreclr/gc/gc.cpp - The GC_ALLOC_ALIGN8 is now always supported. Previously it was only supported when FEATURE_64BIT_ALIGNMENT was enabled but we wanted to use it more broadly.

…ampling feature. This commit is only to gather feedback on the direction and is not intended to be checked in. The is a resumption of some of the original work that occured in dotnet#85750 although this is a different implementation approach. The overall goal is to do lightweight allocation sampling that can produce unbiased approximations of what was actually allocated. Our current ETW AllocationTick sampling is periodic but not randomized. Extrapolating from the AllocationTick samples can lead to unbounded estimation errors in theory and also substantial estimation error observed in practice. This commit is primarily to get feedback on adding a new adjustable limit pointer on the coreclr EE side of allocation contexts. This allows breaking out of the fast path allocation helpers at an arbitrarily selected sampling point that need not align with the standard GC allocation quantum for SOH. With sampling off the fast_helper_alloc_limit would always be identical to alloc_limit, but when sampling is on we would occasionally have them diverge. The key changes are: gcheaputilities.h - defines a new ee_alloc_context which augments the existing gc_alloc_context with the extra fast helper limit field gcheaputilities.h and threads.h - replace the global and per-thread storage of gc_alloc_context with the expanded ee_alloc_context to store the new field. jithelpers.cpp, jitinterfacex86.cpp, and amd64 asm stuff - refactors fast path alloc helpers to use the new limit field instead gchelpers.cpp - Updated Alloc() function recognizes when fast helper limit was exceeded to do some kind of sampling callback and update the fast helper limit when needed. This commit doesn't contain any logic that actually turns the sampling on, determines the limit values needed for sampling, or issues callbacks for sampled objects. That would come later if folks think this is a reasonable approach to intercept allocations.

…vent

…rbosity is set

… allocations

jkotas · 2024-07-09T17:45:59Z

You may consider peeling off some of the changes into separate PR to make this easier to land. For example, the changes in the GC proper around the 64-bit aligned allocs can be peeled off into a separate PR.

chrisnas · 2024-07-09T22:14:25Z

You may consider peeling off some of the changes into separate PR to make this easier to land. For example, the changes in the GC proper around the 64-bit aligned allocs can be peeled off into a separate PR.

Split into how many pieces @jkotas?

64-aligned allocs
Core implementation
NativeAOT implementation
???

jkotas · 2024-07-10T07:25:41Z

I think the following split may work better:

64-aligned allocs
Introduce the secondary allocation limit (for both CoreCLR and NAOT)
Implement the actual sampling event

Smaller PRs are easier to stabilize, review and merge that locks in forward progress.

chrisnas · 2024-07-10T20:09:09Z

I think the following split may work better:

64-aligned allocs

Introduce the secondary allocation limit (for both CoreCLR and NAOT)

Implement the actual sampling event

Smaller PRs are easier to stabilize, review and merge that locks in forward progress.

I'm not sure to understand the difference between the last 2 items: emitting the event is a tiny part of the work; the second item is the big one. Also, without the sampling itself, the code will be different (mostly what I did at the beginning of NAOT when I did not want to change the behaviour - combined_limit = alloc_limit)

…ution

jkotas · 2024-07-12T04:56:33Z

I'm not sure to understand the difference between the last 2 items: emitting the event is a tiny part of the work; the second item is the big one.

The second item (Introduce the secondary allocation limit (for both CoreCLR and NAOT)) would be a mechanical change. It does not need any tests. It should not introduce any observable behavior changes. It is just touching a lot of different places and it is easy to miss a few of those somewhere.

I do not have strong opinions about the best way to split this, but I think this will need to split into multiple PRs to land it - giving how long it takes to stabilize it.

jkotas · 2024-07-12T05:00:39Z

src/coreclr/nativeaot/Runtime/xoshiro128plusplus.h

@@ -0,0 +1,131 @@
+#pragma once


Should this be move to src\native\minipal and CoreCLR impl switched to it as well?

jkotas · 2024-07-12T05:06:26Z

In any case, if you agree that the 64-aligned allocs cleanup would be a good independent PR, we can start with that.

jkotas · 2024-07-12T14:50:20Z

src/coreclr/vm/jithelpers.cpp

-    _ASSERTE(allocPtr <= allocContext->alloc_limit);
-    if (size > static_cast<SIZE_T>(allocContext->alloc_limit - allocPtr))
+    _ASSERTE(allocPtr <= eeAllocContext->combined_limit);
+    if ((allocPtr == nullptr) || (size > static_cast<SIZE_T>(eeAllocContext->combined_limit - allocPtr)))


We do not want to adding extra checks to the allocation helper fast paths. It would cause undesirable performance regressions.

dotnet-policy-service · 2024-08-19T20:07:39Z

Tagging subscribers to this area: @tommcdon
See info in area-owners.md if you want to be subscribed.

tommcdon · 2024-08-26T13:25:51Z

@noahfalk I believe this PR is superseded by:

If yes, @chrisnas @noahfalk any objections if we close this PR?

noahfalk · 2024-08-27T00:08:00Z

@noahfalk I believe this PR is superseded by:

#104849
#104851
#104955

Yep, thats correct. Closing this PR.

noahfalk and others added 17 commits February 8, 2024 07:52

Initial update

6a92fcb

Update based on feedback

cdbffc3

Rename fast_alloc_helper_limit_ptr to alloc_sampling

924afd0

Remove trailing whitespace in markdown

e0ce093

Add sampling threshold computation

396c05e

Take feedback into account and start to implement AllocationSampled e…

91b8b79

…vent

Fix typo

65ff391

Take review into account

b25adc6

Emit the AllocationSampled event

d939cdd

Fix threshold computation error

9b0f7da

Deal with empty allocation context when fixing them

8875a4d

Handle the case of objects larger than an allocation context

f39a732

Take review into account

fbc0b6b

Take review into account for dynamically sampling

32eb2fa

Adding a first test that AllocationSampled is emitted when keyword/ve…

518b4f7

…rbosity is set

Check type name in AllocationSampled test

b596f33

dotnet-issue-labeler bot added the area-VM-coreclr label Mar 27, 2024

dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Mar 27, 2024

chrisnas added 11 commits March 28, 2024 15:13

Compare perf impact of allocation sampling

984ed95

Fix tests

1c46a5d

Update tests and doc

44e914b

Update review

9eba36e

Update based on review

b7614fd

Add tests and simple framework to measure statistical distribution of…

cbc5a12

… allocations

Add x1/x2/x3 ratio check

826d8eb

Update array of double implementation for 32 bits

a56a5ee

Updates

48dbfa1

Fix markdown

7280c13

Add AllocationsRunEventSource to compute percentiles more easily

fd8a1b9

chrisnas added 2 commits July 10, 2024 21:59

Fix issue when TLS for alloc context is not initialized

dc02b7c

Merge remote-tracking branch 'origin' into chrisnas/alloc_sampling

85625cd

This was referenced Jul 11, 2024

TimeProviderTests.TestProviderTimer failed in CI #103459

Closed

[Test Failure] System.Net.Security.Tests.SslStreamNetworkStreamTest.SslStream_RandomSizeWrites_OK #104605

Closed

chrisnas added 2 commits July 11, 2024 22:41

Change randomizer to use sxoshiro128++ for better statistical distrib…

911d5d7

…ution

Fix possible crash in GCInterface_GetTotalAllocatedBytesPrecise

91d58dc

jkotas reviewed Jul 12, 2024

View reviewed changes

noahfalk mentioned this pull request Jul 12, 2024

Extend usage of GC_ALLOC_ALIGN8 #104781

Merged

chrisnas added 2 commits July 12, 2024 14:11

Fix possible desync between gc_alloc_context and combined_limit

11177d9

Merge remote-tracking branch 'origin' into chrisnas/alloc_sampling

ce40d3d

jkotas reviewed Jul 12, 2024

View reviewed changes

build-analysis bot mentioned this pull request Jul 12, 2024

[android] Connection issues in Sockets and Functional networking tests #104709

Closed

noahfalk mentioned this pull request Jul 16, 2024

Randomized allocation sampling #104955

Open

jeffschwMSFT added area-Diagnostics-coreclr and removed area-VM-coreclr labels Aug 19, 2024

noahfalk closed this Aug 27, 2024

github-actions bot locked and limited conversation to collaborators Sep 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add randomized allocation sampling #100356

Add randomized allocation sampling #100356

chrisnas commented Mar 27, 2024 •

edited

Loading

jkotas commented Jul 9, 2024

chrisnas commented Jul 9, 2024

jkotas commented Jul 10, 2024

chrisnas commented Jul 10, 2024

jkotas commented Jul 12, 2024

jkotas Jul 12, 2024

jkotas commented Jul 12, 2024

jkotas Jul 12, 2024

dotnet-policy-service bot commented Aug 19, 2024

tommcdon commented Aug 26, 2024

noahfalk commented Aug 27, 2024

Add randomized allocation sampling #100356

Add randomized allocation sampling #100356

Conversation

chrisnas commented Mar 27, 2024 • edited Loading

jkotas commented Jul 9, 2024

chrisnas commented Jul 9, 2024

jkotas commented Jul 10, 2024

chrisnas commented Jul 10, 2024

jkotas commented Jul 12, 2024

jkotas Jul 12, 2024

Choose a reason for hiding this comment

jkotas commented Jul 12, 2024

jkotas Jul 12, 2024

Choose a reason for hiding this comment

dotnet-policy-service bot commented Aug 19, 2024

tommcdon commented Aug 26, 2024

noahfalk commented Aug 27, 2024

chrisnas commented Mar 27, 2024 •

edited

Loading