Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change temporary entrypoints to be lazily allocated #101580

Merged
merged 59 commits into from
Jul 14, 2024

Conversation

davidwrighton
Copy link
Member

@davidwrighton davidwrighton commented Apr 25, 2024

This change moves temporary entrypoints to be allocated when they are needed instead of eagerly at type load time

As you can see, with a very small test case the wins are significant, and notably, reducing allocations in the stub heaps is nice as they are difficult to allocate efficiently as they require a bunch of memory mapping operations. In addition, with this change I'm removing the concept of Compact Entrypoints, as they are no longer significantly beneficial as of the advent of tiered compilation, and would be difficult to integrate with this change.

Baseline
Loader Heap:
----------------------------------------
LowFrequencyHeap:     Size: 0xf0000 (983040) bytes total.
HighFrequencyHeap:    Size: 0x16a000 (1482752) bytes total, 0x3000 (12288) bytes wasted.
StubHeap:             Size: 0x1000 (4096) bytes total.
FixupPrecodeHeap:     Size: 0x168000 (1474560) bytes total.
NewStubPrecodeHeap:   Size: 0x18000 (98304) bytes total.
IndirectionCellHeap:  Size: 0x1000 (4096) bytes total.
CacheEntryHeap:       Size: 0x1000 (4096) bytes total.
Total size:           Size: 0x3dd000 (4050944) bytes total, 0x3000 (12288) bytes wasted.


PR
Loader Heap:
----------------------------------------
LowFrequencyHeap:     Size: 0xef000 (978944) bytes total.
HighFrequencyHeap:    Size: 0x1ad000 (1757184) bytes total, 0x3000 (12288) bytes wasted.
StubHeap:             Size: 0x1000 (4096) bytes total.
FixupPrecodeHeap:     Size: 0x38000 (229376) bytes total.
NewStubPrecodeHeap:   Size: 0x8000 (32768) bytes total.
IndirectionCellHeap:  Size: 0x1000 (4096) bytes total.
CacheEntryHeap:       Size: 0x1000 (4096) bytes total.
Total size:           Size: 0x2df000 (3010560) bytes total, 0x3000 (12288) bytes wasted.

What can be seen here, is that the size of the Precode heaps shrinks dramatically, and the size of the HighFrequency heap grows a little bit. With a larger test case (launching powershell with a -c exit command line argument, the gains are still very similar. Also with powershell, I have a simple benchmark which runs pwsh.exe -c exit 100 times and measures the total execution time. The performance improvement is around 1% which I consider to be quite a nice result.

Architecture/Version Total Size Process Execution Time
Baseline X86 9,420,800 bytes 23.33 seconds
PR X86 6,348,800 bytes 23.18 seconds
Baseline X64 13,586,432 bytes 23.23 seconds
PR X64 10,440,704 bytes 23.15 seconds

Baseline
Loader Heap:
----------------------------------------
System Domain:        7ffab916ec00
LoaderAllocator:      7ffab916ec00
LowFrequencyHeap:     Size: 0xf0000 (983040) bytes total.
HighFrequencyHeap:    Size: 0x16a000 (1482752) bytes total, 0x3000 (12288) bytes wasted.
StubHeap:             Size: 0x1000 (4096) bytes total.
FixupPrecodeHeap:     Size: 0x168000 (1474560) bytes total.
NewStubPrecodeHeap:   Size: 0x18000 (98304) bytes total.
IndirectionCellHeap:  Size: 0x1000 (4096) bytes total.
CacheEntryHeap:       Size: 0x1000 (4096) bytes total.
Total size:           Size: 0x3dd000 (4050944) bytes total, 0x3000 (12288) bytes wasted.

Compare
Loader Heap:
----------------------------------------
System Domain:        7ff9eb49dc00
LoaderAllocator:      7ff9eb49dc00
LowFrequencyHeap:     Size: 0xef000 (978944) bytes total.
HighFrequencyHeap:    Size: 0x1b2000 (1777664) bytes total, 0x3000 (12288) bytes wasted.
StubHeap:             Size: 0x1000 (4096) bytes total.
FixupPrecodeHeap:     Size: 0x70000 (458752) bytes total.
NewStubPrecodeHeap:   Size: 0x10000 (65536) bytes total.
IndirectionCellHeap:  Size: 0x1000 (4096) bytes total.
CacheEntryHeap:       Size: 0x1000 (4096) bytes total.
Total size:           Size: 0x324000 (3293184) bytes total, 0x3000 (12288) bytes wasted.

LowFrequencyHeap is 4KB bigger
HighFrequencyHeap is 288KB bigger
FixupPrecodeHeap is 992KB smaller
NewstubPrecodeHeap is 32KB smaller
…y definition the method is defining the slot
@build-analysis build-analysis bot mentioned this pull request May 9, 2024
davidwrighton and others added 4 commits June 28, 2024 11:59
Co-authored-by: Jan Kotas <jkotas@microsoft.com>
Co-authored-by: Jan Kotas <jkotas@microsoft.com>
Try removing EnsureSlotFilled
Implement IsEligibleForTieredCompilation in terms of IsEligibleForTieredCompilation_NoCheckMethodDescChunk
src/coreclr/debug/daccess/request.cpp Show resolved Hide resolved
src/coreclr/debug/daccess/request.cpp Show resolved Hide resolved
src/coreclr/debug/daccess/request.cpp Show resolved Hide resolved
src/coreclr/vm/clsload.cpp Show resolved Hide resolved
@@ -612,7 +612,7 @@ MethodDesc* StubDispatchFrame::GetFunction()
{
if (m_pRepresentativeMT != NULL)
{
pMD = m_pRepresentativeMT->GetMethodDescForSlot(m_representativeSlot);
pMD = m_pRepresentativeMT->GetMethodDescForSlot_NoThrow(m_representativeSlot);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My point is if this is not supposed to return a NULL and we're calling the "NoThrow" then at the very least we need an assert to convey that assumption.

src/coreclr/vm/methodtable.h Outdated Show resolved Hide resolved
src/coreclr/vm/methodtablebuilder.cpp Show resolved Hide resolved
src/coreclr/vm/method.cpp Outdated Show resolved Hide resolved
src/coreclr/vm/method.cpp Outdated Show resolved Hide resolved
src/coreclr/vm/method.cpp Outdated Show resolved Hide resolved
Co-authored-by: Aaron Robinson <arobins@microsoft.com>
…can return NULL ... and then another thread could produce a stable entrypoint and the assert could lose the race
Comment on lines 3098 to 3105
if (RequiresStableEntryPoint())
{
GetOrCreatePrecode()->SetTargetInterlocked(entryPoint);
}
else
{
SetStableEntryPointInterlocked(entryPoint);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code reads odd. If RequiresStableEntryPoint() returns false, then we set the stable entry point? If this is correct, I think a comment is appropriate.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sigh. It is right, but yeah, this should be better structured. A RequiredStableEntryPoint is currently required have a stable entrypoint which is a Precode. Otherwise, we can generate set a stable entrypoint which isn't based on a Precode. Before this change, we would have reliably hit the HasPrecode flag above, but with making Precode generation lazy, this is a point where the Precode needs to be forced at this point. However, I think I'll fix this by changing the previous condition to be (RequiresStableEntryPoint()) and not use GetPrecode, and instead use GetOrCreatePrecode(). Then this portion of the condition can revert to be what it was before.

@davidwrighton davidwrighton merged commit dacf9db into dotnet:main Jul 14, 2024
90 checks passed
davidwrighton added a commit to dotnet/diagnostics that referenced this pull request Jul 17, 2024
- Support iterating methods from the new ISOSDacInterface15
- Re-plumb the existing logic as a local implementation of
ISOSDacInterface15 on top of apis that have been present for a long time

This is dependent on dotnet/runtime#101580
@github-actions github-actions bot locked and limited conversation to collaborators Aug 14, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants