System.Reflection.Metadata pinning can generate extensive GC heap fragmentation #50782

noahfalk · 2021-04-06T13:19:40Z

System.Reflection.Metadata uses pinned memory buffers to store the contents of embedded PDBs. When the runtime loads and caches these readers to augment stack traces with source line info it winds up preserving the pinned memory for the lifetime of the app. Long lived pinned objects can generate large GC heap fragmentation leading applications to use far more committed VM than they otherwise would have. We have another internal team at MS observing 132MB of VM wasted due to this pin. The amount of wasted VM has nothing to do with the size the pinned object, it is solely based on where the allocated object happens to fall within the global bounds of the GC heap which varies depending on the overall allocation behavior of all the code in the app.

Although the ideal solution is not to use pinning at all, a likely lower effort solution is to allocate the byte array using the pinned heap on .NET 5 and up where it is available. If fixing this proves not to be viable then the runtime will probably need to pursue alternate solutions such as adding an app configuration switch so that app developers can disable portable PDB usage.

ghost · 2021-04-06T13:19:45Z

Tagging subscribers to this area: @dotnet/gc
See info in area-owners.md if you want to be subscribed.

Issue Details

System.Reflection.Metadata uses pinned memory buffers to store the contents of embedded PDBs. When the runtime loads and caches these readers to augment stack traces with source line info it winds up preserving the pinned memory for the lifetime of the app. Long lived pinned objects can generate large GC heap fragmentation leading applications to use far more committed VM than they otherwise would have. We have another internal team at MS observing 132MB of VM wasted due to this pin. The amount of wasted VM has nothing to do with the size the pinned object, it is solely based on where the allocated object happens to fall within the global bounds of the GC heap which varies depending on the overall allocation behavior of all the code in the app.

Although the ideal solution is not to use pinning at all, a likely lower effort solution is to allocate the byte array using the pinned heap on .NET 5 and up where it is available. If fixing this proves not to be viable then the runtime will probably need to pursue alternate solutions such as adding an app configuration switch so that app developers can disable portable PDB usage.

Author:	noahfalk
Assignees:	tmat
Labels:	`area-GC-coreclr`, `tenet-performance`, `untriaged`
Milestone:	-

Maoni0 · 2021-04-07T20:32:10Z

Although the ideal solution is not to use pinning at all,

does it need to be pinned? what was the reason this was pinned in the first place? we generally only advise POH to be used to replace mandatory pinning usage.

brianrob · 2021-04-07T20:38:09Z

If it does need to be pinned, does it need to be pinned for this long? From what I can see in the memory dumps that led to this issue is that the object is likely pinned for its lifetime.

tmat · 2021-04-07T20:40:21Z

The MetadataReader uses pointers for performance, hence pinning is needed while metadata is read.

We essentially implemented ReadOnlySpan<byte> and ReadOnlyMemory<byte> before it existed in the runtime.
https://github.com/dotnet/runtime/tree/main/src/libraries/System.Reflection.Metadata/src/System/Reflection/Internal/MemoryBlocks

It might be possible to replace the memory abstractions with ReadOnlySpan<byte> and ReadOnlyMemory<byte>, but SRM is also available for .NET Framework so it might be too complicated to maintain both versions.

brianrob · 2021-04-07T20:43:29Z

@tmat, do you think it's possible to pin/unpin rather than putting this in the POH?

Maoni0 · 2021-04-07T20:46:05Z

also how frequently would this object be created? are we talking about one object for the lifetime of the process or would one get created every time something happens?

tmat · 2021-04-07T20:54:26Z

So, this is what happens:

PDB content is requested - e.g. someone called Exception.ToString or StackTrace(fRequireSources: true), the runtime enumerates the frames and for those that are in assemblies with PDB embedded calls to SRM to load the PDB data. The resulting reader is cached. The entries in the cache are not evicted AFAIK. This is controlled by code in System.Diagnostics.
Upon request SRM allocates an array for the PDB data and stream-reads the compressed Embedded PDB data from the loaded PE image through DeflateStream into the array.
The array is then pinned so that the MetadataReader can read from it.

brianrob · 2021-04-07T22:56:04Z

Is the MetadataReader exposed, or is it an internal implementation detail? I'm wondering if we can't just pin the buffer that contains the PDB when it is in-use. Every time a call comes in, we pin the buffer, get a pointer to it, and then pass it to the MetadataReader as part of the call.

tmat · 2021-05-05T17:15:53Z

@brianrob Step [2] above creates MetadataReaderProvider instance that holds on an ImmutableArray<byte> with the PDB data. At this point the memory is not pinned. Step [3] happens when System.Diagnostics code calls MetadataReaderProvider.GetMetadataReader. At this point the memory is pinned and returned MetadataReader instance holds on the pointer. The code is here: https://source.dot.net/#System.Diagnostics.StackTrace/System/Diagnostics/StackTraceSymbols.cs,116

Once the provider returns MetadataReader the memory can't be unpinned until we are sure no one is using the reader of any related types that point to the memory. This is up to the user of the reader, the reader can't determine that. The caller calls MetadataReaderProvider.Dispose to free the memory.

ghost · 2021-05-05T17:20:40Z

Tagging subscribers to this area: @tommcdon, @krwq
See info in area-owners.md if you want to be subscribed.

Issue Details

System.Reflection.Metadata uses pinned memory buffers to store the contents of embedded PDBs. When the runtime loads and caches these readers to augment stack traces with source line info it winds up preserving the pinned memory for the lifetime of the app. Long lived pinned objects can generate large GC heap fragmentation leading applications to use far more committed VM than they otherwise would have. We have another internal team at MS observing 132MB of VM wasted due to this pin. The amount of wasted VM has nothing to do with the size the pinned object, it is solely based on where the allocated object happens to fall within the global bounds of the GC heap which varies depending on the overall allocation behavior of all the code in the app.

Although the ideal solution is not to use pinning at all, a likely lower effort solution is to allocate the byte array using the pinned heap on .NET 5 and up where it is available. If fixing this proves not to be viable then the runtime will probably need to pursue alternate solutions such as adding an app configuration switch so that app developers can disable portable PDB usage.

Author:	noahfalk
Assignees:	tmat
Labels:	`area-System.Diagnostics`, `area-System.Reflection.Metadata`, `tenet-performance`, `untriaged`
Milestone:	-

tmat · 2021-05-05T17:21:08Z

@noahfalk We do no pin the array at the point where we allocate it. It's pinned lazily, only when a MetadataReader is requested later on. I guess in most cases the reader is requested right away, so it might make sense to pin it immediately and allocate it directly on the pinned heap. Any thoughts on that?

tmat · 2021-05-05T17:38:07Z

Alternatively, we could add a ReadOnlySpan<byte> GetMetadataContent() method on the MetadataReaderProvider that returns the content without pinning it. Then the caller can make a copy into a pinned heap array and open the MetadataReader on the copy.

tmat · 2021-05-05T17:40:17Z

@tommcdon Added System.Diagnostics label since the particular usage that's affected is in StackTrace implementation. Depending on how we implement this in SRM, System.Diagnostics might need an update as well.

noahfalk · 2021-05-06T00:49:07Z

so it might make sense to pin it immediately and allocate it directly on the pinned heap. Any thoughts on that?

That was my original idea. @Maoni0 seemed nervous about this and I assume it still has some performance implication, but I don't what it is. Presumably there is a tradeoff here between "better performance" and "requires fewer changes" that we can weigh once we understand what the implications are.

It might be possible to replace the memory abstractions with ReadOnlySpan and ReadOnlyMemory, but SRM is also available for .NET Framework so it might be too complicated to maintain both versions.

System.Diagnostics.DiagnosticSource uses Span types and it ships downlevel on .NET Framework. I believe there is a polyfill in place that allows you to maintain only one version of the code (the one using Span) though downlevel performance will probably be worse than the raw pointer code. If losing some perf for downlevel scenarios is acceptable this might be a really nice solution.

Then the caller can make a copy into a pinned heap array and open the MetadataReader on the copy.

Do you have any expectation how this would affect perf? Assume we are formatting the same stack trace in a loop as fast as possible (which currently I think is around ~250,000 frames/sec). If we needed to open a new MetadataReader once per frame before parsing each frame's worth of line information, how much slower do you think it would run? I think we have some tolerance to slow this down, but not a lot. We had a high severity support case ~5 years back that was caused by a regression in stack trace perf from doing portable PDB parsing and adding that MetadataReader cache was what resolved it. I don't want to be poking that hornet's nest a 2nd time : )

benaadams · 2021-05-06T01:58:38Z

Does the ReadOnlyUnmanagedMemoryStream etc need to use a byte* or could it use Memory<byte> and the pinning requirement be dropped entirely?

tmat · 2021-05-06T03:51:19Z

@benaadams The entire MetadataReader uses pointers for reading data. So to avoid pinning we would need to make a lot of changes (and #ifdefs).

Maoni0 · 2021-05-06T05:58:14Z

pinning objects that could be anywhere on the heap, don't strictly need to be pinned and never get unpinned is a scenario that's extremely hard to justify. the "never get unpinned" aspect makes it the hardest for GC to do its job - as soon as a segment is extended beyond the pinned object it can't shrink any smaller thus you see the problem that originated this issue.

the minimum I would do is to unpin it after you are done using it, allowing GC to compact it when needed.

brianrob · 2021-05-06T14:12:59Z

Agreed with @Maoni0. The other thing worth calling out is that the scenario that generated this issue is the exception stack trace generation scenario, where the PDB contents get loaded to retrieve source line information, and then are cached for the life of the process. Thus, there isn't even an option to unpin the contents currently - it will just live forever. Apps that consume the MetadataReader on their own have an option to dispose of it when its no longer needed.

Ideally, the application doesn't throw a huge number of exceptions, and so it's possible that the application is paying a memory penalty that is outsized, relative to the benefit that it's getting from the functionality, and it has no way to counteract that.

Maoni0 · 2021-05-06T18:59:59Z

just to clarify, as I realize that I didn't specifically answer the question about POH, on POH you are again in another bad situation which is things on POH cannot be get unpinned and since we don't ever decommit memory in the middle of a segment (only at the end of a segment) you can again get into fragmentation situation on POH segments. and in the scenarios where we care about reserved range of the GC heap, you'd also be creating POH segments in the middle of that range and thus may prevent GC from from forming larger free spaces to allocate new segments.

noahfalk · 2021-05-06T19:33:05Z

Thanks for the info! I'm striking my original proposal to use pinned object heap since it doesn't work how I anticipated and doesn't appear to resolve the underlying problem. @tmat at this point here are the options I see in priority for order by performance/user experience:

Replace raw pointer usage in S.R.M with Span (or any other solution that eliminates the pinning entirely)
Agree on an S.R.M API that allows the BCL stack trace code to indicate when it is reading and when it is done so that S.R.M can pin and un-pin. Changing between pinned and unpinned states needs to be fast (~1us) because some scenarios will require it to be invoked very frequently. If the only thing the API does is create/delete a pinned handle that perf goal should be easy to meet, but if the API lumps in other considerations such as copying memory, requiring the memory blob to be re-parsed from scratch, or re-creating internal data structures then we'd need to ensure that work fits in the time constraints.
BCL could add a switch that controls whether the runtime will ever load portable PDBs for stack trace and recommend that customers with concerns on VM usage disable the feature.

jkotas · 2021-05-06T20:05:10Z

Can we just use unmanaged memory to fix the pinning problem? S.R.M APIs are generally designed to deal with unmanaged memory. For example, there is FromPortablePdbImage(byte* start, int size).

tmat · 2021-05-06T20:10:44Z

@jkotas Good point. I think we can.

krwq · 2021-06-17T17:36:53Z

[Triage] @tmat, still planning to fix it in 6.0?

tmat · 2021-06-17T17:52:23Z

Yes, but only after Hot Reload features are done.

danmoseley · 2021-07-21T16:28:34Z

We should only have one area label.

brianrob · 2021-07-21T18:40:18Z

@tmat, checking in on this issue. Is this on track to make it for .NET 6?

tmat · 2021-07-21T20:15:37Z

@brianrob I should be able to work on it soon, now that most of Roslyn Hot Reload is feature complete.

brianrob · 2021-07-22T03:29:23Z

Awesome. Thanks @tmat.

brianrob · 2021-07-27T23:05:13Z

Thanks @tmat!

noahfalk added the tenet-performance Performance related issue label Apr 6, 2021

noahfalk assigned tmat Apr 6, 2021

dotnet-issue-labeler bot added area-GC-coreclr untriaged New issue has not been triaged by the area owner labels Apr 6, 2021

noahfalk added the area-System.Reflection.Metadata label Apr 6, 2021

noahfalk removed the area-GC-coreclr label Apr 6, 2021

tmat added the area-System.Diagnostics label May 5, 2021

tmat removed the area-System.Reflection.Metadata label May 5, 2021

tmat removed their assignment May 5, 2021

tmat added the area-System.Reflection.Metadata label May 5, 2021

tmat self-assigned this May 5, 2021

tommcdon removed the untriaged New issue has not been triaged by the area owner label May 10, 2021

tommcdon added this to the 6.0.0 milestone May 10, 2021

danmoseley removed the area-System.Reflection.Metadata label Jul 21, 2021

tmat mentioned this issue Jul 26, 2021

Use native allocator instead of pinning when decompressing embedded PDB #56336

Merged

ghost added the in-pr There is an active PR which will close this issue when it is merged label Jul 26, 2021

tmat closed this as completed in #56336 Jul 27, 2021

ghost removed the in-pr There is an active PR which will close this issue when it is merged label Jul 27, 2021

ghost locked as resolved and limited conversation to collaborators Aug 27, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

System.Reflection.Metadata pinning can generate extensive GC heap fragmentation #50782

System.Reflection.Metadata pinning can generate extensive GC heap fragmentation #50782

noahfalk commented Apr 6, 2021

ghost commented Apr 6, 2021

Maoni0 commented Apr 7, 2021

brianrob commented Apr 7, 2021

tmat commented Apr 7, 2021

brianrob commented Apr 7, 2021

Maoni0 commented Apr 7, 2021

tmat commented Apr 7, 2021 •

edited

Loading

brianrob commented Apr 7, 2021

tmat commented May 5, 2021 •

edited

Loading

ghost commented May 5, 2021

tmat commented May 5, 2021 •

edited

Loading

tmat commented May 5, 2021

tmat commented May 5, 2021

noahfalk commented May 6, 2021

benaadams commented May 6, 2021

tmat commented May 6, 2021 •

edited

Loading

Maoni0 commented May 6, 2021

brianrob commented May 6, 2021

Maoni0 commented May 6, 2021

noahfalk commented May 6, 2021

jkotas commented May 6, 2021

tmat commented May 6, 2021

krwq commented Jun 17, 2021

tmat commented Jun 17, 2021

danmoseley commented Jul 21, 2021

brianrob commented Jul 21, 2021

tmat commented Jul 21, 2021

brianrob commented Jul 22, 2021

brianrob commented Jul 27, 2021

System.Reflection.Metadata pinning can generate extensive GC heap fragmentation #50782

System.Reflection.Metadata pinning can generate extensive GC heap fragmentation #50782

Comments

noahfalk commented Apr 6, 2021

ghost commented Apr 6, 2021

Maoni0 commented Apr 7, 2021

brianrob commented Apr 7, 2021

tmat commented Apr 7, 2021

brianrob commented Apr 7, 2021

Maoni0 commented Apr 7, 2021

tmat commented Apr 7, 2021 • edited Loading

brianrob commented Apr 7, 2021

tmat commented May 5, 2021 • edited Loading

ghost commented May 5, 2021

tmat commented May 5, 2021 • edited Loading

tmat commented May 5, 2021

tmat commented May 5, 2021

noahfalk commented May 6, 2021

benaadams commented May 6, 2021

tmat commented May 6, 2021 • edited Loading

Maoni0 commented May 6, 2021

brianrob commented May 6, 2021

Maoni0 commented May 6, 2021

noahfalk commented May 6, 2021

jkotas commented May 6, 2021

tmat commented May 6, 2021

krwq commented Jun 17, 2021

tmat commented Jun 17, 2021

danmoseley commented Jul 21, 2021

brianrob commented Jul 21, 2021

tmat commented Jul 21, 2021

brianrob commented Jul 22, 2021

brianrob commented Jul 27, 2021

tmat commented Apr 7, 2021 •

edited

Loading

tmat commented May 5, 2021 •

edited

Loading

tmat commented May 5, 2021 •

edited

Loading

tmat commented May 6, 2021 •

edited

Loading