Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

System.Reflection.Metadata pinning can generate extensive GC heap fragmentation #50782

Closed
noahfalk opened this issue Apr 6, 2021 · 29 comments · Fixed by #56336
Closed

System.Reflection.Metadata pinning can generate extensive GC heap fragmentation #50782

noahfalk opened this issue Apr 6, 2021 · 29 comments · Fixed by #56336
Assignees
Labels
Milestone

Comments

@noahfalk
Copy link
Member

noahfalk commented Apr 6, 2021

System.Reflection.Metadata uses pinned memory buffers to store the contents of embedded PDBs. When the runtime loads and caches these readers to augment stack traces with source line info it winds up preserving the pinned memory for the lifetime of the app. Long lived pinned objects can generate large GC heap fragmentation leading applications to use far more committed VM than they otherwise would have. We have another internal team at MS observing 132MB of VM wasted due to this pin. The amount of wasted VM has nothing to do with the size the pinned object, it is solely based on where the allocated object happens to fall within the global bounds of the GC heap which varies depending on the overall allocation behavior of all the code in the app.

Although the ideal solution is not to use pinning at all, a likely lower effort solution is to allocate the byte array using the pinned heap on .NET 5 and up where it is available. If fixing this proves not to be viable then the runtime will probably need to pursue alternate solutions such as adding an app configuration switch so that app developers can disable portable PDB usage.

@noahfalk noahfalk added the tenet-performance Performance related issue label Apr 6, 2021
@dotnet-issue-labeler dotnet-issue-labeler bot added area-GC-coreclr untriaged New issue has not been triaged by the area owner labels Apr 6, 2021
@ghost
Copy link

ghost commented Apr 6, 2021

Tagging subscribers to this area: @dotnet/gc
See info in area-owners.md if you want to be subscribed.

Issue Details

System.Reflection.Metadata uses pinned memory buffers to store the contents of embedded PDBs. When the runtime loads and caches these readers to augment stack traces with source line info it winds up preserving the pinned memory for the lifetime of the app. Long lived pinned objects can generate large GC heap fragmentation leading applications to use far more committed VM than they otherwise would have. We have another internal team at MS observing 132MB of VM wasted due to this pin. The amount of wasted VM has nothing to do with the size the pinned object, it is solely based on where the allocated object happens to fall within the global bounds of the GC heap which varies depending on the overall allocation behavior of all the code in the app.

Although the ideal solution is not to use pinning at all, a likely lower effort solution is to allocate the byte array using the pinned heap on .NET 5 and up where it is available. If fixing this proves not to be viable then the runtime will probably need to pursue alternate solutions such as adding an app configuration switch so that app developers can disable portable PDB usage.

Author: noahfalk
Assignees: tmat
Labels:

area-GC-coreclr, tenet-performance, untriaged

Milestone: -

@Maoni0
Copy link
Member

Maoni0 commented Apr 7, 2021

Although the ideal solution is not to use pinning at all,

does it need to be pinned? what was the reason this was pinned in the first place? we generally only advise POH to be used to replace mandatory pinning usage.

@brianrob
Copy link
Member

brianrob commented Apr 7, 2021

If it does need to be pinned, does it need to be pinned for this long? From what I can see in the memory dumps that led to this issue is that the object is likely pinned for its lifetime.

@tmat
Copy link
Member

tmat commented Apr 7, 2021

The MetadataReader uses pointers for performance, hence pinning is needed while metadata is read.

We essentially implemented ReadOnlySpan<byte> and ReadOnlyMemory<byte> before it existed in the runtime.
https://github.com/dotnet/runtime/tree/main/src/libraries/System.Reflection.Metadata/src/System/Reflection/Internal/MemoryBlocks

It might be possible to replace the memory abstractions with ReadOnlySpan<byte> and ReadOnlyMemory<byte>, but SRM is also available for .NET Framework so it might be too complicated to maintain both versions.

@brianrob
Copy link
Member

brianrob commented Apr 7, 2021

@tmat, do you think it's possible to pin/unpin rather than putting this in the POH?

@Maoni0
Copy link
Member

Maoni0 commented Apr 7, 2021

also how frequently would this object be created? are we talking about one object for the lifetime of the process or would one get created every time something happens?

@tmat
Copy link
Member

tmat commented Apr 7, 2021

So, this is what happens:

  1. PDB content is requested - e.g. someone called Exception.ToString or StackTrace(fRequireSources: true), the runtime enumerates the frames and for those that are in assemblies with PDB embedded calls to SRM to load the PDB data. The resulting reader is cached. The entries in the cache are not evicted AFAIK. This is controlled by code in System.Diagnostics.
  2. Upon request SRM allocates an array for the PDB data and stream-reads the compressed Embedded PDB data from the loaded PE image through DeflateStream into the array.
  3. The array is then pinned so that the MetadataReader can read from it.

@brianrob
Copy link
Member

brianrob commented Apr 7, 2021

Is the MetadataReader exposed, or is it an internal implementation detail? I'm wondering if we can't just pin the buffer that contains the PDB when it is in-use. Every time a call comes in, we pin the buffer, get a pointer to it, and then pass it to the MetadataReader as part of the call.

@tmat
Copy link
Member

tmat commented May 5, 2021

@brianrob Step [2] above creates MetadataReaderProvider instance that holds on an ImmutableArray<byte> with the PDB data. At this point the memory is not pinned. Step [3] happens when System.Diagnostics code calls MetadataReaderProvider.GetMetadataReader. At this point the memory is pinned and returned MetadataReader instance holds on the pointer. The code is here: https://source.dot.net/#System.Diagnostics.StackTrace/System/Diagnostics/StackTraceSymbols.cs,116

Once the provider returns MetadataReader the memory can't be unpinned until we are sure no one is using the reader of any related types that point to the memory. This is up to the user of the reader, the reader can't determine that. The caller calls MetadataReaderProvider.Dispose to free the memory.

@ghost
Copy link

ghost commented May 5, 2021

Tagging subscribers to this area: @tommcdon, @krwq
See info in area-owners.md if you want to be subscribed.

Issue Details

System.Reflection.Metadata uses pinned memory buffers to store the contents of embedded PDBs. When the runtime loads and caches these readers to augment stack traces with source line info it winds up preserving the pinned memory for the lifetime of the app. Long lived pinned objects can generate large GC heap fragmentation leading applications to use far more committed VM than they otherwise would have. We have another internal team at MS observing 132MB of VM wasted due to this pin. The amount of wasted VM has nothing to do with the size the pinned object, it is solely based on where the allocated object happens to fall within the global bounds of the GC heap which varies depending on the overall allocation behavior of all the code in the app.

Although the ideal solution is not to use pinning at all, a likely lower effort solution is to allocate the byte array using the pinned heap on .NET 5 and up where it is available. If fixing this proves not to be viable then the runtime will probably need to pursue alternate solutions such as adding an app configuration switch so that app developers can disable portable PDB usage.

Author: noahfalk
Assignees: tmat
Labels:

area-System.Diagnostics, area-System.Reflection.Metadata, tenet-performance, untriaged

Milestone: -

@tmat tmat removed their assignment May 5, 2021
@tmat
Copy link
Member

tmat commented May 5, 2021

@noahfalk We do no pin the array at the point where we allocate it. It's pinned lazily, only when a MetadataReader is requested later on. I guess in most cases the reader is requested right away, so it might make sense to pin it immediately and allocate it directly on the pinned heap. Any thoughts on that?

@tmat
Copy link
Member

tmat commented May 5, 2021

Alternatively, we could add a ReadOnlySpan<byte> GetMetadataContent() method on the MetadataReaderProvider that returns the content without pinning it. Then the caller can make a copy into a pinned heap array and open the MetadataReader on the copy.

@tmat
Copy link
Member

tmat commented May 5, 2021

@tommcdon Added System.Diagnostics label since the particular usage that's affected is in StackTrace implementation. Depending on how we implement this in SRM, System.Diagnostics might need an update as well.

@tmat tmat self-assigned this May 5, 2021
@noahfalk
Copy link
Member Author

noahfalk commented May 6, 2021

so it might make sense to pin it immediately and allocate it directly on the pinned heap. Any thoughts on that?

That was my original idea. @Maoni0 seemed nervous about this and I assume it still has some performance implication, but I don't what it is. Presumably there is a tradeoff here between "better performance" and "requires fewer changes" that we can weigh once we understand what the implications are.

It might be possible to replace the memory abstractions with ReadOnlySpan and ReadOnlyMemory, but SRM is also available for .NET Framework so it might be too complicated to maintain both versions.

System.Diagnostics.DiagnosticSource uses Span types and it ships downlevel on .NET Framework. I believe there is a polyfill in place that allows you to maintain only one version of the code (the one using Span) though downlevel performance will probably be worse than the raw pointer code. If losing some perf for downlevel scenarios is acceptable this might be a really nice solution.

Then the caller can make a copy into a pinned heap array and open the MetadataReader on the copy.

Do you have any expectation how this would affect perf? Assume we are formatting the same stack trace in a loop as fast as possible (which currently I think is around ~250,000 frames/sec). If we needed to open a new MetadataReader once per frame before parsing each frame's worth of line information, how much slower do you think it would run? I think we have some tolerance to slow this down, but not a lot. We had a high severity support case ~5 years back that was caused by a regression in stack trace perf from doing portable PDB parsing and adding that MetadataReader cache was what resolved it. I don't want to be poking that hornet's nest a 2nd time : )

@benaadams
Copy link
Member

Does the ReadOnlyUnmanagedMemoryStream etc need to use a byte* or could it use Memory<byte> and the pinning requirement be dropped entirely?

@tmat
Copy link
Member

tmat commented May 6, 2021

@benaadams The entire MetadataReader uses pointers for reading data. So to avoid pinning we would need to make a lot of changes (and #ifdefs).

@Maoni0
Copy link
Member

Maoni0 commented May 6, 2021

pinning objects that could be anywhere on the heap, don't strictly need to be pinned and never get unpinned is a scenario that's extremely hard to justify. the "never get unpinned" aspect makes it the hardest for GC to do its job - as soon as a segment is extended beyond the pinned object it can't shrink any smaller thus you see the problem that originated this issue.

the minimum I would do is to unpin it after you are done using it, allowing GC to compact it when needed.

@brianrob
Copy link
Member

brianrob commented May 6, 2021

Agreed with @Maoni0. The other thing worth calling out is that the scenario that generated this issue is the exception stack trace generation scenario, where the PDB contents get loaded to retrieve source line information, and then are cached for the life of the process. Thus, there isn't even an option to unpin the contents currently - it will just live forever. Apps that consume the MetadataReader on their own have an option to dispose of it when its no longer needed.

Ideally, the application doesn't throw a huge number of exceptions, and so it's possible that the application is paying a memory penalty that is outsized, relative to the benefit that it's getting from the functionality, and it has no way to counteract that.

@Maoni0
Copy link
Member

Maoni0 commented May 6, 2021

just to clarify, as I realize that I didn't specifically answer the question about POH, on POH you are again in another bad situation which is things on POH cannot be get unpinned and since we don't ever decommit memory in the middle of a segment (only at the end of a segment) you can again get into fragmentation situation on POH segments. and in the scenarios where we care about reserved range of the GC heap, you'd also be creating POH segments in the middle of that range and thus may prevent GC from from forming larger free spaces to allocate new segments.

@noahfalk
Copy link
Member Author

noahfalk commented May 6, 2021

Thanks for the info! I'm striking my original proposal to use pinned object heap since it doesn't work how I anticipated and doesn't appear to resolve the underlying problem. @tmat at this point here are the options I see in priority for order by performance/user experience:

  1. Replace raw pointer usage in S.R.M with Span (or any other solution that eliminates the pinning entirely)
  2. Agree on an S.R.M API that allows the BCL stack trace code to indicate when it is reading and when it is done so that S.R.M can pin and un-pin. Changing between pinned and unpinned states needs to be fast (~1us) because some scenarios will require it to be invoked very frequently. If the only thing the API does is create/delete a pinned handle that perf goal should be easy to meet, but if the API lumps in other considerations such as copying memory, requiring the memory blob to be re-parsed from scratch, or re-creating internal data structures then we'd need to ensure that work fits in the time constraints.
  3. BCL could add a switch that controls whether the runtime will ever load portable PDBs for stack trace and recommend that customers with concerns on VM usage disable the feature.

@jkotas
Copy link
Member

jkotas commented May 6, 2021

Can we just use unmanaged memory to fix the pinning problem? S.R.M APIs are generally designed to deal with unmanaged memory. For example, there is FromPortablePdbImage(byte* start, int size).

@tmat
Copy link
Member

tmat commented May 6, 2021

@jkotas Good point. I think we can.

@tommcdon tommcdon removed the untriaged New issue has not been triaged by the area owner label May 10, 2021
@tommcdon tommcdon added this to the 6.0.0 milestone May 10, 2021
@krwq
Copy link
Member

krwq commented Jun 17, 2021

[Triage] @tmat, still planning to fix it in 6.0?

@tmat
Copy link
Member

tmat commented Jun 17, 2021

Yes, but only after Hot Reload features are done.

@danmoseley
Copy link
Member

We should only have one area label.

@brianrob
Copy link
Member

@tmat, checking in on this issue. Is this on track to make it for .NET 6?

@tmat
Copy link
Member

tmat commented Jul 21, 2021

@brianrob I should be able to work on it soon, now that most of Roslyn Hot Reload is feature complete.

@brianrob
Copy link
Member

Awesome. Thanks @tmat.

@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Jul 26, 2021
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Jul 27, 2021
@brianrob
Copy link
Member

Thanks @tmat!

@ghost ghost locked as resolved and limited conversation to collaborators Aug 27, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
No open projects
Development

Successfully merging a pull request may close this issue.

9 participants