Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better BlobBuilder pooling #72383

Merged
merged 12 commits into from
Mar 12, 2024
Merged

Better BlobBuilder pooling #72383

merged 12 commits into from
Mar 12, 2024

Conversation

jaredpar
Copy link
Member

@jaredpar jaredpar commented Mar 4, 2024

Looking at profiles building Compilers.slnf locally found a number of places that we weren't using PooledBlobBuilder that were impacting allocations. This change removes 250MB of byte[] allocations and saves 50MB on LOH.

image

This reduces time spent in GC during build by ~1 second.

Note: profiling the compiler server is imprecise as due to the parallel nature of the server and msbuild profiles can be quite noisy. For these measurements I did several measurements and picked the most average for the comparisons.

@dotnet-issue-labeler dotnet-issue-labeler bot added Area-Compilers untriaged Issues and PRs which have not yet been triaged by a lead labels Mar 4, 2024
@@ -231,6 +233,12 @@ internal static class PeWriter
var strongNameProvider = context.Module.CommonCompilation.Options.StrongNameProvider;
var corFlags = properties.CorFlags;

if (managedResourceBuilder.Count == 0)
Copy link
Member Author

@jaredpar jaredpar Mar 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are essentially two cases here:

  1. The managedResourceBuilder has content in which case it will be linked, via LinkSuffix, to peBlob below. Thus freeing peBlob frees both.
  2. The managedResourceBuilder has no content thus LinkSuffix does nothing and is discarded. This means it is not returned to the pool.

After some thought decided the easiest approach here was to just discard managedResourceBuilder here if it's unused. It's optional below so this is not a functional change. Happy to consider other approaches.

The fact that LinkSuffix and LinkPrefix don't call Free on 0-length chunks feels like a bug but waiting for @tmat to confirm.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Went ahead and filed a bug dotnet/runtime#99266

@jaredpar jaredpar marked this pull request as ready for review March 4, 2024 21:25
@jaredpar jaredpar requested a review from a team as a code owner March 4, 2024 21:25
@jaredpar
Copy link
Member Author

jaredpar commented Mar 4, 2024

@dotnet/roslyn-compiler PTAL

src/Compilers/Core/Portable/PEWriter/PooledBlobBuilder.cs Outdated Show resolved Hide resolved
@@ -66,7 +66,7 @@ internal static class PeWriter

var ilBuilder = new BlobBuilder(32 * 1024);
var mappedFieldDataBuilder = new BlobBuilder();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why aren't these BlobBuilders pooled as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ilBuilder instance is not pooled because it is significantly larger than the instances that we have pools for. I considered creating a separate pool for ilBuilder but it's complicated. I wanted to get past this change, dig deeper into the profiles and if this proves to be a significant source of allocations then I'll come back to it.

The mappedFieldDataBuilder wasn't shown to be a significant source of allocations hence I hadn't done the work yet to establish the ownership. Unfortunately unlike most other pooling decisions this isn't a simple "it's that type, use the pool" decision. Instead have to sit down and map out which BlobBuilder actually owns the instance.

@AlekseyTs
Copy link
Contributor

AlekseyTs commented Mar 5, 2024

                portablePdbToEmbed = portablePdbBlob;

Are we leaking the portablePdbBlob builder on this code path? Why not free it once, at the end of the function? #Closed


Refers to: src/Compilers/Core/Portable/PEWriter/PeWriter.cs:178 in d2e8a20. [](commit_id = d2e8a20, deletion_comment = False)

@@ -66,7 +66,7 @@ internal static class PeWriter

var ilBuilder = new BlobBuilder(32 * 1024);
var mappedFieldDataBuilder = new BlobBuilder();
var managedResourceBuilder = new BlobBuilder(1024);
var managedResourceBuilder = PooledBlobBuilder.GetInstance();
Copy link
Contributor

@AlekseyTs AlekseyTs Mar 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PooledBlobBuilder.GetInstance();

We usually prefer a pattern where, the one getting things from the pool is the one responsible for putting it back. It doesn't look like the pattern is used for this allocation #Closed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pattern for BlobBuilder is unfortunately more complicated. Yes in most cases it's the case that the initial getter is the one who puts it back in the pool. That is the typical pattern for simple uses like building up signatures, sequence points, etc ...

For the more complicated cases like building up the PE file the BlobBuilder can be linked into a different BlobBuilder via LinkSuffix / LinkPrefix. At that point the BlobBuilder they're linked into becomes the owner and is responsible for freeing the instance. That is the case here. The managedResourceBuilder will be linked into the peBuilder instance and that is the responsibility for freeing.

I'm open to suggestions on how to best express this. I tried a few patterns for asserting that values were properly freed but @333fred rightfully pointed out that they were race conditions. Can try and comment better the ownership at the point of allocation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm open to suggestions on how to best express this.

I suggest to try the following. If some API can take an ownership of a builder, the builder should be passed by reference. If ownership is taken, the ref parameter should be set to null. The caller always frees if the builder is not null.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory I agree with this principal. Unfortunately, many of the API we are dealing with here are in SRM and are existing public API. Can't change them.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think I have an idea on how to clear this up a bit. Going to try and encapsulate this weird ownership problem for this method into a struct that is hopefully easier to reason about.

@AlekseyTs
Copy link
Contributor

AlekseyTs commented Mar 5, 2024

                if (portablePdbStream != null)

Are we leaking portablePdbBlob when we do not get into this if? #Closed


Refers to: src/Compilers/Core/Portable/PEWriter/PeWriter.cs:184 in d2e8a20. [](commit_id = d2e8a20, deletion_comment = False)

var builder = s_chunkPool.Allocate();
if (zero)
{
builder.WriteBytes(0, builder.ChunkCapacity);
Copy link
Contributor

@AlekseyTs AlekseyTs Mar 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

builder.ChunkCapacity

Is this total memory size behind the builder? #Closed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this is the size of the underlying byte[] in the BlobBuilder instance.

@AlekseyTs
Copy link
Contributor

Done with review pass (commit 4)

@jaredpar
Copy link
Member Author

jaredpar commented Mar 5, 2024

@AlekseyTs

Not sure why some of your comments aren't displaying inline or in a manner I can respond directly to so responding here

Are we leaking the portablePdbBlob builder on this code path? Why not free it once, at the end of the function?

The existing code is attempting to use two locals to represent the difference between writing out an explicit PDB to disk vs. embedding the PDB as a resource. I was trying to not disrupt this pattern with my change but I agree the code isn't as clear as it could be here.

Are we leaking portablePdbBlob when we do not get into this if?

Yes but it's also one of those "if we don't get into that if othre bad things happen to". Stepping back and looking at that if I think that it represents a bit of an existing bug. After all if we don't get into here then we just drop the PDB on the floor entirely which is wrong. I'm going to rework this code a bit better to make the PDB ownership clearer and hopefully that will clear up the question about freeing the pool value too.

@AlekseyTs
Copy link
Contributor

Not sure why some of your comments aren't displaying inline or in a manner I can respond directly to so responding here

That is probably happening when a comment is placed on like that wasn't modified in the PR. CodeFlow can do this. And one should be able to respond to the comment in the CodeFlow.

@AlekseyTs
Copy link
Contributor

The existing code is attempting to use two locals to represent the difference between writing out an explicit PDB to disk vs. embedding the PDB as a resource. I was trying to not disrupt this pattern with my change but I agree the code isn't as clear as it could be here.

I would try the following. Add a dedicated local that is used to store the value of the pooled builder. It is the one initialized with GetInstance. Its value can be assigned to other locals, but the code frees it at the end, rather than trying to do this in many different branches. If specific branch takes ownership of the builder, the local should be explicitly set to null.

@jaredpar
Copy link
Member Author

jaredpar commented Mar 5, 2024

Note: was having trouble keeping track of all the null in PEWriter so enabled NRT checking there to make it clearer.

@@ -3372,7 +3370,7 @@ internal void EnsureAnonymousTypeTemplates(CancellationToken cancellationToken)
}

// produce the secondary output (ref assembly) if needed
if (emitSecondaryAssembly)
if (getMetadataPeStreamOpt is not null)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This allowed me to avoid a nullable suppression on getMetadataPeStreamOpt below.

int strongNameSignatureSize,
MethodDefinitionHandle entryPoint,
CorFlags flags,
Func<IEnumerable<Blob>, BlobContentId> deterministicIdProvider,
Func<IEnumerable<Blob>, BlobContentId>? deterministicIdProvider,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes all line up with the declarations on the base type ctor.

// https://github.com/dotnet/runtime/issues/99266
if (mappedCount == 0)
{
MappedFieldDataBlobBuilder.AssertAlive();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a previous iteration of this change I had a similar method called AssertFreed and @333fred rightfully pointed out it's a race condition. As written now this call is not a race cause the BlobBuilder will be alive. Once the bug mentioned here is fixed this assert also becomes a race condition. However the chance of the race is unlikely enough that this will still serve as an sufficient trip wire to ensure that we don't forget to update this code.

@@ -35,13 +33,69 @@ public PeWritingException(Exception inner)

internal static class PeWriter
{
internal struct EmitBuilders
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Know there was, rightfully, confusion in previous iterations of the change on whose responsibility it was to free values and whether or not that was handled correctly in the code. Tried to respond to that feedback by moving all of the responsibility for the builder ownership into this type. Hoping this makes everything clearer.

s_chunkPool.Free(this);
}
#if DEBUG
IsAlive = false;
Copy link
Contributor

@AlekseyTs AlekseyTs Mar 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IsAlive = false;

Is this a race? Someone could get the instance from s_chunkPool after we called s_chunkPool.Free but before we got here. Consider moving this assignment to the beginning of the method, and, perhaps, use VolatileWrite #Closed

AlekseyTs
AlekseyTs previously approved these changes Mar 6, 2024
Copy link
Contributor

@AlekseyTs AlekseyTs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (commit 7)

333fred
333fred previously approved these changes Mar 7, 2024
PortablePdbBlobBuilder = null;
}

internal void Free()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe make this IDisposable and using it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of our pool freeing in the compiler is not via IDisposable which is why I didn't use that here.

var mappedFieldDataBuilder = new BlobBuilder();
var managedResourceBuilder = new BlobBuilder(1024);

var emitBuilders = new EmitBuilders();
Copy link
Member Author

@jaredpar jaredpar Mar 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AlekseyTs, @333fred

Last night I was considering going ahead and fixing the LinkSuffix / LinkPrefix bug and realized my current approach is flawed. Once that bug is fixed and a new version of SRM is used with Roslyn then my code will start throwing cause Count == 0 builders will be double freed. This will very concretely happen in Source Build.

After some thought I decided a more durable fix is to pass null into ExtendedPEBuilder for these two BlobBuilder instead of an empty BlobBuilder. The null is an allowed case and is not impacted by the bug fix. There were two ways to achieve this:

  1. Add a simple check after the BuildAndMetadataAndIL and free + null out the fields if their Count is zero. This is a smaller change but felt a bit off. Like I was fixing the symptom not the cause.
  2. Do not allocate the BlobBuilder unless we're going to put content into it. This change produces bigger code churn but feels more correct.

Ended up going with approach (2) here. If you feel churn is too big can use approach (1) instead.

@jaredpar jaredpar dismissed stale reviews from 333fred and AlekseyTs March 8, 2024 20:19

I made substantial changes to the PR

resource.WriteContentTo(resourceWriter);
resourceWriter.Align(8);
return (uint)result;
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These methods are only used in PopulateManifestResourceTableRows. Given the tight relationship that has to the impl details of these methods now I thought it better to make them local functions instead.

@AlekseyTs
Copy link
Contributor

Done with review pass (commit 11)

Copy link
Contributor

@AlekseyTs AlekseyTs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (commit 12)

@jaredpar jaredpar enabled auto-merge (squash) March 12, 2024 17:36
@jaredpar jaredpar merged commit dec8839 into dotnet:main Mar 12, 2024
24 checks passed
@dotnet-policy-service dotnet-policy-service bot added this to the Next milestone Mar 12, 2024
@RikkiGibson RikkiGibson modified the milestones: Next, 17.10 P3 Mar 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area-Compilers untriaged Issues and PRs which have not yet been triaged by a lead
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants