Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.

Using AllocateUninitializedArray in array pool #24504

Merged
merged 5 commits into from
May 28, 2019

Conversation

VSadov
Copy link
Member

@VSadov VSadov commented May 9, 2019

ArrayPool does not guarantee that rented arrays are pre-cleaned.
As such it can use AllocateUninitializedArray when it needs to allocate underlying arrays.

In situations when the pool needs to allocate there is measurable performance gain, depending on the size of the array and overall pattern of allocation.
(from running these tests: https://github.com/dotnet/performance/blob/master/src/benchmarks/micro/corefx/System.Buffers/ArrayPoolTests.cs)
When there are no allocations, there is no direct impact.

Also:
While working on this, I noticed that when expanding allocation context, we were not considering unspent space from the previous context in our calculations, which in case of GC_ALLOC_ZEROING_OPTIONAL would result in extra zeroing roughly the size of the leftover, thus negating some advantages of "Uninitialized" allocations.
In a case of 4K allocation the leftover could be up to 4K.

The above was fixed by adjusting contiguous expansions to account for unused space.
(done only for GC_ALLOC_ZEROING_OPTIONAL)

@VSadov VSadov force-pushed the uninitialized1 branch 5 times, most recently from ca9cd83 to b25329e Compare May 17, 2019 01:27
@VSadov VSadov changed the title [WIP] Using AllocateUninitializedArray in array pool Using AllocateUninitializedArray in array pool May 22, 2019
@VSadov VSadov closed this May 22, 2019
@VSadov VSadov reopened this May 22, 2019
@VSadov VSadov marked this pull request as ready for review May 22, 2019 21:05
@VSadov VSadov requested review from Maoni0, stephentoub and sergiy-k May 22, 2019 21:05
//if (!(flags & GC_ALLOC_ZEROING_OPTIONAL))
//{
// verify_mem_cleared(start - plug_skew, limit_size);
//}
Copy link
Member

@stephentoub stephentoub May 23, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is both the old and new code commented out? #Resolved

Copy link
Member Author

@VSadov VSadov May 23, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is too slow even for debug.
I used it for investigations and had to adjust for GC_ALLOC_ZEROING_OPTIONAL, but decided to keep the whole thing commented as it was before. #Resolved

}
else
{
// The request was for a size too large for the pool. Allocate an array of exactly the requested length.
// When it's returned to the pool, we'll simply throw it away.
buffer = new T[minimumLength];
buffer = GC.AllocateUninitializedArray<T>(minimumLength);
Copy link
Member

@stephentoub stephentoub May 23, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ArrayPool changes LGTM. #Resolved

@@ -654,6 +654,11 @@ internal static void UnregisterMemoryLoadChangeNotification(Action notification)
// the array is always zero-initialized.
internal static T[] AllocateUninitializedArray<T>(int length)
{
if (RuntimeHelpers.IsReferenceOrContainsReferences<T>())
Copy link

@sergiy-k sergiy-k May 23, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be checked only if we are going to call AllocateNewArray? I.e. after the check for min/small size? #Resolved

Copy link
Member Author

@VSadov VSadov May 23, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IsReferenceOrContainsReferences is a JIT intrinsic. I expect the rest of the method be short-circuited at JIT time if the condition is true and then we would just have new T[], perhaps even inlined.
Although I did not check if that really happens. #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IsReferenceOrContainsReferences is elided when T is a valuetype. When it is a reference type we make a method call to something that returns true.

I wonder why IsReferenceOrContainsReferences does not act as a constant. @jkotas - is that expected?


In reply to: 286784972 [](ancestors = 286784972)

Copy link
Member Author

@VSadov VSadov May 23, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was tiered jitting.
With tiered JIT disabled reference types are passed to new T[] :

--- E:\CoreClr2\coreclr\src\System.Private.CoreLib\src\System\GC.cs ------------
            if (RuntimeHelpers.IsReferenceOrContainsReferences<T>())
00007FF8695FF660  push        rsi  
00007FF8695FF661  sub         rsp,30h  
00007FF8695FF665  xor         eax,eax  
00007FF8695FF667  mov         qword ptr [rsp+20h],rax  
00007FF8695FF66C  mov         qword ptr [rsp+28h],rcx  
00007FF8695FF671  mov         esi,edx  
            {
                return new T[length];
00007FF8695FF673  call        qword ptr [7FF869221A68h]  
00007FF8695FF679  mov         rcx,rax  
00007FF8695FF67C  movsxd      rdx,esi  
00007FF8695FF67F  call        qword ptr [7FF869221098h]  
00007FF8695FF685  nop  
00007FF8695FF686  add         rsp,30h  
00007FF8695FF68A  pop         rsi  
00007FF8695FF68B  ret 

In reply to: 287150930 [](ancestors = 287150930,286784972)

dprintf (3, ("contigous ac: making min obj gap %Ix->%Ix(%Id)",
acontext->alloc_ptr, (acontext->alloc_ptr + pad_size), pad_size));
make_unused_array (acontext->alloc_ptr, pad_size);
Copy link
Member

@Maoni0 Maoni0 May 23, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make_unused_array (acontext->alloc_ptr, pad_size); [](start = 11, length = 51)

is there any reason for this change? #Resolved

Copy link
Member Author

@VSadov VSadov May 23, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used tracing for some investigations and having trace form the nested make_unused_array before the outer trace was confusing.

I basically moved the "contiguous ac: making gap ..." thing before we call make_unused_array so that the reason why we do unused array would appear in the trace before the actual make_unused_array, not after.


In reply to: 287066315 [](ancestors = 287066315)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that is not a general pattern used for tracing, I can move it back. It felt a bit easier to read.


In reply to: 287090700 [](ancestors = 287090700,287066315)

Copy link
Member

@Maoni0 Maoni0 May 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see; makes sense. I'm fine with keeping the change. #Resolved

{
// In a contiguous AC case with GC_ALLOC_ZEROING_OPTIONAL, deduct unspent space from the limit to clear only what is necessary.
if (flags & GC_ALLOC_ZEROING_OPTIONAL &&
(allocated == acontext->alloc_limit || allocated == acontext->alloc_limit + Align (min_obj_size, align_const)))
Copy link
Member

@Maoni0 Maoni0 May 23, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

allocated == acontext->alloc_limit [](start = 13, length = 34)

nit - as a convention in GC code, please always put ()'s around individual statements -

    if ((flags & GC_ALLOC_ZEROING_OPTIONAL) &&
        ((allocated == acontext->alloc_limit) || (allocated == (acontext->alloc_limit + Align (min_obj_size, align_const)))))

#Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am trying :-) , I think I missed couple ( ) here since there is a lot of nesting.


In reply to: 287172722 [](ancestors = 287172722)

if (flags & GC_ALLOC_ZEROING_OPTIONAL &&
(allocated == acontext->alloc_limit || allocated == acontext->alloc_limit + Align (min_obj_size, align_const)))
{
limit -= (allocated - acontext->alloc_ptr);
Copy link
Member

@Maoni0 Maoni0 May 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

limit -= (allocated - acontext->alloc_ptr); [](start = 12, length = 43)

why reduce limit here? we already knew that we could fit the limit we asked for, might as well just give it out anyway. I don't understand why this needs to be done because in adjust_limit_clr we will naturally only advance acontext->alloc_ptr by a min_obj_size when alloc_limit was end of seg, ie, we do not make large free objects (we just made the pad). #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we give more than necessary, we will have to zero the space beyond the start + size. That would be roughly the size of the leftover in the current AC. In 4K allocation the leftover could be up to 4K. Clearing that much extra is noticeable.
That would be ok for a regular allocation, but for GC_ALLOC_ZEROING_OPTIONAL we are trying to not clear more than needed.


In reply to: 287178268 [](ancestors = 287178268)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reduce the limit so we would only have to clear the pad at the end. (since that is not consumed by the object, we have to clean it).
If I do not reduce, we will have to clear additional "allocated - acontext->alloc_ptr" bytes since that would not be consumed by the object and whatever is left in the AC needs to be clean as the next allocation may just use that space without clearing.


In reply to: 287180883 [](ancestors = 287180883,287178268)

Copy link
Member

@Maoni0 Maoni0 May 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahh, I thought you said we were wasting space in the AC when we talked about this. I misunderstood then. this seems like an optimization targeted at a very narrow scenario almost not worth the complexity. I think I'm ok with keeping this if you feel strongly about it but I could definitely go the other way.

also you don't need to check for gen_number in the if block because for LOH allocations the if statement will not be true. #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was not sure about gen_number. Thanks!

Yeah, I'd like to keep this. According to VTune these extra expenses are noticeable.
That is how I found this - it was not obvious from the code, but profiles indicated that we are spending time in memset even when doing Uninitialized allocations.

I also tried to combine this with the similar logic inside alloc_clr. That promised to make it simpler, but at the end did not work because we must update "allocated" here while we still holding the heap lock.


In reply to: 287183348 [](ancestors = 287183348)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will have to remove unnecessary gen_number check.


In reply to: 287189803 [](ancestors = 287189803,287183348)

@Maoni0
Copy link
Member

Maoni0 commented May 24, 2019

btw, just in case as a reminder, please make sure to squash your commits into 1 when merging. #Resolved

@@ -100,13 +100,13 @@ public override T[] Rent(int minimumLength)

// The pool was exhausted for this buffer size. Allocate a new buffer with a size corresponding
// to the appropriate bucket.
buffer = new T[_buckets[index]._bufferLength];
buffer = GC.AllocateUninitializedArray<T>(_buckets[index]._bufferLength);
Copy link
Member

@Maoni0 Maoni0 May 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AllocateUninitializedArray [](start = 28, length = 26)

I may just be paranoid here but could this be a breaking change if folks were expecting the buffers in the pool to be cleared? do we need a config to switch to allocating with the uninit API? #Resolved

Copy link
Member

@jkotas jkotas May 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need a config to switch to allocating with the uninit API?

No. We are not maintaining bug-for-bug compatibility in CoreCLR. If somebody was depending on the buffer being zero initialized before and the app breaks on .NET Core 3.0, it is their bug and they need to fix it before upgrading to .NET Core 3.0. #Resolved

@Maoni0
Copy link
Member

Maoni0 commented May 24, 2019

        if (Unsafe.SizeOf<T>() * length < 256 - 3 * IntPtr.Size)

nit

if ((Unsafe.SizeOf() * length) < (256 - 3 * IntPtr.Size))

also were you planning to make a separate PR to increase the size?
#Resolved


Refers to: src/System.Private.CoreLib/src/System/GC.cs:673 in 199fe01. [](commit_id = 199fe01, deletion_comment = False)

@VSadov
Copy link
Member Author

VSadov commented May 24, 2019

        if (Unsafe.SizeOf<T>() * length < 256 - 3 * IntPtr.Size)

I hoped to get a better sense on this number, but ArrayPool is even less sensitive than the scenarios we had before.
From the allocation context mechanics we know it must be between 0 and 8k and I've heard arguments for either extreme - 0 to not take the choice from the user, 8k to reduce waste due to plug dividers. I can be convinced in any, or something in between. - like 1k - because it is a round number.

I think as long as we do regular and Uninitialized allocations from the same heap, the pattern of allocation will have much larger effect than this threshold.

If we ever introduce a "dirty heap" - similar to proposed "pinned heap" then the threshold may become more interesting. Maybe ... this time I am less sure :-)


In reply to: 495435651 [](ancestors = 495435651)


Refers to: src/System.Private.CoreLib/src/System/GC.cs:673 in 199fe01. [](commit_id = 199fe01, deletion_comment = False)

@VSadov
Copy link
Member Author

VSadov commented May 24, 2019

        if (Unsafe.SizeOf<T>() * length < 256 - 3 * IntPtr.Size)

Let's discuss this when we meet next time. Have a vote on this or something.


In reply to: 495464775 [](ancestors = 495464775,495435651)


Refers to: src/System.Private.CoreLib/src/System/GC.cs:673 in 199fe01. [](commit_id = 199fe01, deletion_comment = False)

@VSadov
Copy link
Member Author

VSadov commented May 24, 2019

        if (Unsafe.SizeOf<T>() * length < 256 - 3 * IntPtr.Size)

I think we should also reserve the possibility of changing the threshold if implementation change.


In reply to: 495464870 [](ancestors = 495464870,495464775,495435651)


Refers to: src/System.Private.CoreLib/src/System/GC.cs:673 in 199fe01. [](commit_id = 199fe01, deletion_comment = False)

@VSadov VSadov requested a review from Maoni0 May 25, 2019 15:27
@VSadov
Copy link
Member Author

VSadov commented May 28, 2019

Any more suggestions on this PR?

Copy link
Member

@Maoni0 Maoni0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@VSadov
Copy link
Member Author

VSadov commented May 28, 2019

Thanks!!

@VSadov VSadov merged commit 4ca032d into dotnet:master May 28, 2019
@VSadov VSadov deleted the uninitialized1 branch May 28, 2019 21:29
Dotnet-GitSync-Bot pushed a commit to Dotnet-GitSync-Bot/corert that referenced this pull request May 29, 2019
* Just use `new T[]` when elements are not pointer-free

* reduce zeroing out when not necessary.

* use AllocateUninitializedArray in ArrayPool

Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>
jkotas pushed a commit to dotnet/corert that referenced this pull request May 29, 2019
* Just use `new T[]` when elements are not pointer-free

* reduce zeroing out when not necessary.

* use AllocateUninitializedArray in ArrayPool

Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>
Dotnet-GitSync-Bot pushed a commit to Dotnet-GitSync-Bot/mono that referenced this pull request May 29, 2019
* Just use `new T[]` when elements are not pointer-free

* reduce zeroing out when not necessary.

* use AllocateUninitializedArray in ArrayPool

Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>
marek-safar pushed a commit to mono/mono that referenced this pull request May 29, 2019
* Just use `new T[]` when elements are not pointer-free

* reduce zeroing out when not necessary.

* use AllocateUninitializedArray in ArrayPool

Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>
Dotnet-GitSync-Bot pushed a commit to Dotnet-GitSync-Bot/corefx that referenced this pull request May 30, 2019
* Just use `new T[]` when elements are not pointer-free

* reduce zeroing out when not necessary.

* use AllocateUninitializedArray in ArrayPool

Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>
stephentoub pushed a commit to Dotnet-GitSync-Bot/corefx that referenced this pull request May 31, 2019
* Just use `new T[]` when elements are not pointer-free

* reduce zeroing out when not necessary.

* use AllocateUninitializedArray in ArrayPool

Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>
stephentoub pushed a commit to Dotnet-GitSync-Bot/corefx that referenced this pull request May 31, 2019
* Just use `new T[]` when elements are not pointer-free

* reduce zeroing out when not necessary.

* use AllocateUninitializedArray in ArrayPool

Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>
stephentoub pushed a commit to Dotnet-GitSync-Bot/corefx that referenced this pull request Jun 6, 2019
* Just use `new T[]` when elements are not pointer-free

* reduce zeroing out when not necessary.

* use AllocateUninitializedArray in ArrayPool

Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>
stephentoub pushed a commit to dotnet/corefx that referenced this pull request Jun 7, 2019
* Just use `new T[]` when elements are not pointer-free

* reduce zeroing out when not necessary.

* use AllocateUninitializedArray in ArrayPool

Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>
@ENikS
Copy link

ENikS commented Jul 11, 2020

Would be nice if documentation could be updated to specify that arrays are not zero initialized.

@jkotas
Copy link
Member

jkotas commented Jul 12, 2020

Doc updates are always just two mouse clicks away: dotnet/dotnet-api-docs#4507

picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
* Just use `new T[]` when elements are not pointer-free

* reduce zeroing out when not necessary.

* use AllocateUninitializedArray in ArrayPool


Commit migrated from dotnet/coreclr@4ca032d
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants