-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Using AllocateUninitializedArray in array pool #24504
Conversation
ca9cd83
to
b25329e
Compare
//if (!(flags & GC_ALLOC_ZEROING_OPTIONAL)) | ||
//{ | ||
// verify_mem_cleared(start - plug_skew, limit_size); | ||
//} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is both the old and new code commented out? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is too slow even for debug.
I used it for investigations and had to adjust for GC_ALLOC_ZEROING_OPTIONAL
, but decided to keep the whole thing commented as it was before. #Resolved
} | ||
else | ||
{ | ||
// The request was for a size too large for the pool. Allocate an array of exactly the requested length. | ||
// When it's returned to the pool, we'll simply throw it away. | ||
buffer = new T[minimumLength]; | ||
buffer = GC.AllocateUninitializedArray<T>(minimumLength); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ArrayPool changes LGTM. #Resolved
@@ -654,6 +654,11 @@ internal static void UnregisterMemoryLoadChangeNotification(Action notification) | |||
// the array is always zero-initialized. | |||
internal static T[] AllocateUninitializedArray<T>(int length) | |||
{ | |||
if (RuntimeHelpers.IsReferenceOrContainsReferences<T>()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be checked only if we are going to call AllocateNewArray? I.e. after the check for min/small size? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IsReferenceOrContainsReferences
is a JIT intrinsic. I expect the rest of the method be short-circuited at JIT time if the condition is true and then we would just have new T[]
, perhaps even inlined.
Although I did not check if that really happens. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was tiered jitting.
With tiered JIT disabled reference types are passed to new T[]
:
--- E:\CoreClr2\coreclr\src\System.Private.CoreLib\src\System\GC.cs ------------
if (RuntimeHelpers.IsReferenceOrContainsReferences<T>())
00007FF8695FF660 push rsi
00007FF8695FF661 sub rsp,30h
00007FF8695FF665 xor eax,eax
00007FF8695FF667 mov qword ptr [rsp+20h],rax
00007FF8695FF66C mov qword ptr [rsp+28h],rcx
00007FF8695FF671 mov esi,edx
{
return new T[length];
00007FF8695FF673 call qword ptr [7FF869221A68h]
00007FF8695FF679 mov rcx,rax
00007FF8695FF67C movsxd rdx,esi
00007FF8695FF67F call qword ptr [7FF869221098h]
00007FF8695FF685 nop
00007FF8695FF686 add rsp,30h
00007FF8695FF68A pop rsi
00007FF8695FF68B ret
In reply to: 287150930 [](ancestors = 287150930,286784972)
dprintf (3, ("contigous ac: making min obj gap %Ix->%Ix(%Id)", | ||
acontext->alloc_ptr, (acontext->alloc_ptr + pad_size), pad_size)); | ||
make_unused_array (acontext->alloc_ptr, pad_size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make_unused_array (acontext->alloc_ptr, pad_size); [](start = 11, length = 51)
is there any reason for this change? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used tracing for some investigations and having trace form the nested make_unused_array
before the outer trace was confusing.
I basically moved the "contiguous ac: making gap ..." thing before we call make_unused_array so that the reason why we do unused array would appear in the trace before the actual make_unused_array, not after.
In reply to: 287066315 [](ancestors = 287066315)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If that is not a general pattern used for tracing, I can move it back. It felt a bit easier to read.
In reply to: 287090700 [](ancestors = 287090700,287066315)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see; makes sense. I'm fine with keeping the change. #Resolved
{ | ||
// In a contiguous AC case with GC_ALLOC_ZEROING_OPTIONAL, deduct unspent space from the limit to clear only what is necessary. | ||
if (flags & GC_ALLOC_ZEROING_OPTIONAL && | ||
(allocated == acontext->alloc_limit || allocated == acontext->alloc_limit + Align (min_obj_size, align_const))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
allocated == acontext->alloc_limit [](start = 13, length = 34)
nit - as a convention in GC code, please always put ()'s around individual statements -
if ((flags & GC_ALLOC_ZEROING_OPTIONAL) &&
((allocated == acontext->alloc_limit) || (allocated == (acontext->alloc_limit + Align (min_obj_size, align_const)))))
#Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am trying :-) , I think I missed couple ( ) here since there is a lot of nesting.
In reply to: 287172722 [](ancestors = 287172722)
if (flags & GC_ALLOC_ZEROING_OPTIONAL && | ||
(allocated == acontext->alloc_limit || allocated == acontext->alloc_limit + Align (min_obj_size, align_const))) | ||
{ | ||
limit -= (allocated - acontext->alloc_ptr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
limit -= (allocated - acontext->alloc_ptr); [](start = 12, length = 43)
why reduce limit here? we already knew that we could fit the limit we asked for, might as well just give it out anyway. I don't understand why this needs to be done because in adjust_limit_clr we will naturally only advance acontext->alloc_ptr by a min_obj_size when alloc_limit was end of seg, ie, we do not make large free objects (we just made the pad). #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we give more than necessary, we will have to zero the space beyond the start + size. That would be roughly the size of the leftover in the current AC. In 4K allocation the leftover could be up to 4K. Clearing that much extra is noticeable.
That would be ok for a regular allocation, but for GC_ALLOC_ZEROING_OPTIONAL we are trying to not clear more than needed.
In reply to: 287178268 [](ancestors = 287178268)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reduce the limit so we would only have to clear the pad at the end. (since that is not consumed by the object, we have to clean it).
If I do not reduce, we will have to clear additional "allocated - acontext->alloc_ptr" bytes since that would not be consumed by the object and whatever is left in the AC needs to be clean as the next allocation may just use that space without clearing.
In reply to: 287180883 [](ancestors = 287180883,287178268)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ahh, I thought you said we were wasting space in the AC when we talked about this. I misunderstood then. this seems like an optimization targeted at a very narrow scenario almost not worth the complexity. I think I'm ok with keeping this if you feel strongly about it but I could definitely go the other way.
also you don't need to check for gen_number in the if block because for LOH allocations the if statement will not be true. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was not sure about gen_number. Thanks!
Yeah, I'd like to keep this. According to VTune these extra expenses are noticeable.
That is how I found this - it was not obvious from the code, but profiles indicated that we are spending time in memset even when doing Uninitialized allocations.
I also tried to combine this with the similar logic inside alloc_clr. That promised to make it simpler, but at the end did not work because we must update "allocated" here while we still holding the heap lock.
In reply to: 287183348 [](ancestors = 287183348)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will have to remove unnecessary gen_number check.
In reply to: 287189803 [](ancestors = 287189803,287183348)
btw, just in case as a reminder, please make sure to squash your commits into 1 when merging. #Resolved |
@@ -100,13 +100,13 @@ public override T[] Rent(int minimumLength) | |||
|
|||
// The pool was exhausted for this buffer size. Allocate a new buffer with a size corresponding | |||
// to the appropriate bucket. | |||
buffer = new T[_buckets[index]._bufferLength]; | |||
buffer = GC.AllocateUninitializedArray<T>(_buckets[index]._bufferLength); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AllocateUninitializedArray [](start = 28, length = 26)
I may just be paranoid here but could this be a breaking change if folks were expecting the buffers in the pool to be cleared? do we need a config to switch to allocating with the uninit API? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need a config to switch to allocating with the uninit API?
No. We are not maintaining bug-for-bug compatibility in CoreCLR. If somebody was depending on the buffer being zero initialized before and the app breaks on .NET Core 3.0, it is their bug and they need to fix it before upgrading to .NET Core 3.0. #Resolved
nit if ((Unsafe.SizeOf() * length) < (256 - 3 * IntPtr.Size)) also were you planning to make a separate PR to increase the size? Refers to: src/System.Private.CoreLib/src/System/GC.cs:673 in 199fe01. [](commit_id = 199fe01, deletion_comment = False) |
I hoped to get a better sense on this number, but ArrayPool is even less sensitive than the scenarios we had before. I think as long as we do regular and Uninitialized allocations from the same heap, the pattern of allocation will have much larger effect than this threshold. If we ever introduce a "dirty heap" - similar to proposed "pinned heap" then the threshold may become more interesting. Maybe ... this time I am less sure :-) In reply to: 495435651 [](ancestors = 495435651) Refers to: src/System.Private.CoreLib/src/System/GC.cs:673 in 199fe01. [](commit_id = 199fe01, deletion_comment = False) |
Let's discuss this when we meet next time. Have a vote on this or something. In reply to: 495464775 [](ancestors = 495464775,495435651) Refers to: src/System.Private.CoreLib/src/System/GC.cs:673 in 199fe01. [](commit_id = 199fe01, deletion_comment = False) |
I think we should also reserve the possibility of changing the threshold if implementation change. In reply to: 495464870 [](ancestors = 495464870,495464775,495435651) Refers to: src/System.Private.CoreLib/src/System/GC.cs:673 in 199fe01. [](commit_id = 199fe01, deletion_comment = False) |
Any more suggestions on this PR? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Thanks!! |
* Just use `new T[]` when elements are not pointer-free * reduce zeroing out when not necessary. * use AllocateUninitializedArray in ArrayPool Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>
* Just use `new T[]` when elements are not pointer-free * reduce zeroing out when not necessary. * use AllocateUninitializedArray in ArrayPool Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>
* Just use `new T[]` when elements are not pointer-free * reduce zeroing out when not necessary. * use AllocateUninitializedArray in ArrayPool Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>
* Just use `new T[]` when elements are not pointer-free * reduce zeroing out when not necessary. * use AllocateUninitializedArray in ArrayPool Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>
* Just use `new T[]` when elements are not pointer-free * reduce zeroing out when not necessary. * use AllocateUninitializedArray in ArrayPool Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>
* Just use `new T[]` when elements are not pointer-free * reduce zeroing out when not necessary. * use AllocateUninitializedArray in ArrayPool Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>
* Just use `new T[]` when elements are not pointer-free * reduce zeroing out when not necessary. * use AllocateUninitializedArray in ArrayPool Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>
* Just use `new T[]` when elements are not pointer-free * reduce zeroing out when not necessary. * use AllocateUninitializedArray in ArrayPool Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>
* Just use `new T[]` when elements are not pointer-free * reduce zeroing out when not necessary. * use AllocateUninitializedArray in ArrayPool Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>
Would be nice if documentation could be updated to specify that arrays are not zero initialized. |
Doc updates are always just two mouse clicks away: dotnet/dotnet-api-docs#4507 |
* Just use `new T[]` when elements are not pointer-free * reduce zeroing out when not necessary. * use AllocateUninitializedArray in ArrayPool Commit migrated from dotnet/coreclr@4ca032d
ArrayPool does not guarantee that rented arrays are pre-cleaned.
As such it can use
AllocateUninitializedArray
when it needs to allocate underlying arrays.In situations when the pool needs to allocate there is measurable performance gain, depending on the size of the array and overall pattern of allocation.
(from running these tests: https://github.com/dotnet/performance/blob/master/src/benchmarks/micro/corefx/System.Buffers/ArrayPoolTests.cs)
When there are no allocations, there is no direct impact.
Also:
While working on this, I noticed that when expanding allocation context, we were not considering unspent space from the previous context in our calculations, which in case of
GC_ALLOC_ZEROING_OPTIONAL
would result in extra zeroing roughly the size of the leftover, thus negating some advantages of "Uninitialized" allocations.In a case of 4K allocation the leftover could be up to 4K.
The above was fixed by adjusting contiguous expansions to account for unused space.
(done only for
GC_ALLOC_ZEROING_OPTIONAL
)