Change BulkMoveWithWriteBarrier to be GC suspension friendly #27642

jkotas · 2019-11-03T06:01:51Z

No description provided.

jkotas · 2019-11-03T06:02:14Z

Micro-benchmark results:

// Mean pause time: 
// Baseline: 32.9ms
// This change: 21ms
static void Main()
{
    new Thread(() =>
    {
        var src = new object[10_000_000]; 
        var dst = new object[10_000_000]; 
        for (;;) Array.Copy(src, dst, src.Length);
    }).Start();
    for (;;) { GC.Collect(); Thread.Sleep(1); }
}

// Mean pause time: 
// Baseline: 33.1ms
// This change: 0.8ms
static void Main()
{
    new Thread(() =>
    {
        var a = new byte[100_000_000];
        for (;;) a.Clone();
    }).Start();
    for (;;) { GC.Collect(); Thread.Sleep(1); }
}

src/System.Private.CoreLib/src/System/Object.CoreCLR.cs

GrabYourPitchforks · 2019-11-03T06:32:25Z

The managed part seems sound to me, but I'm not familiar enough with QCalls and GC cooperative modes to comment on the native components.

VSadov · 2019-11-03T15:14:20Z

I would expect latencies in object Array.Copy to improve more if that was solely due to copying. Any idea what is the long pole now? Suspension itself?

The byte Clone benchmark would not have anything to suspend most of the time and thus shorter pauses?

VSadov · 2019-11-03T15:28:15Z

Could be interesting to measure impact on GC.Collect(0) as well.

VSadov · 2019-11-03T15:50:28Z

src/vm/comutilnative.cpp

@@ -714,13 +714,75 @@ void QCALLTYPE Buffer::Clear(void *dst, size_t length)
    memset(dst, 0, length);
 }

+#define BULK_MOVE_WITH_WRITE_BARRIER_BLOCK_SIZE 0x8000
+


I think it could be simpler if “chunking” happens on the managed side. Then it would be regular fcalls without polling and frames. (move itself does not trigger or throw)

Bulk w barrier move would just need to assert it is not asked to move too much at once.

It may be simpler, but less efficient. It can either add a new frame that does the chunking (throughput overhead - for small block sizes in particular); or aggressive inline the chunking into every callsite to avoid the frame (code side overhead). Which tradeoff would you pick if the chunking happens on the managed side?

I was thinking of something like:

ManagedHelperProbablyInlineable() { // get refs, byte count and whether need to copy backwards nint count = . . .; if (count < CHUNK_SIZE) { NativeHelper(ref src, ref dst, count, backwards); } else { ManagedChunkingHelperDoNotInline(mt, ref src, ref dst, count, backwards); } } ManagedChunkingHelperDoNotInline(MethodTable *mt, ref byte src, ref byte dst, nint count, bool backwards) { int elementSize = . . . ; while(count) { int chunk = Align(min(count, chunkSize), elementSize); NativeHelper(src, dst, chunk , backwards); count -= chunk; src = . . .; dst = . . .; } }

jkotas · 2019-11-03T21:02:36Z

I would expect latencies in object Array.Copy to improve more if that was solely due to copying. Any idea what is the long pole now? Suspension itself?

The Array.Copy example is copying object references in Gen2. The GC (whether it is Gen0 GC or Gen2 GC) has to visit these modified references. 21ms pause is cost to visit these modified references.

In the baseline, the pause time time is 12ms waiting for threads to suspend and 21ms GC doing work. With this change, it is just 21ms GC doing work.

I have picked this example specifically to show that this change helps, but copying large arrays of object references has still less than ideal performance characteristics.

VSadov · 2019-11-04T00:49:53Z

The sample copies null references. And even if they were objects, they'd be old. I did not realize we update cards unconditionally. Yes, that introduces a lot of roots to scan and explains the remaining pause.

Checking for copying null pointers is not too hard, but may not be a typical enough scenario. Checking whether writes are cross-generational (even in a conservative way) is not an easy thing to fix right now. Especially in server GC.
Something to think about....

jkotas · 2019-11-05T02:34:33Z

I think it could be simpler if “chunking” happens on the managed side

Pushed update with this change. @VSadov How does the new version look?

VSadov · 2019-11-05T03:25:38Z

Looks nice!!
Array elements are always aligned when contain references? (I cant think of a case otherwise, but just in case..)

jkotas · 2019-11-05T03:46:50Z

Array elements are always aligned when contain references?

Yes.

VSadov · 2019-11-05T04:13:02Z

src/System.Private.CoreLib/src/System/Buffer.CoreCLR.cs

@@ -37,8 +39,53 @@ internal static unsafe void _ZeroMemory(ref byte b, nuint byteLength)
        [DllImport(RuntimeHelpers.QCall, CharSet = CharSet.Unicode)]
        private static extern unsafe void __ZeroMemory(void* b, nuint byteLength);

+        // The maximum block size to for __BulkMoveWithWriteBarrier FCall. This is required to avoid GC starvation.
+        private const uint BulkMoveWithWriteBarrierChunk = 0x10000;


65K - not too big?

65k should be less than 50 microseconds. I can make it smaller.

jkotas · 2019-11-05T13:27:25Z

/azp run coreclr-ci

azure-pipelines · 2019-11-05T13:27:39Z

Azure Pipelines successfully started running 1 pipeline(s).

…otnet#27642)" This reverts commit 5e1ef69.

…27642)" (#27758) This reverts commit 5e1ef69.

…riendly (dotnet#27642)" (dotnet#27758)" This reverts commit b06f8a7.

* Revert "Revert "Change BulkMoveWithWriteBarrier to be GC suspension friendly (#27642)" (#27758)" This reverts commit b06f8a7. * Fix wrong argument order for Unsafe.ByteOffset * Add comment

GrabYourPitchforks reviewed Nov 3, 2019

View reviewed changes

src/System.Private.CoreLib/src/System/Object.CoreCLR.cs Outdated Show resolved Hide resolved

GrabYourPitchforks reviewed Nov 3, 2019

View reviewed changes

src/System.Private.CoreLib/src/System/Object.CoreCLR.cs Outdated Show resolved Hide resolved

jkotas added the area-VM label Nov 3, 2019

jkotas requested a review from VSadov November 3, 2019 07:51

VSadov reviewed Nov 3, 2019

View reviewed changes

Change BulkMoveWithWriteBarrier to be GC suspension friendly

9883f47

jkotas force-pushed the gcmemcpy branch from 2230cd0 to 9883f47 Compare November 5, 2019 02:31

Rename RuntimeHelpers.cs

37a1c1d

VSadov reviewed Nov 5, 2019

View reviewed changes

VSadov approved these changes Nov 5, 2019

View reviewed changes

Smaller threshold

5afe8d1

jkotas merged commit 5e1ef69 into dotnet:master Nov 5, 2019

jkotas deleted the gcmemcpy branch November 5, 2019 16:12

jkotas added a commit to jkotas/coreclr that referenced this pull request Nov 8, 2019

Revert "Change BulkMoveWithWriteBarrier to be GC suspension friendly (d…

c70c01d

…otnet#27642)" This reverts commit 5e1ef69.

jkotas added a commit that referenced this pull request Nov 8, 2019

Revert "Change BulkMoveWithWriteBarrier to be GC suspension friendly (#…

b06f8a7

…27642)" (#27758) This reverts commit 5e1ef69.

jkotas added a commit to jkotas/coreclr that referenced this pull request Nov 8, 2019

Revert "Revert "Change BulkMoveWithWriteBarrier to be GC suspension f…

cafa773

…riendly (dotnet#27642)" (dotnet#27758)" This reverts commit b06f8a7.

stephentoub mentioned this pull request Apr 22, 2020

Improve Array.Sort(T[]) performance dotnet/runtime#35297

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change BulkMoveWithWriteBarrier to be GC suspension friendly #27642

Change BulkMoveWithWriteBarrier to be GC suspension friendly #27642

jkotas commented Nov 3, 2019

jkotas commented Nov 3, 2019 •

edited

Loading

GrabYourPitchforks commented Nov 3, 2019

VSadov commented Nov 3, 2019

VSadov commented Nov 3, 2019 •

edited

Loading

VSadov Nov 3, 2019

jkotas Nov 3, 2019

VSadov Nov 3, 2019

jkotas commented Nov 3, 2019 •

edited

Loading

VSadov commented Nov 4, 2019

jkotas commented Nov 5, 2019 •

edited

Loading

VSadov commented Nov 5, 2019

jkotas commented Nov 5, 2019

VSadov Nov 5, 2019

jkotas Nov 5, 2019

jkotas commented Nov 5, 2019

azure-pipelines bot commented Nov 5, 2019

Change BulkMoveWithWriteBarrier to be GC suspension friendly #27642

Change BulkMoveWithWriteBarrier to be GC suspension friendly #27642

Conversation

jkotas commented Nov 3, 2019

jkotas commented Nov 3, 2019 • edited Loading

GrabYourPitchforks commented Nov 3, 2019

VSadov commented Nov 3, 2019

VSadov commented Nov 3, 2019 • edited Loading

VSadov Nov 3, 2019

Choose a reason for hiding this comment

jkotas Nov 3, 2019

Choose a reason for hiding this comment

VSadov Nov 3, 2019

Choose a reason for hiding this comment

jkotas commented Nov 3, 2019 • edited Loading

VSadov commented Nov 4, 2019

jkotas commented Nov 5, 2019 • edited Loading

VSadov commented Nov 5, 2019

jkotas commented Nov 5, 2019

VSadov Nov 5, 2019

Choose a reason for hiding this comment

jkotas Nov 5, 2019

Choose a reason for hiding this comment

jkotas commented Nov 5, 2019

azure-pipelines bot commented Nov 5, 2019

jkotas commented Nov 3, 2019 •

edited

Loading

VSadov commented Nov 3, 2019 •

edited

Loading

jkotas commented Nov 3, 2019 •

edited

Loading

jkotas commented Nov 5, 2019 •

edited

Loading