Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove FileStream long pinning, and simplify synchronization #51462

Merged
merged 2 commits into from
Apr 20, 2021

Conversation

stephentoub
Copy link
Member

This PR has two changes for FileStream...

The first is to remove the permanent pinning performed by BufferedFileStreamStrategy. Previously, on Windows when FileStream would allocate its buffer, it would also pin it as part of creating a PreallocatedOverlapped. Recent changes removed the PreallocatedOverlapped wrapping of the buffer, and replaced it with a buffer from the pinned-object heap. But POH allocations are considered to be gen2, which can then have a negative app-wide impact if lots of FileStreams are created and thrown away quickly. Instead, this PR just removes that pinning entirely. BufferedFileStreamStrategy's buffer, like any other buffer, will just be pinned for the duration of each individual read/write async operation, using Memory.Pin. I do not see any degradation in the benchmarks as a result. cc: @jkotas

The second change is to significantly simplify how the async completion is handled. There's currently a complicated set of interlocked operations that transition the promise through multiple stages, to account for the possibility of race conditions between various operations. We can instead get rid of all of that, and just use an event to handle the rare case where additional configuration of an async operation after that async operation was just initiated (e.g. registering for cancellation) doesn't race with concurrent completion of that same operation.

@adamsitnik, I don't see regressions from these changes, but the benchmarks also appear to be quite noisy, at least on my machine. If you could give them a go, that'd be helpful. Thanks.

@ghost
Copy link

ghost commented Apr 18, 2021

Tagging subscribers to this area: @carlossanlop
See info in area-owners.md if you want to be subscribed.

Issue Details

This PR has two changes for FileStream...

The first is to remove the permanent pinning performed by BufferedFileStreamStrategy. Previously, on Windows when FileStream would allocate its buffer, it would also pin it as part of creating a PreallocatedOverlapped. Recent changes removed the PreallocatedOverlapped wrapping of the buffer, and replaced it with a buffer from the pinned-object heap. But POH allocations are considered to be gen2, which can then have a negative app-wide impact if lots of FileStreams are created and thrown away quickly. Instead, this PR just removes that pinning entirely. BufferedFileStreamStrategy's buffer, like any other buffer, will just be pinned for the duration of each individual read/write async operation, using Memory.Pin. I do not see any degradation in the benchmarks as a result. cc: @jkotas

The second change is to significantly simplify how the async completion is handled. There's currently a complicated set of interlocked operations that transition the promise through multiple stages, to account for the possibility of race conditions between various operations. We can instead get rid of all of that, and just use an event to handle the rare case where additional configuration of an async operation after that async operation was just initiated (e.g. registering for cancellation) doesn't race with concurrent completion of that same operation.

@adamsitnik, I don't see regressions from these changes, but the benchmarks also appear to be quite noisy, at least on my machine. If you could give them a go, that'd be helpful. Thanks.

Author: stephentoub
Assignees: -
Labels:

area-System.IO

Milestone: -

@stephentoub stephentoub force-pushed the fixfspinning branch 2 times, most recently from 0547102 to f3eafd7 Compare April 19, 2021 00:31
@adamsitnik
Copy link
Member

If you could give them a go, that'd be helpful.

There are no regressions and the number of Gen 2 collections have dropped (as expected):

Method Toolchain fileSize userBufferSize options Mean Ratio Gen 0 Gen 1 Gen 2 Allocated
Read \after\corerun.exe 1024 1024 None 54.75 us 1.00 0.4296 0.2148 - 4,344 B
Read \before\corerun.exe 1024 1024 None 54.52 us 1.00 0.4496 0.4496 0.4496 4,346 B
Write \after\corerun.exe 1024 1024 None 434.29 us 1.02 - - - 4,344 B
Write \before\corerun.exe 1024 1024 None 424.92 us 1.00 - - - 4,344 B
ReadAsync \after\corerun.exe 1024 1024 None 69.71 us 0.99 0.5708 0.2854 - 5,088 B
ReadAsync \before\corerun.exe 1024 1024 None 70.71 us 1.00 0.2880 0.2880 0.2880 5,088 B
WriteAsync \after\corerun.exe 1024 1024 None 435.03 us 1.01 - - - 4,432 B
WriteAsync \before\corerun.exe 1024 1024 None 430.26 us 1.00 - - - 4,432 B
ReadAsync \after\corerun.exe 1024 1024 Asynchronous 82.80 us 0.98 0.6614 0.3307 - 5,256 B
ReadAsync \before\corerun.exe 1024 1024 Asynchronous 84.13 us 1.00 0.3324 0.3324 0.3324 5,240 B
WriteAsync \after\corerun.exe 1024 1024 Asynchronous 502.49 us 1.00 - - - 4,976 B
WriteAsync \before\corerun.exe 1024 1024 Asynchronous 502.35 us 1.00 - - - 4,960 B
OpenClose \after\corerun.exe 1024 ? None 43.76 us 0.99 - - - 224 B
OpenClose \before\corerun.exe 1024 ? None 44.36 us 1.00 - - - 224 B
LockUnlock \after\corerun.exe 1024 ? None 84.72 us 0.99 - - - 224 B
LockUnlock \before\corerun.exe 1024 ? None 85.34 us 1.00 - - - 224 B
SeekForward \after\corerun.exe 1024 ? None 55.00 us 0.99 - - - 224 B
SeekForward \before\corerun.exe 1024 ? None 55.59 us 1.00 - - - 224 B
SeekBackward \after\corerun.exe 1024 ? None 58.61 us 1.00 - - - 224 B
SeekBackward \before\corerun.exe 1024 ? None 58.75 us 1.00 - - - 224 B
ReadByte \after\corerun.exe 1024 ? None 55.96 us 1.00 0.4464 0.2232 - 4,344 B
ReadByte \before\corerun.exe 1024 ? None 56.03 us 1.00 0.4433 0.4433 0.4433 4,346 B
WriteByte \after\corerun.exe 1024 ? None 457.12 us 1.00 - - - 4,344 B
WriteByte \before\corerun.exe 1024 ? None 460.25 us 1.00 - - - 4,344 B
Flush \after\corerun.exe 1024 ? None 3,220.51 us 0.97 - - - 4,346 B
Flush \before\corerun.exe 1024 ? None 3,314.11 us 1.00 - - - 4,346 B
FlushAsync \after\corerun.exe 1024 ? None 6,217.04 us 1.12 20.8333 - - 275,067 B
FlushAsync \before\corerun.exe 1024 ? None 5,558.01 us 1.00 20.8333 - - 275,067 B
CopyToFile \after\corerun.exe 1024 ? None 484.21 us 1.00 - - - 4,569 B
CopyToFile \before\corerun.exe 1024 ? None 486.41 us 1.00 - - - 4,569 B
CopyToFileAsync \after\corerun.exe 1024 ? None 509.00 us 0.98 - - - 5,593 B
CopyToFileAsync \before\corerun.exe 1024 ? None 518.25 us 1.00 - - - 5,593 B
OpenClose \after\corerun.exe 1024 ? Asynchronous 45.66 us 1.00 - - - 264 B
OpenClose \before\corerun.exe 1024 ? Asynchronous 45.71 us 1.00 - - - 264 B
LockUnlock \after\corerun.exe 1024 ? Asynchronous 89.37 us 0.99 - - - 264 B
LockUnlock \before\corerun.exe 1024 ? Asynchronous 89.95 us 1.00 - - - 264 B
SeekForward \after\corerun.exe 1024 ? Asynchronous 57.77 us 1.01 - - - 264 B
SeekForward \before\corerun.exe 1024 ? Asynchronous 57.21 us 1.00 - - - 264 B
SeekBackward \after\corerun.exe 1024 ? Asynchronous 62.40 us 1.01 - - - 264 B
SeekBackward \before\corerun.exe 1024 ? Asynchronous 61.95 us 1.00 - - - 264 B
ReadByte \after\corerun.exe 1024 ? Asynchronous 74.92 us 1.01 0.5952 0.2976 - 4,826 B
ReadByte \before\corerun.exe 1024 ? Asynchronous 74.32 us 1.00 0.3005 0.3005 0.3005 4,809 B
WriteByte \after\corerun.exe 1024 ? Asynchronous 528.62 us 1.01 - - - 4,885 B
WriteByte \before\corerun.exe 1024 ? Asynchronous 524.48 us 1.00 - - - 4,872 B
Flush \after\corerun.exe 1024 ? Asynchronous 10,750.41 us 1.00 - - - 120,208 B
Flush \before\corerun.exe 1024 ? Asynchronous 10,795.36 us 1.00 - - - 119,588 B
FlushAsync \after\corerun.exe 1024 ? Asynchronous 10,757.45 us 0.99 - - - 160,661 B
FlushAsync \before\corerun.exe 1024 ? Asynchronous 10,930.05 us 1.00 - - - 160,626 B
CopyToFileAsync \after\corerun.exe 1024 ? Asynchronous 539.89 us 1.00 - - - 6,352 B
CopyToFileAsync \before\corerun.exe 1024 ? Asynchronous 541.66 us 1.00 - - - 6,336 B
Read \after\corerun.exe 1048576 512 None 731.20 us 1.00 - - - 4,344 B
Read \before\corerun.exe 1048576 512 None 729.33 us 1.00 - - - 4,344 B
Write \after\corerun.exe 1048576 512 None 3,118.42 us 1.01 - - - 4,346 B
Write \before\corerun.exe 1048576 512 None 3,185.80 us 1.00 - - - 4,346 B
ReadAsync \after\corerun.exe 1048576 512 None 1,648.34 us 1.00 6.9444 - - 86,689 B
ReadAsync \before\corerun.exe 1048576 512 None 1,646.80 us 1.00 6.9444 - - 86,689 B
WriteAsync \after\corerun.exe 1048576 512 None 3,828.83 us 1.00 - - - 78,208 B
WriteAsync \before\corerun.exe 1048576 512 None 3,915.84 us 1.00 - - - 78,211 B
ReadAsync \after\corerun.exe 1048576 512 Asynchronous 2,593.54 us 1.11 - - - 58,298 B
ReadAsync \before\corerun.exe 1048576 512 Asynchronous 2,345.37 us 1.00 - - - 58,279 B
WriteAsync \after\corerun.exe 1048576 512 Asynchronous 5,342.20 us 1.08 - - - 50,052 B
WriteAsync \before\corerun.exe 1048576 512 Asynchronous 4,983.02 us 1.00 - - - 50,020 B
Read \after\corerun.exe 1048576 4096 None 691.65 us 1.00 - - - 224 B
Read \before\corerun.exe 1048576 4096 None 693.04 us 1.00 - - - 224 B
Write \after\corerun.exe 1048576 4096 None 3,597.98 us 1.01 - - - 226 B
Write \before\corerun.exe 1048576 4096 None 3,596.56 us 1.00 - - - 226 B
ReadAsync \after\corerun.exe 1048576 4096 None 1,085.40 us 1.01 - - - 29,321 B
ReadAsync \before\corerun.exe 1048576 4096 None 1,080.16 us 1.00 - - - 29,321 B
WriteAsync \after\corerun.exe 1048576 4096 None 4,219.47 us 1.03 - - - 55,971 B
WriteAsync \before\corerun.exe 1048576 4096 None 4,105.06 us 1.00 - - - 55,971 B
ReadAsync \after\corerun.exe 1048576 4096 Asynchronous 2,125.01 us 0.98 - - - 929 B
ReadAsync \before\corerun.exe 1048576 4096 Asynchronous 2,159.08 us 1.00 - - - 913 B
WriteAsync \after\corerun.exe 1048576 4096 Asynchronous 4,720.16 us 1.01 - - - 27,578 B
WriteAsync \before\corerun.exe 1048576 4096 Asynchronous 4,714.79 us 1.00 - - - 27,562 B
Read_NoBuffering \after\corerun.exe 1048576 16384 None 245.11 us 0.99 - - - 160 B
Read_NoBuffering \before\corerun.exe 1048576 16384 None 247.25 us 1.00 - - - 160 B
Write_NoBuffering \after\corerun.exe 1048576 16384 None 2,889.89 us 1.01 - - - 162 B
Write_NoBuffering \before\corerun.exe 1048576 16384 None 2,897.49 us 1.00 - - - 162 B
ReadAsync_NoBuffering \after\corerun.exe 1048576 16384 None 393.51 us 0.98 - - - 7,664 B
ReadAsync_NoBuffering \before\corerun.exe 1048576 16384 None 399.94 us 1.00 - - - 7,664 B
WriteAsync_NoBuffering \after\corerun.exe 1048576 16384 None 3,104.27 us 1.04 - - - 7,666 B
WriteAsync_NoBuffering \before\corerun.exe 1048576 16384 None 3,009.15 us 1.00 - - - 7,666 B
ReadAsync_NoBuffering \after\corerun.exe 1048576 16384 Asynchronous 671.40 us 0.99 - - - 776 B
ReadAsync_NoBuffering \before\corerun.exe 1048576 16384 Asynchronous 675.89 us 1.00 - - - 760 B
WriteAsync_NoBuffering \after\corerun.exe 1048576 16384 Asynchronous 3,252.27 us 1.01 - - - 778 B
WriteAsync_NoBuffering \before\corerun.exe 1048576 16384 Asynchronous 3,235.23 us 1.00 - - - 762 B
CopyToFile \after\corerun.exe 1048576 ? None 2,801.34 us 1.02 - - - 452 B
CopyToFile \before\corerun.exe 1048576 ? None 2,771.88 us 1.00 - - - 452 B
CopyToFileAsync \after\corerun.exe 1048576 ? None 2,442.76 us 1.01 - - - 3,244 B
CopyToFileAsync \before\corerun.exe 1048576 ? None 2,459.22 us 1.00 - - - 3,243 B
CopyToFileAsync \after\corerun.exe 1048576 ? Asynchronous 2,948.48 us 1.04 - - - 2,060 B
CopyToFileAsync \before\corerun.exe 1048576 ? Asynchronous 2,861.54 us 1.00 - - - 2,044 B
Read \after\corerun.exe 104857600 4096 None 82,530.54 us 1.01 - - - 260 B
Read \before\corerun.exe 104857600 4096 None 81,469.74 us 1.00 - - - 260 B
Write \after\corerun.exe 104857600 4096 None 175,214.91 us 1.01 - - - 296 B
Write \before\corerun.exe 104857600 4096 None 179,875.51 us 1.00 - - - 368 B
ReadAsync \after\corerun.exe 104857600 4096 None 123,170.40 us 0.93 - - - 2,867,992 B
ReadAsync \before\corerun.exe 104857600 4096 None 132,888.91 us 1.00 - - - 2,867,992 B
WriteAsync \after\corerun.exe 104857600 4096 None 278,714.85 us 1.02 - - - 5,124,952 B
WriteAsync \before\corerun.exe 104857600 4096 None 281,506.86 us 1.00 - - - 5,124,952 B
ReadAsync \after\corerun.exe 104857600 4096 Asynchronous 226,828.08 us 1.00 - - - 1,072 B
ReadAsync \before\corerun.exe 104857600 4096 Asynchronous 226,791.58 us 1.00 - - - 1,056 B
WriteAsync \after\corerun.exe 104857600 4096 Asynchronous 400,963.00 us 0.99 - - - 2,258,016 B
WriteAsync \before\corerun.exe 104857600 4096 Asynchronous 409,212.21 us 1.00 - - - 2,257,976 B
Read_NoBuffering \after\corerun.exe 104857600 16384 None 34,544.07 us 1.00 - - - 181 B
Read_NoBuffering \before\corerun.exe 104857600 16384 None 34,708.25 us 1.00 - - - 181 B
Write_NoBuffering \after\corerun.exe 104857600 16384 None 66,570.20 us 1.03 - - - 196 B
Write_NoBuffering \before\corerun.exe 104857600 16384 None 65,361.34 us 1.00 - - - 196 B
ReadAsync_NoBuffering \after\corerun.exe 104857600 16384 None 48,909.64 us 1.00 - - - 717,332 B
ReadAsync_NoBuffering \before\corerun.exe 104857600 16384 None 48,770.65 us 1.00 - - - 717,332 B
WriteAsync_NoBuffering \after\corerun.exe 104857600 16384 None 84,126.94 us 1.01 - - - 717,332 B
WriteAsync_NoBuffering \before\corerun.exe 104857600 16384 None 83,620.95 us 1.00 - - - 717,332 B
ReadAsync_NoBuffering \after\corerun.exe 104857600 16384 Asynchronous 74,775.66 us 1.00 - - - 812 B
ReadAsync_NoBuffering \before\corerun.exe 104857600 16384 Asynchronous 74,992.22 us 1.00 - - - 796 B
WriteAsync_NoBuffering \after\corerun.exe 104857600 16384 Asynchronous 122,320.47 us 1.02 - - - 848 B
WriteAsync_NoBuffering \before\corerun.exe 104857600 16384 Asynchronous 120,853.88 us 1.00 - - - 832 B
CopyToFile \after\corerun.exe 104857600 ? None 67,047.15 us 1.00 - - - 556 B
CopyToFile \before\corerun.exe 104857600 ? None 66,893.68 us 1.00 - - - 556 B
CopyToFileAsync \after\corerun.exe 104857600 ? None 74,996.18 us 1.02 - - - 180,828 B
CopyToFileAsync \before\corerun.exe 104857600 ? None 73,810.16 us 1.00 - - - 180,828 B
CopyToFileAsync \after\corerun.exe 104857600 ? Asynchronous 86,122.72 us 1.00 - - - 2,236 B
CopyToFileAsync \before\corerun.exe 104857600 ? Asynchronous 86,598.78 us 1.00 - - - 2,220 B

Some of the benchmarks (FlushAsync and WriteAsync) seemed to have regressed, but I've re-run them and ensure that they have not regressed:

Method Toolchain fileSize userBufferSize options Mean Ratio RatioSD Allocated
WriteAsync \after\corerun.exe 1024 1024 None 426.5 us 1.00 0.07 4 KB
WriteAsync \before\corerun.exe 1024 1024 None 425.6 us 1.00 0.00 4 KB
WriteAsync \after\corerun.exe 1024 1024 Asynchronous 497.6 us 0.99 0.05 5 KB
WriteAsync \before\corerun.exe 1024 1024 Asynchronous 505.0 us 1.00 0.00 5 KB
FlushAsync \after\corerun.exe 1024 ? None 5,686.7 us 0.98 0.04 269 KB
FlushAsync \before\corerun.exe 1024 ? None 5,788.7 us 1.00 0.00 269 KB
FlushAsync \after\corerun.exe 1024 ? Asynchronous 10,751.2 us 1.04 0.04 157 KB
FlushAsync \before\corerun.exe 1024 ? Asynchronous 10,359.2 us 1.00 0.00 157 KB
WriteAsync \after\corerun.exe 1048576 512 None 3,884.7 us 1.00 0.13 76 KB
WriteAsync \before\corerun.exe 1048576 512 None 3,937.0 us 1.00 0.00 76 KB
WriteAsync \after\corerun.exe 1048576 512 Asynchronous 4,660.9 us 1.02 0.14 49 KB
WriteAsync \before\corerun.exe 1048576 512 Asynchronous 4,635.2 us 1.00 0.00 49 KB
WriteAsync \after\corerun.exe 1048576 4096 None 3,658.0 us 0.99 0.08 55 KB
WriteAsync \before\corerun.exe 1048576 4096 None 3,711.4 us 1.00 0.00 55 KB
WriteAsync \after\corerun.exe 1048576 4096 Asynchronous 4,444.7 us 1.02 0.13 27 KB
WriteAsync \before\corerun.exe 1048576 4096 Asynchronous 4,390.8 us 1.00 0.00 27 KB
WriteAsync \after\corerun.exe 104857600 4096 None 273,137.2 us 0.98 0.16 5,005 KB
WriteAsync \before\corerun.exe 104857600 4096 None 284,183.0 us 1.00 0.00 5,005 KB
WriteAsync \after\corerun.exe 104857600 4096 Asynchronous 405,003.2 us 1.03 0.12 2,205 KB
WriteAsync \before\corerun.exe 104857600 4096 Asynchronous 396,117.0 us 1.00 0.00 2,205 KB

Copy link
Member

@carlossanlop carlossanlop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good so far. Pending answering the dispose question and benchmark results cc @adamsitnik

Copy link
Member

@adamsitnik adamsitnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you @stephentoub !

Copy link
Member

@carlossanlop carlossanlop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. The only comment is that I think having a Dispose method without implementing IDisposable seems a bit weird, but I don't think it is a merge blocker.

@stephentoub stephentoub merged commit 508e560 into dotnet:main Apr 20, 2021
@stephentoub stephentoub deleted the fixfspinning branch April 20, 2021 18:27
@adamsitnik
Copy link
Member

@stephentoub should we backport this PR to the preview4 branch? My main concern are the arrays allocated from the pinned heap and the Gen 2 collections that can cause.

@stephentoub
Copy link
Member Author

I don't think it meets the bar.

@ghost ghost locked as resolved and limited conversation to collaborators May 22, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants