Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File.CopyTo: try to pre-allocate disk space for large destination files #61676

Closed

Conversation

adamsitnik
Copy link
Member

@adamsitnik adamsitnik commented Nov 16, 2021

Using the dotnet/performance#2134 benchmarks I got the following results:

BenchmarkDotNet=v0.13.1.1616-nightly, OS=ubuntu 18.04
Intel Xeon CPU E5-1650 v4 3.60GHz, 1 CPU, 12 logical and 6 physical cores
Method Toolchain size Mean Ratio
CopyTo /main/corerun 512 33.81 us 1.00
CopyTo /after/corerun 512 33.91 us 1.00
CopyToOverwrite /main/corerun 512 37.15 us 1.00
CopyToOverwrite /after/corerun 512 37.36 us 1.01
CopyTo /main/corerun 4096 34.06 us 1.00
CopyTo /after/corerun 4096 33.46 us 0.98
CopyToOverwrite /main/corerun 4096 37.17 us 1.00
CopyToOverwrite /after/corerun 4096 37.68 us 1.01
CopyTo /main/corerun 1048576 499.15 us 1.00
CopyTo /after/corerun 1048576 489.84 us 0.98
CopyToOverwrite /main/corerun 1048576 1,064.98 us 1.00
CopyToOverwrite /after/corerun 1048576 473.70 us 0.45
CopyTo /main/corerun 104857600 58,511.27 us 1.00
CopyTo /after/corerun 104857600 55,992.31 us 0.96
CopyToOverwrite /main/corerun 104857600 102,380.19 us 1.00
CopyToOverwrite /after/corerun 104857600 55,508.35 us 0.54

As we can see there is no regression for small files. For other files we can see:

  • 2-5% improvement for File.CopyTo where we write to a new file.
  • 45-55% improvement for File.CopyTo(overwrite: true) where we overwrite an existing file. To be honest I am surprised that the gain is so huge.

@adamsitnik adamsitnik added area-System.IO os-linux Linux OS (any supported distro) tenet-performance Performance related issue labels Nov 16, 2021
@adamsitnik adamsitnik added this to the 7.0.0 milestone Nov 16, 2021
@ghost
Copy link

ghost commented Nov 16, 2021

Tagging subscribers to this area: @dotnet/area-system-io
See info in area-owners.md if you want to be subscribed.

Issue Details

I am currently re-running all the dotnet/performance#2134 benchmarks and will post the results soon. From initial quick run I was able to get up to x2 for large files.

Author: adamsitnik
Assignees: -
Labels:

area-System.IO, os-linux, tenet-performance

Milestone: 7.0.0

@tmds
Copy link
Member

tmds commented Nov 16, 2021

Using the dotnet/performance#2134 benchmarks I got the following results:

I see these benchmarks do a single copy.
When I ran a similar benchmarks, BenchmarkDotNet told me to add a loop to reduce the variance.

What file system is used in on the benchmark machine? ext4?

where we overwrite an existing file. To be honest I am surprised that the gain is so huge.

Looking at the absolute numbers: CopyToOverwrite was twice as slow as CopyTo. That is the real surprise (to me).

@adamsitnik
Copy link
Member Author

I see these benchmarks do a single copy.
When I ran a similar benchmarks, BenchmarkDotNet told me to add a loop to reduce the variance.

Is there any chance that your benchmarks were using [IterationSetup] attribute? (doc)

Or performing an operation that runs very slow at first execution but performs much better for every next call? (in such case BDN could under estimate the number of invocations per iteration).

Do you have the source code by any chance?

@adamsitnik
Copy link
Member Author

adamsitnik commented Nov 17, 2021

ext4?

Yes, ext4.

That is the real surprise (to me).

To me as well. @tmds Is there any chance you could run them on your machine and share the results?

git clone https://github.com/dotnet/performance.git
python3 ./performance/scripts/benchmarks_ci.py -f net7.0 --filter '*File.CopyTo*'

@adamsitnik
Copy link
Member Author

My macOS and Ubuntu 18.04 results:

BenchmarkDotNet=v0.13.1.1616-nightly, OS=macOS Big Sur 11.4 (20F71) [Darwin 20.5.0]
Intel Core i7-5557U CPU 3.10GHz (Broadwell), 1 CPU, 4 logical and 2 physical cores
.NET SDK=7.0.100-alpha.1.21566.20
  [Host]     : .NET 7.0.0 (7.0.21.56201), X64 RyuJIT
  Job-VREZES : .NET 7.0.0 (7.0.21.56201), X64 RyuJIT
Method size Mean Error StdDev Median Min Max Allocated
CopyTo 512 559.5 us 32.22 us 95.49 us 535.2 us 496.7 us 1,092.8 us 130 B
CopyToOverwrite 512 353.0 us 3.66 us 10.78 us 350.8 us 344.2 us 435.5 us 129 B
CopyTo 4096 530.8 us 8.70 us 25.64 us 530.5 us 488.9 us 688.9 us 130 B
CopyToOverwrite 4096 340.7 us 4.49 us 13.23 us 338.0 us 332.6 us 460.3 us 129 B
CopyTo 1048576 1,828.3 us 17.39 us 51.28 us 1,812.8 us 1,780.4 us 2,164.7 us 134 B
CopyToOverwrite 1048576 1,689.8 us 42.78 us 126.82 us 1,643.4 us 1,602.5 us 2,181.6 us 140 B
CopyTo 104857600 155,437.3 us 11,214.75 us 33,242.03 us 152,217.0 us 94,939.3 us 252,142.1 us 872 B
CopyToOverwrite 104857600 176,698.7 us 10,668.87 us 31,623.98 us 172,378.7 us 125,061.9 us 259,941.0 us 872 B
BenchmarkDotNet=v0.13.1.1616-nightly, OS=ubuntu 18.04
Intel Xeon CPU E5-1650 v4 3.60GHz, 1 CPU, 12 logical and 6 physical cores
.NET SDK=7.0.100-alpha.1.21566.20
  [Host]     : .NET 7.0.0 (7.0.21.56201), X64 RyuJIT
  Job-GTINUV : .NET 7.0.0 (7.0.21.56201), X64 RyuJIT
Method size Mean Error StdDev Median Min Max Allocated
CopyTo 512 33.57 us 0.958 us 1.025 us 33.34 us 32.19 us 36.04 us 128 B
CopyToOverwrite 512 37.21 us 1.421 us 1.637 us 36.63 us 35.51 us 40.92 us 128 B
CopyTo 4096 34.78 us 1.372 us 1.580 us 34.04 us 33.08 us 37.77 us 128 B
CopyToOverwrite 4096 38.26 us 1.711 us 1.971 us 37.71 us 35.78 us 41.32 us 128 B
CopyTo 1048576 493.21 us 3.632 us 3.033 us 492.28 us 490.76 us 500.19 us 130 B
CopyToOverwrite 1048576 1,069.77 us 9.813 us 9.179 us 1,072.09 us 1,049.49 us 1,083.27 us 133 B
CopyTo 104857600 56,528.31 us 139.447 us 130.439 us 56,504.66 us 56,347.31 us 56,767.74 us 538 B
CopyToOverwrite 104857600 108,496.90 us 2,088.556 us 1,851.450 us 108,550.01 us 104,995.72 us 112,027.34 us 580 B

@tmds
Copy link
Member

tmds commented Nov 17, 2021

On my machine, which has Fedora 34 with btrfs CopyTo and CopyToOverwrite perform similar.

Adding the fallocate will probably regress things a little, but not for copies between same partition which will use FICLONE.

Method size Mean Error StdDev Median Min Max Allocated
CopyTo 512 16.72 us 0.312 us 0.277 us 16.69 us 16.30 us 17.34 us 128 B
CopyToOverwrite 512 10.82 us 0.143 us 0.134 us 10.81 us 10.62 us 11.10 us 128 B
CopyTo 4096 16.93 us 0.202 us 0.189 us 16.86 us 16.71 us 17.24 us 128 B
CopyToOverwrite 4096 10.84 us 0.076 us 0.067 us 10.82 us 10.71 us 10.98 us 128 B
CopyTo 1048576 322.75 us 6.931 us 7.704 us 322.59 us 309.68 us 336.30 us 129 B
CopyToOverwrite 1048576 320.77 us 9.420 us 10.849 us 323.43 us 297.78 us 334.26 us 129 B
CopyTo 104857600 41,254.24 us 695.573 us 616.608 us 41,229.77 us 40,074.01 us 42,357.61 us 264 B
CopyToOverwrite 104857600 41,350.87 us 798.138 us 746.579 us 41,123.53 us 40,089.23 us 42,685.22 us 264 B

I see the same thing when using tmpfs. tmpfs doesn't support FICLONE, so a small regression is expected from adding fallocate.

Method size Mean Error StdDev Median Min Max Allocated
CopyTo 512 16.86 us 0.238 us 0.222 us 16.83 us 16.43 us 17.22 us 128 B
CopyToOverwrite 512 10.70 us 0.182 us 0.162 us 10.67 us 10.50 us 11.07 us 128 B
CopyTo 4096 17.23 us 0.215 us 0.201 us 17.25 us 16.90 us 17.59 us 128 B
CopyToOverwrite 4096 10.91 us 0.198 us 0.185 us 10.87 us 10.72 us 11.27 us 128 B
CopyTo 1048576 325.38 us 7.286 us 8.391 us 324.18 us 306.64 us 341.47 us 129 B
CopyToOverwrite 1048576 321.10 us 6.859 us 7.899 us 321.20 us 304.62 us 333.13 us 129 B
CopyTo 104857600 46,025.87 us 690.920 us 646.287 us 45,981.90 us 45,194.21 us 47,076.94 us 332 B
CopyToOverwrite 104857600 44,983.72 us 598.061 us 559.427 us 44,930.80 us 44,086.34 us 45,915.32 us 264 B

The performance improvement is ext4 specific. From the benchmark results, it seems it is faster to delete and create a new file than it is to overwrite a file. It is surprising.

@adamsitnik
Copy link
Member Author

Adding the fallocate will probably regress things a little, but not for copies between same partition which will use FICLONE.

The code is placed after the usage of FICLONE, I've also added a check to perform it only if ioctl(FICLONE) has failed to copy the files. So it should definitely not regress the FICLONE.

I'll run the benchmarks against tmpfs and see if I can get the btrfs numbers as well.

@tmds
Copy link
Member

tmds commented Nov 17, 2021

it seems it is faster to delete and create a new file than it is to overwrite a file.

It would be interesting to see a flame graph that shows where this unexpected slowness comes from.

If I'd make a guess, the ftruncate has a cost on ext4 that depends on the file size that gets truncated. And the fallocate added in this PR undoes that cost.

@tmds
Copy link
Member

tmds commented Nov 18, 2021

If we add fallocate, probably we should handle the errno that indicate there is no space?

The tricky part is that fallocate is file system dependent.
So we're assuming this to be true for any file system:
If the size to be copied is large enough, fallocate will be a gain, or its cost is negligible compared to copying the data.
Maybe it is safe to assume that if fallocate would be really costly, a file system implementation would not implement it and return EOPNOTSUPP.

For ext4 with a target file that gets truncated, the gain is significant.
I think the ext4 behavior may be related to the auto_da_alloc option (https://man7.org/linux/man-pages/man5/ext4.5.html).

@GSPP
Copy link

GSPP commented Dec 30, 2021

Regardless of the performance delta, preallocation can be highly advantageous on NTFS to reduce fragmentation. NTFS can fragment absolutely hideously, like placing each cluster in a separate fragment (without necessity). I had good experiences with preallocating.

@tmds
Copy link
Member

tmds commented Jan 4, 2022

preallocation can be highly advantageous on NTFS

Preallocation performance benefits are file system type dependent.

afaik, for 'Linux filesystems' improvements have only been observed with ext4.

@adamsitnik adamsitnik self-assigned this Jan 31, 2022
…ationSize

# Conflicts:
#	src/native/libs/System.Native/pal_io.c
@adamsitnik
Copy link
Member Author

I can't finish this experiment as I simply have a lot of other more important things to deliver, so I am going to close the PR.

To make sure this opportunity is not lost I've created a new up-for-grabs issue: #64539

@adamsitnik adamsitnik closed this Jan 31, 2022
@adamsitnik adamsitnik deleted the fileCopyUsePreallocationSize branch January 31, 2022 15:16
@ghost ghost locked as resolved and limited conversation to collaborators Mar 2, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.IO os-linux Linux OS (any supported distro) tenet-performance Performance related issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants