-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FileSystem.Unix: improve CopyFile. #59695
Conversation
Tagging subscribers to this area: @dotnet/area-system-io Issue DetailsLike the upcoming version of GNU coreutils 'cp' prefer a copy-on-write clone. Eliminate a 'stat' call that is always performed for checking if the target is a directory Eliminate a 'stat' call for retrieving the file size of the source by passing through Create the destination with file permissions that match the source. When performing a manual copy, limit the allocated buffer for small files.
|
Hi @tmds Could you please provide some benchmark results for small, medium and big files? The change is non-trivial, it would be good to make sure it's worth the complexity. Thanks! |
For small files, we see an improvement due to eliminating syscalls. For large files, this doesn't weigh in, and we see a huge gain on the filesystem that support CoW (my home partition uses Btrfs which supports it, while
Benchmark: using System;
using System.Collections.Generic;
using System.IO;
using BenchmarkDotNet;
using BenchmarkDotNet.Attributes;
namespace FileCopyBenchmark
{
public class Benchmarks
{
private const int kB = 1 << 10;
private const int MB = kB << 10;
[Params(0, 1, 100, 512, 1 * kB, 4 * kB, 1 * MB, 10 * MB)]
public int SourceSize;
[ParamsSource(nameof(ValuesForBaseDir))]
public string BaseDir { get; set; }
public IEnumerable<string> ValuesForBaseDir =>
new[] { Path.GetTempPath(), Environment.GetFolderPath(Environment.SpecialFolder.MyDocuments) };
private string SourceFileName;
private string DestinationFileName;
[GlobalSetup]
public void GlobalSetup()
{
DestinationFileName = Path.Combine(BaseDir, Guid.NewGuid().ToString());
SourceFileName = Path.Combine(BaseDir, Guid.NewGuid().ToString());
using FileStream random = File.OpenRead("/dev/random");
using FileStream sourceFile = File.OpenWrite(SourceFileName);
Span<byte> buffer = stackalloc byte[4 * kB];
int length = SourceSize;
while (length > 0)
{
int read = random.Read(buffer.Slice(0, Math.Min(length, buffer.Length)));
sourceFile.Write(buffer.Slice(0, read));
length -= read;
}
}
[GlobalCleanup]
public void GlobalCleanup()
{
File.Delete(SourceFileName);
File.Delete(DestinationFileName);
}
[Benchmark]
public void FileCopy()
{
for (int i = 0; i < 100; i++)
{
File.Delete(DestinationFileName);
File.Copy(SourceFileName, DestinationFileName);
}
}
}
} |
What is the impact on subsequent writes to the file? |
The copies get avoided by the CoW, but then when you write to the file, you are making those copies again. Updating the benchmark to overwrite half of the file after [Benchmark]
public void FileCopy()
{
Span<byte> buffer = stackalloc byte[4 * kB];
for (int i = 0; i < 100; i++)
{
File.Delete(DestinationFileName);
File.Copy(SourceFileName, DestinationFileName);
using SafeFileHandle writeHandle = File.OpenHandle(DestinationFileName, FileMode.Open, FileAccess.Write, FileShare.Write);
for (int offset = SourceSize / 2; offset < SourceSize; )
{
RandomAccess.Write(writeHandle, buffer, offset);
offset += buffer.Length;
}
}
}
|
Like the upcoming version of GNU coreutils 'cp' prefer a copy-on-write clone. This shares the physical storage between files, which means no data needs to copied. CoW-clones are supported by a number of Linux file systems, like Btrfs, XFS, and overlayfs. Eliminate a 'stat' call that is always performed for checking if the target is a directory by only performing the check when the 'open' syscall reports an error. Eliminate a 'stat' call for retrieving the file size of the source by passing through the length that was retrieved when checking the opened file is not a directory. Create the destination with file permissions that match the source. We still need to fchmod due to umask being applied to the open mode. When performing a manual copy, limit the allocated buffer for small files. And, avoid the last 'read' call by checking when we've copied the expected nr of bytes.
I had messed up the PR by resolving conflicts in the GitHub UI. |
I ran the benchmark once more to ensure there are no regressions. I've included some more sizes.
|
src/libraries/System.Private.CoreLib/src/Microsoft/Win32/SafeHandles/SafeFileHandle.Unix.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/IO/FileSystem.Unix.cs
Outdated
Show resolved
Hide resolved
@stephentoub all feedback is addressed and CI is happy. This is good to merge. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few small suggestions about coding style, but nothing important.
LGTM
@stephentoub @dotnet/area-system-io can you merge this? |
Thanks as always, @tmds! |
Interop.Sys.Permissions filePermissions; | ||
using SafeFileHandle src = SafeFileHandle.OpenReadOnly(sourceFullPath, FileOptions.None, out fileLength, out filePermissions); | ||
using SafeFileHandle dst = SafeFileHandle.Open(destFullPath, overwrite ? FileMode.Create : FileMode.CreateNew, | ||
FileAccess.ReadWrite, FileShare.None, FileOptions.None, preallocationSize: 0, openPermissions: filePermissions, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder whether it would be beneficial to provide preallocationSize: fileLength
here. @tmds what do you think?
Like the upcoming version of GNU coreutils 'cp' prefer a copy-on-write clone.
This shares the physical storage between files, which means no data needs to copied.
CoW-clones are supported by a number of Linux file systems, like Btrfs, XFS, and overlayfs.
Eliminate a 'stat' call that is always performed for checking if the target is a directory
by only performing the check when the 'open' syscall reports an error.
Eliminate a 'stat' call for retrieving the file size of the source by passing through
the length that was retrieved when checking the opened file is not a directory.
Create the destination with file permissions that match the source.
We still need to fchmod due to umask being applied to the open mode.
When performing a manual copy, limit the allocated buffer for small files.
And, avoid the last 'read' call by checking when we've copied the expected nr of bytes.