This project is implemented in two languages, Rust and C#. It was written to compare the performance of the two languages, and to explore their comparative ergonomics.
The two implementations are in /CSharp
implemented in .NET 8 C#, and /Rust
in Rust 1.74.1 (December 2023). Both have almost identical behaviour. See below for benchmarks.
Both were developed on Windows, but should work on Linux and Mac.
Build the C# version use dotnet publish -c Release
or the supplied Publish.cmd
(Or Visual Studio 2022, which is free)
Build the Rust version with cargo build -r
The program walks the first folder tree (given by -a
) and records all filenames, sizes and optionally hashes. It does the same with the second folder tree (given by -b
). Files are compared (using -c
) and differences listed. Comparison can be via:
Comparison | Description |
---|---|
--comparison Name | Filename only (default, fast) |
--comparison NameSize | Filename and file size (fast) |
--comparison Hash | SHA2 hash, disregarding filenames (slow) |
Comparison by name only checks the filename itself, not the path. Eg a/b/file.txt
and d/e/file.txt
will be considered the same file.
folder_compare.exe -a <folder> -b <folder> [-c <comparison>] [-r] [-f]
Eg:
folder_compare.exe -a ./target/debug -b ./target/release -c hash
MANDATORY PARAMETERS:
-a, --foldera First folder to compare
-b, --folderb Second folder to compare
OPTIONS:
-c, --comparison [value] Comparison to use (Name, NameSize or Hash). Default is Name
-r, --raw Raw output, for piping
-f, --first-only Only show files in folder A missing from folder B (default is both)
-o, --one-thread Only use one thread (don't scan the two folders in parallel)
-h, --help Help
Hashing uses SHA256 and is obviously much slower than just comparing on name and/or size.
Implementing pluggable comparers (name / name & size / hash) is more difficult in Rust than in C#. C# allows different implementations of IEqualityComparer<FileData>
.
In Rust you have to use 'unit structs' to mark the different comparisons, and then implemented Eq
, PartialEq
and Hash
traits on FileData<..marker struct..>
for each comparison technique.
This does mean that FileData<a>
isn't type compatible with FileData<b>
, which is an ugly side effect. An implementation of HashSet
that took lambdas for hashing and comparison would be useful here!
Benchmarks from Hyperfine, run on a wheezy old laptop. Code from ver 1.0.6 (8a7eb6c2b949615ec77). The C# version is compiled to a native binary, to improve startup speed. All times in milliseconds (lower is better). Test folders have 800-1200 file differences.
Benchmark | Rust single-thread | C# single-thread | Difference | Rust parallel | C# parallel | Difference |
---|---|---|---|---|---|---|
Comparing by name | 61 | 65 | x1.03 (dead-heat) | 49 | 70 | x1.4 |
Second run | 65 | 65 | 50 | 71 | ||
Comparing by hash | 1808 | 1994 | x1.1 | 1253 | 1390 | x1.12 |
Second run | 1801 | 1974 | 1246 | 1410 |
Hashing is obviously more expensive than comparison by filename. The parallel code is around 30% faster than single-threaded (a maximum of 2 threads are used, and only for the folder enumeration and hashing).
The C# code performs suprisingly well, only 112% of Rust speed for the heavier workload of hashing. This is impressive given Rust's higher cognative load.
Publish.cmd
is provided to simplify publishing. NativeAOT compilation is used, to build a large but comparatively fast native binary. It contains just:
dotnet publish FolderCompare.csproj -r win-x64 -c Release
Tests are written in Powershell, so they can be used for both implementations.
PS > cd .\Testing
PS > .\TestCSharp.ps1
PS > .\TestRust.ps1