forked from git-for-windows/git
-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sparse index: integrate with the sparse-checkout
builtin
#421
Merged
derrickstolee
merged 9 commits into
microsoft:vfs-2.33.0
from
derrickstolee:sparse-index/sparse-checkout-vfs
Sep 7, 2021
Merged
Sparse index: integrate with the sparse-checkout
builtin
#421
derrickstolee
merged 9 commits into
microsoft:vfs-2.33.0
from
derrickstolee:sparse-index/sparse-checkout-vfs
Sep 7, 2021
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
vdye
approved these changes
Sep 3, 2021
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My comments pretty much amount to documentation change requests, but I did have some questions about the implications of the updated sparse_index_mode
on other commands.
In order to allow modifying the sparse-checkout cone using a sparse index without expanding to a full one, we need to be able to replace sparse directory entries with their contained files and subdirectories so other code paths can discover those cache entries and write the corresponding files to disk before committing the index. We already have logic in ensure_full_index() that expands the index entries, so we will use that as our base. Create expand_to_pattern_list() which takes a pattern list, but for now mostly ignores it. The current implementation is only correct when the pattern list is NULL as that does the same as ensure_full_index(). In fact, ensure_full_index() is converted to a shim over expand_to_pattern_list(). A future update will actually implement expand_to_pattern_list() to its full capabilities. For now, it is created and documented. We also start using doc-style comments in sparse-index.h. Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
When matching against a generic pattern list, the 'basename' is important for some patterns. However, it and the 'dtype' parameter are irrelevant for cone mode sparse-checkout patterns. If we know that we are working with cone mode patterns from the start, then we can speed up the pattern check slightly by not computing the 'basename'. In many existing consumers, the 'basename' is already known from context, but some new consumers we compute this on-demand. A future change will add more calls that do not have the 'basename' from context and would need to compute it for many cache entries in a tight loop. Avoid this problem by creating the new path_matches_cone_mode_pattern_list() method. Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
derrickstolee
force-pushed
the
sparse-index/sparse-checkout-vfs
branch
from
September 7, 2021 13:47
cc564ae
to
5376361
Compare
A future change will present a temporary, in-memory mode where the index can both contain sparse directory entries but also not be completely collapsed to the smallest possible sparse directories. This will be necessary for modifying the sparse-checkout definition while using a sparse index. For now, convert the single-bit member 'sparse_index' in 'struct index_state' to be a an 'enum sparse_index_mode' with three modes: * COMPLETELY_FULL (0): No sparse directories exist. * COMPLETELY_SPARSE (1): Sparse directories may exist. Files outside the sparse-checkout cone are reduced to sparse directory entries whenever possible. * PARTIALLY_SPARSE (2): Sparse directories may exist. Some file entries outside the sparse-checkout cone may exist. Running convert_to_sparse() may further reduce those files to sparse directory entries. The main reason to store this extra information is to allow convert_to_sparse() to short-circuit when the index is already in COMPLETELY_SPARSE mode but to actually do the necessary work when in PARTIALLY_SPARSE mode. The PARTIALLY_SPARSE mode will be used in an upcoming change. Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Given a 'struct cache_tree', it may be beneficial to navigate directly to a node within that corresponds to a given path name. Create cache_tree_find_path() for this function. It returns NULL when no such path exists. The implementation is adapted from do_invalidate_path() which does a similar search but also modifies the nodes it finds along the way. This new method is not currently used, but will be in an upcoming change. Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
When the --no-sparse-index option is supplied, the sparse-checkout builtin should explicitly ask to expand a sparse index to a full one. This is currently done implicitly due to the command_requires_full_index protection, but that will be removed in an upcoming change. Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
The expand_to_pattern_list() method expands sparse directory entries to their list of contained files when either the pattern list is NULL or the directory is contained in the new pattern list's cone mode patterns. It is possible that the pattern list has a recursive match with a directory 'A/B/C/' and so an existing sparse directory 'A/B/' would need to be expanded. If there exists a directory 'A/B/D/', then that directory should not be expanded and instead we can create a sparse directory. To implement this, we plug into the add_path_to_index() callback for the call to read_tree_at(). Since we now need access to both the index we are writing and the pattern list we are comparing, create a 'struct modify_index_context' to use as a data transfer object. It is important that we use the given pattern list since we will use this pattern list to change the sparse-checkout patterns and cannot use istate->sparse_checkout_patterns. Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
To complete the implementation of expand_to_pattern_list(), we need to detect when a sparse directory entry should remain sparse. This avoids a full expansion, so we now need to use the PARTIALLY_SPARSE mode to indicate this state. There still are no callers to this method, but we will add one in the next change. Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
When modifying the sparse-checkout definition, the sparse-checkout builtin calls update_sparsity() to modify the SKIP_WORKTREE bits of all cache entries in the index. Before, we needed the index to be fully expanded in order to ensure we had the full list of files necessary that match the new patterns. Insert a call to reset_sparse_directories() that expands sparse directories that are within the new pattern list, but only far enough that every necessary file path now exists as a cache entry. The remaining logic within update_sparsity() will modify the SKIP_WORKTREE bits appropriately. This allows us to disable command_requires_full_index within the sparse-checkout builtin. Add tests that demonstrate that we are not expanding to a full index unnecessarily. Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
derrickstolee
force-pushed
the
sparse-index/sparse-checkout-vfs
branch
from
September 7, 2021 14:42
5376361
to
d86ac33
Compare
derrickstolee
added a commit
that referenced
this pull request
Sep 13, 2021
One thing I forgot when talking about the sparse index is that we have a performance test: `t/perf/p2000-sparse-operations.sh`. This test wasn't helpful for commands like `git merge` that need a particular set of input, but work for more read-only operations. Here is a quick demonstration of how this performance test works so we could have a definitive measure of how your previous updates improved performance. To get these results, I ran the following command in `t/perf`: ``` ./run 4bcd533 f9255a5 f28fc01 b713582 -- p2000-sparse-operations.sh ``` The short-shas correspond to the merge commits for these PRs: * #410 * #421 * #417 * #419 The test takes a copy of the Git repository and creates several copies within a nested directory heirarchy. ``` Test 4bcd533 f9255a5 f28fc01 b713582 ------------------------------------------------------------------------------------------------------------------------------------------------- 2000.2: git status (full-v3) 0.19(0.15+0.05) 0.19(0.16+0.05) +0.0% 0.20(0.18+0.03) +5.3% 0.19(0.17+0.04) +0.0% 2000.3: git status (full-v4) 0.20(0.18+0.04) 0.19(0.15+0.06) -5.0% 0.21(0.18+0.05) +5.0% 0.18(0.18+0.02) -10.0% 2000.4: git status (sparse-v3) 0.04(0.04+0.04) 0.05(0.07+0.04) +25.0% 0.04(0.04+0.05) +0.0% 0.04(0.06+0.04) +0.0% 2000.5: git status (sparse-v4) 0.04(0.03+0.06) 0.04(0.05+0.05) +0.0% 0.05(0.05+0.04) +25.0% 0.05(0.06+0.04) +25.0% 2000.6: git add -A (full-v3) 0.36(0.29+0.05) 0.38(0.28+0.07) +5.6% 0.36(0.31+0.05) +0.0% 0.37(0.31+0.05) +2.8% 2000.7: git add -A (full-v4) 0.34(0.27+0.06) 0.34(0.29+0.05) +0.0% 0.34(0.29+0.04) +0.0% 0.35(0.28+0.06) +2.9% 2000.8: git add -A (sparse-v3) 0.06(0.07+0.04) 0.06(0.05+0.06) +0.0% 0.06(0.09+0.01) +0.0% 0.06(0.08+0.03) +0.0% 2000.9: git add -A (sparse-v4) 0.05(0.05+0.04) 0.05(0.05+0.07) +0.0% 0.05(0.04+0.06) +0.0% 0.06(0.06+0.05) +20.0% 2000.10: git add . (full-v3) 0.38(0.31+0.05) 0.37(0.29+0.06) -2.6% 0.37(0.30+0.07) -2.6% 0.37(0.29+0.06) -2.6% 2000.11: git add . (full-v4) 0.35(0.31+0.04) 0.35(0.29+0.07) +0.0% 0.35(0.29+0.05) +0.0% 0.34(0.29+0.06) -2.9% 2000.12: git add . (sparse-v3) 0.06(0.06+0.05) 0.06(0.05+0.06) +0.0% 0.06(0.07+0.05) +0.0% 0.06(0.09+0.03) +0.0% 2000.13: git add . (sparse-v4) 0.06(0.06+0.06) 0.06(0.07+0.04) +0.0% 0.05(0.06+0.05) -16.7% 0.05(0.05+0.07) -16.7% 2000.14: git commit -a -m A (full-v3) 0.48(0.37+0.08) 0.45(0.36+0.08) -6.2% 0.45(0.35+0.09) -6.2% 0.44(0.36+0.07) -8.3% 2000.15: git commit -a -m A (full-v4) 0.45(0.40+0.06) 0.43(0.34+0.07) -4.4% 0.45(0.37+0.06) +0.0% 0.42(0.36+0.05) -6.7% 2000.16: git commit -a -m A (sparse-v3) 0.05(0.05+0.06) 0.05(0.05+0.03) +0.0% 0.05(0.06+0.06) +0.0% 0.05(0.04+0.06) +0.0% 2000.17: git commit -a -m A (sparse-v4) 0.05(0.06+0.03) 0.05(0.06+0.04) +0.0% 0.06(0.07+0.05) +20.0% 0.05(0.04+0.06) +0.0% 2000.18: git checkout -f - (full-v3) 0.55(0.43+0.08) 0.54(0.46+0.05) -1.8% 0.55(0.46+0.07) +0.0% 0.54(0.40+0.10) -1.8% 2000.19: git checkout -f - (full-v4) 0.55(0.41+0.09) 0.50(0.40+0.09) -9.1% 0.51(0.46+0.05) -7.3% 0.51(0.44+0.06) -7.3% 2000.20: git checkout -f - (sparse-v3) 0.06(0.09+0.03) 0.06(0.08+0.03) +0.0% 0.06(0.06+0.05) +0.0% 0.07(0.09+0.03) +16.7% 2000.21: git checkout -f - (sparse-v4) 0.06(0.08+0.04) 0.05(0.07+0.05) -16.7% 0.05(0.07+0.04) -16.7% 0.06(0.09+0.03) +0.0% ``` All of the above were already integrated. ``` 2000.22: git reset (full-v3) 0.41(0.32+0.06) 0.40(0.31+0.06) -2.4% 0.41(0.33+0.05) +0.0% 0.42(0.34+0.04) +2.4% 2000.23: git reset (full-v4) 0.37(0.32+0.05) 0.35(0.30+0.05) -5.4% 0.37(0.30+0.05) +0.0% 0.35(0.31+0.03) -5.4% 2000.24: git reset (sparse-v3) 0.68(0.65+0.05) 0.55(0.52+0.04) -19.1% 0.04(0.05+0.04) -94.1% 0.04(0.05+0.04) -94.1% 2000.25: git reset (sparse-v4) 0.70(0.65+0.05) 0.54(0.50+0.06) -22.9% 0.04(0.07+0.01) -94.3% 0.03(0.05+0.05) -95.7% 2000.26: git reset --hard (full-v3) 0.54(0.43+0.07) 0.53(0.43+0.06) -1.9% 0.55(0.46+0.05) +1.9% 0.55(0.44+0.06) +1.9% 2000.27: git reset --hard (full-v4) 0.50(0.45+0.03) 0.50(0.43+0.05) +0.0% 0.49(0.41+0.06) -2.0% 0.50(0.42+0.05) +0.0% 2000.28: git reset --hard (sparse-v3) 0.83(0.76+0.06) 0.68(0.62+0.05) -18.1% 0.07(0.05+0.02) -91.6% 0.07(0.05+0.02) -91.6% 2000.29: git reset --hard (sparse-v4) 0.80(0.75+0.05) 0.69(0.62+0.06) -13.8% 0.07(0.04+0.02) -91.2% 0.07(0.04+0.03) -91.2% ``` As expected, `git reset [--hard]` improves with the sparse index integration, but remains constant across the full index case. ``` 2000.30: git update-index --add --remove (full-v3) 0.03(0.01+0.01) 0.03(0.02+0.01) +0.0% 0.03(0.02+0.01) +0.0% 0.03(0.01+0.01) +0.0% 2000.31: git update-index --add --remove (full-v4) 0.03(0.02+0.01) 0.03(0.02+0.01) +0.0% 0.03(0.03+0.00) +0.0% 0.03(0.02+0.01) +0.0% 2000.32: git update-index --add --remove (sparse-v3) 0.57(0.54+0.02) 0.43(0.42+0.00) -24.6% 0.44(0.41+0.03) -22.8% 0.44(0.42+0.01) -22.8% 2000.33: git update-index --add --remove (sparse-v4) 0.56(0.52+0.04) 0.43(0.42+0.01) -23.2% 0.44(0.42+0.02) -21.4% 0.42(0.41+0.01) -25.0% ``` These do not change significantly because #423 is not merged. ``` 2000.34: git diff (full-v3) 0.07(0.05+0.03) 0.06(0.05+0.03) -14.3% 0.07(0.05+0.03) +0.0% 0.06(0.05+0.03) -14.3% 2000.35: git diff (full-v4) 0.06(0.05+0.03) 0.06(0.05+0.02) +0.0% 0.06(0.05+0.02) +0.0% 0.06(0.06+0.02) +0.0% 2000.36: git diff (sparse-v3) 0.25(0.23+0.03) 0.17(0.17+0.02) -32.0% 0.18(0.18+0.02) -28.0% 0.01(0.03+0.03) -96.0% 2000.37: git diff (sparse-v4) 0.25(0.22+0.05) 0.16(0.16+0.01) -36.0% 0.18(0.15+0.04) -28.0% 0.01(0.04+0.02) -96.0% 2000.38: git diff --staged (full-v3) 0.03(0.01+0.01) 0.03(0.02+0.01) +0.0% 0.03(0.02+0.01) +0.0% 0.03(0.02+0.00) +0.0% 2000.39: git diff --staged (full-v4) 0.04(0.03+0.01) 0.03(0.02+0.01) -25.0% 0.03(0.03+0.00) -25.0% 0.03(0.03+0.00) -25.0% 2000.40: git diff --staged (sparse-v3) 0.21(0.19+0.01) 0.15(0.13+0.01) -28.6% 0.15(0.14+0.01) -28.6% 0.01(0.01+0.00) -95.2% 2000.41: git diff --staged (sparse-v4) 0.22(0.21+0.01) 0.14(0.11+0.03) -36.4% 0.15(0.13+0.02) -31.8% 0.01(0.01+0.00) -95.5% ``` The `git diff` improvements are measurable. ``` 2000.42: git sparse-checkout reapply (full-v3) 0.63(0.54+0.05) 0.56(0.48+0.04) -11.1% 0.57(0.48+0.03) -9.5% 0.59(0.48+0.05) -6.3% 2000.43: git sparse-checkout reapply (full-v4) 0.60(0.54+0.02) 0.51(0.46+0.03) -15.0% 0.54(0.48+0.02) -10.0% 0.50(0.44+0.04) -16.7% 2000.44: git sparse-checkout reapply (sparse-v3) 0.91(0.86+0.05) 0.05(0.05+0.00) -94.5% 0.06(0.05+0.01) -93.4% 0.06(0.06+0.00) -93.4% 2000.45: git sparse-checkout reapply (sparse-v4) 0.92(0.88+0.04) 0.05(0.05+0.00) -94.6% 0.05(0.05+0.01) -94.6% 0.05(0.04+0.01) -94.6% ``` Finally, the `git sparse-checkout` measurements are also present. This test script is particularly valuable when contributing changes upstream. It can be good to start by adding the lines to the performance test in an early commit, then demonstrating the performance change by copying the necessary lines from the output table into your commit message.
dscho
pushed a commit
that referenced
this pull request
Oct 30, 2021
…ckout` builtin This integrates the `sparse-checkout` builtin with the sparse index. The tricky part here is that we need to partially expand the index when we are modifying the sparse-checkout definition. Note that we modify the pattern list in a careful way: we create a `struct pattern_list` in-memory in `builtin/sparse-checkout.c` then apply those patterns to the index before writing the patterns to the sparse-checkout file. The `update_sparsity()` method does the work to assign the `SKIP_WORKTREE` bit appropriately, but this doesn't work if the files that are within the new sparse-checkout cone are still hidden behind a sparse directory. The new `expand_to_pattern_list()` method does the hard work of expanding the sparse directories that are now within the new patterns. This expands only as far as needed, possibly creating new sparse directory entries. This method does not contract existing files to sparse directories, and a big reason why is because of the check for ignored files as we delete those directories. The `clean_tracked_sparse_directories()` method is called after `update_sparsity()`, but we need to read the `A/B/.gitignore` file (or lack thereof) before we can delete `A/B/`. If we convert to sparse too quickly, then we lose this information and cause a full expansion. Most of the correctness is handled by existing tests in `t1092`, but I add checks for `ensure_not_expanded` in some hopefully interesting cases. As for performance, `git sparse-checkout set` can be slow if it needs to move a lot of files. However, no-op `git sparse-checkout set` (i.e. set the sparse-checkout cone to only include files at root, and do this on repeat) has these performance results on Linux in a monorepo with 2+ million files at `HEAD`: ``` Benchmark #1: baseline Time (mean ± σ): 10.465 s ± 0.018 s [User: 9.885 s, System: 0.573 s] Range (min … max): 10.450 s … 10.497 s 5 runs Benchmark #2: new code Time (mean ± σ): 68.9 ms ± 2.9 ms [User: 45.8 ms, System: 17.1 ms] Range (min … max): 63.4 ms … 74.0 ms 41 runs Summary 'new code' ran 151.89 ± 6.30 times faster than 'baseline' ```
dscho
pushed a commit
that referenced
this pull request
Oct 30, 2021
One thing I forgot when talking about the sparse index is that we have a performance test: `t/perf/p2000-sparse-operations.sh`. This test wasn't helpful for commands like `git merge` that need a particular set of input, but work for more read-only operations. Here is a quick demonstration of how this performance test works so we could have a definitive measure of how your previous updates improved performance. To get these results, I ran the following command in `t/perf`: ``` ./run 4bcd533 f9255a5 f28fc01 b713582 -- p2000-sparse-operations.sh ``` The short-shas correspond to the merge commits for these PRs: * #410 * #421 * #417 * #419 The test takes a copy of the Git repository and creates several copies within a nested directory heirarchy. ``` Test 4bcd533 f9255a5 f28fc01 b713582 ------------------------------------------------------------------------------------------------------------------------------------------------- 2000.2: git status (full-v3) 0.19(0.15+0.05) 0.19(0.16+0.05) +0.0% 0.20(0.18+0.03) +5.3% 0.19(0.17+0.04) +0.0% 2000.3: git status (full-v4) 0.20(0.18+0.04) 0.19(0.15+0.06) -5.0% 0.21(0.18+0.05) +5.0% 0.18(0.18+0.02) -10.0% 2000.4: git status (sparse-v3) 0.04(0.04+0.04) 0.05(0.07+0.04) +25.0% 0.04(0.04+0.05) +0.0% 0.04(0.06+0.04) +0.0% 2000.5: git status (sparse-v4) 0.04(0.03+0.06) 0.04(0.05+0.05) +0.0% 0.05(0.05+0.04) +25.0% 0.05(0.06+0.04) +25.0% 2000.6: git add -A (full-v3) 0.36(0.29+0.05) 0.38(0.28+0.07) +5.6% 0.36(0.31+0.05) +0.0% 0.37(0.31+0.05) +2.8% 2000.7: git add -A (full-v4) 0.34(0.27+0.06) 0.34(0.29+0.05) +0.0% 0.34(0.29+0.04) +0.0% 0.35(0.28+0.06) +2.9% 2000.8: git add -A (sparse-v3) 0.06(0.07+0.04) 0.06(0.05+0.06) +0.0% 0.06(0.09+0.01) +0.0% 0.06(0.08+0.03) +0.0% 2000.9: git add -A (sparse-v4) 0.05(0.05+0.04) 0.05(0.05+0.07) +0.0% 0.05(0.04+0.06) +0.0% 0.06(0.06+0.05) +20.0% 2000.10: git add . (full-v3) 0.38(0.31+0.05) 0.37(0.29+0.06) -2.6% 0.37(0.30+0.07) -2.6% 0.37(0.29+0.06) -2.6% 2000.11: git add . (full-v4) 0.35(0.31+0.04) 0.35(0.29+0.07) +0.0% 0.35(0.29+0.05) +0.0% 0.34(0.29+0.06) -2.9% 2000.12: git add . (sparse-v3) 0.06(0.06+0.05) 0.06(0.05+0.06) +0.0% 0.06(0.07+0.05) +0.0% 0.06(0.09+0.03) +0.0% 2000.13: git add . (sparse-v4) 0.06(0.06+0.06) 0.06(0.07+0.04) +0.0% 0.05(0.06+0.05) -16.7% 0.05(0.05+0.07) -16.7% 2000.14: git commit -a -m A (full-v3) 0.48(0.37+0.08) 0.45(0.36+0.08) -6.2% 0.45(0.35+0.09) -6.2% 0.44(0.36+0.07) -8.3% 2000.15: git commit -a -m A (full-v4) 0.45(0.40+0.06) 0.43(0.34+0.07) -4.4% 0.45(0.37+0.06) +0.0% 0.42(0.36+0.05) -6.7% 2000.16: git commit -a -m A (sparse-v3) 0.05(0.05+0.06) 0.05(0.05+0.03) +0.0% 0.05(0.06+0.06) +0.0% 0.05(0.04+0.06) +0.0% 2000.17: git commit -a -m A (sparse-v4) 0.05(0.06+0.03) 0.05(0.06+0.04) +0.0% 0.06(0.07+0.05) +20.0% 0.05(0.04+0.06) +0.0% 2000.18: git checkout -f - (full-v3) 0.55(0.43+0.08) 0.54(0.46+0.05) -1.8% 0.55(0.46+0.07) +0.0% 0.54(0.40+0.10) -1.8% 2000.19: git checkout -f - (full-v4) 0.55(0.41+0.09) 0.50(0.40+0.09) -9.1% 0.51(0.46+0.05) -7.3% 0.51(0.44+0.06) -7.3% 2000.20: git checkout -f - (sparse-v3) 0.06(0.09+0.03) 0.06(0.08+0.03) +0.0% 0.06(0.06+0.05) +0.0% 0.07(0.09+0.03) +16.7% 2000.21: git checkout -f - (sparse-v4) 0.06(0.08+0.04) 0.05(0.07+0.05) -16.7% 0.05(0.07+0.04) -16.7% 0.06(0.09+0.03) +0.0% ``` All of the above were already integrated. ``` 2000.22: git reset (full-v3) 0.41(0.32+0.06) 0.40(0.31+0.06) -2.4% 0.41(0.33+0.05) +0.0% 0.42(0.34+0.04) +2.4% 2000.23: git reset (full-v4) 0.37(0.32+0.05) 0.35(0.30+0.05) -5.4% 0.37(0.30+0.05) +0.0% 0.35(0.31+0.03) -5.4% 2000.24: git reset (sparse-v3) 0.68(0.65+0.05) 0.55(0.52+0.04) -19.1% 0.04(0.05+0.04) -94.1% 0.04(0.05+0.04) -94.1% 2000.25: git reset (sparse-v4) 0.70(0.65+0.05) 0.54(0.50+0.06) -22.9% 0.04(0.07+0.01) -94.3% 0.03(0.05+0.05) -95.7% 2000.26: git reset --hard (full-v3) 0.54(0.43+0.07) 0.53(0.43+0.06) -1.9% 0.55(0.46+0.05) +1.9% 0.55(0.44+0.06) +1.9% 2000.27: git reset --hard (full-v4) 0.50(0.45+0.03) 0.50(0.43+0.05) +0.0% 0.49(0.41+0.06) -2.0% 0.50(0.42+0.05) +0.0% 2000.28: git reset --hard (sparse-v3) 0.83(0.76+0.06) 0.68(0.62+0.05) -18.1% 0.07(0.05+0.02) -91.6% 0.07(0.05+0.02) -91.6% 2000.29: git reset --hard (sparse-v4) 0.80(0.75+0.05) 0.69(0.62+0.06) -13.8% 0.07(0.04+0.02) -91.2% 0.07(0.04+0.03) -91.2% ``` As expected, `git reset [--hard]` improves with the sparse index integration, but remains constant across the full index case. ``` 2000.30: git update-index --add --remove (full-v3) 0.03(0.01+0.01) 0.03(0.02+0.01) +0.0% 0.03(0.02+0.01) +0.0% 0.03(0.01+0.01) +0.0% 2000.31: git update-index --add --remove (full-v4) 0.03(0.02+0.01) 0.03(0.02+0.01) +0.0% 0.03(0.03+0.00) +0.0% 0.03(0.02+0.01) +0.0% 2000.32: git update-index --add --remove (sparse-v3) 0.57(0.54+0.02) 0.43(0.42+0.00) -24.6% 0.44(0.41+0.03) -22.8% 0.44(0.42+0.01) -22.8% 2000.33: git update-index --add --remove (sparse-v4) 0.56(0.52+0.04) 0.43(0.42+0.01) -23.2% 0.44(0.42+0.02) -21.4% 0.42(0.41+0.01) -25.0% ``` These do not change significantly because #423 is not merged. ``` 2000.34: git diff (full-v3) 0.07(0.05+0.03) 0.06(0.05+0.03) -14.3% 0.07(0.05+0.03) +0.0% 0.06(0.05+0.03) -14.3% 2000.35: git diff (full-v4) 0.06(0.05+0.03) 0.06(0.05+0.02) +0.0% 0.06(0.05+0.02) +0.0% 0.06(0.06+0.02) +0.0% 2000.36: git diff (sparse-v3) 0.25(0.23+0.03) 0.17(0.17+0.02) -32.0% 0.18(0.18+0.02) -28.0% 0.01(0.03+0.03) -96.0% 2000.37: git diff (sparse-v4) 0.25(0.22+0.05) 0.16(0.16+0.01) -36.0% 0.18(0.15+0.04) -28.0% 0.01(0.04+0.02) -96.0% 2000.38: git diff --staged (full-v3) 0.03(0.01+0.01) 0.03(0.02+0.01) +0.0% 0.03(0.02+0.01) +0.0% 0.03(0.02+0.00) +0.0% 2000.39: git diff --staged (full-v4) 0.04(0.03+0.01) 0.03(0.02+0.01) -25.0% 0.03(0.03+0.00) -25.0% 0.03(0.03+0.00) -25.0% 2000.40: git diff --staged (sparse-v3) 0.21(0.19+0.01) 0.15(0.13+0.01) -28.6% 0.15(0.14+0.01) -28.6% 0.01(0.01+0.00) -95.2% 2000.41: git diff --staged (sparse-v4) 0.22(0.21+0.01) 0.14(0.11+0.03) -36.4% 0.15(0.13+0.02) -31.8% 0.01(0.01+0.00) -95.5% ``` The `git diff` improvements are measurable. ``` 2000.42: git sparse-checkout reapply (full-v3) 0.63(0.54+0.05) 0.56(0.48+0.04) -11.1% 0.57(0.48+0.03) -9.5% 0.59(0.48+0.05) -6.3% 2000.43: git sparse-checkout reapply (full-v4) 0.60(0.54+0.02) 0.51(0.46+0.03) -15.0% 0.54(0.48+0.02) -10.0% 0.50(0.44+0.04) -16.7% 2000.44: git sparse-checkout reapply (sparse-v3) 0.91(0.86+0.05) 0.05(0.05+0.00) -94.5% 0.06(0.05+0.01) -93.4% 0.06(0.06+0.00) -93.4% 2000.45: git sparse-checkout reapply (sparse-v4) 0.92(0.88+0.04) 0.05(0.05+0.00) -94.6% 0.05(0.05+0.01) -94.6% 0.05(0.04+0.01) -94.6% ``` Finally, the `git sparse-checkout` measurements are also present. This test script is particularly valuable when contributing changes upstream. It can be good to start by adding the lines to the performance test in an early commit, then demonstrating the performance change by copying the necessary lines from the output table into your commit message.
derrickstolee
added a commit
that referenced
this pull request
Oct 30, 2021
…ckout` builtin This integrates the `sparse-checkout` builtin with the sparse index. The tricky part here is that we need to partially expand the index when we are modifying the sparse-checkout definition. Note that we modify the pattern list in a careful way: we create a `struct pattern_list` in-memory in `builtin/sparse-checkout.c` then apply those patterns to the index before writing the patterns to the sparse-checkout file. The `update_sparsity()` method does the work to assign the `SKIP_WORKTREE` bit appropriately, but this doesn't work if the files that are within the new sparse-checkout cone are still hidden behind a sparse directory. The new `expand_to_pattern_list()` method does the hard work of expanding the sparse directories that are now within the new patterns. This expands only as far as needed, possibly creating new sparse directory entries. This method does not contract existing files to sparse directories, and a big reason why is because of the check for ignored files as we delete those directories. The `clean_tracked_sparse_directories()` method is called after `update_sparsity()`, but we need to read the `A/B/.gitignore` file (or lack thereof) before we can delete `A/B/`. If we convert to sparse too quickly, then we lose this information and cause a full expansion. Most of the correctness is handled by existing tests in `t1092`, but I add checks for `ensure_not_expanded` in some hopefully interesting cases. As for performance, `git sparse-checkout set` can be slow if it needs to move a lot of files. However, no-op `git sparse-checkout set` (i.e. set the sparse-checkout cone to only include files at root, and do this on repeat) has these performance results on Linux in a monorepo with 2+ million files at `HEAD`: ``` Benchmark #1: baseline Time (mean ± σ): 10.465 s ± 0.018 s [User: 9.885 s, System: 0.573 s] Range (min … max): 10.450 s … 10.497 s 5 runs Benchmark #2: new code Time (mean ± σ): 68.9 ms ± 2.9 ms [User: 45.8 ms, System: 17.1 ms] Range (min … max): 63.4 ms … 74.0 ms 41 runs Summary 'new code' ran 151.89 ± 6.30 times faster than 'baseline' ```
derrickstolee
added a commit
that referenced
this pull request
Oct 30, 2021
One thing I forgot when talking about the sparse index is that we have a performance test: `t/perf/p2000-sparse-operations.sh`. This test wasn't helpful for commands like `git merge` that need a particular set of input, but work for more read-only operations. Here is a quick demonstration of how this performance test works so we could have a definitive measure of how your previous updates improved performance. To get these results, I ran the following command in `t/perf`: ``` ./run 4bcd533 f9255a5 f28fc01 b713582 -- p2000-sparse-operations.sh ``` The short-shas correspond to the merge commits for these PRs: * #410 * #421 * #417 * #419 The test takes a copy of the Git repository and creates several copies within a nested directory heirarchy. ``` Test 4bcd533 f9255a5 f28fc01 b713582 ------------------------------------------------------------------------------------------------------------------------------------------------- 2000.2: git status (full-v3) 0.19(0.15+0.05) 0.19(0.16+0.05) +0.0% 0.20(0.18+0.03) +5.3% 0.19(0.17+0.04) +0.0% 2000.3: git status (full-v4) 0.20(0.18+0.04) 0.19(0.15+0.06) -5.0% 0.21(0.18+0.05) +5.0% 0.18(0.18+0.02) -10.0% 2000.4: git status (sparse-v3) 0.04(0.04+0.04) 0.05(0.07+0.04) +25.0% 0.04(0.04+0.05) +0.0% 0.04(0.06+0.04) +0.0% 2000.5: git status (sparse-v4) 0.04(0.03+0.06) 0.04(0.05+0.05) +0.0% 0.05(0.05+0.04) +25.0% 0.05(0.06+0.04) +25.0% 2000.6: git add -A (full-v3) 0.36(0.29+0.05) 0.38(0.28+0.07) +5.6% 0.36(0.31+0.05) +0.0% 0.37(0.31+0.05) +2.8% 2000.7: git add -A (full-v4) 0.34(0.27+0.06) 0.34(0.29+0.05) +0.0% 0.34(0.29+0.04) +0.0% 0.35(0.28+0.06) +2.9% 2000.8: git add -A (sparse-v3) 0.06(0.07+0.04) 0.06(0.05+0.06) +0.0% 0.06(0.09+0.01) +0.0% 0.06(0.08+0.03) +0.0% 2000.9: git add -A (sparse-v4) 0.05(0.05+0.04) 0.05(0.05+0.07) +0.0% 0.05(0.04+0.06) +0.0% 0.06(0.06+0.05) +20.0% 2000.10: git add . (full-v3) 0.38(0.31+0.05) 0.37(0.29+0.06) -2.6% 0.37(0.30+0.07) -2.6% 0.37(0.29+0.06) -2.6% 2000.11: git add . (full-v4) 0.35(0.31+0.04) 0.35(0.29+0.07) +0.0% 0.35(0.29+0.05) +0.0% 0.34(0.29+0.06) -2.9% 2000.12: git add . (sparse-v3) 0.06(0.06+0.05) 0.06(0.05+0.06) +0.0% 0.06(0.07+0.05) +0.0% 0.06(0.09+0.03) +0.0% 2000.13: git add . (sparse-v4) 0.06(0.06+0.06) 0.06(0.07+0.04) +0.0% 0.05(0.06+0.05) -16.7% 0.05(0.05+0.07) -16.7% 2000.14: git commit -a -m A (full-v3) 0.48(0.37+0.08) 0.45(0.36+0.08) -6.2% 0.45(0.35+0.09) -6.2% 0.44(0.36+0.07) -8.3% 2000.15: git commit -a -m A (full-v4) 0.45(0.40+0.06) 0.43(0.34+0.07) -4.4% 0.45(0.37+0.06) +0.0% 0.42(0.36+0.05) -6.7% 2000.16: git commit -a -m A (sparse-v3) 0.05(0.05+0.06) 0.05(0.05+0.03) +0.0% 0.05(0.06+0.06) +0.0% 0.05(0.04+0.06) +0.0% 2000.17: git commit -a -m A (sparse-v4) 0.05(0.06+0.03) 0.05(0.06+0.04) +0.0% 0.06(0.07+0.05) +20.0% 0.05(0.04+0.06) +0.0% 2000.18: git checkout -f - (full-v3) 0.55(0.43+0.08) 0.54(0.46+0.05) -1.8% 0.55(0.46+0.07) +0.0% 0.54(0.40+0.10) -1.8% 2000.19: git checkout -f - (full-v4) 0.55(0.41+0.09) 0.50(0.40+0.09) -9.1% 0.51(0.46+0.05) -7.3% 0.51(0.44+0.06) -7.3% 2000.20: git checkout -f - (sparse-v3) 0.06(0.09+0.03) 0.06(0.08+0.03) +0.0% 0.06(0.06+0.05) +0.0% 0.07(0.09+0.03) +16.7% 2000.21: git checkout -f - (sparse-v4) 0.06(0.08+0.04) 0.05(0.07+0.05) -16.7% 0.05(0.07+0.04) -16.7% 0.06(0.09+0.03) +0.0% ``` All of the above were already integrated. ``` 2000.22: git reset (full-v3) 0.41(0.32+0.06) 0.40(0.31+0.06) -2.4% 0.41(0.33+0.05) +0.0% 0.42(0.34+0.04) +2.4% 2000.23: git reset (full-v4) 0.37(0.32+0.05) 0.35(0.30+0.05) -5.4% 0.37(0.30+0.05) +0.0% 0.35(0.31+0.03) -5.4% 2000.24: git reset (sparse-v3) 0.68(0.65+0.05) 0.55(0.52+0.04) -19.1% 0.04(0.05+0.04) -94.1% 0.04(0.05+0.04) -94.1% 2000.25: git reset (sparse-v4) 0.70(0.65+0.05) 0.54(0.50+0.06) -22.9% 0.04(0.07+0.01) -94.3% 0.03(0.05+0.05) -95.7% 2000.26: git reset --hard (full-v3) 0.54(0.43+0.07) 0.53(0.43+0.06) -1.9% 0.55(0.46+0.05) +1.9% 0.55(0.44+0.06) +1.9% 2000.27: git reset --hard (full-v4) 0.50(0.45+0.03) 0.50(0.43+0.05) +0.0% 0.49(0.41+0.06) -2.0% 0.50(0.42+0.05) +0.0% 2000.28: git reset --hard (sparse-v3) 0.83(0.76+0.06) 0.68(0.62+0.05) -18.1% 0.07(0.05+0.02) -91.6% 0.07(0.05+0.02) -91.6% 2000.29: git reset --hard (sparse-v4) 0.80(0.75+0.05) 0.69(0.62+0.06) -13.8% 0.07(0.04+0.02) -91.2% 0.07(0.04+0.03) -91.2% ``` As expected, `git reset [--hard]` improves with the sparse index integration, but remains constant across the full index case. ``` 2000.30: git update-index --add --remove (full-v3) 0.03(0.01+0.01) 0.03(0.02+0.01) +0.0% 0.03(0.02+0.01) +0.0% 0.03(0.01+0.01) +0.0% 2000.31: git update-index --add --remove (full-v4) 0.03(0.02+0.01) 0.03(0.02+0.01) +0.0% 0.03(0.03+0.00) +0.0% 0.03(0.02+0.01) +0.0% 2000.32: git update-index --add --remove (sparse-v3) 0.57(0.54+0.02) 0.43(0.42+0.00) -24.6% 0.44(0.41+0.03) -22.8% 0.44(0.42+0.01) -22.8% 2000.33: git update-index --add --remove (sparse-v4) 0.56(0.52+0.04) 0.43(0.42+0.01) -23.2% 0.44(0.42+0.02) -21.4% 0.42(0.41+0.01) -25.0% ``` These do not change significantly because #423 is not merged. ``` 2000.34: git diff (full-v3) 0.07(0.05+0.03) 0.06(0.05+0.03) -14.3% 0.07(0.05+0.03) +0.0% 0.06(0.05+0.03) -14.3% 2000.35: git diff (full-v4) 0.06(0.05+0.03) 0.06(0.05+0.02) +0.0% 0.06(0.05+0.02) +0.0% 0.06(0.06+0.02) +0.0% 2000.36: git diff (sparse-v3) 0.25(0.23+0.03) 0.17(0.17+0.02) -32.0% 0.18(0.18+0.02) -28.0% 0.01(0.03+0.03) -96.0% 2000.37: git diff (sparse-v4) 0.25(0.22+0.05) 0.16(0.16+0.01) -36.0% 0.18(0.15+0.04) -28.0% 0.01(0.04+0.02) -96.0% 2000.38: git diff --staged (full-v3) 0.03(0.01+0.01) 0.03(0.02+0.01) +0.0% 0.03(0.02+0.01) +0.0% 0.03(0.02+0.00) +0.0% 2000.39: git diff --staged (full-v4) 0.04(0.03+0.01) 0.03(0.02+0.01) -25.0% 0.03(0.03+0.00) -25.0% 0.03(0.03+0.00) -25.0% 2000.40: git diff --staged (sparse-v3) 0.21(0.19+0.01) 0.15(0.13+0.01) -28.6% 0.15(0.14+0.01) -28.6% 0.01(0.01+0.00) -95.2% 2000.41: git diff --staged (sparse-v4) 0.22(0.21+0.01) 0.14(0.11+0.03) -36.4% 0.15(0.13+0.02) -31.8% 0.01(0.01+0.00) -95.5% ``` The `git diff` improvements are measurable. ``` 2000.42: git sparse-checkout reapply (full-v3) 0.63(0.54+0.05) 0.56(0.48+0.04) -11.1% 0.57(0.48+0.03) -9.5% 0.59(0.48+0.05) -6.3% 2000.43: git sparse-checkout reapply (full-v4) 0.60(0.54+0.02) 0.51(0.46+0.03) -15.0% 0.54(0.48+0.02) -10.0% 0.50(0.44+0.04) -16.7% 2000.44: git sparse-checkout reapply (sparse-v3) 0.91(0.86+0.05) 0.05(0.05+0.00) -94.5% 0.06(0.05+0.01) -93.4% 0.06(0.06+0.00) -93.4% 2000.45: git sparse-checkout reapply (sparse-v4) 0.92(0.88+0.04) 0.05(0.05+0.00) -94.6% 0.05(0.05+0.01) -94.6% 0.05(0.04+0.01) -94.6% ``` Finally, the `git sparse-checkout` measurements are also present. This test script is particularly valuable when contributing changes upstream. It can be good to start by adding the lines to the performance test in an early commit, then demonstrating the performance change by copying the necessary lines from the output table into your commit message.
derrickstolee
added a commit
that referenced
this pull request
Oct 31, 2021
…ckout` builtin This integrates the `sparse-checkout` builtin with the sparse index. The tricky part here is that we need to partially expand the index when we are modifying the sparse-checkout definition. Note that we modify the pattern list in a careful way: we create a `struct pattern_list` in-memory in `builtin/sparse-checkout.c` then apply those patterns to the index before writing the patterns to the sparse-checkout file. The `update_sparsity()` method does the work to assign the `SKIP_WORKTREE` bit appropriately, but this doesn't work if the files that are within the new sparse-checkout cone are still hidden behind a sparse directory. The new `expand_to_pattern_list()` method does the hard work of expanding the sparse directories that are now within the new patterns. This expands only as far as needed, possibly creating new sparse directory entries. This method does not contract existing files to sparse directories, and a big reason why is because of the check for ignored files as we delete those directories. The `clean_tracked_sparse_directories()` method is called after `update_sparsity()`, but we need to read the `A/B/.gitignore` file (or lack thereof) before we can delete `A/B/`. If we convert to sparse too quickly, then we lose this information and cause a full expansion. Most of the correctness is handled by existing tests in `t1092`, but I add checks for `ensure_not_expanded` in some hopefully interesting cases. As for performance, `git sparse-checkout set` can be slow if it needs to move a lot of files. However, no-op `git sparse-checkout set` (i.e. set the sparse-checkout cone to only include files at root, and do this on repeat) has these performance results on Linux in a monorepo with 2+ million files at `HEAD`: ``` Benchmark #1: baseline Time (mean ± σ): 10.465 s ± 0.018 s [User: 9.885 s, System: 0.573 s] Range (min … max): 10.450 s … 10.497 s 5 runs Benchmark #2: new code Time (mean ± σ): 68.9 ms ± 2.9 ms [User: 45.8 ms, System: 17.1 ms] Range (min … max): 63.4 ms … 74.0 ms 41 runs Summary 'new code' ran 151.89 ± 6.30 times faster than 'baseline' ```
derrickstolee
added a commit
that referenced
this pull request
Oct 31, 2021
One thing I forgot when talking about the sparse index is that we have a performance test: `t/perf/p2000-sparse-operations.sh`. This test wasn't helpful for commands like `git merge` that need a particular set of input, but work for more read-only operations. Here is a quick demonstration of how this performance test works so we could have a definitive measure of how your previous updates improved performance. To get these results, I ran the following command in `t/perf`: ``` ./run 4bcd533 f9255a5 f28fc01 b713582 -- p2000-sparse-operations.sh ``` The short-shas correspond to the merge commits for these PRs: * #410 * #421 * #417 * #419 The test takes a copy of the Git repository and creates several copies within a nested directory heirarchy. ``` Test 4bcd533 f9255a5 f28fc01 b713582 ------------------------------------------------------------------------------------------------------------------------------------------------- 2000.2: git status (full-v3) 0.19(0.15+0.05) 0.19(0.16+0.05) +0.0% 0.20(0.18+0.03) +5.3% 0.19(0.17+0.04) +0.0% 2000.3: git status (full-v4) 0.20(0.18+0.04) 0.19(0.15+0.06) -5.0% 0.21(0.18+0.05) +5.0% 0.18(0.18+0.02) -10.0% 2000.4: git status (sparse-v3) 0.04(0.04+0.04) 0.05(0.07+0.04) +25.0% 0.04(0.04+0.05) +0.0% 0.04(0.06+0.04) +0.0% 2000.5: git status (sparse-v4) 0.04(0.03+0.06) 0.04(0.05+0.05) +0.0% 0.05(0.05+0.04) +25.0% 0.05(0.06+0.04) +25.0% 2000.6: git add -A (full-v3) 0.36(0.29+0.05) 0.38(0.28+0.07) +5.6% 0.36(0.31+0.05) +0.0% 0.37(0.31+0.05) +2.8% 2000.7: git add -A (full-v4) 0.34(0.27+0.06) 0.34(0.29+0.05) +0.0% 0.34(0.29+0.04) +0.0% 0.35(0.28+0.06) +2.9% 2000.8: git add -A (sparse-v3) 0.06(0.07+0.04) 0.06(0.05+0.06) +0.0% 0.06(0.09+0.01) +0.0% 0.06(0.08+0.03) +0.0% 2000.9: git add -A (sparse-v4) 0.05(0.05+0.04) 0.05(0.05+0.07) +0.0% 0.05(0.04+0.06) +0.0% 0.06(0.06+0.05) +20.0% 2000.10: git add . (full-v3) 0.38(0.31+0.05) 0.37(0.29+0.06) -2.6% 0.37(0.30+0.07) -2.6% 0.37(0.29+0.06) -2.6% 2000.11: git add . (full-v4) 0.35(0.31+0.04) 0.35(0.29+0.07) +0.0% 0.35(0.29+0.05) +0.0% 0.34(0.29+0.06) -2.9% 2000.12: git add . (sparse-v3) 0.06(0.06+0.05) 0.06(0.05+0.06) +0.0% 0.06(0.07+0.05) +0.0% 0.06(0.09+0.03) +0.0% 2000.13: git add . (sparse-v4) 0.06(0.06+0.06) 0.06(0.07+0.04) +0.0% 0.05(0.06+0.05) -16.7% 0.05(0.05+0.07) -16.7% 2000.14: git commit -a -m A (full-v3) 0.48(0.37+0.08) 0.45(0.36+0.08) -6.2% 0.45(0.35+0.09) -6.2% 0.44(0.36+0.07) -8.3% 2000.15: git commit -a -m A (full-v4) 0.45(0.40+0.06) 0.43(0.34+0.07) -4.4% 0.45(0.37+0.06) +0.0% 0.42(0.36+0.05) -6.7% 2000.16: git commit -a -m A (sparse-v3) 0.05(0.05+0.06) 0.05(0.05+0.03) +0.0% 0.05(0.06+0.06) +0.0% 0.05(0.04+0.06) +0.0% 2000.17: git commit -a -m A (sparse-v4) 0.05(0.06+0.03) 0.05(0.06+0.04) +0.0% 0.06(0.07+0.05) +20.0% 0.05(0.04+0.06) +0.0% 2000.18: git checkout -f - (full-v3) 0.55(0.43+0.08) 0.54(0.46+0.05) -1.8% 0.55(0.46+0.07) +0.0% 0.54(0.40+0.10) -1.8% 2000.19: git checkout -f - (full-v4) 0.55(0.41+0.09) 0.50(0.40+0.09) -9.1% 0.51(0.46+0.05) -7.3% 0.51(0.44+0.06) -7.3% 2000.20: git checkout -f - (sparse-v3) 0.06(0.09+0.03) 0.06(0.08+0.03) +0.0% 0.06(0.06+0.05) +0.0% 0.07(0.09+0.03) +16.7% 2000.21: git checkout -f - (sparse-v4) 0.06(0.08+0.04) 0.05(0.07+0.05) -16.7% 0.05(0.07+0.04) -16.7% 0.06(0.09+0.03) +0.0% ``` All of the above were already integrated. ``` 2000.22: git reset (full-v3) 0.41(0.32+0.06) 0.40(0.31+0.06) -2.4% 0.41(0.33+0.05) +0.0% 0.42(0.34+0.04) +2.4% 2000.23: git reset (full-v4) 0.37(0.32+0.05) 0.35(0.30+0.05) -5.4% 0.37(0.30+0.05) +0.0% 0.35(0.31+0.03) -5.4% 2000.24: git reset (sparse-v3) 0.68(0.65+0.05) 0.55(0.52+0.04) -19.1% 0.04(0.05+0.04) -94.1% 0.04(0.05+0.04) -94.1% 2000.25: git reset (sparse-v4) 0.70(0.65+0.05) 0.54(0.50+0.06) -22.9% 0.04(0.07+0.01) -94.3% 0.03(0.05+0.05) -95.7% 2000.26: git reset --hard (full-v3) 0.54(0.43+0.07) 0.53(0.43+0.06) -1.9% 0.55(0.46+0.05) +1.9% 0.55(0.44+0.06) +1.9% 2000.27: git reset --hard (full-v4) 0.50(0.45+0.03) 0.50(0.43+0.05) +0.0% 0.49(0.41+0.06) -2.0% 0.50(0.42+0.05) +0.0% 2000.28: git reset --hard (sparse-v3) 0.83(0.76+0.06) 0.68(0.62+0.05) -18.1% 0.07(0.05+0.02) -91.6% 0.07(0.05+0.02) -91.6% 2000.29: git reset --hard (sparse-v4) 0.80(0.75+0.05) 0.69(0.62+0.06) -13.8% 0.07(0.04+0.02) -91.2% 0.07(0.04+0.03) -91.2% ``` As expected, `git reset [--hard]` improves with the sparse index integration, but remains constant across the full index case. ``` 2000.30: git update-index --add --remove (full-v3) 0.03(0.01+0.01) 0.03(0.02+0.01) +0.0% 0.03(0.02+0.01) +0.0% 0.03(0.01+0.01) +0.0% 2000.31: git update-index --add --remove (full-v4) 0.03(0.02+0.01) 0.03(0.02+0.01) +0.0% 0.03(0.03+0.00) +0.0% 0.03(0.02+0.01) +0.0% 2000.32: git update-index --add --remove (sparse-v3) 0.57(0.54+0.02) 0.43(0.42+0.00) -24.6% 0.44(0.41+0.03) -22.8% 0.44(0.42+0.01) -22.8% 2000.33: git update-index --add --remove (sparse-v4) 0.56(0.52+0.04) 0.43(0.42+0.01) -23.2% 0.44(0.42+0.02) -21.4% 0.42(0.41+0.01) -25.0% ``` These do not change significantly because #423 is not merged. ``` 2000.34: git diff (full-v3) 0.07(0.05+0.03) 0.06(0.05+0.03) -14.3% 0.07(0.05+0.03) +0.0% 0.06(0.05+0.03) -14.3% 2000.35: git diff (full-v4) 0.06(0.05+0.03) 0.06(0.05+0.02) +0.0% 0.06(0.05+0.02) +0.0% 0.06(0.06+0.02) +0.0% 2000.36: git diff (sparse-v3) 0.25(0.23+0.03) 0.17(0.17+0.02) -32.0% 0.18(0.18+0.02) -28.0% 0.01(0.03+0.03) -96.0% 2000.37: git diff (sparse-v4) 0.25(0.22+0.05) 0.16(0.16+0.01) -36.0% 0.18(0.15+0.04) -28.0% 0.01(0.04+0.02) -96.0% 2000.38: git diff --staged (full-v3) 0.03(0.01+0.01) 0.03(0.02+0.01) +0.0% 0.03(0.02+0.01) +0.0% 0.03(0.02+0.00) +0.0% 2000.39: git diff --staged (full-v4) 0.04(0.03+0.01) 0.03(0.02+0.01) -25.0% 0.03(0.03+0.00) -25.0% 0.03(0.03+0.00) -25.0% 2000.40: git diff --staged (sparse-v3) 0.21(0.19+0.01) 0.15(0.13+0.01) -28.6% 0.15(0.14+0.01) -28.6% 0.01(0.01+0.00) -95.2% 2000.41: git diff --staged (sparse-v4) 0.22(0.21+0.01) 0.14(0.11+0.03) -36.4% 0.15(0.13+0.02) -31.8% 0.01(0.01+0.00) -95.5% ``` The `git diff` improvements are measurable. ``` 2000.42: git sparse-checkout reapply (full-v3) 0.63(0.54+0.05) 0.56(0.48+0.04) -11.1% 0.57(0.48+0.03) -9.5% 0.59(0.48+0.05) -6.3% 2000.43: git sparse-checkout reapply (full-v4) 0.60(0.54+0.02) 0.51(0.46+0.03) -15.0% 0.54(0.48+0.02) -10.0% 0.50(0.44+0.04) -16.7% 2000.44: git sparse-checkout reapply (sparse-v3) 0.91(0.86+0.05) 0.05(0.05+0.00) -94.5% 0.06(0.05+0.01) -93.4% 0.06(0.06+0.00) -93.4% 2000.45: git sparse-checkout reapply (sparse-v4) 0.92(0.88+0.04) 0.05(0.05+0.00) -94.6% 0.05(0.05+0.01) -94.6% 0.05(0.04+0.01) -94.6% ``` Finally, the `git sparse-checkout` measurements are also present. This test script is particularly valuable when contributing changes upstream. It can be good to start by adding the lines to the performance test in an early commit, then demonstrating the performance change by copying the necessary lines from the output table into your commit message.
derrickstolee
added a commit
that referenced
this pull request
Nov 4, 2021
…ckout` builtin This integrates the `sparse-checkout` builtin with the sparse index. The tricky part here is that we need to partially expand the index when we are modifying the sparse-checkout definition. Note that we modify the pattern list in a careful way: we create a `struct pattern_list` in-memory in `builtin/sparse-checkout.c` then apply those patterns to the index before writing the patterns to the sparse-checkout file. The `update_sparsity()` method does the work to assign the `SKIP_WORKTREE` bit appropriately, but this doesn't work if the files that are within the new sparse-checkout cone are still hidden behind a sparse directory. The new `expand_to_pattern_list()` method does the hard work of expanding the sparse directories that are now within the new patterns. This expands only as far as needed, possibly creating new sparse directory entries. This method does not contract existing files to sparse directories, and a big reason why is because of the check for ignored files as we delete those directories. The `clean_tracked_sparse_directories()` method is called after `update_sparsity()`, but we need to read the `A/B/.gitignore` file (or lack thereof) before we can delete `A/B/`. If we convert to sparse too quickly, then we lose this information and cause a full expansion. Most of the correctness is handled by existing tests in `t1092`, but I add checks for `ensure_not_expanded` in some hopefully interesting cases. As for performance, `git sparse-checkout set` can be slow if it needs to move a lot of files. However, no-op `git sparse-checkout set` (i.e. set the sparse-checkout cone to only include files at root, and do this on repeat) has these performance results on Linux in a monorepo with 2+ million files at `HEAD`: ``` Benchmark #1: baseline Time (mean ± σ): 10.465 s ± 0.018 s [User: 9.885 s, System: 0.573 s] Range (min … max): 10.450 s … 10.497 s 5 runs Benchmark #2: new code Time (mean ± σ): 68.9 ms ± 2.9 ms [User: 45.8 ms, System: 17.1 ms] Range (min … max): 63.4 ms … 74.0 ms 41 runs Summary 'new code' ran 151.89 ± 6.30 times faster than 'baseline' ```
derrickstolee
added a commit
that referenced
this pull request
Nov 4, 2021
One thing I forgot when talking about the sparse index is that we have a performance test: `t/perf/p2000-sparse-operations.sh`. This test wasn't helpful for commands like `git merge` that need a particular set of input, but work for more read-only operations. Here is a quick demonstration of how this performance test works so we could have a definitive measure of how your previous updates improved performance. To get these results, I ran the following command in `t/perf`: ``` ./run 4bcd533 f9255a5 f28fc01 b713582 -- p2000-sparse-operations.sh ``` The short-shas correspond to the merge commits for these PRs: * #410 * #421 * #417 * #419 The test takes a copy of the Git repository and creates several copies within a nested directory heirarchy. ``` Test 4bcd533 f9255a5 f28fc01 b713582 ------------------------------------------------------------------------------------------------------------------------------------------------- 2000.2: git status (full-v3) 0.19(0.15+0.05) 0.19(0.16+0.05) +0.0% 0.20(0.18+0.03) +5.3% 0.19(0.17+0.04) +0.0% 2000.3: git status (full-v4) 0.20(0.18+0.04) 0.19(0.15+0.06) -5.0% 0.21(0.18+0.05) +5.0% 0.18(0.18+0.02) -10.0% 2000.4: git status (sparse-v3) 0.04(0.04+0.04) 0.05(0.07+0.04) +25.0% 0.04(0.04+0.05) +0.0% 0.04(0.06+0.04) +0.0% 2000.5: git status (sparse-v4) 0.04(0.03+0.06) 0.04(0.05+0.05) +0.0% 0.05(0.05+0.04) +25.0% 0.05(0.06+0.04) +25.0% 2000.6: git add -A (full-v3) 0.36(0.29+0.05) 0.38(0.28+0.07) +5.6% 0.36(0.31+0.05) +0.0% 0.37(0.31+0.05) +2.8% 2000.7: git add -A (full-v4) 0.34(0.27+0.06) 0.34(0.29+0.05) +0.0% 0.34(0.29+0.04) +0.0% 0.35(0.28+0.06) +2.9% 2000.8: git add -A (sparse-v3) 0.06(0.07+0.04) 0.06(0.05+0.06) +0.0% 0.06(0.09+0.01) +0.0% 0.06(0.08+0.03) +0.0% 2000.9: git add -A (sparse-v4) 0.05(0.05+0.04) 0.05(0.05+0.07) +0.0% 0.05(0.04+0.06) +0.0% 0.06(0.06+0.05) +20.0% 2000.10: git add . (full-v3) 0.38(0.31+0.05) 0.37(0.29+0.06) -2.6% 0.37(0.30+0.07) -2.6% 0.37(0.29+0.06) -2.6% 2000.11: git add . (full-v4) 0.35(0.31+0.04) 0.35(0.29+0.07) +0.0% 0.35(0.29+0.05) +0.0% 0.34(0.29+0.06) -2.9% 2000.12: git add . (sparse-v3) 0.06(0.06+0.05) 0.06(0.05+0.06) +0.0% 0.06(0.07+0.05) +0.0% 0.06(0.09+0.03) +0.0% 2000.13: git add . (sparse-v4) 0.06(0.06+0.06) 0.06(0.07+0.04) +0.0% 0.05(0.06+0.05) -16.7% 0.05(0.05+0.07) -16.7% 2000.14: git commit -a -m A (full-v3) 0.48(0.37+0.08) 0.45(0.36+0.08) -6.2% 0.45(0.35+0.09) -6.2% 0.44(0.36+0.07) -8.3% 2000.15: git commit -a -m A (full-v4) 0.45(0.40+0.06) 0.43(0.34+0.07) -4.4% 0.45(0.37+0.06) +0.0% 0.42(0.36+0.05) -6.7% 2000.16: git commit -a -m A (sparse-v3) 0.05(0.05+0.06) 0.05(0.05+0.03) +0.0% 0.05(0.06+0.06) +0.0% 0.05(0.04+0.06) +0.0% 2000.17: git commit -a -m A (sparse-v4) 0.05(0.06+0.03) 0.05(0.06+0.04) +0.0% 0.06(0.07+0.05) +20.0% 0.05(0.04+0.06) +0.0% 2000.18: git checkout -f - (full-v3) 0.55(0.43+0.08) 0.54(0.46+0.05) -1.8% 0.55(0.46+0.07) +0.0% 0.54(0.40+0.10) -1.8% 2000.19: git checkout -f - (full-v4) 0.55(0.41+0.09) 0.50(0.40+0.09) -9.1% 0.51(0.46+0.05) -7.3% 0.51(0.44+0.06) -7.3% 2000.20: git checkout -f - (sparse-v3) 0.06(0.09+0.03) 0.06(0.08+0.03) +0.0% 0.06(0.06+0.05) +0.0% 0.07(0.09+0.03) +16.7% 2000.21: git checkout -f - (sparse-v4) 0.06(0.08+0.04) 0.05(0.07+0.05) -16.7% 0.05(0.07+0.04) -16.7% 0.06(0.09+0.03) +0.0% ``` All of the above were already integrated. ``` 2000.22: git reset (full-v3) 0.41(0.32+0.06) 0.40(0.31+0.06) -2.4% 0.41(0.33+0.05) +0.0% 0.42(0.34+0.04) +2.4% 2000.23: git reset (full-v4) 0.37(0.32+0.05) 0.35(0.30+0.05) -5.4% 0.37(0.30+0.05) +0.0% 0.35(0.31+0.03) -5.4% 2000.24: git reset (sparse-v3) 0.68(0.65+0.05) 0.55(0.52+0.04) -19.1% 0.04(0.05+0.04) -94.1% 0.04(0.05+0.04) -94.1% 2000.25: git reset (sparse-v4) 0.70(0.65+0.05) 0.54(0.50+0.06) -22.9% 0.04(0.07+0.01) -94.3% 0.03(0.05+0.05) -95.7% 2000.26: git reset --hard (full-v3) 0.54(0.43+0.07) 0.53(0.43+0.06) -1.9% 0.55(0.46+0.05) +1.9% 0.55(0.44+0.06) +1.9% 2000.27: git reset --hard (full-v4) 0.50(0.45+0.03) 0.50(0.43+0.05) +0.0% 0.49(0.41+0.06) -2.0% 0.50(0.42+0.05) +0.0% 2000.28: git reset --hard (sparse-v3) 0.83(0.76+0.06) 0.68(0.62+0.05) -18.1% 0.07(0.05+0.02) -91.6% 0.07(0.05+0.02) -91.6% 2000.29: git reset --hard (sparse-v4) 0.80(0.75+0.05) 0.69(0.62+0.06) -13.8% 0.07(0.04+0.02) -91.2% 0.07(0.04+0.03) -91.2% ``` As expected, `git reset [--hard]` improves with the sparse index integration, but remains constant across the full index case. ``` 2000.30: git update-index --add --remove (full-v3) 0.03(0.01+0.01) 0.03(0.02+0.01) +0.0% 0.03(0.02+0.01) +0.0% 0.03(0.01+0.01) +0.0% 2000.31: git update-index --add --remove (full-v4) 0.03(0.02+0.01) 0.03(0.02+0.01) +0.0% 0.03(0.03+0.00) +0.0% 0.03(0.02+0.01) +0.0% 2000.32: git update-index --add --remove (sparse-v3) 0.57(0.54+0.02) 0.43(0.42+0.00) -24.6% 0.44(0.41+0.03) -22.8% 0.44(0.42+0.01) -22.8% 2000.33: git update-index --add --remove (sparse-v4) 0.56(0.52+0.04) 0.43(0.42+0.01) -23.2% 0.44(0.42+0.02) -21.4% 0.42(0.41+0.01) -25.0% ``` These do not change significantly because #423 is not merged. ``` 2000.34: git diff (full-v3) 0.07(0.05+0.03) 0.06(0.05+0.03) -14.3% 0.07(0.05+0.03) +0.0% 0.06(0.05+0.03) -14.3% 2000.35: git diff (full-v4) 0.06(0.05+0.03) 0.06(0.05+0.02) +0.0% 0.06(0.05+0.02) +0.0% 0.06(0.06+0.02) +0.0% 2000.36: git diff (sparse-v3) 0.25(0.23+0.03) 0.17(0.17+0.02) -32.0% 0.18(0.18+0.02) -28.0% 0.01(0.03+0.03) -96.0% 2000.37: git diff (sparse-v4) 0.25(0.22+0.05) 0.16(0.16+0.01) -36.0% 0.18(0.15+0.04) -28.0% 0.01(0.04+0.02) -96.0% 2000.38: git diff --staged (full-v3) 0.03(0.01+0.01) 0.03(0.02+0.01) +0.0% 0.03(0.02+0.01) +0.0% 0.03(0.02+0.00) +0.0% 2000.39: git diff --staged (full-v4) 0.04(0.03+0.01) 0.03(0.02+0.01) -25.0% 0.03(0.03+0.00) -25.0% 0.03(0.03+0.00) -25.0% 2000.40: git diff --staged (sparse-v3) 0.21(0.19+0.01) 0.15(0.13+0.01) -28.6% 0.15(0.14+0.01) -28.6% 0.01(0.01+0.00) -95.2% 2000.41: git diff --staged (sparse-v4) 0.22(0.21+0.01) 0.14(0.11+0.03) -36.4% 0.15(0.13+0.02) -31.8% 0.01(0.01+0.00) -95.5% ``` The `git diff` improvements are measurable. ``` 2000.42: git sparse-checkout reapply (full-v3) 0.63(0.54+0.05) 0.56(0.48+0.04) -11.1% 0.57(0.48+0.03) -9.5% 0.59(0.48+0.05) -6.3% 2000.43: git sparse-checkout reapply (full-v4) 0.60(0.54+0.02) 0.51(0.46+0.03) -15.0% 0.54(0.48+0.02) -10.0% 0.50(0.44+0.04) -16.7% 2000.44: git sparse-checkout reapply (sparse-v3) 0.91(0.86+0.05) 0.05(0.05+0.00) -94.5% 0.06(0.05+0.01) -93.4% 0.06(0.06+0.00) -93.4% 2000.45: git sparse-checkout reapply (sparse-v4) 0.92(0.88+0.04) 0.05(0.05+0.00) -94.6% 0.05(0.05+0.01) -94.6% 0.05(0.04+0.01) -94.6% ``` Finally, the `git sparse-checkout` measurements are also present. This test script is particularly valuable when contributing changes upstream. It can be good to start by adding the lines to the performance test in an early commit, then demonstrating the performance change by copying the necessary lines from the output table into your commit message.
derrickstolee
added a commit
that referenced
this pull request
Nov 4, 2021
…ckout` builtin This integrates the `sparse-checkout` builtin with the sparse index. The tricky part here is that we need to partially expand the index when we are modifying the sparse-checkout definition. Note that we modify the pattern list in a careful way: we create a `struct pattern_list` in-memory in `builtin/sparse-checkout.c` then apply those patterns to the index before writing the patterns to the sparse-checkout file. The `update_sparsity()` method does the work to assign the `SKIP_WORKTREE` bit appropriately, but this doesn't work if the files that are within the new sparse-checkout cone are still hidden behind a sparse directory. The new `expand_to_pattern_list()` method does the hard work of expanding the sparse directories that are now within the new patterns. This expands only as far as needed, possibly creating new sparse directory entries. This method does not contract existing files to sparse directories, and a big reason why is because of the check for ignored files as we delete those directories. The `clean_tracked_sparse_directories()` method is called after `update_sparsity()`, but we need to read the `A/B/.gitignore` file (or lack thereof) before we can delete `A/B/`. If we convert to sparse too quickly, then we lose this information and cause a full expansion. Most of the correctness is handled by existing tests in `t1092`, but I add checks for `ensure_not_expanded` in some hopefully interesting cases. As for performance, `git sparse-checkout set` can be slow if it needs to move a lot of files. However, no-op `git sparse-checkout set` (i.e. set the sparse-checkout cone to only include files at root, and do this on repeat) has these performance results on Linux in a monorepo with 2+ million files at `HEAD`: ``` Benchmark #1: baseline Time (mean ± σ): 10.465 s ± 0.018 s [User: 9.885 s, System: 0.573 s] Range (min … max): 10.450 s … 10.497 s 5 runs Benchmark #2: new code Time (mean ± σ): 68.9 ms ± 2.9 ms [User: 45.8 ms, System: 17.1 ms] Range (min … max): 63.4 ms … 74.0 ms 41 runs Summary 'new code' ran 151.89 ± 6.30 times faster than 'baseline' ```
derrickstolee
added a commit
that referenced
this pull request
Nov 4, 2021
One thing I forgot when talking about the sparse index is that we have a performance test: `t/perf/p2000-sparse-operations.sh`. This test wasn't helpful for commands like `git merge` that need a particular set of input, but work for more read-only operations. Here is a quick demonstration of how this performance test works so we could have a definitive measure of how your previous updates improved performance. To get these results, I ran the following command in `t/perf`: ``` ./run 4bcd533 f9255a5 f28fc01 b713582 -- p2000-sparse-operations.sh ``` The short-shas correspond to the merge commits for these PRs: * #410 * #421 * #417 * #419 The test takes a copy of the Git repository and creates several copies within a nested directory heirarchy. ``` Test 4bcd533 f9255a5 f28fc01 b713582 ------------------------------------------------------------------------------------------------------------------------------------------------- 2000.2: git status (full-v3) 0.19(0.15+0.05) 0.19(0.16+0.05) +0.0% 0.20(0.18+0.03) +5.3% 0.19(0.17+0.04) +0.0% 2000.3: git status (full-v4) 0.20(0.18+0.04) 0.19(0.15+0.06) -5.0% 0.21(0.18+0.05) +5.0% 0.18(0.18+0.02) -10.0% 2000.4: git status (sparse-v3) 0.04(0.04+0.04) 0.05(0.07+0.04) +25.0% 0.04(0.04+0.05) +0.0% 0.04(0.06+0.04) +0.0% 2000.5: git status (sparse-v4) 0.04(0.03+0.06) 0.04(0.05+0.05) +0.0% 0.05(0.05+0.04) +25.0% 0.05(0.06+0.04) +25.0% 2000.6: git add -A (full-v3) 0.36(0.29+0.05) 0.38(0.28+0.07) +5.6% 0.36(0.31+0.05) +0.0% 0.37(0.31+0.05) +2.8% 2000.7: git add -A (full-v4) 0.34(0.27+0.06) 0.34(0.29+0.05) +0.0% 0.34(0.29+0.04) +0.0% 0.35(0.28+0.06) +2.9% 2000.8: git add -A (sparse-v3) 0.06(0.07+0.04) 0.06(0.05+0.06) +0.0% 0.06(0.09+0.01) +0.0% 0.06(0.08+0.03) +0.0% 2000.9: git add -A (sparse-v4) 0.05(0.05+0.04) 0.05(0.05+0.07) +0.0% 0.05(0.04+0.06) +0.0% 0.06(0.06+0.05) +20.0% 2000.10: git add . (full-v3) 0.38(0.31+0.05) 0.37(0.29+0.06) -2.6% 0.37(0.30+0.07) -2.6% 0.37(0.29+0.06) -2.6% 2000.11: git add . (full-v4) 0.35(0.31+0.04) 0.35(0.29+0.07) +0.0% 0.35(0.29+0.05) +0.0% 0.34(0.29+0.06) -2.9% 2000.12: git add . (sparse-v3) 0.06(0.06+0.05) 0.06(0.05+0.06) +0.0% 0.06(0.07+0.05) +0.0% 0.06(0.09+0.03) +0.0% 2000.13: git add . (sparse-v4) 0.06(0.06+0.06) 0.06(0.07+0.04) +0.0% 0.05(0.06+0.05) -16.7% 0.05(0.05+0.07) -16.7% 2000.14: git commit -a -m A (full-v3) 0.48(0.37+0.08) 0.45(0.36+0.08) -6.2% 0.45(0.35+0.09) -6.2% 0.44(0.36+0.07) -8.3% 2000.15: git commit -a -m A (full-v4) 0.45(0.40+0.06) 0.43(0.34+0.07) -4.4% 0.45(0.37+0.06) +0.0% 0.42(0.36+0.05) -6.7% 2000.16: git commit -a -m A (sparse-v3) 0.05(0.05+0.06) 0.05(0.05+0.03) +0.0% 0.05(0.06+0.06) +0.0% 0.05(0.04+0.06) +0.0% 2000.17: git commit -a -m A (sparse-v4) 0.05(0.06+0.03) 0.05(0.06+0.04) +0.0% 0.06(0.07+0.05) +20.0% 0.05(0.04+0.06) +0.0% 2000.18: git checkout -f - (full-v3) 0.55(0.43+0.08) 0.54(0.46+0.05) -1.8% 0.55(0.46+0.07) +0.0% 0.54(0.40+0.10) -1.8% 2000.19: git checkout -f - (full-v4) 0.55(0.41+0.09) 0.50(0.40+0.09) -9.1% 0.51(0.46+0.05) -7.3% 0.51(0.44+0.06) -7.3% 2000.20: git checkout -f - (sparse-v3) 0.06(0.09+0.03) 0.06(0.08+0.03) +0.0% 0.06(0.06+0.05) +0.0% 0.07(0.09+0.03) +16.7% 2000.21: git checkout -f - (sparse-v4) 0.06(0.08+0.04) 0.05(0.07+0.05) -16.7% 0.05(0.07+0.04) -16.7% 0.06(0.09+0.03) +0.0% ``` All of the above were already integrated. ``` 2000.22: git reset (full-v3) 0.41(0.32+0.06) 0.40(0.31+0.06) -2.4% 0.41(0.33+0.05) +0.0% 0.42(0.34+0.04) +2.4% 2000.23: git reset (full-v4) 0.37(0.32+0.05) 0.35(0.30+0.05) -5.4% 0.37(0.30+0.05) +0.0% 0.35(0.31+0.03) -5.4% 2000.24: git reset (sparse-v3) 0.68(0.65+0.05) 0.55(0.52+0.04) -19.1% 0.04(0.05+0.04) -94.1% 0.04(0.05+0.04) -94.1% 2000.25: git reset (sparse-v4) 0.70(0.65+0.05) 0.54(0.50+0.06) -22.9% 0.04(0.07+0.01) -94.3% 0.03(0.05+0.05) -95.7% 2000.26: git reset --hard (full-v3) 0.54(0.43+0.07) 0.53(0.43+0.06) -1.9% 0.55(0.46+0.05) +1.9% 0.55(0.44+0.06) +1.9% 2000.27: git reset --hard (full-v4) 0.50(0.45+0.03) 0.50(0.43+0.05) +0.0% 0.49(0.41+0.06) -2.0% 0.50(0.42+0.05) +0.0% 2000.28: git reset --hard (sparse-v3) 0.83(0.76+0.06) 0.68(0.62+0.05) -18.1% 0.07(0.05+0.02) -91.6% 0.07(0.05+0.02) -91.6% 2000.29: git reset --hard (sparse-v4) 0.80(0.75+0.05) 0.69(0.62+0.06) -13.8% 0.07(0.04+0.02) -91.2% 0.07(0.04+0.03) -91.2% ``` As expected, `git reset [--hard]` improves with the sparse index integration, but remains constant across the full index case. ``` 2000.30: git update-index --add --remove (full-v3) 0.03(0.01+0.01) 0.03(0.02+0.01) +0.0% 0.03(0.02+0.01) +0.0% 0.03(0.01+0.01) +0.0% 2000.31: git update-index --add --remove (full-v4) 0.03(0.02+0.01) 0.03(0.02+0.01) +0.0% 0.03(0.03+0.00) +0.0% 0.03(0.02+0.01) +0.0% 2000.32: git update-index --add --remove (sparse-v3) 0.57(0.54+0.02) 0.43(0.42+0.00) -24.6% 0.44(0.41+0.03) -22.8% 0.44(0.42+0.01) -22.8% 2000.33: git update-index --add --remove (sparse-v4) 0.56(0.52+0.04) 0.43(0.42+0.01) -23.2% 0.44(0.42+0.02) -21.4% 0.42(0.41+0.01) -25.0% ``` These do not change significantly because #423 is not merged. ``` 2000.34: git diff (full-v3) 0.07(0.05+0.03) 0.06(0.05+0.03) -14.3% 0.07(0.05+0.03) +0.0% 0.06(0.05+0.03) -14.3% 2000.35: git diff (full-v4) 0.06(0.05+0.03) 0.06(0.05+0.02) +0.0% 0.06(0.05+0.02) +0.0% 0.06(0.06+0.02) +0.0% 2000.36: git diff (sparse-v3) 0.25(0.23+0.03) 0.17(0.17+0.02) -32.0% 0.18(0.18+0.02) -28.0% 0.01(0.03+0.03) -96.0% 2000.37: git diff (sparse-v4) 0.25(0.22+0.05) 0.16(0.16+0.01) -36.0% 0.18(0.15+0.04) -28.0% 0.01(0.04+0.02) -96.0% 2000.38: git diff --staged (full-v3) 0.03(0.01+0.01) 0.03(0.02+0.01) +0.0% 0.03(0.02+0.01) +0.0% 0.03(0.02+0.00) +0.0% 2000.39: git diff --staged (full-v4) 0.04(0.03+0.01) 0.03(0.02+0.01) -25.0% 0.03(0.03+0.00) -25.0% 0.03(0.03+0.00) -25.0% 2000.40: git diff --staged (sparse-v3) 0.21(0.19+0.01) 0.15(0.13+0.01) -28.6% 0.15(0.14+0.01) -28.6% 0.01(0.01+0.00) -95.2% 2000.41: git diff --staged (sparse-v4) 0.22(0.21+0.01) 0.14(0.11+0.03) -36.4% 0.15(0.13+0.02) -31.8% 0.01(0.01+0.00) -95.5% ``` The `git diff` improvements are measurable. ``` 2000.42: git sparse-checkout reapply (full-v3) 0.63(0.54+0.05) 0.56(0.48+0.04) -11.1% 0.57(0.48+0.03) -9.5% 0.59(0.48+0.05) -6.3% 2000.43: git sparse-checkout reapply (full-v4) 0.60(0.54+0.02) 0.51(0.46+0.03) -15.0% 0.54(0.48+0.02) -10.0% 0.50(0.44+0.04) -16.7% 2000.44: git sparse-checkout reapply (sparse-v3) 0.91(0.86+0.05) 0.05(0.05+0.00) -94.5% 0.06(0.05+0.01) -93.4% 0.06(0.06+0.00) -93.4% 2000.45: git sparse-checkout reapply (sparse-v4) 0.92(0.88+0.04) 0.05(0.05+0.00) -94.6% 0.05(0.05+0.01) -94.6% 0.05(0.04+0.01) -94.6% ``` Finally, the `git sparse-checkout` measurements are also present. This test script is particularly valuable when contributing changes upstream. It can be good to start by adding the lines to the performance test in an early commit, then demonstrating the performance change by copying the necessary lines from the output table into your commit message.
ldennington
pushed a commit
to ldennington/git
that referenced
this pull request
Jan 25, 2022
…parse-checkout` builtin This integrates the `sparse-checkout` builtin with the sparse index. The tricky part here is that we need to partially expand the index when we are modifying the sparse-checkout definition. Note that we modify the pattern list in a careful way: we create a `struct pattern_list` in-memory in `builtin/sparse-checkout.c` then apply those patterns to the index before writing the patterns to the sparse-checkout file. The `update_sparsity()` method does the work to assign the `SKIP_WORKTREE` bit appropriately, but this doesn't work if the files that are within the new sparse-checkout cone are still hidden behind a sparse directory. The new `expand_to_pattern_list()` method does the hard work of expanding the sparse directories that are now within the new patterns. This expands only as far as needed, possibly creating new sparse directory entries. This method does not contract existing files to sparse directories, and a big reason why is because of the check for ignored files as we delete those directories. The `clean_tracked_sparse_directories()` method is called after `update_sparsity()`, but we need to read the `A/B/.gitignore` file (or lack thereof) before we can delete `A/B/`. If we convert to sparse too quickly, then we lose this information and cause a full expansion. Most of the correctness is handled by existing tests in `t1092`, but I add checks for `ensure_not_expanded` in some hopefully interesting cases. As for performance, `git sparse-checkout set` can be slow if it needs to move a lot of files. However, no-op `git sparse-checkout set` (i.e. set the sparse-checkout cone to only include files at root, and do this on repeat) has these performance results on Linux in a monorepo with 2+ million files at `HEAD`: ``` Benchmark microsoft#1: baseline Time (mean ± σ): 10.465 s ± 0.018 s [User: 9.885 s, System: 0.573 s] Range (min … max): 10.450 s … 10.497 s 5 runs Benchmark microsoft#2: new code Time (mean ± σ): 68.9 ms ± 2.9 ms [User: 45.8 ms, System: 17.1 ms] Range (min … max): 63.4 ms … 74.0 ms 41 runs Summary 'new code' ran 151.89 ± 6.30 times faster than 'baseline' ```
ldennington
pushed a commit
to ldennington/git
that referenced
this pull request
Jan 25, 2022
…tests One thing I forgot when talking about the sparse index is that we have a performance test: `t/perf/p2000-sparse-operations.sh`. This test wasn't helpful for commands like `git merge` that need a particular set of input, but work for more read-only operations. Here is a quick demonstration of how this performance test works so we could have a definitive measure of how your previous updates improved performance. To get these results, I ran the following command in `t/perf`: ``` ./run 4bcd533 f9255a5 f28fc01 b713582 -- p2000-sparse-operations.sh ``` The short-shas correspond to the merge commits for these PRs: * microsoft#410 * microsoft#421 * microsoft#417 * microsoft#419 The test takes a copy of the Git repository and creates several copies within a nested directory heirarchy. ``` Test 4bcd533 f9255a5 f28fc01 b713582 ------------------------------------------------------------------------------------------------------------------------------------------------- 2000.2: git status (full-v3) 0.19(0.15+0.05) 0.19(0.16+0.05) +0.0% 0.20(0.18+0.03) +5.3% 0.19(0.17+0.04) +0.0% 2000.3: git status (full-v4) 0.20(0.18+0.04) 0.19(0.15+0.06) -5.0% 0.21(0.18+0.05) +5.0% 0.18(0.18+0.02) -10.0% 2000.4: git status (sparse-v3) 0.04(0.04+0.04) 0.05(0.07+0.04) +25.0% 0.04(0.04+0.05) +0.0% 0.04(0.06+0.04) +0.0% 2000.5: git status (sparse-v4) 0.04(0.03+0.06) 0.04(0.05+0.05) +0.0% 0.05(0.05+0.04) +25.0% 0.05(0.06+0.04) +25.0% 2000.6: git add -A (full-v3) 0.36(0.29+0.05) 0.38(0.28+0.07) +5.6% 0.36(0.31+0.05) +0.0% 0.37(0.31+0.05) +2.8% 2000.7: git add -A (full-v4) 0.34(0.27+0.06) 0.34(0.29+0.05) +0.0% 0.34(0.29+0.04) +0.0% 0.35(0.28+0.06) +2.9% 2000.8: git add -A (sparse-v3) 0.06(0.07+0.04) 0.06(0.05+0.06) +0.0% 0.06(0.09+0.01) +0.0% 0.06(0.08+0.03) +0.0% 2000.9: git add -A (sparse-v4) 0.05(0.05+0.04) 0.05(0.05+0.07) +0.0% 0.05(0.04+0.06) +0.0% 0.06(0.06+0.05) +20.0% 2000.10: git add . (full-v3) 0.38(0.31+0.05) 0.37(0.29+0.06) -2.6% 0.37(0.30+0.07) -2.6% 0.37(0.29+0.06) -2.6% 2000.11: git add . (full-v4) 0.35(0.31+0.04) 0.35(0.29+0.07) +0.0% 0.35(0.29+0.05) +0.0% 0.34(0.29+0.06) -2.9% 2000.12: git add . (sparse-v3) 0.06(0.06+0.05) 0.06(0.05+0.06) +0.0% 0.06(0.07+0.05) +0.0% 0.06(0.09+0.03) +0.0% 2000.13: git add . (sparse-v4) 0.06(0.06+0.06) 0.06(0.07+0.04) +0.0% 0.05(0.06+0.05) -16.7% 0.05(0.05+0.07) -16.7% 2000.14: git commit -a -m A (full-v3) 0.48(0.37+0.08) 0.45(0.36+0.08) -6.2% 0.45(0.35+0.09) -6.2% 0.44(0.36+0.07) -8.3% 2000.15: git commit -a -m A (full-v4) 0.45(0.40+0.06) 0.43(0.34+0.07) -4.4% 0.45(0.37+0.06) +0.0% 0.42(0.36+0.05) -6.7% 2000.16: git commit -a -m A (sparse-v3) 0.05(0.05+0.06) 0.05(0.05+0.03) +0.0% 0.05(0.06+0.06) +0.0% 0.05(0.04+0.06) +0.0% 2000.17: git commit -a -m A (sparse-v4) 0.05(0.06+0.03) 0.05(0.06+0.04) +0.0% 0.06(0.07+0.05) +20.0% 0.05(0.04+0.06) +0.0% 2000.18: git checkout -f - (full-v3) 0.55(0.43+0.08) 0.54(0.46+0.05) -1.8% 0.55(0.46+0.07) +0.0% 0.54(0.40+0.10) -1.8% 2000.19: git checkout -f - (full-v4) 0.55(0.41+0.09) 0.50(0.40+0.09) -9.1% 0.51(0.46+0.05) -7.3% 0.51(0.44+0.06) -7.3% 2000.20: git checkout -f - (sparse-v3) 0.06(0.09+0.03) 0.06(0.08+0.03) +0.0% 0.06(0.06+0.05) +0.0% 0.07(0.09+0.03) +16.7% 2000.21: git checkout -f - (sparse-v4) 0.06(0.08+0.04) 0.05(0.07+0.05) -16.7% 0.05(0.07+0.04) -16.7% 0.06(0.09+0.03) +0.0% ``` All of the above were already integrated. ``` 2000.22: git reset (full-v3) 0.41(0.32+0.06) 0.40(0.31+0.06) -2.4% 0.41(0.33+0.05) +0.0% 0.42(0.34+0.04) +2.4% 2000.23: git reset (full-v4) 0.37(0.32+0.05) 0.35(0.30+0.05) -5.4% 0.37(0.30+0.05) +0.0% 0.35(0.31+0.03) -5.4% 2000.24: git reset (sparse-v3) 0.68(0.65+0.05) 0.55(0.52+0.04) -19.1% 0.04(0.05+0.04) -94.1% 0.04(0.05+0.04) -94.1% 2000.25: git reset (sparse-v4) 0.70(0.65+0.05) 0.54(0.50+0.06) -22.9% 0.04(0.07+0.01) -94.3% 0.03(0.05+0.05) -95.7% 2000.26: git reset --hard (full-v3) 0.54(0.43+0.07) 0.53(0.43+0.06) -1.9% 0.55(0.46+0.05) +1.9% 0.55(0.44+0.06) +1.9% 2000.27: git reset --hard (full-v4) 0.50(0.45+0.03) 0.50(0.43+0.05) +0.0% 0.49(0.41+0.06) -2.0% 0.50(0.42+0.05) +0.0% 2000.28: git reset --hard (sparse-v3) 0.83(0.76+0.06) 0.68(0.62+0.05) -18.1% 0.07(0.05+0.02) -91.6% 0.07(0.05+0.02) -91.6% 2000.29: git reset --hard (sparse-v4) 0.80(0.75+0.05) 0.69(0.62+0.06) -13.8% 0.07(0.04+0.02) -91.2% 0.07(0.04+0.03) -91.2% ``` As expected, `git reset [--hard]` improves with the sparse index integration, but remains constant across the full index case. ``` 2000.30: git update-index --add --remove (full-v3) 0.03(0.01+0.01) 0.03(0.02+0.01) +0.0% 0.03(0.02+0.01) +0.0% 0.03(0.01+0.01) +0.0% 2000.31: git update-index --add --remove (full-v4) 0.03(0.02+0.01) 0.03(0.02+0.01) +0.0% 0.03(0.03+0.00) +0.0% 0.03(0.02+0.01) +0.0% 2000.32: git update-index --add --remove (sparse-v3) 0.57(0.54+0.02) 0.43(0.42+0.00) -24.6% 0.44(0.41+0.03) -22.8% 0.44(0.42+0.01) -22.8% 2000.33: git update-index --add --remove (sparse-v4) 0.56(0.52+0.04) 0.43(0.42+0.01) -23.2% 0.44(0.42+0.02) -21.4% 0.42(0.41+0.01) -25.0% ``` These do not change significantly because microsoft#423 is not merged. ``` 2000.34: git diff (full-v3) 0.07(0.05+0.03) 0.06(0.05+0.03) -14.3% 0.07(0.05+0.03) +0.0% 0.06(0.05+0.03) -14.3% 2000.35: git diff (full-v4) 0.06(0.05+0.03) 0.06(0.05+0.02) +0.0% 0.06(0.05+0.02) +0.0% 0.06(0.06+0.02) +0.0% 2000.36: git diff (sparse-v3) 0.25(0.23+0.03) 0.17(0.17+0.02) -32.0% 0.18(0.18+0.02) -28.0% 0.01(0.03+0.03) -96.0% 2000.37: git diff (sparse-v4) 0.25(0.22+0.05) 0.16(0.16+0.01) -36.0% 0.18(0.15+0.04) -28.0% 0.01(0.04+0.02) -96.0% 2000.38: git diff --staged (full-v3) 0.03(0.01+0.01) 0.03(0.02+0.01) +0.0% 0.03(0.02+0.01) +0.0% 0.03(0.02+0.00) +0.0% 2000.39: git diff --staged (full-v4) 0.04(0.03+0.01) 0.03(0.02+0.01) -25.0% 0.03(0.03+0.00) -25.0% 0.03(0.03+0.00) -25.0% 2000.40: git diff --staged (sparse-v3) 0.21(0.19+0.01) 0.15(0.13+0.01) -28.6% 0.15(0.14+0.01) -28.6% 0.01(0.01+0.00) -95.2% 2000.41: git diff --staged (sparse-v4) 0.22(0.21+0.01) 0.14(0.11+0.03) -36.4% 0.15(0.13+0.02) -31.8% 0.01(0.01+0.00) -95.5% ``` The `git diff` improvements are measurable. ``` 2000.42: git sparse-checkout reapply (full-v3) 0.63(0.54+0.05) 0.56(0.48+0.04) -11.1% 0.57(0.48+0.03) -9.5% 0.59(0.48+0.05) -6.3% 2000.43: git sparse-checkout reapply (full-v4) 0.60(0.54+0.02) 0.51(0.46+0.03) -15.0% 0.54(0.48+0.02) -10.0% 0.50(0.44+0.04) -16.7% 2000.44: git sparse-checkout reapply (sparse-v3) 0.91(0.86+0.05) 0.05(0.05+0.00) -94.5% 0.06(0.05+0.01) -93.4% 0.06(0.06+0.00) -93.4% 2000.45: git sparse-checkout reapply (sparse-v4) 0.92(0.88+0.04) 0.05(0.05+0.00) -94.6% 0.05(0.05+0.01) -94.6% 0.05(0.04+0.01) -94.6% ``` Finally, the `git sparse-checkout` measurements are also present. This test script is particularly valuable when contributing changes upstream. It can be good to start by adding the lines to the performance test in an early commit, then demonstrating the performance change by copying the necessary lines from the output table into your commit message.
dscho
pushed a commit
that referenced
this pull request
Feb 1, 2022
…ckout` builtin This integrates the `sparse-checkout` builtin with the sparse index. The tricky part here is that we need to partially expand the index when we are modifying the sparse-checkout definition. Note that we modify the pattern list in a careful way: we create a `struct pattern_list` in-memory in `builtin/sparse-checkout.c` then apply those patterns to the index before writing the patterns to the sparse-checkout file. The `update_sparsity()` method does the work to assign the `SKIP_WORKTREE` bit appropriately, but this doesn't work if the files that are within the new sparse-checkout cone are still hidden behind a sparse directory. The new `expand_to_pattern_list()` method does the hard work of expanding the sparse directories that are now within the new patterns. This expands only as far as needed, possibly creating new sparse directory entries. This method does not contract existing files to sparse directories, and a big reason why is because of the check for ignored files as we delete those directories. The `clean_tracked_sparse_directories()` method is called after `update_sparsity()`, but we need to read the `A/B/.gitignore` file (or lack thereof) before we can delete `A/B/`. If we convert to sparse too quickly, then we lose this information and cause a full expansion. Most of the correctness is handled by existing tests in `t1092`, but I add checks for `ensure_not_expanded` in some hopefully interesting cases. As for performance, `git sparse-checkout set` can be slow if it needs to move a lot of files. However, no-op `git sparse-checkout set` (i.e. set the sparse-checkout cone to only include files at root, and do this on repeat) has these performance results on Linux in a monorepo with 2+ million files at `HEAD`: ``` Benchmark #1: baseline Time (mean ± σ): 10.465 s ± 0.018 s [User: 9.885 s, System: 0.573 s] Range (min … max): 10.450 s … 10.497 s 5 runs Benchmark #2: new code Time (mean ± σ): 68.9 ms ± 2.9 ms [User: 45.8 ms, System: 17.1 ms] Range (min … max): 63.4 ms … 74.0 ms 41 runs Summary 'new code' ran 151.89 ± 6.30 times faster than 'baseline' ```
dscho
pushed a commit
that referenced
this pull request
Feb 1, 2022
One thing I forgot when talking about the sparse index is that we have a performance test: `t/perf/p2000-sparse-operations.sh`. This test wasn't helpful for commands like `git merge` that need a particular set of input, but work for more read-only operations. Here is a quick demonstration of how this performance test works so we could have a definitive measure of how your previous updates improved performance. To get these results, I ran the following command in `t/perf`: ``` ./run 4bcd533 f9255a5 f28fc01 b713582 -- p2000-sparse-operations.sh ``` The short-shas correspond to the merge commits for these PRs: * #410 * #421 * #417 * #419 The test takes a copy of the Git repository and creates several copies within a nested directory heirarchy. ``` Test 4bcd533 f9255a5 f28fc01 b713582 ------------------------------------------------------------------------------------------------------------------------------------------------- 2000.2: git status (full-v3) 0.19(0.15+0.05) 0.19(0.16+0.05) +0.0% 0.20(0.18+0.03) +5.3% 0.19(0.17+0.04) +0.0% 2000.3: git status (full-v4) 0.20(0.18+0.04) 0.19(0.15+0.06) -5.0% 0.21(0.18+0.05) +5.0% 0.18(0.18+0.02) -10.0% 2000.4: git status (sparse-v3) 0.04(0.04+0.04) 0.05(0.07+0.04) +25.0% 0.04(0.04+0.05) +0.0% 0.04(0.06+0.04) +0.0% 2000.5: git status (sparse-v4) 0.04(0.03+0.06) 0.04(0.05+0.05) +0.0% 0.05(0.05+0.04) +25.0% 0.05(0.06+0.04) +25.0% 2000.6: git add -A (full-v3) 0.36(0.29+0.05) 0.38(0.28+0.07) +5.6% 0.36(0.31+0.05) +0.0% 0.37(0.31+0.05) +2.8% 2000.7: git add -A (full-v4) 0.34(0.27+0.06) 0.34(0.29+0.05) +0.0% 0.34(0.29+0.04) +0.0% 0.35(0.28+0.06) +2.9% 2000.8: git add -A (sparse-v3) 0.06(0.07+0.04) 0.06(0.05+0.06) +0.0% 0.06(0.09+0.01) +0.0% 0.06(0.08+0.03) +0.0% 2000.9: git add -A (sparse-v4) 0.05(0.05+0.04) 0.05(0.05+0.07) +0.0% 0.05(0.04+0.06) +0.0% 0.06(0.06+0.05) +20.0% 2000.10: git add . (full-v3) 0.38(0.31+0.05) 0.37(0.29+0.06) -2.6% 0.37(0.30+0.07) -2.6% 0.37(0.29+0.06) -2.6% 2000.11: git add . (full-v4) 0.35(0.31+0.04) 0.35(0.29+0.07) +0.0% 0.35(0.29+0.05) +0.0% 0.34(0.29+0.06) -2.9% 2000.12: git add . (sparse-v3) 0.06(0.06+0.05) 0.06(0.05+0.06) +0.0% 0.06(0.07+0.05) +0.0% 0.06(0.09+0.03) +0.0% 2000.13: git add . (sparse-v4) 0.06(0.06+0.06) 0.06(0.07+0.04) +0.0% 0.05(0.06+0.05) -16.7% 0.05(0.05+0.07) -16.7% 2000.14: git commit -a -m A (full-v3) 0.48(0.37+0.08) 0.45(0.36+0.08) -6.2% 0.45(0.35+0.09) -6.2% 0.44(0.36+0.07) -8.3% 2000.15: git commit -a -m A (full-v4) 0.45(0.40+0.06) 0.43(0.34+0.07) -4.4% 0.45(0.37+0.06) +0.0% 0.42(0.36+0.05) -6.7% 2000.16: git commit -a -m A (sparse-v3) 0.05(0.05+0.06) 0.05(0.05+0.03) +0.0% 0.05(0.06+0.06) +0.0% 0.05(0.04+0.06) +0.0% 2000.17: git commit -a -m A (sparse-v4) 0.05(0.06+0.03) 0.05(0.06+0.04) +0.0% 0.06(0.07+0.05) +20.0% 0.05(0.04+0.06) +0.0% 2000.18: git checkout -f - (full-v3) 0.55(0.43+0.08) 0.54(0.46+0.05) -1.8% 0.55(0.46+0.07) +0.0% 0.54(0.40+0.10) -1.8% 2000.19: git checkout -f - (full-v4) 0.55(0.41+0.09) 0.50(0.40+0.09) -9.1% 0.51(0.46+0.05) -7.3% 0.51(0.44+0.06) -7.3% 2000.20: git checkout -f - (sparse-v3) 0.06(0.09+0.03) 0.06(0.08+0.03) +0.0% 0.06(0.06+0.05) +0.0% 0.07(0.09+0.03) +16.7% 2000.21: git checkout -f - (sparse-v4) 0.06(0.08+0.04) 0.05(0.07+0.05) -16.7% 0.05(0.07+0.04) -16.7% 0.06(0.09+0.03) +0.0% ``` All of the above were already integrated. ``` 2000.22: git reset (full-v3) 0.41(0.32+0.06) 0.40(0.31+0.06) -2.4% 0.41(0.33+0.05) +0.0% 0.42(0.34+0.04) +2.4% 2000.23: git reset (full-v4) 0.37(0.32+0.05) 0.35(0.30+0.05) -5.4% 0.37(0.30+0.05) +0.0% 0.35(0.31+0.03) -5.4% 2000.24: git reset (sparse-v3) 0.68(0.65+0.05) 0.55(0.52+0.04) -19.1% 0.04(0.05+0.04) -94.1% 0.04(0.05+0.04) -94.1% 2000.25: git reset (sparse-v4) 0.70(0.65+0.05) 0.54(0.50+0.06) -22.9% 0.04(0.07+0.01) -94.3% 0.03(0.05+0.05) -95.7% 2000.26: git reset --hard (full-v3) 0.54(0.43+0.07) 0.53(0.43+0.06) -1.9% 0.55(0.46+0.05) +1.9% 0.55(0.44+0.06) +1.9% 2000.27: git reset --hard (full-v4) 0.50(0.45+0.03) 0.50(0.43+0.05) +0.0% 0.49(0.41+0.06) -2.0% 0.50(0.42+0.05) +0.0% 2000.28: git reset --hard (sparse-v3) 0.83(0.76+0.06) 0.68(0.62+0.05) -18.1% 0.07(0.05+0.02) -91.6% 0.07(0.05+0.02) -91.6% 2000.29: git reset --hard (sparse-v4) 0.80(0.75+0.05) 0.69(0.62+0.06) -13.8% 0.07(0.04+0.02) -91.2% 0.07(0.04+0.03) -91.2% ``` As expected, `git reset [--hard]` improves with the sparse index integration, but remains constant across the full index case. ``` 2000.30: git update-index --add --remove (full-v3) 0.03(0.01+0.01) 0.03(0.02+0.01) +0.0% 0.03(0.02+0.01) +0.0% 0.03(0.01+0.01) +0.0% 2000.31: git update-index --add --remove (full-v4) 0.03(0.02+0.01) 0.03(0.02+0.01) +0.0% 0.03(0.03+0.00) +0.0% 0.03(0.02+0.01) +0.0% 2000.32: git update-index --add --remove (sparse-v3) 0.57(0.54+0.02) 0.43(0.42+0.00) -24.6% 0.44(0.41+0.03) -22.8% 0.44(0.42+0.01) -22.8% 2000.33: git update-index --add --remove (sparse-v4) 0.56(0.52+0.04) 0.43(0.42+0.01) -23.2% 0.44(0.42+0.02) -21.4% 0.42(0.41+0.01) -25.0% ``` These do not change significantly because #423 is not merged. ``` 2000.34: git diff (full-v3) 0.07(0.05+0.03) 0.06(0.05+0.03) -14.3% 0.07(0.05+0.03) +0.0% 0.06(0.05+0.03) -14.3% 2000.35: git diff (full-v4) 0.06(0.05+0.03) 0.06(0.05+0.02) +0.0% 0.06(0.05+0.02) +0.0% 0.06(0.06+0.02) +0.0% 2000.36: git diff (sparse-v3) 0.25(0.23+0.03) 0.17(0.17+0.02) -32.0% 0.18(0.18+0.02) -28.0% 0.01(0.03+0.03) -96.0% 2000.37: git diff (sparse-v4) 0.25(0.22+0.05) 0.16(0.16+0.01) -36.0% 0.18(0.15+0.04) -28.0% 0.01(0.04+0.02) -96.0% 2000.38: git diff --staged (full-v3) 0.03(0.01+0.01) 0.03(0.02+0.01) +0.0% 0.03(0.02+0.01) +0.0% 0.03(0.02+0.00) +0.0% 2000.39: git diff --staged (full-v4) 0.04(0.03+0.01) 0.03(0.02+0.01) -25.0% 0.03(0.03+0.00) -25.0% 0.03(0.03+0.00) -25.0% 2000.40: git diff --staged (sparse-v3) 0.21(0.19+0.01) 0.15(0.13+0.01) -28.6% 0.15(0.14+0.01) -28.6% 0.01(0.01+0.00) -95.2% 2000.41: git diff --staged (sparse-v4) 0.22(0.21+0.01) 0.14(0.11+0.03) -36.4% 0.15(0.13+0.02) -31.8% 0.01(0.01+0.00) -95.5% ``` The `git diff` improvements are measurable. ``` 2000.42: git sparse-checkout reapply (full-v3) 0.63(0.54+0.05) 0.56(0.48+0.04) -11.1% 0.57(0.48+0.03) -9.5% 0.59(0.48+0.05) -6.3% 2000.43: git sparse-checkout reapply (full-v4) 0.60(0.54+0.02) 0.51(0.46+0.03) -15.0% 0.54(0.48+0.02) -10.0% 0.50(0.44+0.04) -16.7% 2000.44: git sparse-checkout reapply (sparse-v3) 0.91(0.86+0.05) 0.05(0.05+0.00) -94.5% 0.06(0.05+0.01) -93.4% 0.06(0.06+0.00) -93.4% 2000.45: git sparse-checkout reapply (sparse-v4) 0.92(0.88+0.04) 0.05(0.05+0.00) -94.6% 0.05(0.05+0.01) -94.6% 0.05(0.04+0.01) -94.6% ``` Finally, the `git sparse-checkout` measurements are also present. This test script is particularly valuable when contributing changes upstream. It can be good to start by adding the lines to the performance test in an early commit, then demonstrating the performance change by copying the necessary lines from the output table into your commit message.
dscho
pushed a commit
that referenced
this pull request
Jun 17, 2022
…ckout` builtin This integrates the `sparse-checkout` builtin with the sparse index. The tricky part here is that we need to partially expand the index when we are modifying the sparse-checkout definition. Note that we modify the pattern list in a careful way: we create a `struct pattern_list` in-memory in `builtin/sparse-checkout.c` then apply those patterns to the index before writing the patterns to the sparse-checkout file. The `update_sparsity()` method does the work to assign the `SKIP_WORKTREE` bit appropriately, but this doesn't work if the files that are within the new sparse-checkout cone are still hidden behind a sparse directory. The new `expand_to_pattern_list()` method does the hard work of expanding the sparse directories that are now within the new patterns. This expands only as far as needed, possibly creating new sparse directory entries. This method does not contract existing files to sparse directories, and a big reason why is because of the check for ignored files as we delete those directories. The `clean_tracked_sparse_directories()` method is called after `update_sparsity()`, but we need to read the `A/B/.gitignore` file (or lack thereof) before we can delete `A/B/`. If we convert to sparse too quickly, then we lose this information and cause a full expansion. Most of the correctness is handled by existing tests in `t1092`, but I add checks for `ensure_not_expanded` in some hopefully interesting cases. As for performance, `git sparse-checkout set` can be slow if it needs to move a lot of files. However, no-op `git sparse-checkout set` (i.e. set the sparse-checkout cone to only include files at root, and do this on repeat) has these performance results on Linux in a monorepo with 2+ million files at `HEAD`: ``` Benchmark #1: baseline Time (mean ± σ): 10.465 s ± 0.018 s [User: 9.885 s, System: 0.573 s] Range (min … max): 10.450 s … 10.497 s 5 runs Benchmark #2: new code Time (mean ± σ): 68.9 ms ± 2.9 ms [User: 45.8 ms, System: 17.1 ms] Range (min … max): 63.4 ms … 74.0 ms 41 runs Summary 'new code' ran 151.89 ± 6.30 times faster than 'baseline' ```
dscho
pushed a commit
that referenced
this pull request
Jun 17, 2022
One thing I forgot when talking about the sparse index is that we have a performance test: `t/perf/p2000-sparse-operations.sh`. This test wasn't helpful for commands like `git merge` that need a particular set of input, but work for more read-only operations. Here is a quick demonstration of how this performance test works so we could have a definitive measure of how your previous updates improved performance. To get these results, I ran the following command in `t/perf`: ``` ./run 4bcd533 f9255a5 f28fc01 b713582 -- p2000-sparse-operations.sh ``` The short-shas correspond to the merge commits for these PRs: * #410 * #421 * #417 * #419 The test takes a copy of the Git repository and creates several copies within a nested directory heirarchy. ``` Test 4bcd533 f9255a5 f28fc01 b713582 ------------------------------------------------------------------------------------------------------------------------------------------------- 2000.2: git status (full-v3) 0.19(0.15+0.05) 0.19(0.16+0.05) +0.0% 0.20(0.18+0.03) +5.3% 0.19(0.17+0.04) +0.0% 2000.3: git status (full-v4) 0.20(0.18+0.04) 0.19(0.15+0.06) -5.0% 0.21(0.18+0.05) +5.0% 0.18(0.18+0.02) -10.0% 2000.4: git status (sparse-v3) 0.04(0.04+0.04) 0.05(0.07+0.04) +25.0% 0.04(0.04+0.05) +0.0% 0.04(0.06+0.04) +0.0% 2000.5: git status (sparse-v4) 0.04(0.03+0.06) 0.04(0.05+0.05) +0.0% 0.05(0.05+0.04) +25.0% 0.05(0.06+0.04) +25.0% 2000.6: git add -A (full-v3) 0.36(0.29+0.05) 0.38(0.28+0.07) +5.6% 0.36(0.31+0.05) +0.0% 0.37(0.31+0.05) +2.8% 2000.7: git add -A (full-v4) 0.34(0.27+0.06) 0.34(0.29+0.05) +0.0% 0.34(0.29+0.04) +0.0% 0.35(0.28+0.06) +2.9% 2000.8: git add -A (sparse-v3) 0.06(0.07+0.04) 0.06(0.05+0.06) +0.0% 0.06(0.09+0.01) +0.0% 0.06(0.08+0.03) +0.0% 2000.9: git add -A (sparse-v4) 0.05(0.05+0.04) 0.05(0.05+0.07) +0.0% 0.05(0.04+0.06) +0.0% 0.06(0.06+0.05) +20.0% 2000.10: git add . (full-v3) 0.38(0.31+0.05) 0.37(0.29+0.06) -2.6% 0.37(0.30+0.07) -2.6% 0.37(0.29+0.06) -2.6% 2000.11: git add . (full-v4) 0.35(0.31+0.04) 0.35(0.29+0.07) +0.0% 0.35(0.29+0.05) +0.0% 0.34(0.29+0.06) -2.9% 2000.12: git add . (sparse-v3) 0.06(0.06+0.05) 0.06(0.05+0.06) +0.0% 0.06(0.07+0.05) +0.0% 0.06(0.09+0.03) +0.0% 2000.13: git add . (sparse-v4) 0.06(0.06+0.06) 0.06(0.07+0.04) +0.0% 0.05(0.06+0.05) -16.7% 0.05(0.05+0.07) -16.7% 2000.14: git commit -a -m A (full-v3) 0.48(0.37+0.08) 0.45(0.36+0.08) -6.2% 0.45(0.35+0.09) -6.2% 0.44(0.36+0.07) -8.3% 2000.15: git commit -a -m A (full-v4) 0.45(0.40+0.06) 0.43(0.34+0.07) -4.4% 0.45(0.37+0.06) +0.0% 0.42(0.36+0.05) -6.7% 2000.16: git commit -a -m A (sparse-v3) 0.05(0.05+0.06) 0.05(0.05+0.03) +0.0% 0.05(0.06+0.06) +0.0% 0.05(0.04+0.06) +0.0% 2000.17: git commit -a -m A (sparse-v4) 0.05(0.06+0.03) 0.05(0.06+0.04) +0.0% 0.06(0.07+0.05) +20.0% 0.05(0.04+0.06) +0.0% 2000.18: git checkout -f - (full-v3) 0.55(0.43+0.08) 0.54(0.46+0.05) -1.8% 0.55(0.46+0.07) +0.0% 0.54(0.40+0.10) -1.8% 2000.19: git checkout -f - (full-v4) 0.55(0.41+0.09) 0.50(0.40+0.09) -9.1% 0.51(0.46+0.05) -7.3% 0.51(0.44+0.06) -7.3% 2000.20: git checkout -f - (sparse-v3) 0.06(0.09+0.03) 0.06(0.08+0.03) +0.0% 0.06(0.06+0.05) +0.0% 0.07(0.09+0.03) +16.7% 2000.21: git checkout -f - (sparse-v4) 0.06(0.08+0.04) 0.05(0.07+0.05) -16.7% 0.05(0.07+0.04) -16.7% 0.06(0.09+0.03) +0.0% ``` All of the above were already integrated. ``` 2000.22: git reset (full-v3) 0.41(0.32+0.06) 0.40(0.31+0.06) -2.4% 0.41(0.33+0.05) +0.0% 0.42(0.34+0.04) +2.4% 2000.23: git reset (full-v4) 0.37(0.32+0.05) 0.35(0.30+0.05) -5.4% 0.37(0.30+0.05) +0.0% 0.35(0.31+0.03) -5.4% 2000.24: git reset (sparse-v3) 0.68(0.65+0.05) 0.55(0.52+0.04) -19.1% 0.04(0.05+0.04) -94.1% 0.04(0.05+0.04) -94.1% 2000.25: git reset (sparse-v4) 0.70(0.65+0.05) 0.54(0.50+0.06) -22.9% 0.04(0.07+0.01) -94.3% 0.03(0.05+0.05) -95.7% 2000.26: git reset --hard (full-v3) 0.54(0.43+0.07) 0.53(0.43+0.06) -1.9% 0.55(0.46+0.05) +1.9% 0.55(0.44+0.06) +1.9% 2000.27: git reset --hard (full-v4) 0.50(0.45+0.03) 0.50(0.43+0.05) +0.0% 0.49(0.41+0.06) -2.0% 0.50(0.42+0.05) +0.0% 2000.28: git reset --hard (sparse-v3) 0.83(0.76+0.06) 0.68(0.62+0.05) -18.1% 0.07(0.05+0.02) -91.6% 0.07(0.05+0.02) -91.6% 2000.29: git reset --hard (sparse-v4) 0.80(0.75+0.05) 0.69(0.62+0.06) -13.8% 0.07(0.04+0.02) -91.2% 0.07(0.04+0.03) -91.2% ``` As expected, `git reset [--hard]` improves with the sparse index integration, but remains constant across the full index case. ``` 2000.30: git update-index --add --remove (full-v3) 0.03(0.01+0.01) 0.03(0.02+0.01) +0.0% 0.03(0.02+0.01) +0.0% 0.03(0.01+0.01) +0.0% 2000.31: git update-index --add --remove (full-v4) 0.03(0.02+0.01) 0.03(0.02+0.01) +0.0% 0.03(0.03+0.00) +0.0% 0.03(0.02+0.01) +0.0% 2000.32: git update-index --add --remove (sparse-v3) 0.57(0.54+0.02) 0.43(0.42+0.00) -24.6% 0.44(0.41+0.03) -22.8% 0.44(0.42+0.01) -22.8% 2000.33: git update-index --add --remove (sparse-v4) 0.56(0.52+0.04) 0.43(0.42+0.01) -23.2% 0.44(0.42+0.02) -21.4% 0.42(0.41+0.01) -25.0% ``` These do not change significantly because #423 is not merged. ``` 2000.34: git diff (full-v3) 0.07(0.05+0.03) 0.06(0.05+0.03) -14.3% 0.07(0.05+0.03) +0.0% 0.06(0.05+0.03) -14.3% 2000.35: git diff (full-v4) 0.06(0.05+0.03) 0.06(0.05+0.02) +0.0% 0.06(0.05+0.02) +0.0% 0.06(0.06+0.02) +0.0% 2000.36: git diff (sparse-v3) 0.25(0.23+0.03) 0.17(0.17+0.02) -32.0% 0.18(0.18+0.02) -28.0% 0.01(0.03+0.03) -96.0% 2000.37: git diff (sparse-v4) 0.25(0.22+0.05) 0.16(0.16+0.01) -36.0% 0.18(0.15+0.04) -28.0% 0.01(0.04+0.02) -96.0% 2000.38: git diff --staged (full-v3) 0.03(0.01+0.01) 0.03(0.02+0.01) +0.0% 0.03(0.02+0.01) +0.0% 0.03(0.02+0.00) +0.0% 2000.39: git diff --staged (full-v4) 0.04(0.03+0.01) 0.03(0.02+0.01) -25.0% 0.03(0.03+0.00) -25.0% 0.03(0.03+0.00) -25.0% 2000.40: git diff --staged (sparse-v3) 0.21(0.19+0.01) 0.15(0.13+0.01) -28.6% 0.15(0.14+0.01) -28.6% 0.01(0.01+0.00) -95.2% 2000.41: git diff --staged (sparse-v4) 0.22(0.21+0.01) 0.14(0.11+0.03) -36.4% 0.15(0.13+0.02) -31.8% 0.01(0.01+0.00) -95.5% ``` The `git diff` improvements are measurable. ``` 2000.42: git sparse-checkout reapply (full-v3) 0.63(0.54+0.05) 0.56(0.48+0.04) -11.1% 0.57(0.48+0.03) -9.5% 0.59(0.48+0.05) -6.3% 2000.43: git sparse-checkout reapply (full-v4) 0.60(0.54+0.02) 0.51(0.46+0.03) -15.0% 0.54(0.48+0.02) -10.0% 0.50(0.44+0.04) -16.7% 2000.44: git sparse-checkout reapply (sparse-v3) 0.91(0.86+0.05) 0.05(0.05+0.00) -94.5% 0.06(0.05+0.01) -93.4% 0.06(0.06+0.00) -93.4% 2000.45: git sparse-checkout reapply (sparse-v4) 0.92(0.88+0.04) 0.05(0.05+0.00) -94.6% 0.05(0.05+0.01) -94.6% 0.05(0.04+0.01) -94.6% ``` Finally, the `git sparse-checkout` measurements are also present. This test script is particularly valuable when contributing changes upstream. It can be good to start by adding the lines to the performance test in an early commit, then demonstrating the performance change by copying the necessary lines from the output table into your commit message.
dscho
pushed a commit
that referenced
this pull request
Jun 17, 2022
…ckout` builtin This integrates the `sparse-checkout` builtin with the sparse index. The tricky part here is that we need to partially expand the index when we are modifying the sparse-checkout definition. Note that we modify the pattern list in a careful way: we create a `struct pattern_list` in-memory in `builtin/sparse-checkout.c` then apply those patterns to the index before writing the patterns to the sparse-checkout file. The `update_sparsity()` method does the work to assign the `SKIP_WORKTREE` bit appropriately, but this doesn't work if the files that are within the new sparse-checkout cone are still hidden behind a sparse directory. The new `expand_to_pattern_list()` method does the hard work of expanding the sparse directories that are now within the new patterns. This expands only as far as needed, possibly creating new sparse directory entries. This method does not contract existing files to sparse directories, and a big reason why is because of the check for ignored files as we delete those directories. The `clean_tracked_sparse_directories()` method is called after `update_sparsity()`, but we need to read the `A/B/.gitignore` file (or lack thereof) before we can delete `A/B/`. If we convert to sparse too quickly, then we lose this information and cause a full expansion. Most of the correctness is handled by existing tests in `t1092`, but I add checks for `ensure_not_expanded` in some hopefully interesting cases. As for performance, `git sparse-checkout set` can be slow if it needs to move a lot of files. However, no-op `git sparse-checkout set` (i.e. set the sparse-checkout cone to only include files at root, and do this on repeat) has these performance results on Linux in a monorepo with 2+ million files at `HEAD`: ``` Benchmark #1: baseline Time (mean ± σ): 10.465 s ± 0.018 s [User: 9.885 s, System: 0.573 s] Range (min … max): 10.450 s … 10.497 s 5 runs Benchmark #2: new code Time (mean ± σ): 68.9 ms ± 2.9 ms [User: 45.8 ms, System: 17.1 ms] Range (min … max): 63.4 ms … 74.0 ms 41 runs Summary 'new code' ran 151.89 ± 6.30 times faster than 'baseline' ```
dscho
pushed a commit
that referenced
this pull request
Jun 17, 2022
One thing I forgot when talking about the sparse index is that we have a performance test: `t/perf/p2000-sparse-operations.sh`. This test wasn't helpful for commands like `git merge` that need a particular set of input, but work for more read-only operations. Here is a quick demonstration of how this performance test works so we could have a definitive measure of how your previous updates improved performance. To get these results, I ran the following command in `t/perf`: ``` ./run 4bcd533 f9255a5 f28fc01 b713582 -- p2000-sparse-operations.sh ``` The short-shas correspond to the merge commits for these PRs: * #410 * #421 * #417 * #419 The test takes a copy of the Git repository and creates several copies within a nested directory heirarchy. ``` Test 4bcd533 f9255a5 f28fc01 b713582 ------------------------------------------------------------------------------------------------------------------------------------------------- 2000.2: git status (full-v3) 0.19(0.15+0.05) 0.19(0.16+0.05) +0.0% 0.20(0.18+0.03) +5.3% 0.19(0.17+0.04) +0.0% 2000.3: git status (full-v4) 0.20(0.18+0.04) 0.19(0.15+0.06) -5.0% 0.21(0.18+0.05) +5.0% 0.18(0.18+0.02) -10.0% 2000.4: git status (sparse-v3) 0.04(0.04+0.04) 0.05(0.07+0.04) +25.0% 0.04(0.04+0.05) +0.0% 0.04(0.06+0.04) +0.0% 2000.5: git status (sparse-v4) 0.04(0.03+0.06) 0.04(0.05+0.05) +0.0% 0.05(0.05+0.04) +25.0% 0.05(0.06+0.04) +25.0% 2000.6: git add -A (full-v3) 0.36(0.29+0.05) 0.38(0.28+0.07) +5.6% 0.36(0.31+0.05) +0.0% 0.37(0.31+0.05) +2.8% 2000.7: git add -A (full-v4) 0.34(0.27+0.06) 0.34(0.29+0.05) +0.0% 0.34(0.29+0.04) +0.0% 0.35(0.28+0.06) +2.9% 2000.8: git add -A (sparse-v3) 0.06(0.07+0.04) 0.06(0.05+0.06) +0.0% 0.06(0.09+0.01) +0.0% 0.06(0.08+0.03) +0.0% 2000.9: git add -A (sparse-v4) 0.05(0.05+0.04) 0.05(0.05+0.07) +0.0% 0.05(0.04+0.06) +0.0% 0.06(0.06+0.05) +20.0% 2000.10: git add . (full-v3) 0.38(0.31+0.05) 0.37(0.29+0.06) -2.6% 0.37(0.30+0.07) -2.6% 0.37(0.29+0.06) -2.6% 2000.11: git add . (full-v4) 0.35(0.31+0.04) 0.35(0.29+0.07) +0.0% 0.35(0.29+0.05) +0.0% 0.34(0.29+0.06) -2.9% 2000.12: git add . (sparse-v3) 0.06(0.06+0.05) 0.06(0.05+0.06) +0.0% 0.06(0.07+0.05) +0.0% 0.06(0.09+0.03) +0.0% 2000.13: git add . (sparse-v4) 0.06(0.06+0.06) 0.06(0.07+0.04) +0.0% 0.05(0.06+0.05) -16.7% 0.05(0.05+0.07) -16.7% 2000.14: git commit -a -m A (full-v3) 0.48(0.37+0.08) 0.45(0.36+0.08) -6.2% 0.45(0.35+0.09) -6.2% 0.44(0.36+0.07) -8.3% 2000.15: git commit -a -m A (full-v4) 0.45(0.40+0.06) 0.43(0.34+0.07) -4.4% 0.45(0.37+0.06) +0.0% 0.42(0.36+0.05) -6.7% 2000.16: git commit -a -m A (sparse-v3) 0.05(0.05+0.06) 0.05(0.05+0.03) +0.0% 0.05(0.06+0.06) +0.0% 0.05(0.04+0.06) +0.0% 2000.17: git commit -a -m A (sparse-v4) 0.05(0.06+0.03) 0.05(0.06+0.04) +0.0% 0.06(0.07+0.05) +20.0% 0.05(0.04+0.06) +0.0% 2000.18: git checkout -f - (full-v3) 0.55(0.43+0.08) 0.54(0.46+0.05) -1.8% 0.55(0.46+0.07) +0.0% 0.54(0.40+0.10) -1.8% 2000.19: git checkout -f - (full-v4) 0.55(0.41+0.09) 0.50(0.40+0.09) -9.1% 0.51(0.46+0.05) -7.3% 0.51(0.44+0.06) -7.3% 2000.20: git checkout -f - (sparse-v3) 0.06(0.09+0.03) 0.06(0.08+0.03) +0.0% 0.06(0.06+0.05) +0.0% 0.07(0.09+0.03) +16.7% 2000.21: git checkout -f - (sparse-v4) 0.06(0.08+0.04) 0.05(0.07+0.05) -16.7% 0.05(0.07+0.04) -16.7% 0.06(0.09+0.03) +0.0% ``` All of the above were already integrated. ``` 2000.22: git reset (full-v3) 0.41(0.32+0.06) 0.40(0.31+0.06) -2.4% 0.41(0.33+0.05) +0.0% 0.42(0.34+0.04) +2.4% 2000.23: git reset (full-v4) 0.37(0.32+0.05) 0.35(0.30+0.05) -5.4% 0.37(0.30+0.05) +0.0% 0.35(0.31+0.03) -5.4% 2000.24: git reset (sparse-v3) 0.68(0.65+0.05) 0.55(0.52+0.04) -19.1% 0.04(0.05+0.04) -94.1% 0.04(0.05+0.04) -94.1% 2000.25: git reset (sparse-v4) 0.70(0.65+0.05) 0.54(0.50+0.06) -22.9% 0.04(0.07+0.01) -94.3% 0.03(0.05+0.05) -95.7% 2000.26: git reset --hard (full-v3) 0.54(0.43+0.07) 0.53(0.43+0.06) -1.9% 0.55(0.46+0.05) +1.9% 0.55(0.44+0.06) +1.9% 2000.27: git reset --hard (full-v4) 0.50(0.45+0.03) 0.50(0.43+0.05) +0.0% 0.49(0.41+0.06) -2.0% 0.50(0.42+0.05) +0.0% 2000.28: git reset --hard (sparse-v3) 0.83(0.76+0.06) 0.68(0.62+0.05) -18.1% 0.07(0.05+0.02) -91.6% 0.07(0.05+0.02) -91.6% 2000.29: git reset --hard (sparse-v4) 0.80(0.75+0.05) 0.69(0.62+0.06) -13.8% 0.07(0.04+0.02) -91.2% 0.07(0.04+0.03) -91.2% ``` As expected, `git reset [--hard]` improves with the sparse index integration, but remains constant across the full index case. ``` 2000.30: git update-index --add --remove (full-v3) 0.03(0.01+0.01) 0.03(0.02+0.01) +0.0% 0.03(0.02+0.01) +0.0% 0.03(0.01+0.01) +0.0% 2000.31: git update-index --add --remove (full-v4) 0.03(0.02+0.01) 0.03(0.02+0.01) +0.0% 0.03(0.03+0.00) +0.0% 0.03(0.02+0.01) +0.0% 2000.32: git update-index --add --remove (sparse-v3) 0.57(0.54+0.02) 0.43(0.42+0.00) -24.6% 0.44(0.41+0.03) -22.8% 0.44(0.42+0.01) -22.8% 2000.33: git update-index --add --remove (sparse-v4) 0.56(0.52+0.04) 0.43(0.42+0.01) -23.2% 0.44(0.42+0.02) -21.4% 0.42(0.41+0.01) -25.0% ``` These do not change significantly because #423 is not merged. ``` 2000.34: git diff (full-v3) 0.07(0.05+0.03) 0.06(0.05+0.03) -14.3% 0.07(0.05+0.03) +0.0% 0.06(0.05+0.03) -14.3% 2000.35: git diff (full-v4) 0.06(0.05+0.03) 0.06(0.05+0.02) +0.0% 0.06(0.05+0.02) +0.0% 0.06(0.06+0.02) +0.0% 2000.36: git diff (sparse-v3) 0.25(0.23+0.03) 0.17(0.17+0.02) -32.0% 0.18(0.18+0.02) -28.0% 0.01(0.03+0.03) -96.0% 2000.37: git diff (sparse-v4) 0.25(0.22+0.05) 0.16(0.16+0.01) -36.0% 0.18(0.15+0.04) -28.0% 0.01(0.04+0.02) -96.0% 2000.38: git diff --staged (full-v3) 0.03(0.01+0.01) 0.03(0.02+0.01) +0.0% 0.03(0.02+0.01) +0.0% 0.03(0.02+0.00) +0.0% 2000.39: git diff --staged (full-v4) 0.04(0.03+0.01) 0.03(0.02+0.01) -25.0% 0.03(0.03+0.00) -25.0% 0.03(0.03+0.00) -25.0% 2000.40: git diff --staged (sparse-v3) 0.21(0.19+0.01) 0.15(0.13+0.01) -28.6% 0.15(0.14+0.01) -28.6% 0.01(0.01+0.00) -95.2% 2000.41: git diff --staged (sparse-v4) 0.22(0.21+0.01) 0.14(0.11+0.03) -36.4% 0.15(0.13+0.02) -31.8% 0.01(0.01+0.00) -95.5% ``` The `git diff` improvements are measurable. ``` 2000.42: git sparse-checkout reapply (full-v3) 0.63(0.54+0.05) 0.56(0.48+0.04) -11.1% 0.57(0.48+0.03) -9.5% 0.59(0.48+0.05) -6.3% 2000.43: git sparse-checkout reapply (full-v4) 0.60(0.54+0.02) 0.51(0.46+0.03) -15.0% 0.54(0.48+0.02) -10.0% 0.50(0.44+0.04) -16.7% 2000.44: git sparse-checkout reapply (sparse-v3) 0.91(0.86+0.05) 0.05(0.05+0.00) -94.5% 0.06(0.05+0.01) -93.4% 0.06(0.06+0.00) -93.4% 2000.45: git sparse-checkout reapply (sparse-v4) 0.92(0.88+0.04) 0.05(0.05+0.00) -94.6% 0.05(0.05+0.01) -94.6% 0.05(0.04+0.01) -94.6% ``` Finally, the `git sparse-checkout` measurements are also present. This test script is particularly valuable when contributing changes upstream. It can be good to start by adding the lines to the performance test in an early commit, then demonstrating the performance change by copying the necessary lines from the output table into your commit message.
dscho
pushed a commit
that referenced
this pull request
Jun 17, 2022
…ckout` builtin This integrates the `sparse-checkout` builtin with the sparse index. The tricky part here is that we need to partially expand the index when we are modifying the sparse-checkout definition. Note that we modify the pattern list in a careful way: we create a `struct pattern_list` in-memory in `builtin/sparse-checkout.c` then apply those patterns to the index before writing the patterns to the sparse-checkout file. The `update_sparsity()` method does the work to assign the `SKIP_WORKTREE` bit appropriately, but this doesn't work if the files that are within the new sparse-checkout cone are still hidden behind a sparse directory. The new `expand_to_pattern_list()` method does the hard work of expanding the sparse directories that are now within the new patterns. This expands only as far as needed, possibly creating new sparse directory entries. This method does not contract existing files to sparse directories, and a big reason why is because of the check for ignored files as we delete those directories. The `clean_tracked_sparse_directories()` method is called after `update_sparsity()`, but we need to read the `A/B/.gitignore` file (or lack thereof) before we can delete `A/B/`. If we convert to sparse too quickly, then we lose this information and cause a full expansion. Most of the correctness is handled by existing tests in `t1092`, but I add checks for `ensure_not_expanded` in some hopefully interesting cases. As for performance, `git sparse-checkout set` can be slow if it needs to move a lot of files. However, no-op `git sparse-checkout set` (i.e. set the sparse-checkout cone to only include files at root, and do this on repeat) has these performance results on Linux in a monorepo with 2+ million files at `HEAD`: ``` Benchmark #1: baseline Time (mean ± σ): 10.465 s ± 0.018 s [User: 9.885 s, System: 0.573 s] Range (min … max): 10.450 s … 10.497 s 5 runs Benchmark #2: new code Time (mean ± σ): 68.9 ms ± 2.9 ms [User: 45.8 ms, System: 17.1 ms] Range (min … max): 63.4 ms … 74.0 ms 41 runs Summary 'new code' ran 151.89 ± 6.30 times faster than 'baseline' ```
dscho
pushed a commit
that referenced
this pull request
Jun 17, 2022
One thing I forgot when talking about the sparse index is that we have a performance test: `t/perf/p2000-sparse-operations.sh`. This test wasn't helpful for commands like `git merge` that need a particular set of input, but work for more read-only operations. Here is a quick demonstration of how this performance test works so we could have a definitive measure of how your previous updates improved performance. To get these results, I ran the following command in `t/perf`: ``` ./run 4bcd533 f9255a5 f28fc01 b713582 -- p2000-sparse-operations.sh ``` The short-shas correspond to the merge commits for these PRs: * #410 * #421 * #417 * #419 The test takes a copy of the Git repository and creates several copies within a nested directory heirarchy. ``` Test 4bcd533 f9255a5 f28fc01 b713582 ------------------------------------------------------------------------------------------------------------------------------------------------- 2000.2: git status (full-v3) 0.19(0.15+0.05) 0.19(0.16+0.05) +0.0% 0.20(0.18+0.03) +5.3% 0.19(0.17+0.04) +0.0% 2000.3: git status (full-v4) 0.20(0.18+0.04) 0.19(0.15+0.06) -5.0% 0.21(0.18+0.05) +5.0% 0.18(0.18+0.02) -10.0% 2000.4: git status (sparse-v3) 0.04(0.04+0.04) 0.05(0.07+0.04) +25.0% 0.04(0.04+0.05) +0.0% 0.04(0.06+0.04) +0.0% 2000.5: git status (sparse-v4) 0.04(0.03+0.06) 0.04(0.05+0.05) +0.0% 0.05(0.05+0.04) +25.0% 0.05(0.06+0.04) +25.0% 2000.6: git add -A (full-v3) 0.36(0.29+0.05) 0.38(0.28+0.07) +5.6% 0.36(0.31+0.05) +0.0% 0.37(0.31+0.05) +2.8% 2000.7: git add -A (full-v4) 0.34(0.27+0.06) 0.34(0.29+0.05) +0.0% 0.34(0.29+0.04) +0.0% 0.35(0.28+0.06) +2.9% 2000.8: git add -A (sparse-v3) 0.06(0.07+0.04) 0.06(0.05+0.06) +0.0% 0.06(0.09+0.01) +0.0% 0.06(0.08+0.03) +0.0% 2000.9: git add -A (sparse-v4) 0.05(0.05+0.04) 0.05(0.05+0.07) +0.0% 0.05(0.04+0.06) +0.0% 0.06(0.06+0.05) +20.0% 2000.10: git add . (full-v3) 0.38(0.31+0.05) 0.37(0.29+0.06) -2.6% 0.37(0.30+0.07) -2.6% 0.37(0.29+0.06) -2.6% 2000.11: git add . (full-v4) 0.35(0.31+0.04) 0.35(0.29+0.07) +0.0% 0.35(0.29+0.05) +0.0% 0.34(0.29+0.06) -2.9% 2000.12: git add . (sparse-v3) 0.06(0.06+0.05) 0.06(0.05+0.06) +0.0% 0.06(0.07+0.05) +0.0% 0.06(0.09+0.03) +0.0% 2000.13: git add . (sparse-v4) 0.06(0.06+0.06) 0.06(0.07+0.04) +0.0% 0.05(0.06+0.05) -16.7% 0.05(0.05+0.07) -16.7% 2000.14: git commit -a -m A (full-v3) 0.48(0.37+0.08) 0.45(0.36+0.08) -6.2% 0.45(0.35+0.09) -6.2% 0.44(0.36+0.07) -8.3% 2000.15: git commit -a -m A (full-v4) 0.45(0.40+0.06) 0.43(0.34+0.07) -4.4% 0.45(0.37+0.06) +0.0% 0.42(0.36+0.05) -6.7% 2000.16: git commit -a -m A (sparse-v3) 0.05(0.05+0.06) 0.05(0.05+0.03) +0.0% 0.05(0.06+0.06) +0.0% 0.05(0.04+0.06) +0.0% 2000.17: git commit -a -m A (sparse-v4) 0.05(0.06+0.03) 0.05(0.06+0.04) +0.0% 0.06(0.07+0.05) +20.0% 0.05(0.04+0.06) +0.0% 2000.18: git checkout -f - (full-v3) 0.55(0.43+0.08) 0.54(0.46+0.05) -1.8% 0.55(0.46+0.07) +0.0% 0.54(0.40+0.10) -1.8% 2000.19: git checkout -f - (full-v4) 0.55(0.41+0.09) 0.50(0.40+0.09) -9.1% 0.51(0.46+0.05) -7.3% 0.51(0.44+0.06) -7.3% 2000.20: git checkout -f - (sparse-v3) 0.06(0.09+0.03) 0.06(0.08+0.03) +0.0% 0.06(0.06+0.05) +0.0% 0.07(0.09+0.03) +16.7% 2000.21: git checkout -f - (sparse-v4) 0.06(0.08+0.04) 0.05(0.07+0.05) -16.7% 0.05(0.07+0.04) -16.7% 0.06(0.09+0.03) +0.0% ``` All of the above were already integrated. ``` 2000.22: git reset (full-v3) 0.41(0.32+0.06) 0.40(0.31+0.06) -2.4% 0.41(0.33+0.05) +0.0% 0.42(0.34+0.04) +2.4% 2000.23: git reset (full-v4) 0.37(0.32+0.05) 0.35(0.30+0.05) -5.4% 0.37(0.30+0.05) +0.0% 0.35(0.31+0.03) -5.4% 2000.24: git reset (sparse-v3) 0.68(0.65+0.05) 0.55(0.52+0.04) -19.1% 0.04(0.05+0.04) -94.1% 0.04(0.05+0.04) -94.1% 2000.25: git reset (sparse-v4) 0.70(0.65+0.05) 0.54(0.50+0.06) -22.9% 0.04(0.07+0.01) -94.3% 0.03(0.05+0.05) -95.7% 2000.26: git reset --hard (full-v3) 0.54(0.43+0.07) 0.53(0.43+0.06) -1.9% 0.55(0.46+0.05) +1.9% 0.55(0.44+0.06) +1.9% 2000.27: git reset --hard (full-v4) 0.50(0.45+0.03) 0.50(0.43+0.05) +0.0% 0.49(0.41+0.06) -2.0% 0.50(0.42+0.05) +0.0% 2000.28: git reset --hard (sparse-v3) 0.83(0.76+0.06) 0.68(0.62+0.05) -18.1% 0.07(0.05+0.02) -91.6% 0.07(0.05+0.02) -91.6% 2000.29: git reset --hard (sparse-v4) 0.80(0.75+0.05) 0.69(0.62+0.06) -13.8% 0.07(0.04+0.02) -91.2% 0.07(0.04+0.03) -91.2% ``` As expected, `git reset [--hard]` improves with the sparse index integration, but remains constant across the full index case. ``` 2000.30: git update-index --add --remove (full-v3) 0.03(0.01+0.01) 0.03(0.02+0.01) +0.0% 0.03(0.02+0.01) +0.0% 0.03(0.01+0.01) +0.0% 2000.31: git update-index --add --remove (full-v4) 0.03(0.02+0.01) 0.03(0.02+0.01) +0.0% 0.03(0.03+0.00) +0.0% 0.03(0.02+0.01) +0.0% 2000.32: git update-index --add --remove (sparse-v3) 0.57(0.54+0.02) 0.43(0.42+0.00) -24.6% 0.44(0.41+0.03) -22.8% 0.44(0.42+0.01) -22.8% 2000.33: git update-index --add --remove (sparse-v4) 0.56(0.52+0.04) 0.43(0.42+0.01) -23.2% 0.44(0.42+0.02) -21.4% 0.42(0.41+0.01) -25.0% ``` These do not change significantly because #423 is not merged. ``` 2000.34: git diff (full-v3) 0.07(0.05+0.03) 0.06(0.05+0.03) -14.3% 0.07(0.05+0.03) +0.0% 0.06(0.05+0.03) -14.3% 2000.35: git diff (full-v4) 0.06(0.05+0.03) 0.06(0.05+0.02) +0.0% 0.06(0.05+0.02) +0.0% 0.06(0.06+0.02) +0.0% 2000.36: git diff (sparse-v3) 0.25(0.23+0.03) 0.17(0.17+0.02) -32.0% 0.18(0.18+0.02) -28.0% 0.01(0.03+0.03) -96.0% 2000.37: git diff (sparse-v4) 0.25(0.22+0.05) 0.16(0.16+0.01) -36.0% 0.18(0.15+0.04) -28.0% 0.01(0.04+0.02) -96.0% 2000.38: git diff --staged (full-v3) 0.03(0.01+0.01) 0.03(0.02+0.01) +0.0% 0.03(0.02+0.01) +0.0% 0.03(0.02+0.00) +0.0% 2000.39: git diff --staged (full-v4) 0.04(0.03+0.01) 0.03(0.02+0.01) -25.0% 0.03(0.03+0.00) -25.0% 0.03(0.03+0.00) -25.0% 2000.40: git diff --staged (sparse-v3) 0.21(0.19+0.01) 0.15(0.13+0.01) -28.6% 0.15(0.14+0.01) -28.6% 0.01(0.01+0.00) -95.2% 2000.41: git diff --staged (sparse-v4) 0.22(0.21+0.01) 0.14(0.11+0.03) -36.4% 0.15(0.13+0.02) -31.8% 0.01(0.01+0.00) -95.5% ``` The `git diff` improvements are measurable. ``` 2000.42: git sparse-checkout reapply (full-v3) 0.63(0.54+0.05) 0.56(0.48+0.04) -11.1% 0.57(0.48+0.03) -9.5% 0.59(0.48+0.05) -6.3% 2000.43: git sparse-checkout reapply (full-v4) 0.60(0.54+0.02) 0.51(0.46+0.03) -15.0% 0.54(0.48+0.02) -10.0% 0.50(0.44+0.04) -16.7% 2000.44: git sparse-checkout reapply (sparse-v3) 0.91(0.86+0.05) 0.05(0.05+0.00) -94.5% 0.06(0.05+0.01) -93.4% 0.06(0.06+0.00) -93.4% 2000.45: git sparse-checkout reapply (sparse-v4) 0.92(0.88+0.04) 0.05(0.05+0.00) -94.6% 0.05(0.05+0.01) -94.6% 0.05(0.04+0.01) -94.6% ``` Finally, the `git sparse-checkout` measurements are also present. This test script is particularly valuable when contributing changes upstream. It can be good to start by adding the lines to the performance test in an early commit, then demonstrating the performance change by copying the necessary lines from the output table into your commit message.
dscho
pushed a commit
that referenced
this pull request
Jun 17, 2022
…ckout` builtin This integrates the `sparse-checkout` builtin with the sparse index. The tricky part here is that we need to partially expand the index when we are modifying the sparse-checkout definition. Note that we modify the pattern list in a careful way: we create a `struct pattern_list` in-memory in `builtin/sparse-checkout.c` then apply those patterns to the index before writing the patterns to the sparse-checkout file. The `update_sparsity()` method does the work to assign the `SKIP_WORKTREE` bit appropriately, but this doesn't work if the files that are within the new sparse-checkout cone are still hidden behind a sparse directory. The new `expand_to_pattern_list()` method does the hard work of expanding the sparse directories that are now within the new patterns. This expands only as far as needed, possibly creating new sparse directory entries. This method does not contract existing files to sparse directories, and a big reason why is because of the check for ignored files as we delete those directories. The `clean_tracked_sparse_directories()` method is called after `update_sparsity()`, but we need to read the `A/B/.gitignore` file (or lack thereof) before we can delete `A/B/`. If we convert to sparse too quickly, then we lose this information and cause a full expansion. Most of the correctness is handled by existing tests in `t1092`, but I add checks for `ensure_not_expanded` in some hopefully interesting cases. As for performance, `git sparse-checkout set` can be slow if it needs to move a lot of files. However, no-op `git sparse-checkout set` (i.e. set the sparse-checkout cone to only include files at root, and do this on repeat) has these performance results on Linux in a monorepo with 2+ million files at `HEAD`: ``` Benchmark #1: baseline Time (mean ± σ): 10.465 s ± 0.018 s [User: 9.885 s, System: 0.573 s] Range (min … max): 10.450 s … 10.497 s 5 runs Benchmark #2: new code Time (mean ± σ): 68.9 ms ± 2.9 ms [User: 45.8 ms, System: 17.1 ms] Range (min … max): 63.4 ms … 74.0 ms 41 runs Summary 'new code' ran 151.89 ± 6.30 times faster than 'baseline' ```
dscho
pushed a commit
that referenced
this pull request
Jun 17, 2022
One thing I forgot when talking about the sparse index is that we have a performance test: `t/perf/p2000-sparse-operations.sh`. This test wasn't helpful for commands like `git merge` that need a particular set of input, but work for more read-only operations. Here is a quick demonstration of how this performance test works so we could have a definitive measure of how your previous updates improved performance. To get these results, I ran the following command in `t/perf`: ``` ./run 4bcd533 f9255a5 f28fc01 b713582 -- p2000-sparse-operations.sh ``` The short-shas correspond to the merge commits for these PRs: * #410 * #421 * #417 * #419 The test takes a copy of the Git repository and creates several copies within a nested directory heirarchy. ``` Test 4bcd533 f9255a5 f28fc01 b713582 ------------------------------------------------------------------------------------------------------------------------------------------------- 2000.2: git status (full-v3) 0.19(0.15+0.05) 0.19(0.16+0.05) +0.0% 0.20(0.18+0.03) +5.3% 0.19(0.17+0.04) +0.0% 2000.3: git status (full-v4) 0.20(0.18+0.04) 0.19(0.15+0.06) -5.0% 0.21(0.18+0.05) +5.0% 0.18(0.18+0.02) -10.0% 2000.4: git status (sparse-v3) 0.04(0.04+0.04) 0.05(0.07+0.04) +25.0% 0.04(0.04+0.05) +0.0% 0.04(0.06+0.04) +0.0% 2000.5: git status (sparse-v4) 0.04(0.03+0.06) 0.04(0.05+0.05) +0.0% 0.05(0.05+0.04) +25.0% 0.05(0.06+0.04) +25.0% 2000.6: git add -A (full-v3) 0.36(0.29+0.05) 0.38(0.28+0.07) +5.6% 0.36(0.31+0.05) +0.0% 0.37(0.31+0.05) +2.8% 2000.7: git add -A (full-v4) 0.34(0.27+0.06) 0.34(0.29+0.05) +0.0% 0.34(0.29+0.04) +0.0% 0.35(0.28+0.06) +2.9% 2000.8: git add -A (sparse-v3) 0.06(0.07+0.04) 0.06(0.05+0.06) +0.0% 0.06(0.09+0.01) +0.0% 0.06(0.08+0.03) +0.0% 2000.9: git add -A (sparse-v4) 0.05(0.05+0.04) 0.05(0.05+0.07) +0.0% 0.05(0.04+0.06) +0.0% 0.06(0.06+0.05) +20.0% 2000.10: git add . (full-v3) 0.38(0.31+0.05) 0.37(0.29+0.06) -2.6% 0.37(0.30+0.07) -2.6% 0.37(0.29+0.06) -2.6% 2000.11: git add . (full-v4) 0.35(0.31+0.04) 0.35(0.29+0.07) +0.0% 0.35(0.29+0.05) +0.0% 0.34(0.29+0.06) -2.9% 2000.12: git add . (sparse-v3) 0.06(0.06+0.05) 0.06(0.05+0.06) +0.0% 0.06(0.07+0.05) +0.0% 0.06(0.09+0.03) +0.0% 2000.13: git add . (sparse-v4) 0.06(0.06+0.06) 0.06(0.07+0.04) +0.0% 0.05(0.06+0.05) -16.7% 0.05(0.05+0.07) -16.7% 2000.14: git commit -a -m A (full-v3) 0.48(0.37+0.08) 0.45(0.36+0.08) -6.2% 0.45(0.35+0.09) -6.2% 0.44(0.36+0.07) -8.3% 2000.15: git commit -a -m A (full-v4) 0.45(0.40+0.06) 0.43(0.34+0.07) -4.4% 0.45(0.37+0.06) +0.0% 0.42(0.36+0.05) -6.7% 2000.16: git commit -a -m A (sparse-v3) 0.05(0.05+0.06) 0.05(0.05+0.03) +0.0% 0.05(0.06+0.06) +0.0% 0.05(0.04+0.06) +0.0% 2000.17: git commit -a -m A (sparse-v4) 0.05(0.06+0.03) 0.05(0.06+0.04) +0.0% 0.06(0.07+0.05) +20.0% 0.05(0.04+0.06) +0.0% 2000.18: git checkout -f - (full-v3) 0.55(0.43+0.08) 0.54(0.46+0.05) -1.8% 0.55(0.46+0.07) +0.0% 0.54(0.40+0.10) -1.8% 2000.19: git checkout -f - (full-v4) 0.55(0.41+0.09) 0.50(0.40+0.09) -9.1% 0.51(0.46+0.05) -7.3% 0.51(0.44+0.06) -7.3% 2000.20: git checkout -f - (sparse-v3) 0.06(0.09+0.03) 0.06(0.08+0.03) +0.0% 0.06(0.06+0.05) +0.0% 0.07(0.09+0.03) +16.7% 2000.21: git checkout -f - (sparse-v4) 0.06(0.08+0.04) 0.05(0.07+0.05) -16.7% 0.05(0.07+0.04) -16.7% 0.06(0.09+0.03) +0.0% ``` All of the above were already integrated. ``` 2000.22: git reset (full-v3) 0.41(0.32+0.06) 0.40(0.31+0.06) -2.4% 0.41(0.33+0.05) +0.0% 0.42(0.34+0.04) +2.4% 2000.23: git reset (full-v4) 0.37(0.32+0.05) 0.35(0.30+0.05) -5.4% 0.37(0.30+0.05) +0.0% 0.35(0.31+0.03) -5.4% 2000.24: git reset (sparse-v3) 0.68(0.65+0.05) 0.55(0.52+0.04) -19.1% 0.04(0.05+0.04) -94.1% 0.04(0.05+0.04) -94.1% 2000.25: git reset (sparse-v4) 0.70(0.65+0.05) 0.54(0.50+0.06) -22.9% 0.04(0.07+0.01) -94.3% 0.03(0.05+0.05) -95.7% 2000.26: git reset --hard (full-v3) 0.54(0.43+0.07) 0.53(0.43+0.06) -1.9% 0.55(0.46+0.05) +1.9% 0.55(0.44+0.06) +1.9% 2000.27: git reset --hard (full-v4) 0.50(0.45+0.03) 0.50(0.43+0.05) +0.0% 0.49(0.41+0.06) -2.0% 0.50(0.42+0.05) +0.0% 2000.28: git reset --hard (sparse-v3) 0.83(0.76+0.06) 0.68(0.62+0.05) -18.1% 0.07(0.05+0.02) -91.6% 0.07(0.05+0.02) -91.6% 2000.29: git reset --hard (sparse-v4) 0.80(0.75+0.05) 0.69(0.62+0.06) -13.8% 0.07(0.04+0.02) -91.2% 0.07(0.04+0.03) -91.2% ``` As expected, `git reset [--hard]` improves with the sparse index integration, but remains constant across the full index case. ``` 2000.30: git update-index --add --remove (full-v3) 0.03(0.01+0.01) 0.03(0.02+0.01) +0.0% 0.03(0.02+0.01) +0.0% 0.03(0.01+0.01) +0.0% 2000.31: git update-index --add --remove (full-v4) 0.03(0.02+0.01) 0.03(0.02+0.01) +0.0% 0.03(0.03+0.00) +0.0% 0.03(0.02+0.01) +0.0% 2000.32: git update-index --add --remove (sparse-v3) 0.57(0.54+0.02) 0.43(0.42+0.00) -24.6% 0.44(0.41+0.03) -22.8% 0.44(0.42+0.01) -22.8% 2000.33: git update-index --add --remove (sparse-v4) 0.56(0.52+0.04) 0.43(0.42+0.01) -23.2% 0.44(0.42+0.02) -21.4% 0.42(0.41+0.01) -25.0% ``` These do not change significantly because #423 is not merged. ``` 2000.34: git diff (full-v3) 0.07(0.05+0.03) 0.06(0.05+0.03) -14.3% 0.07(0.05+0.03) +0.0% 0.06(0.05+0.03) -14.3% 2000.35: git diff (full-v4) 0.06(0.05+0.03) 0.06(0.05+0.02) +0.0% 0.06(0.05+0.02) +0.0% 0.06(0.06+0.02) +0.0% 2000.36: git diff (sparse-v3) 0.25(0.23+0.03) 0.17(0.17+0.02) -32.0% 0.18(0.18+0.02) -28.0% 0.01(0.03+0.03) -96.0% 2000.37: git diff (sparse-v4) 0.25(0.22+0.05) 0.16(0.16+0.01) -36.0% 0.18(0.15+0.04) -28.0% 0.01(0.04+0.02) -96.0% 2000.38: git diff --staged (full-v3) 0.03(0.01+0.01) 0.03(0.02+0.01) +0.0% 0.03(0.02+0.01) +0.0% 0.03(0.02+0.00) +0.0% 2000.39: git diff --staged (full-v4) 0.04(0.03+0.01) 0.03(0.02+0.01) -25.0% 0.03(0.03+0.00) -25.0% 0.03(0.03+0.00) -25.0% 2000.40: git diff --staged (sparse-v3) 0.21(0.19+0.01) 0.15(0.13+0.01) -28.6% 0.15(0.14+0.01) -28.6% 0.01(0.01+0.00) -95.2% 2000.41: git diff --staged (sparse-v4) 0.22(0.21+0.01) 0.14(0.11+0.03) -36.4% 0.15(0.13+0.02) -31.8% 0.01(0.01+0.00) -95.5% ``` The `git diff` improvements are measurable. ``` 2000.42: git sparse-checkout reapply (full-v3) 0.63(0.54+0.05) 0.56(0.48+0.04) -11.1% 0.57(0.48+0.03) -9.5% 0.59(0.48+0.05) -6.3% 2000.43: git sparse-checkout reapply (full-v4) 0.60(0.54+0.02) 0.51(0.46+0.03) -15.0% 0.54(0.48+0.02) -10.0% 0.50(0.44+0.04) -16.7% 2000.44: git sparse-checkout reapply (sparse-v3) 0.91(0.86+0.05) 0.05(0.05+0.00) -94.5% 0.06(0.05+0.01) -93.4% 0.06(0.06+0.00) -93.4% 2000.45: git sparse-checkout reapply (sparse-v4) 0.92(0.88+0.04) 0.05(0.05+0.00) -94.6% 0.05(0.05+0.01) -94.6% 0.05(0.04+0.01) -94.6% ``` Finally, the `git sparse-checkout` measurements are also present. This test script is particularly valuable when contributing changes upstream. It can be good to start by adding the lines to the performance test in an early commit, then demonstrating the performance change by copying the necessary lines from the output table into your commit message.
dscho
pushed a commit
that referenced
this pull request
Jun 17, 2022
…ckout` builtin This integrates the `sparse-checkout` builtin with the sparse index. The tricky part here is that we need to partially expand the index when we are modifying the sparse-checkout definition. Note that we modify the pattern list in a careful way: we create a `struct pattern_list` in-memory in `builtin/sparse-checkout.c` then apply those patterns to the index before writing the patterns to the sparse-checkout file. The `update_sparsity()` method does the work to assign the `SKIP_WORKTREE` bit appropriately, but this doesn't work if the files that are within the new sparse-checkout cone are still hidden behind a sparse directory. The new `expand_to_pattern_list()` method does the hard work of expanding the sparse directories that are now within the new patterns. This expands only as far as needed, possibly creating new sparse directory entries. This method does not contract existing files to sparse directories, and a big reason why is because of the check for ignored files as we delete those directories. The `clean_tracked_sparse_directories()` method is called after `update_sparsity()`, but we need to read the `A/B/.gitignore` file (or lack thereof) before we can delete `A/B/`. If we convert to sparse too quickly, then we lose this information and cause a full expansion. Most of the correctness is handled by existing tests in `t1092`, but I add checks for `ensure_not_expanded` in some hopefully interesting cases. As for performance, `git sparse-checkout set` can be slow if it needs to move a lot of files. However, no-op `git sparse-checkout set` (i.e. set the sparse-checkout cone to only include files at root, and do this on repeat) has these performance results on Linux in a monorepo with 2+ million files at `HEAD`: ``` Benchmark #1: baseline Time (mean ± σ): 10.465 s ± 0.018 s [User: 9.885 s, System: 0.573 s] Range (min … max): 10.450 s … 10.497 s 5 runs Benchmark #2: new code Time (mean ± σ): 68.9 ms ± 2.9 ms [User: 45.8 ms, System: 17.1 ms] Range (min … max): 63.4 ms … 74.0 ms 41 runs Summary 'new code' ran 151.89 ± 6.30 times faster than 'baseline' ```
dscho
pushed a commit
that referenced
this pull request
Jun 17, 2022
One thing I forgot when talking about the sparse index is that we have a performance test: `t/perf/p2000-sparse-operations.sh`. This test wasn't helpful for commands like `git merge` that need a particular set of input, but work for more read-only operations. Here is a quick demonstration of how this performance test works so we could have a definitive measure of how your previous updates improved performance. To get these results, I ran the following command in `t/perf`: ``` ./run 4bcd533 f9255a5 f28fc01 b713582 -- p2000-sparse-operations.sh ``` The short-shas correspond to the merge commits for these PRs: * #410 * #421 * #417 * #419 The test takes a copy of the Git repository and creates several copies within a nested directory heirarchy. ``` Test 4bcd533 f9255a5 f28fc01 b713582 ------------------------------------------------------------------------------------------------------------------------------------------------- 2000.2: git status (full-v3) 0.19(0.15+0.05) 0.19(0.16+0.05) +0.0% 0.20(0.18+0.03) +5.3% 0.19(0.17+0.04) +0.0% 2000.3: git status (full-v4) 0.20(0.18+0.04) 0.19(0.15+0.06) -5.0% 0.21(0.18+0.05) +5.0% 0.18(0.18+0.02) -10.0% 2000.4: git status (sparse-v3) 0.04(0.04+0.04) 0.05(0.07+0.04) +25.0% 0.04(0.04+0.05) +0.0% 0.04(0.06+0.04) +0.0% 2000.5: git status (sparse-v4) 0.04(0.03+0.06) 0.04(0.05+0.05) +0.0% 0.05(0.05+0.04) +25.0% 0.05(0.06+0.04) +25.0% 2000.6: git add -A (full-v3) 0.36(0.29+0.05) 0.38(0.28+0.07) +5.6% 0.36(0.31+0.05) +0.0% 0.37(0.31+0.05) +2.8% 2000.7: git add -A (full-v4) 0.34(0.27+0.06) 0.34(0.29+0.05) +0.0% 0.34(0.29+0.04) +0.0% 0.35(0.28+0.06) +2.9% 2000.8: git add -A (sparse-v3) 0.06(0.07+0.04) 0.06(0.05+0.06) +0.0% 0.06(0.09+0.01) +0.0% 0.06(0.08+0.03) +0.0% 2000.9: git add -A (sparse-v4) 0.05(0.05+0.04) 0.05(0.05+0.07) +0.0% 0.05(0.04+0.06) +0.0% 0.06(0.06+0.05) +20.0% 2000.10: git add . (full-v3) 0.38(0.31+0.05) 0.37(0.29+0.06) -2.6% 0.37(0.30+0.07) -2.6% 0.37(0.29+0.06) -2.6% 2000.11: git add . (full-v4) 0.35(0.31+0.04) 0.35(0.29+0.07) +0.0% 0.35(0.29+0.05) +0.0% 0.34(0.29+0.06) -2.9% 2000.12: git add . (sparse-v3) 0.06(0.06+0.05) 0.06(0.05+0.06) +0.0% 0.06(0.07+0.05) +0.0% 0.06(0.09+0.03) +0.0% 2000.13: git add . (sparse-v4) 0.06(0.06+0.06) 0.06(0.07+0.04) +0.0% 0.05(0.06+0.05) -16.7% 0.05(0.05+0.07) -16.7% 2000.14: git commit -a -m A (full-v3) 0.48(0.37+0.08) 0.45(0.36+0.08) -6.2% 0.45(0.35+0.09) -6.2% 0.44(0.36+0.07) -8.3% 2000.15: git commit -a -m A (full-v4) 0.45(0.40+0.06) 0.43(0.34+0.07) -4.4% 0.45(0.37+0.06) +0.0% 0.42(0.36+0.05) -6.7% 2000.16: git commit -a -m A (sparse-v3) 0.05(0.05+0.06) 0.05(0.05+0.03) +0.0% 0.05(0.06+0.06) +0.0% 0.05(0.04+0.06) +0.0% 2000.17: git commit -a -m A (sparse-v4) 0.05(0.06+0.03) 0.05(0.06+0.04) +0.0% 0.06(0.07+0.05) +20.0% 0.05(0.04+0.06) +0.0% 2000.18: git checkout -f - (full-v3) 0.55(0.43+0.08) 0.54(0.46+0.05) -1.8% 0.55(0.46+0.07) +0.0% 0.54(0.40+0.10) -1.8% 2000.19: git checkout -f - (full-v4) 0.55(0.41+0.09) 0.50(0.40+0.09) -9.1% 0.51(0.46+0.05) -7.3% 0.51(0.44+0.06) -7.3% 2000.20: git checkout -f - (sparse-v3) 0.06(0.09+0.03) 0.06(0.08+0.03) +0.0% 0.06(0.06+0.05) +0.0% 0.07(0.09+0.03) +16.7% 2000.21: git checkout -f - (sparse-v4) 0.06(0.08+0.04) 0.05(0.07+0.05) -16.7% 0.05(0.07+0.04) -16.7% 0.06(0.09+0.03) +0.0% ``` All of the above were already integrated. ``` 2000.22: git reset (full-v3) 0.41(0.32+0.06) 0.40(0.31+0.06) -2.4% 0.41(0.33+0.05) +0.0% 0.42(0.34+0.04) +2.4% 2000.23: git reset (full-v4) 0.37(0.32+0.05) 0.35(0.30+0.05) -5.4% 0.37(0.30+0.05) +0.0% 0.35(0.31+0.03) -5.4% 2000.24: git reset (sparse-v3) 0.68(0.65+0.05) 0.55(0.52+0.04) -19.1% 0.04(0.05+0.04) -94.1% 0.04(0.05+0.04) -94.1% 2000.25: git reset (sparse-v4) 0.70(0.65+0.05) 0.54(0.50+0.06) -22.9% 0.04(0.07+0.01) -94.3% 0.03(0.05+0.05) -95.7% 2000.26: git reset --hard (full-v3) 0.54(0.43+0.07) 0.53(0.43+0.06) -1.9% 0.55(0.46+0.05) +1.9% 0.55(0.44+0.06) +1.9% 2000.27: git reset --hard (full-v4) 0.50(0.45+0.03) 0.50(0.43+0.05) +0.0% 0.49(0.41+0.06) -2.0% 0.50(0.42+0.05) +0.0% 2000.28: git reset --hard (sparse-v3) 0.83(0.76+0.06) 0.68(0.62+0.05) -18.1% 0.07(0.05+0.02) -91.6% 0.07(0.05+0.02) -91.6% 2000.29: git reset --hard (sparse-v4) 0.80(0.75+0.05) 0.69(0.62+0.06) -13.8% 0.07(0.04+0.02) -91.2% 0.07(0.04+0.03) -91.2% ``` As expected, `git reset [--hard]` improves with the sparse index integration, but remains constant across the full index case. ``` 2000.30: git update-index --add --remove (full-v3) 0.03(0.01+0.01) 0.03(0.02+0.01) +0.0% 0.03(0.02+0.01) +0.0% 0.03(0.01+0.01) +0.0% 2000.31: git update-index --add --remove (full-v4) 0.03(0.02+0.01) 0.03(0.02+0.01) +0.0% 0.03(0.03+0.00) +0.0% 0.03(0.02+0.01) +0.0% 2000.32: git update-index --add --remove (sparse-v3) 0.57(0.54+0.02) 0.43(0.42+0.00) -24.6% 0.44(0.41+0.03) -22.8% 0.44(0.42+0.01) -22.8% 2000.33: git update-index --add --remove (sparse-v4) 0.56(0.52+0.04) 0.43(0.42+0.01) -23.2% 0.44(0.42+0.02) -21.4% 0.42(0.41+0.01) -25.0% ``` These do not change significantly because #423 is not merged. ``` 2000.34: git diff (full-v3) 0.07(0.05+0.03) 0.06(0.05+0.03) -14.3% 0.07(0.05+0.03) +0.0% 0.06(0.05+0.03) -14.3% 2000.35: git diff (full-v4) 0.06(0.05+0.03) 0.06(0.05+0.02) +0.0% 0.06(0.05+0.02) +0.0% 0.06(0.06+0.02) +0.0% 2000.36: git diff (sparse-v3) 0.25(0.23+0.03) 0.17(0.17+0.02) -32.0% 0.18(0.18+0.02) -28.0% 0.01(0.03+0.03) -96.0% 2000.37: git diff (sparse-v4) 0.25(0.22+0.05) 0.16(0.16+0.01) -36.0% 0.18(0.15+0.04) -28.0% 0.01(0.04+0.02) -96.0% 2000.38: git diff --staged (full-v3) 0.03(0.01+0.01) 0.03(0.02+0.01) +0.0% 0.03(0.02+0.01) +0.0% 0.03(0.02+0.00) +0.0% 2000.39: git diff --staged (full-v4) 0.04(0.03+0.01) 0.03(0.02+0.01) -25.0% 0.03(0.03+0.00) -25.0% 0.03(0.03+0.00) -25.0% 2000.40: git diff --staged (sparse-v3) 0.21(0.19+0.01) 0.15(0.13+0.01) -28.6% 0.15(0.14+0.01) -28.6% 0.01(0.01+0.00) -95.2% 2000.41: git diff --staged (sparse-v4) 0.22(0.21+0.01) 0.14(0.11+0.03) -36.4% 0.15(0.13+0.02) -31.8% 0.01(0.01+0.00) -95.5% ``` The `git diff` improvements are measurable. ``` 2000.42: git sparse-checkout reapply (full-v3) 0.63(0.54+0.05) 0.56(0.48+0.04) -11.1% 0.57(0.48+0.03) -9.5% 0.59(0.48+0.05) -6.3% 2000.43: git sparse-checkout reapply (full-v4) 0.60(0.54+0.02) 0.51(0.46+0.03) -15.0% 0.54(0.48+0.02) -10.0% 0.50(0.44+0.04) -16.7% 2000.44: git sparse-checkout reapply (sparse-v3) 0.91(0.86+0.05) 0.05(0.05+0.00) -94.5% 0.06(0.05+0.01) -93.4% 0.06(0.06+0.00) -93.4% 2000.45: git sparse-checkout reapply (sparse-v4) 0.92(0.88+0.04) 0.05(0.05+0.00) -94.6% 0.05(0.05+0.01) -94.6% 0.05(0.04+0.01) -94.6% ``` Finally, the `git sparse-checkout` measurements are also present. This test script is particularly valuable when contributing changes upstream. It can be good to start by adding the lines to the performance test in an early commit, then demonstrating the performance change by copying the necessary lines from the output table into your commit message.
dscho
pushed a commit
that referenced
this pull request
Jun 17, 2022
…ckout` builtin This integrates the `sparse-checkout` builtin with the sparse index. The tricky part here is that we need to partially expand the index when we are modifying the sparse-checkout definition. Note that we modify the pattern list in a careful way: we create a `struct pattern_list` in-memory in `builtin/sparse-checkout.c` then apply those patterns to the index before writing the patterns to the sparse-checkout file. The `update_sparsity()` method does the work to assign the `SKIP_WORKTREE` bit appropriately, but this doesn't work if the files that are within the new sparse-checkout cone are still hidden behind a sparse directory. The new `expand_to_pattern_list()` method does the hard work of expanding the sparse directories that are now within the new patterns. This expands only as far as needed, possibly creating new sparse directory entries. This method does not contract existing files to sparse directories, and a big reason why is because of the check for ignored files as we delete those directories. The `clean_tracked_sparse_directories()` method is called after `update_sparsity()`, but we need to read the `A/B/.gitignore` file (or lack thereof) before we can delete `A/B/`. If we convert to sparse too quickly, then we lose this information and cause a full expansion. Most of the correctness is handled by existing tests in `t1092`, but I add checks for `ensure_not_expanded` in some hopefully interesting cases. As for performance, `git sparse-checkout set` can be slow if it needs to move a lot of files. However, no-op `git sparse-checkout set` (i.e. set the sparse-checkout cone to only include files at root, and do this on repeat) has these performance results on Linux in a monorepo with 2+ million files at `HEAD`: ``` Benchmark #1: baseline Time (mean ± σ): 10.465 s ± 0.018 s [User: 9.885 s, System: 0.573 s] Range (min … max): 10.450 s … 10.497 s 5 runs Benchmark #2: new code Time (mean ± σ): 68.9 ms ± 2.9 ms [User: 45.8 ms, System: 17.1 ms] Range (min … max): 63.4 ms … 74.0 ms 41 runs Summary 'new code' ran 151.89 ± 6.30 times faster than 'baseline' ```
dscho
pushed a commit
that referenced
this pull request
Jun 17, 2022
One thing I forgot when talking about the sparse index is that we have a performance test: `t/perf/p2000-sparse-operations.sh`. This test wasn't helpful for commands like `git merge` that need a particular set of input, but work for more read-only operations. Here is a quick demonstration of how this performance test works so we could have a definitive measure of how your previous updates improved performance. To get these results, I ran the following command in `t/perf`: ``` ./run 4bcd533 f9255a5 f28fc01 b713582 -- p2000-sparse-operations.sh ``` The short-shas correspond to the merge commits for these PRs: * #410 * #421 * #417 * #419 The test takes a copy of the Git repository and creates several copies within a nested directory heirarchy. ``` Test 4bcd533 f9255a5 f28fc01 b713582 ------------------------------------------------------------------------------------------------------------------------------------------------- 2000.2: git status (full-v3) 0.19(0.15+0.05) 0.19(0.16+0.05) +0.0% 0.20(0.18+0.03) +5.3% 0.19(0.17+0.04) +0.0% 2000.3: git status (full-v4) 0.20(0.18+0.04) 0.19(0.15+0.06) -5.0% 0.21(0.18+0.05) +5.0% 0.18(0.18+0.02) -10.0% 2000.4: git status (sparse-v3) 0.04(0.04+0.04) 0.05(0.07+0.04) +25.0% 0.04(0.04+0.05) +0.0% 0.04(0.06+0.04) +0.0% 2000.5: git status (sparse-v4) 0.04(0.03+0.06) 0.04(0.05+0.05) +0.0% 0.05(0.05+0.04) +25.0% 0.05(0.06+0.04) +25.0% 2000.6: git add -A (full-v3) 0.36(0.29+0.05) 0.38(0.28+0.07) +5.6% 0.36(0.31+0.05) +0.0% 0.37(0.31+0.05) +2.8% 2000.7: git add -A (full-v4) 0.34(0.27+0.06) 0.34(0.29+0.05) +0.0% 0.34(0.29+0.04) +0.0% 0.35(0.28+0.06) +2.9% 2000.8: git add -A (sparse-v3) 0.06(0.07+0.04) 0.06(0.05+0.06) +0.0% 0.06(0.09+0.01) +0.0% 0.06(0.08+0.03) +0.0% 2000.9: git add -A (sparse-v4) 0.05(0.05+0.04) 0.05(0.05+0.07) +0.0% 0.05(0.04+0.06) +0.0% 0.06(0.06+0.05) +20.0% 2000.10: git add . (full-v3) 0.38(0.31+0.05) 0.37(0.29+0.06) -2.6% 0.37(0.30+0.07) -2.6% 0.37(0.29+0.06) -2.6% 2000.11: git add . (full-v4) 0.35(0.31+0.04) 0.35(0.29+0.07) +0.0% 0.35(0.29+0.05) +0.0% 0.34(0.29+0.06) -2.9% 2000.12: git add . (sparse-v3) 0.06(0.06+0.05) 0.06(0.05+0.06) +0.0% 0.06(0.07+0.05) +0.0% 0.06(0.09+0.03) +0.0% 2000.13: git add . (sparse-v4) 0.06(0.06+0.06) 0.06(0.07+0.04) +0.0% 0.05(0.06+0.05) -16.7% 0.05(0.05+0.07) -16.7% 2000.14: git commit -a -m A (full-v3) 0.48(0.37+0.08) 0.45(0.36+0.08) -6.2% 0.45(0.35+0.09) -6.2% 0.44(0.36+0.07) -8.3% 2000.15: git commit -a -m A (full-v4) 0.45(0.40+0.06) 0.43(0.34+0.07) -4.4% 0.45(0.37+0.06) +0.0% 0.42(0.36+0.05) -6.7% 2000.16: git commit -a -m A (sparse-v3) 0.05(0.05+0.06) 0.05(0.05+0.03) +0.0% 0.05(0.06+0.06) +0.0% 0.05(0.04+0.06) +0.0% 2000.17: git commit -a -m A (sparse-v4) 0.05(0.06+0.03) 0.05(0.06+0.04) +0.0% 0.06(0.07+0.05) +20.0% 0.05(0.04+0.06) +0.0% 2000.18: git checkout -f - (full-v3) 0.55(0.43+0.08) 0.54(0.46+0.05) -1.8% 0.55(0.46+0.07) +0.0% 0.54(0.40+0.10) -1.8% 2000.19: git checkout -f - (full-v4) 0.55(0.41+0.09) 0.50(0.40+0.09) -9.1% 0.51(0.46+0.05) -7.3% 0.51(0.44+0.06) -7.3% 2000.20: git checkout -f - (sparse-v3) 0.06(0.09+0.03) 0.06(0.08+0.03) +0.0% 0.06(0.06+0.05) +0.0% 0.07(0.09+0.03) +16.7% 2000.21: git checkout -f - (sparse-v4) 0.06(0.08+0.04) 0.05(0.07+0.05) -16.7% 0.05(0.07+0.04) -16.7% 0.06(0.09+0.03) +0.0% ``` All of the above were already integrated. ``` 2000.22: git reset (full-v3) 0.41(0.32+0.06) 0.40(0.31+0.06) -2.4% 0.41(0.33+0.05) +0.0% 0.42(0.34+0.04) +2.4% 2000.23: git reset (full-v4) 0.37(0.32+0.05) 0.35(0.30+0.05) -5.4% 0.37(0.30+0.05) +0.0% 0.35(0.31+0.03) -5.4% 2000.24: git reset (sparse-v3) 0.68(0.65+0.05) 0.55(0.52+0.04) -19.1% 0.04(0.05+0.04) -94.1% 0.04(0.05+0.04) -94.1% 2000.25: git reset (sparse-v4) 0.70(0.65+0.05) 0.54(0.50+0.06) -22.9% 0.04(0.07+0.01) -94.3% 0.03(0.05+0.05) -95.7% 2000.26: git reset --hard (full-v3) 0.54(0.43+0.07) 0.53(0.43+0.06) -1.9% 0.55(0.46+0.05) +1.9% 0.55(0.44+0.06) +1.9% 2000.27: git reset --hard (full-v4) 0.50(0.45+0.03) 0.50(0.43+0.05) +0.0% 0.49(0.41+0.06) -2.0% 0.50(0.42+0.05) +0.0% 2000.28: git reset --hard (sparse-v3) 0.83(0.76+0.06) 0.68(0.62+0.05) -18.1% 0.07(0.05+0.02) -91.6% 0.07(0.05+0.02) -91.6% 2000.29: git reset --hard (sparse-v4) 0.80(0.75+0.05) 0.69(0.62+0.06) -13.8% 0.07(0.04+0.02) -91.2% 0.07(0.04+0.03) -91.2% ``` As expected, `git reset [--hard]` improves with the sparse index integration, but remains constant across the full index case. ``` 2000.30: git update-index --add --remove (full-v3) 0.03(0.01+0.01) 0.03(0.02+0.01) +0.0% 0.03(0.02+0.01) +0.0% 0.03(0.01+0.01) +0.0% 2000.31: git update-index --add --remove (full-v4) 0.03(0.02+0.01) 0.03(0.02+0.01) +0.0% 0.03(0.03+0.00) +0.0% 0.03(0.02+0.01) +0.0% 2000.32: git update-index --add --remove (sparse-v3) 0.57(0.54+0.02) 0.43(0.42+0.00) -24.6% 0.44(0.41+0.03) -22.8% 0.44(0.42+0.01) -22.8% 2000.33: git update-index --add --remove (sparse-v4) 0.56(0.52+0.04) 0.43(0.42+0.01) -23.2% 0.44(0.42+0.02) -21.4% 0.42(0.41+0.01) -25.0% ``` These do not change significantly because #423 is not merged. ``` 2000.34: git diff (full-v3) 0.07(0.05+0.03) 0.06(0.05+0.03) -14.3% 0.07(0.05+0.03) +0.0% 0.06(0.05+0.03) -14.3% 2000.35: git diff (full-v4) 0.06(0.05+0.03) 0.06(0.05+0.02) +0.0% 0.06(0.05+0.02) +0.0% 0.06(0.06+0.02) +0.0% 2000.36: git diff (sparse-v3) 0.25(0.23+0.03) 0.17(0.17+0.02) -32.0% 0.18(0.18+0.02) -28.0% 0.01(0.03+0.03) -96.0% 2000.37: git diff (sparse-v4) 0.25(0.22+0.05) 0.16(0.16+0.01) -36.0% 0.18(0.15+0.04) -28.0% 0.01(0.04+0.02) -96.0% 2000.38: git diff --staged (full-v3) 0.03(0.01+0.01) 0.03(0.02+0.01) +0.0% 0.03(0.02+0.01) +0.0% 0.03(0.02+0.00) +0.0% 2000.39: git diff --staged (full-v4) 0.04(0.03+0.01) 0.03(0.02+0.01) -25.0% 0.03(0.03+0.00) -25.0% 0.03(0.03+0.00) -25.0% 2000.40: git diff --staged (sparse-v3) 0.21(0.19+0.01) 0.15(0.13+0.01) -28.6% 0.15(0.14+0.01) -28.6% 0.01(0.01+0.00) -95.2% 2000.41: git diff --staged (sparse-v4) 0.22(0.21+0.01) 0.14(0.11+0.03) -36.4% 0.15(0.13+0.02) -31.8% 0.01(0.01+0.00) -95.5% ``` The `git diff` improvements are measurable. ``` 2000.42: git sparse-checkout reapply (full-v3) 0.63(0.54+0.05) 0.56(0.48+0.04) -11.1% 0.57(0.48+0.03) -9.5% 0.59(0.48+0.05) -6.3% 2000.43: git sparse-checkout reapply (full-v4) 0.60(0.54+0.02) 0.51(0.46+0.03) -15.0% 0.54(0.48+0.02) -10.0% 0.50(0.44+0.04) -16.7% 2000.44: git sparse-checkout reapply (sparse-v3) 0.91(0.86+0.05) 0.05(0.05+0.00) -94.5% 0.06(0.05+0.01) -93.4% 0.06(0.06+0.00) -93.4% 2000.45: git sparse-checkout reapply (sparse-v4) 0.92(0.88+0.04) 0.05(0.05+0.00) -94.6% 0.05(0.05+0.01) -94.6% 0.05(0.04+0.01) -94.6% ``` Finally, the `git sparse-checkout` measurements are also present. This test script is particularly valuable when contributing changes upstream. It can be good to start by adding the lines to the performance test in an early commit, then demonstrating the performance change by copying the necessary lines from the output table into your commit message.
dscho
pushed a commit
that referenced
this pull request
Jun 17, 2022
…ckout` builtin This integrates the `sparse-checkout` builtin with the sparse index. The tricky part here is that we need to partially expand the index when we are modifying the sparse-checkout definition. Note that we modify the pattern list in a careful way: we create a `struct pattern_list` in-memory in `builtin/sparse-checkout.c` then apply those patterns to the index before writing the patterns to the sparse-checkout file. The `update_sparsity()` method does the work to assign the `SKIP_WORKTREE` bit appropriately, but this doesn't work if the files that are within the new sparse-checkout cone are still hidden behind a sparse directory. The new `expand_to_pattern_list()` method does the hard work of expanding the sparse directories that are now within the new patterns. This expands only as far as needed, possibly creating new sparse directory entries. This method does not contract existing files to sparse directories, and a big reason why is because of the check for ignored files as we delete those directories. The `clean_tracked_sparse_directories()` method is called after `update_sparsity()`, but we need to read the `A/B/.gitignore` file (or lack thereof) before we can delete `A/B/`. If we convert to sparse too quickly, then we lose this information and cause a full expansion. Most of the correctness is handled by existing tests in `t1092`, but I add checks for `ensure_not_expanded` in some hopefully interesting cases. As for performance, `git sparse-checkout set` can be slow if it needs to move a lot of files. However, no-op `git sparse-checkout set` (i.e. set the sparse-checkout cone to only include files at root, and do this on repeat) has these performance results on Linux in a monorepo with 2+ million files at `HEAD`: ``` Benchmark #1: baseline Time (mean ± σ): 10.465 s ± 0.018 s [User: 9.885 s, System: 0.573 s] Range (min … max): 10.450 s … 10.497 s 5 runs Benchmark #2: new code Time (mean ± σ): 68.9 ms ± 2.9 ms [User: 45.8 ms, System: 17.1 ms] Range (min … max): 63.4 ms … 74.0 ms 41 runs Summary 'new code' ran 151.89 ± 6.30 times faster than 'baseline' ```
dscho
pushed a commit
that referenced
this pull request
Jun 17, 2022
One thing I forgot when talking about the sparse index is that we have a performance test: `t/perf/p2000-sparse-operations.sh`. This test wasn't helpful for commands like `git merge` that need a particular set of input, but work for more read-only operations. Here is a quick demonstration of how this performance test works so we could have a definitive measure of how your previous updates improved performance. To get these results, I ran the following command in `t/perf`: ``` ./run 4bcd533 f9255a5 f28fc01 b713582 -- p2000-sparse-operations.sh ``` The short-shas correspond to the merge commits for these PRs: * #410 * #421 * #417 * #419 The test takes a copy of the Git repository and creates several copies within a nested directory heirarchy. ``` Test 4bcd533 f9255a5 f28fc01 b713582 ------------------------------------------------------------------------------------------------------------------------------------------------- 2000.2: git status (full-v3) 0.19(0.15+0.05) 0.19(0.16+0.05) +0.0% 0.20(0.18+0.03) +5.3% 0.19(0.17+0.04) +0.0% 2000.3: git status (full-v4) 0.20(0.18+0.04) 0.19(0.15+0.06) -5.0% 0.21(0.18+0.05) +5.0% 0.18(0.18+0.02) -10.0% 2000.4: git status (sparse-v3) 0.04(0.04+0.04) 0.05(0.07+0.04) +25.0% 0.04(0.04+0.05) +0.0% 0.04(0.06+0.04) +0.0% 2000.5: git status (sparse-v4) 0.04(0.03+0.06) 0.04(0.05+0.05) +0.0% 0.05(0.05+0.04) +25.0% 0.05(0.06+0.04) +25.0% 2000.6: git add -A (full-v3) 0.36(0.29+0.05) 0.38(0.28+0.07) +5.6% 0.36(0.31+0.05) +0.0% 0.37(0.31+0.05) +2.8% 2000.7: git add -A (full-v4) 0.34(0.27+0.06) 0.34(0.29+0.05) +0.0% 0.34(0.29+0.04) +0.0% 0.35(0.28+0.06) +2.9% 2000.8: git add -A (sparse-v3) 0.06(0.07+0.04) 0.06(0.05+0.06) +0.0% 0.06(0.09+0.01) +0.0% 0.06(0.08+0.03) +0.0% 2000.9: git add -A (sparse-v4) 0.05(0.05+0.04) 0.05(0.05+0.07) +0.0% 0.05(0.04+0.06) +0.0% 0.06(0.06+0.05) +20.0% 2000.10: git add . (full-v3) 0.38(0.31+0.05) 0.37(0.29+0.06) -2.6% 0.37(0.30+0.07) -2.6% 0.37(0.29+0.06) -2.6% 2000.11: git add . (full-v4) 0.35(0.31+0.04) 0.35(0.29+0.07) +0.0% 0.35(0.29+0.05) +0.0% 0.34(0.29+0.06) -2.9% 2000.12: git add . (sparse-v3) 0.06(0.06+0.05) 0.06(0.05+0.06) +0.0% 0.06(0.07+0.05) +0.0% 0.06(0.09+0.03) +0.0% 2000.13: git add . (sparse-v4) 0.06(0.06+0.06) 0.06(0.07+0.04) +0.0% 0.05(0.06+0.05) -16.7% 0.05(0.05+0.07) -16.7% 2000.14: git commit -a -m A (full-v3) 0.48(0.37+0.08) 0.45(0.36+0.08) -6.2% 0.45(0.35+0.09) -6.2% 0.44(0.36+0.07) -8.3% 2000.15: git commit -a -m A (full-v4) 0.45(0.40+0.06) 0.43(0.34+0.07) -4.4% 0.45(0.37+0.06) +0.0% 0.42(0.36+0.05) -6.7% 2000.16: git commit -a -m A (sparse-v3) 0.05(0.05+0.06) 0.05(0.05+0.03) +0.0% 0.05(0.06+0.06) +0.0% 0.05(0.04+0.06) +0.0% 2000.17: git commit -a -m A (sparse-v4) 0.05(0.06+0.03) 0.05(0.06+0.04) +0.0% 0.06(0.07+0.05) +20.0% 0.05(0.04+0.06) +0.0% 2000.18: git checkout -f - (full-v3) 0.55(0.43+0.08) 0.54(0.46+0.05) -1.8% 0.55(0.46+0.07) +0.0% 0.54(0.40+0.10) -1.8% 2000.19: git checkout -f - (full-v4) 0.55(0.41+0.09) 0.50(0.40+0.09) -9.1% 0.51(0.46+0.05) -7.3% 0.51(0.44+0.06) -7.3% 2000.20: git checkout -f - (sparse-v3) 0.06(0.09+0.03) 0.06(0.08+0.03) +0.0% 0.06(0.06+0.05) +0.0% 0.07(0.09+0.03) +16.7% 2000.21: git checkout -f - (sparse-v4) 0.06(0.08+0.04) 0.05(0.07+0.05) -16.7% 0.05(0.07+0.04) -16.7% 0.06(0.09+0.03) +0.0% ``` All of the above were already integrated. ``` 2000.22: git reset (full-v3) 0.41(0.32+0.06) 0.40(0.31+0.06) -2.4% 0.41(0.33+0.05) +0.0% 0.42(0.34+0.04) +2.4% 2000.23: git reset (full-v4) 0.37(0.32+0.05) 0.35(0.30+0.05) -5.4% 0.37(0.30+0.05) +0.0% 0.35(0.31+0.03) -5.4% 2000.24: git reset (sparse-v3) 0.68(0.65+0.05) 0.55(0.52+0.04) -19.1% 0.04(0.05+0.04) -94.1% 0.04(0.05+0.04) -94.1% 2000.25: git reset (sparse-v4) 0.70(0.65+0.05) 0.54(0.50+0.06) -22.9% 0.04(0.07+0.01) -94.3% 0.03(0.05+0.05) -95.7% 2000.26: git reset --hard (full-v3) 0.54(0.43+0.07) 0.53(0.43+0.06) -1.9% 0.55(0.46+0.05) +1.9% 0.55(0.44+0.06) +1.9% 2000.27: git reset --hard (full-v4) 0.50(0.45+0.03) 0.50(0.43+0.05) +0.0% 0.49(0.41+0.06) -2.0% 0.50(0.42+0.05) +0.0% 2000.28: git reset --hard (sparse-v3) 0.83(0.76+0.06) 0.68(0.62+0.05) -18.1% 0.07(0.05+0.02) -91.6% 0.07(0.05+0.02) -91.6% 2000.29: git reset --hard (sparse-v4) 0.80(0.75+0.05) 0.69(0.62+0.06) -13.8% 0.07(0.04+0.02) -91.2% 0.07(0.04+0.03) -91.2% ``` As expected, `git reset [--hard]` improves with the sparse index integration, but remains constant across the full index case. ``` 2000.30: git update-index --add --remove (full-v3) 0.03(0.01+0.01) 0.03(0.02+0.01) +0.0% 0.03(0.02+0.01) +0.0% 0.03(0.01+0.01) +0.0% 2000.31: git update-index --add --remove (full-v4) 0.03(0.02+0.01) 0.03(0.02+0.01) +0.0% 0.03(0.03+0.00) +0.0% 0.03(0.02+0.01) +0.0% 2000.32: git update-index --add --remove (sparse-v3) 0.57(0.54+0.02) 0.43(0.42+0.00) -24.6% 0.44(0.41+0.03) -22.8% 0.44(0.42+0.01) -22.8% 2000.33: git update-index --add --remove (sparse-v4) 0.56(0.52+0.04) 0.43(0.42+0.01) -23.2% 0.44(0.42+0.02) -21.4% 0.42(0.41+0.01) -25.0% ``` These do not change significantly because #423 is not merged. ``` 2000.34: git diff (full-v3) 0.07(0.05+0.03) 0.06(0.05+0.03) -14.3% 0.07(0.05+0.03) +0.0% 0.06(0.05+0.03) -14.3% 2000.35: git diff (full-v4) 0.06(0.05+0.03) 0.06(0.05+0.02) +0.0% 0.06(0.05+0.02) +0.0% 0.06(0.06+0.02) +0.0% 2000.36: git diff (sparse-v3) 0.25(0.23+0.03) 0.17(0.17+0.02) -32.0% 0.18(0.18+0.02) -28.0% 0.01(0.03+0.03) -96.0% 2000.37: git diff (sparse-v4) 0.25(0.22+0.05) 0.16(0.16+0.01) -36.0% 0.18(0.15+0.04) -28.0% 0.01(0.04+0.02) -96.0% 2000.38: git diff --staged (full-v3) 0.03(0.01+0.01) 0.03(0.02+0.01) +0.0% 0.03(0.02+0.01) +0.0% 0.03(0.02+0.00) +0.0% 2000.39: git diff --staged (full-v4) 0.04(0.03+0.01) 0.03(0.02+0.01) -25.0% 0.03(0.03+0.00) -25.0% 0.03(0.03+0.00) -25.0% 2000.40: git diff --staged (sparse-v3) 0.21(0.19+0.01) 0.15(0.13+0.01) -28.6% 0.15(0.14+0.01) -28.6% 0.01(0.01+0.00) -95.2% 2000.41: git diff --staged (sparse-v4) 0.22(0.21+0.01) 0.14(0.11+0.03) -36.4% 0.15(0.13+0.02) -31.8% 0.01(0.01+0.00) -95.5% ``` The `git diff` improvements are measurable. ``` 2000.42: git sparse-checkout reapply (full-v3) 0.63(0.54+0.05) 0.56(0.48+0.04) -11.1% 0.57(0.48+0.03) -9.5% 0.59(0.48+0.05) -6.3% 2000.43: git sparse-checkout reapply (full-v4) 0.60(0.54+0.02) 0.51(0.46+0.03) -15.0% 0.54(0.48+0.02) -10.0% 0.50(0.44+0.04) -16.7% 2000.44: git sparse-checkout reapply (sparse-v3) 0.91(0.86+0.05) 0.05(0.05+0.00) -94.5% 0.06(0.05+0.01) -93.4% 0.06(0.06+0.00) -93.4% 2000.45: git sparse-checkout reapply (sparse-v4) 0.92(0.88+0.04) 0.05(0.05+0.00) -94.6% 0.05(0.05+0.01) -94.6% 0.05(0.04+0.01) -94.6% ``` Finally, the `git sparse-checkout` measurements are also present. This test script is particularly valuable when contributing changes upstream. It can be good to start by adding the lines to the performance test in an early commit, then demonstrating the performance change by copying the necessary lines from the output table into your commit message.
dscho
pushed a commit
that referenced
this pull request
Jun 18, 2022
…ckout` builtin This integrates the `sparse-checkout` builtin with the sparse index. The tricky part here is that we need to partially expand the index when we are modifying the sparse-checkout definition. Note that we modify the pattern list in a careful way: we create a `struct pattern_list` in-memory in `builtin/sparse-checkout.c` then apply those patterns to the index before writing the patterns to the sparse-checkout file. The `update_sparsity()` method does the work to assign the `SKIP_WORKTREE` bit appropriately, but this doesn't work if the files that are within the new sparse-checkout cone are still hidden behind a sparse directory. The new `expand_to_pattern_list()` method does the hard work of expanding the sparse directories that are now within the new patterns. This expands only as far as needed, possibly creating new sparse directory entries. This method does not contract existing files to sparse directories, and a big reason why is because of the check for ignored files as we delete those directories. The `clean_tracked_sparse_directories()` method is called after `update_sparsity()`, but we need to read the `A/B/.gitignore` file (or lack thereof) before we can delete `A/B/`. If we convert to sparse too quickly, then we lose this information and cause a full expansion. Most of the correctness is handled by existing tests in `t1092`, but I add checks for `ensure_not_expanded` in some hopefully interesting cases. As for performance, `git sparse-checkout set` can be slow if it needs to move a lot of files. However, no-op `git sparse-checkout set` (i.e. set the sparse-checkout cone to only include files at root, and do this on repeat) has these performance results on Linux in a monorepo with 2+ million files at `HEAD`: ``` Benchmark #1: baseline Time (mean ± σ): 10.465 s ± 0.018 s [User: 9.885 s, System: 0.573 s] Range (min … max): 10.450 s … 10.497 s 5 runs Benchmark #2: new code Time (mean ± σ): 68.9 ms ± 2.9 ms [User: 45.8 ms, System: 17.1 ms] Range (min … max): 63.4 ms … 74.0 ms 41 runs Summary 'new code' ran 151.89 ± 6.30 times faster than 'baseline' ```
dscho
pushed a commit
that referenced
this pull request
Jun 18, 2022
One thing I forgot when talking about the sparse index is that we have a performance test: `t/perf/p2000-sparse-operations.sh`. This test wasn't helpful for commands like `git merge` that need a particular set of input, but work for more read-only operations. Here is a quick demonstration of how this performance test works so we could have a definitive measure of how your previous updates improved performance. To get these results, I ran the following command in `t/perf`: ``` ./run 4bcd533 f9255a5 f28fc01 b713582 -- p2000-sparse-operations.sh ``` The short-shas correspond to the merge commits for these PRs: * #410 * #421 * #417 * #419 The test takes a copy of the Git repository and creates several copies within a nested directory heirarchy. ``` Test 4bcd533 f9255a5 f28fc01 b713582 ------------------------------------------------------------------------------------------------------------------------------------------------- 2000.2: git status (full-v3) 0.19(0.15+0.05) 0.19(0.16+0.05) +0.0% 0.20(0.18+0.03) +5.3% 0.19(0.17+0.04) +0.0% 2000.3: git status (full-v4) 0.20(0.18+0.04) 0.19(0.15+0.06) -5.0% 0.21(0.18+0.05) +5.0% 0.18(0.18+0.02) -10.0% 2000.4: git status (sparse-v3) 0.04(0.04+0.04) 0.05(0.07+0.04) +25.0% 0.04(0.04+0.05) +0.0% 0.04(0.06+0.04) +0.0% 2000.5: git status (sparse-v4) 0.04(0.03+0.06) 0.04(0.05+0.05) +0.0% 0.05(0.05+0.04) +25.0% 0.05(0.06+0.04) +25.0% 2000.6: git add -A (full-v3) 0.36(0.29+0.05) 0.38(0.28+0.07) +5.6% 0.36(0.31+0.05) +0.0% 0.37(0.31+0.05) +2.8% 2000.7: git add -A (full-v4) 0.34(0.27+0.06) 0.34(0.29+0.05) +0.0% 0.34(0.29+0.04) +0.0% 0.35(0.28+0.06) +2.9% 2000.8: git add -A (sparse-v3) 0.06(0.07+0.04) 0.06(0.05+0.06) +0.0% 0.06(0.09+0.01) +0.0% 0.06(0.08+0.03) +0.0% 2000.9: git add -A (sparse-v4) 0.05(0.05+0.04) 0.05(0.05+0.07) +0.0% 0.05(0.04+0.06) +0.0% 0.06(0.06+0.05) +20.0% 2000.10: git add . (full-v3) 0.38(0.31+0.05) 0.37(0.29+0.06) -2.6% 0.37(0.30+0.07) -2.6% 0.37(0.29+0.06) -2.6% 2000.11: git add . (full-v4) 0.35(0.31+0.04) 0.35(0.29+0.07) +0.0% 0.35(0.29+0.05) +0.0% 0.34(0.29+0.06) -2.9% 2000.12: git add . (sparse-v3) 0.06(0.06+0.05) 0.06(0.05+0.06) +0.0% 0.06(0.07+0.05) +0.0% 0.06(0.09+0.03) +0.0% 2000.13: git add . (sparse-v4) 0.06(0.06+0.06) 0.06(0.07+0.04) +0.0% 0.05(0.06+0.05) -16.7% 0.05(0.05+0.07) -16.7% 2000.14: git commit -a -m A (full-v3) 0.48(0.37+0.08) 0.45(0.36+0.08) -6.2% 0.45(0.35+0.09) -6.2% 0.44(0.36+0.07) -8.3% 2000.15: git commit -a -m A (full-v4) 0.45(0.40+0.06) 0.43(0.34+0.07) -4.4% 0.45(0.37+0.06) +0.0% 0.42(0.36+0.05) -6.7% 2000.16: git commit -a -m A (sparse-v3) 0.05(0.05+0.06) 0.05(0.05+0.03) +0.0% 0.05(0.06+0.06) +0.0% 0.05(0.04+0.06) +0.0% 2000.17: git commit -a -m A (sparse-v4) 0.05(0.06+0.03) 0.05(0.06+0.04) +0.0% 0.06(0.07+0.05) +20.0% 0.05(0.04+0.06) +0.0% 2000.18: git checkout -f - (full-v3) 0.55(0.43+0.08) 0.54(0.46+0.05) -1.8% 0.55(0.46+0.07) +0.0% 0.54(0.40+0.10) -1.8% 2000.19: git checkout -f - (full-v4) 0.55(0.41+0.09) 0.50(0.40+0.09) -9.1% 0.51(0.46+0.05) -7.3% 0.51(0.44+0.06) -7.3% 2000.20: git checkout -f - (sparse-v3) 0.06(0.09+0.03) 0.06(0.08+0.03) +0.0% 0.06(0.06+0.05) +0.0% 0.07(0.09+0.03) +16.7% 2000.21: git checkout -f - (sparse-v4) 0.06(0.08+0.04) 0.05(0.07+0.05) -16.7% 0.05(0.07+0.04) -16.7% 0.06(0.09+0.03) +0.0% ``` All of the above were already integrated. ``` 2000.22: git reset (full-v3) 0.41(0.32+0.06) 0.40(0.31+0.06) -2.4% 0.41(0.33+0.05) +0.0% 0.42(0.34+0.04) +2.4% 2000.23: git reset (full-v4) 0.37(0.32+0.05) 0.35(0.30+0.05) -5.4% 0.37(0.30+0.05) +0.0% 0.35(0.31+0.03) -5.4% 2000.24: git reset (sparse-v3) 0.68(0.65+0.05) 0.55(0.52+0.04) -19.1% 0.04(0.05+0.04) -94.1% 0.04(0.05+0.04) -94.1% 2000.25: git reset (sparse-v4) 0.70(0.65+0.05) 0.54(0.50+0.06) -22.9% 0.04(0.07+0.01) -94.3% 0.03(0.05+0.05) -95.7% 2000.26: git reset --hard (full-v3) 0.54(0.43+0.07) 0.53(0.43+0.06) -1.9% 0.55(0.46+0.05) +1.9% 0.55(0.44+0.06) +1.9% 2000.27: git reset --hard (full-v4) 0.50(0.45+0.03) 0.50(0.43+0.05) +0.0% 0.49(0.41+0.06) -2.0% 0.50(0.42+0.05) +0.0% 2000.28: git reset --hard (sparse-v3) 0.83(0.76+0.06) 0.68(0.62+0.05) -18.1% 0.07(0.05+0.02) -91.6% 0.07(0.05+0.02) -91.6% 2000.29: git reset --hard (sparse-v4) 0.80(0.75+0.05) 0.69(0.62+0.06) -13.8% 0.07(0.04+0.02) -91.2% 0.07(0.04+0.03) -91.2% ``` As expected, `git reset [--hard]` improves with the sparse index integration, but remains constant across the full index case. ``` 2000.30: git update-index --add --remove (full-v3) 0.03(0.01+0.01) 0.03(0.02+0.01) +0.0% 0.03(0.02+0.01) +0.0% 0.03(0.01+0.01) +0.0% 2000.31: git update-index --add --remove (full-v4) 0.03(0.02+0.01) 0.03(0.02+0.01) +0.0% 0.03(0.03+0.00) +0.0% 0.03(0.02+0.01) +0.0% 2000.32: git update-index --add --remove (sparse-v3) 0.57(0.54+0.02) 0.43(0.42+0.00) -24.6% 0.44(0.41+0.03) -22.8% 0.44(0.42+0.01) -22.8% 2000.33: git update-index --add --remove (sparse-v4) 0.56(0.52+0.04) 0.43(0.42+0.01) -23.2% 0.44(0.42+0.02) -21.4% 0.42(0.41+0.01) -25.0% ``` These do not change significantly because #423 is not merged. ``` 2000.34: git diff (full-v3) 0.07(0.05+0.03) 0.06(0.05+0.03) -14.3% 0.07(0.05+0.03) +0.0% 0.06(0.05+0.03) -14.3% 2000.35: git diff (full-v4) 0.06(0.05+0.03) 0.06(0.05+0.02) +0.0% 0.06(0.05+0.02) +0.0% 0.06(0.06+0.02) +0.0% 2000.36: git diff (sparse-v3) 0.25(0.23+0.03) 0.17(0.17+0.02) -32.0% 0.18(0.18+0.02) -28.0% 0.01(0.03+0.03) -96.0% 2000.37: git diff (sparse-v4) 0.25(0.22+0.05) 0.16(0.16+0.01) -36.0% 0.18(0.15+0.04) -28.0% 0.01(0.04+0.02) -96.0% 2000.38: git diff --staged (full-v3) 0.03(0.01+0.01) 0.03(0.02+0.01) +0.0% 0.03(0.02+0.01) +0.0% 0.03(0.02+0.00) +0.0% 2000.39: git diff --staged (full-v4) 0.04(0.03+0.01) 0.03(0.02+0.01) -25.0% 0.03(0.03+0.00) -25.0% 0.03(0.03+0.00) -25.0% 2000.40: git diff --staged (sparse-v3) 0.21(0.19+0.01) 0.15(0.13+0.01) -28.6% 0.15(0.14+0.01) -28.6% 0.01(0.01+0.00) -95.2% 2000.41: git diff --staged (sparse-v4) 0.22(0.21+0.01) 0.14(0.11+0.03) -36.4% 0.15(0.13+0.02) -31.8% 0.01(0.01+0.00) -95.5% ``` The `git diff` improvements are measurable. ``` 2000.42: git sparse-checkout reapply (full-v3) 0.63(0.54+0.05) 0.56(0.48+0.04) -11.1% 0.57(0.48+0.03) -9.5% 0.59(0.48+0.05) -6.3% 2000.43: git sparse-checkout reapply (full-v4) 0.60(0.54+0.02) 0.51(0.46+0.03) -15.0% 0.54(0.48+0.02) -10.0% 0.50(0.44+0.04) -16.7% 2000.44: git sparse-checkout reapply (sparse-v3) 0.91(0.86+0.05) 0.05(0.05+0.00) -94.5% 0.06(0.05+0.01) -93.4% 0.06(0.06+0.00) -93.4% 2000.45: git sparse-checkout reapply (sparse-v4) 0.92(0.88+0.04) 0.05(0.05+0.00) -94.6% 0.05(0.05+0.01) -94.6% 0.05(0.04+0.01) -94.6% ``` Finally, the `git sparse-checkout` measurements are also present. This test script is particularly valuable when contributing changes upstream. It can be good to start by adding the lines to the performance test in an early commit, then demonstrating the performance change by copying the necessary lines from the output table into your commit message.
dscho
pushed a commit
that referenced
this pull request
Jun 22, 2022
…ckout` builtin This integrates the `sparse-checkout` builtin with the sparse index. The tricky part here is that we need to partially expand the index when we are modifying the sparse-checkout definition. Note that we modify the pattern list in a careful way: we create a `struct pattern_list` in-memory in `builtin/sparse-checkout.c` then apply those patterns to the index before writing the patterns to the sparse-checkout file. The `update_sparsity()` method does the work to assign the `SKIP_WORKTREE` bit appropriately, but this doesn't work if the files that are within the new sparse-checkout cone are still hidden behind a sparse directory. The new `expand_to_pattern_list()` method does the hard work of expanding the sparse directories that are now within the new patterns. This expands only as far as needed, possibly creating new sparse directory entries. This method does not contract existing files to sparse directories, and a big reason why is because of the check for ignored files as we delete those directories. The `clean_tracked_sparse_directories()` method is called after `update_sparsity()`, but we need to read the `A/B/.gitignore` file (or lack thereof) before we can delete `A/B/`. If we convert to sparse too quickly, then we lose this information and cause a full expansion. Most of the correctness is handled by existing tests in `t1092`, but I add checks for `ensure_not_expanded` in some hopefully interesting cases. As for performance, `git sparse-checkout set` can be slow if it needs to move a lot of files. However, no-op `git sparse-checkout set` (i.e. set the sparse-checkout cone to only include files at root, and do this on repeat) has these performance results on Linux in a monorepo with 2+ million files at `HEAD`: ``` Benchmark #1: baseline Time (mean ± σ): 10.465 s ± 0.018 s [User: 9.885 s, System: 0.573 s] Range (min … max): 10.450 s … 10.497 s 5 runs Benchmark #2: new code Time (mean ± σ): 68.9 ms ± 2.9 ms [User: 45.8 ms, System: 17.1 ms] Range (min … max): 63.4 ms … 74.0 ms 41 runs Summary 'new code' ran 151.89 ± 6.30 times faster than 'baseline' ```
dscho
pushed a commit
that referenced
this pull request
Jun 22, 2022
One thing I forgot when talking about the sparse index is that we have a performance test: `t/perf/p2000-sparse-operations.sh`. This test wasn't helpful for commands like `git merge` that need a particular set of input, but work for more read-only operations. Here is a quick demonstration of how this performance test works so we could have a definitive measure of how your previous updates improved performance. To get these results, I ran the following command in `t/perf`: ``` ./run 4bcd533 f9255a5 f28fc01 b713582 -- p2000-sparse-operations.sh ``` The short-shas correspond to the merge commits for these PRs: * #410 * #421 * #417 * #419 The test takes a copy of the Git repository and creates several copies within a nested directory heirarchy. ``` Test 4bcd533 f9255a5 f28fc01 b713582 ------------------------------------------------------------------------------------------------------------------------------------------------- 2000.2: git status (full-v3) 0.19(0.15+0.05) 0.19(0.16+0.05) +0.0% 0.20(0.18+0.03) +5.3% 0.19(0.17+0.04) +0.0% 2000.3: git status (full-v4) 0.20(0.18+0.04) 0.19(0.15+0.06) -5.0% 0.21(0.18+0.05) +5.0% 0.18(0.18+0.02) -10.0% 2000.4: git status (sparse-v3) 0.04(0.04+0.04) 0.05(0.07+0.04) +25.0% 0.04(0.04+0.05) +0.0% 0.04(0.06+0.04) +0.0% 2000.5: git status (sparse-v4) 0.04(0.03+0.06) 0.04(0.05+0.05) +0.0% 0.05(0.05+0.04) +25.0% 0.05(0.06+0.04) +25.0% 2000.6: git add -A (full-v3) 0.36(0.29+0.05) 0.38(0.28+0.07) +5.6% 0.36(0.31+0.05) +0.0% 0.37(0.31+0.05) +2.8% 2000.7: git add -A (full-v4) 0.34(0.27+0.06) 0.34(0.29+0.05) +0.0% 0.34(0.29+0.04) +0.0% 0.35(0.28+0.06) +2.9% 2000.8: git add -A (sparse-v3) 0.06(0.07+0.04) 0.06(0.05+0.06) +0.0% 0.06(0.09+0.01) +0.0% 0.06(0.08+0.03) +0.0% 2000.9: git add -A (sparse-v4) 0.05(0.05+0.04) 0.05(0.05+0.07) +0.0% 0.05(0.04+0.06) +0.0% 0.06(0.06+0.05) +20.0% 2000.10: git add . (full-v3) 0.38(0.31+0.05) 0.37(0.29+0.06) -2.6% 0.37(0.30+0.07) -2.6% 0.37(0.29+0.06) -2.6% 2000.11: git add . (full-v4) 0.35(0.31+0.04) 0.35(0.29+0.07) +0.0% 0.35(0.29+0.05) +0.0% 0.34(0.29+0.06) -2.9% 2000.12: git add . (sparse-v3) 0.06(0.06+0.05) 0.06(0.05+0.06) +0.0% 0.06(0.07+0.05) +0.0% 0.06(0.09+0.03) +0.0% 2000.13: git add . (sparse-v4) 0.06(0.06+0.06) 0.06(0.07+0.04) +0.0% 0.05(0.06+0.05) -16.7% 0.05(0.05+0.07) -16.7% 2000.14: git commit -a -m A (full-v3) 0.48(0.37+0.08) 0.45(0.36+0.08) -6.2% 0.45(0.35+0.09) -6.2% 0.44(0.36+0.07) -8.3% 2000.15: git commit -a -m A (full-v4) 0.45(0.40+0.06) 0.43(0.34+0.07) -4.4% 0.45(0.37+0.06) +0.0% 0.42(0.36+0.05) -6.7% 2000.16: git commit -a -m A (sparse-v3) 0.05(0.05+0.06) 0.05(0.05+0.03) +0.0% 0.05(0.06+0.06) +0.0% 0.05(0.04+0.06) +0.0% 2000.17: git commit -a -m A (sparse-v4) 0.05(0.06+0.03) 0.05(0.06+0.04) +0.0% 0.06(0.07+0.05) +20.0% 0.05(0.04+0.06) +0.0% 2000.18: git checkout -f - (full-v3) 0.55(0.43+0.08) 0.54(0.46+0.05) -1.8% 0.55(0.46+0.07) +0.0% 0.54(0.40+0.10) -1.8% 2000.19: git checkout -f - (full-v4) 0.55(0.41+0.09) 0.50(0.40+0.09) -9.1% 0.51(0.46+0.05) -7.3% 0.51(0.44+0.06) -7.3% 2000.20: git checkout -f - (sparse-v3) 0.06(0.09+0.03) 0.06(0.08+0.03) +0.0% 0.06(0.06+0.05) +0.0% 0.07(0.09+0.03) +16.7% 2000.21: git checkout -f - (sparse-v4) 0.06(0.08+0.04) 0.05(0.07+0.05) -16.7% 0.05(0.07+0.04) -16.7% 0.06(0.09+0.03) +0.0% ``` All of the above were already integrated. ``` 2000.22: git reset (full-v3) 0.41(0.32+0.06) 0.40(0.31+0.06) -2.4% 0.41(0.33+0.05) +0.0% 0.42(0.34+0.04) +2.4% 2000.23: git reset (full-v4) 0.37(0.32+0.05) 0.35(0.30+0.05) -5.4% 0.37(0.30+0.05) +0.0% 0.35(0.31+0.03) -5.4% 2000.24: git reset (sparse-v3) 0.68(0.65+0.05) 0.55(0.52+0.04) -19.1% 0.04(0.05+0.04) -94.1% 0.04(0.05+0.04) -94.1% 2000.25: git reset (sparse-v4) 0.70(0.65+0.05) 0.54(0.50+0.06) -22.9% 0.04(0.07+0.01) -94.3% 0.03(0.05+0.05) -95.7% 2000.26: git reset --hard (full-v3) 0.54(0.43+0.07) 0.53(0.43+0.06) -1.9% 0.55(0.46+0.05) +1.9% 0.55(0.44+0.06) +1.9% 2000.27: git reset --hard (full-v4) 0.50(0.45+0.03) 0.50(0.43+0.05) +0.0% 0.49(0.41+0.06) -2.0% 0.50(0.42+0.05) +0.0% 2000.28: git reset --hard (sparse-v3) 0.83(0.76+0.06) 0.68(0.62+0.05) -18.1% 0.07(0.05+0.02) -91.6% 0.07(0.05+0.02) -91.6% 2000.29: git reset --hard (sparse-v4) 0.80(0.75+0.05) 0.69(0.62+0.06) -13.8% 0.07(0.04+0.02) -91.2% 0.07(0.04+0.03) -91.2% ``` As expected, `git reset [--hard]` improves with the sparse index integration, but remains constant across the full index case. ``` 2000.30: git update-index --add --remove (full-v3) 0.03(0.01+0.01) 0.03(0.02+0.01) +0.0% 0.03(0.02+0.01) +0.0% 0.03(0.01+0.01) +0.0% 2000.31: git update-index --add --remove (full-v4) 0.03(0.02+0.01) 0.03(0.02+0.01) +0.0% 0.03(0.03+0.00) +0.0% 0.03(0.02+0.01) +0.0% 2000.32: git update-index --add --remove (sparse-v3) 0.57(0.54+0.02) 0.43(0.42+0.00) -24.6% 0.44(0.41+0.03) -22.8% 0.44(0.42+0.01) -22.8% 2000.33: git update-index --add --remove (sparse-v4) 0.56(0.52+0.04) 0.43(0.42+0.01) -23.2% 0.44(0.42+0.02) -21.4% 0.42(0.41+0.01) -25.0% ``` These do not change significantly because #423 is not merged. ``` 2000.34: git diff (full-v3) 0.07(0.05+0.03) 0.06(0.05+0.03) -14.3% 0.07(0.05+0.03) +0.0% 0.06(0.05+0.03) -14.3% 2000.35: git diff (full-v4) 0.06(0.05+0.03) 0.06(0.05+0.02) +0.0% 0.06(0.05+0.02) +0.0% 0.06(0.06+0.02) +0.0% 2000.36: git diff (sparse-v3) 0.25(0.23+0.03) 0.17(0.17+0.02) -32.0% 0.18(0.18+0.02) -28.0% 0.01(0.03+0.03) -96.0% 2000.37: git diff (sparse-v4) 0.25(0.22+0.05) 0.16(0.16+0.01) -36.0% 0.18(0.15+0.04) -28.0% 0.01(0.04+0.02) -96.0% 2000.38: git diff --staged (full-v3) 0.03(0.01+0.01) 0.03(0.02+0.01) +0.0% 0.03(0.02+0.01) +0.0% 0.03(0.02+0.00) +0.0% 2000.39: git diff --staged (full-v4) 0.04(0.03+0.01) 0.03(0.02+0.01) -25.0% 0.03(0.03+0.00) -25.0% 0.03(0.03+0.00) -25.0% 2000.40: git diff --staged (sparse-v3) 0.21(0.19+0.01) 0.15(0.13+0.01) -28.6% 0.15(0.14+0.01) -28.6% 0.01(0.01+0.00) -95.2% 2000.41: git diff --staged (sparse-v4) 0.22(0.21+0.01) 0.14(0.11+0.03) -36.4% 0.15(0.13+0.02) -31.8% 0.01(0.01+0.00) -95.5% ``` The `git diff` improvements are measurable. ``` 2000.42: git sparse-checkout reapply (full-v3) 0.63(0.54+0.05) 0.56(0.48+0.04) -11.1% 0.57(0.48+0.03) -9.5% 0.59(0.48+0.05) -6.3% 2000.43: git sparse-checkout reapply (full-v4) 0.60(0.54+0.02) 0.51(0.46+0.03) -15.0% 0.54(0.48+0.02) -10.0% 0.50(0.44+0.04) -16.7% 2000.44: git sparse-checkout reapply (sparse-v3) 0.91(0.86+0.05) 0.05(0.05+0.00) -94.5% 0.06(0.05+0.01) -93.4% 0.06(0.06+0.00) -93.4% 2000.45: git sparse-checkout reapply (sparse-v4) 0.92(0.88+0.04) 0.05(0.05+0.00) -94.6% 0.05(0.05+0.01) -94.6% 0.05(0.04+0.01) -94.6% ``` Finally, the `git sparse-checkout` measurements are also present. This test script is particularly valuable when contributing changes upstream. It can be good to start by adding the lines to the performance test in an early commit, then demonstrating the performance change by copying the necessary lines from the output table into your commit message.
dscho
pushed a commit
that referenced
this pull request
Jun 27, 2022
…ckout` builtin This integrates the `sparse-checkout` builtin with the sparse index. The tricky part here is that we need to partially expand the index when we are modifying the sparse-checkout definition. Note that we modify the pattern list in a careful way: we create a `struct pattern_list` in-memory in `builtin/sparse-checkout.c` then apply those patterns to the index before writing the patterns to the sparse-checkout file. The `update_sparsity()` method does the work to assign the `SKIP_WORKTREE` bit appropriately, but this doesn't work if the files that are within the new sparse-checkout cone are still hidden behind a sparse directory. The new `expand_to_pattern_list()` method does the hard work of expanding the sparse directories that are now within the new patterns. This expands only as far as needed, possibly creating new sparse directory entries. This method does not contract existing files to sparse directories, and a big reason why is because of the check for ignored files as we delete those directories. The `clean_tracked_sparse_directories()` method is called after `update_sparsity()`, but we need to read the `A/B/.gitignore` file (or lack thereof) before we can delete `A/B/`. If we convert to sparse too quickly, then we lose this information and cause a full expansion. Most of the correctness is handled by existing tests in `t1092`, but I add checks for `ensure_not_expanded` in some hopefully interesting cases. As for performance, `git sparse-checkout set` can be slow if it needs to move a lot of files. However, no-op `git sparse-checkout set` (i.e. set the sparse-checkout cone to only include files at root, and do this on repeat) has these performance results on Linux in a monorepo with 2+ million files at `HEAD`: ``` Benchmark #1: baseline Time (mean ± σ): 10.465 s ± 0.018 s [User: 9.885 s, System: 0.573 s] Range (min … max): 10.450 s … 10.497 s 5 runs Benchmark #2: new code Time (mean ± σ): 68.9 ms ± 2.9 ms [User: 45.8 ms, System: 17.1 ms] Range (min … max): 63.4 ms … 74.0 ms 41 runs Summary 'new code' ran 151.89 ± 6.30 times faster than 'baseline' ```
dscho
pushed a commit
that referenced
this pull request
Jun 27, 2022
One thing I forgot when talking about the sparse index is that we have a performance test: `t/perf/p2000-sparse-operations.sh`. This test wasn't helpful for commands like `git merge` that need a particular set of input, but work for more read-only operations. Here is a quick demonstration of how this performance test works so we could have a definitive measure of how your previous updates improved performance. To get these results, I ran the following command in `t/perf`: ``` ./run 4bcd533 f9255a5 f28fc01 b713582 -- p2000-sparse-operations.sh ``` The short-shas correspond to the merge commits for these PRs: * #410 * #421 * #417 * #419 The test takes a copy of the Git repository and creates several copies within a nested directory heirarchy. ``` Test 4bcd533 f9255a5 f28fc01 b713582 ------------------------------------------------------------------------------------------------------------------------------------------------- 2000.2: git status (full-v3) 0.19(0.15+0.05) 0.19(0.16+0.05) +0.0% 0.20(0.18+0.03) +5.3% 0.19(0.17+0.04) +0.0% 2000.3: git status (full-v4) 0.20(0.18+0.04) 0.19(0.15+0.06) -5.0% 0.21(0.18+0.05) +5.0% 0.18(0.18+0.02) -10.0% 2000.4: git status (sparse-v3) 0.04(0.04+0.04) 0.05(0.07+0.04) +25.0% 0.04(0.04+0.05) +0.0% 0.04(0.06+0.04) +0.0% 2000.5: git status (sparse-v4) 0.04(0.03+0.06) 0.04(0.05+0.05) +0.0% 0.05(0.05+0.04) +25.0% 0.05(0.06+0.04) +25.0% 2000.6: git add -A (full-v3) 0.36(0.29+0.05) 0.38(0.28+0.07) +5.6% 0.36(0.31+0.05) +0.0% 0.37(0.31+0.05) +2.8% 2000.7: git add -A (full-v4) 0.34(0.27+0.06) 0.34(0.29+0.05) +0.0% 0.34(0.29+0.04) +0.0% 0.35(0.28+0.06) +2.9% 2000.8: git add -A (sparse-v3) 0.06(0.07+0.04) 0.06(0.05+0.06) +0.0% 0.06(0.09+0.01) +0.0% 0.06(0.08+0.03) +0.0% 2000.9: git add -A (sparse-v4) 0.05(0.05+0.04) 0.05(0.05+0.07) +0.0% 0.05(0.04+0.06) +0.0% 0.06(0.06+0.05) +20.0% 2000.10: git add . (full-v3) 0.38(0.31+0.05) 0.37(0.29+0.06) -2.6% 0.37(0.30+0.07) -2.6% 0.37(0.29+0.06) -2.6% 2000.11: git add . (full-v4) 0.35(0.31+0.04) 0.35(0.29+0.07) +0.0% 0.35(0.29+0.05) +0.0% 0.34(0.29+0.06) -2.9% 2000.12: git add . (sparse-v3) 0.06(0.06+0.05) 0.06(0.05+0.06) +0.0% 0.06(0.07+0.05) +0.0% 0.06(0.09+0.03) +0.0% 2000.13: git add . (sparse-v4) 0.06(0.06+0.06) 0.06(0.07+0.04) +0.0% 0.05(0.06+0.05) -16.7% 0.05(0.05+0.07) -16.7% 2000.14: git commit -a -m A (full-v3) 0.48(0.37+0.08) 0.45(0.36+0.08) -6.2% 0.45(0.35+0.09) -6.2% 0.44(0.36+0.07) -8.3% 2000.15: git commit -a -m A (full-v4) 0.45(0.40+0.06) 0.43(0.34+0.07) -4.4% 0.45(0.37+0.06) +0.0% 0.42(0.36+0.05) -6.7% 2000.16: git commit -a -m A (sparse-v3) 0.05(0.05+0.06) 0.05(0.05+0.03) +0.0% 0.05(0.06+0.06) +0.0% 0.05(0.04+0.06) +0.0% 2000.17: git commit -a -m A (sparse-v4) 0.05(0.06+0.03) 0.05(0.06+0.04) +0.0% 0.06(0.07+0.05) +20.0% 0.05(0.04+0.06) +0.0% 2000.18: git checkout -f - (full-v3) 0.55(0.43+0.08) 0.54(0.46+0.05) -1.8% 0.55(0.46+0.07) +0.0% 0.54(0.40+0.10) -1.8% 2000.19: git checkout -f - (full-v4) 0.55(0.41+0.09) 0.50(0.40+0.09) -9.1% 0.51(0.46+0.05) -7.3% 0.51(0.44+0.06) -7.3% 2000.20: git checkout -f - (sparse-v3) 0.06(0.09+0.03) 0.06(0.08+0.03) +0.0% 0.06(0.06+0.05) +0.0% 0.07(0.09+0.03) +16.7% 2000.21: git checkout -f - (sparse-v4) 0.06(0.08+0.04) 0.05(0.07+0.05) -16.7% 0.05(0.07+0.04) -16.7% 0.06(0.09+0.03) +0.0% ``` All of the above were already integrated. ``` 2000.22: git reset (full-v3) 0.41(0.32+0.06) 0.40(0.31+0.06) -2.4% 0.41(0.33+0.05) +0.0% 0.42(0.34+0.04) +2.4% 2000.23: git reset (full-v4) 0.37(0.32+0.05) 0.35(0.30+0.05) -5.4% 0.37(0.30+0.05) +0.0% 0.35(0.31+0.03) -5.4% 2000.24: git reset (sparse-v3) 0.68(0.65+0.05) 0.55(0.52+0.04) -19.1% 0.04(0.05+0.04) -94.1% 0.04(0.05+0.04) -94.1% 2000.25: git reset (sparse-v4) 0.70(0.65+0.05) 0.54(0.50+0.06) -22.9% 0.04(0.07+0.01) -94.3% 0.03(0.05+0.05) -95.7% 2000.26: git reset --hard (full-v3) 0.54(0.43+0.07) 0.53(0.43+0.06) -1.9% 0.55(0.46+0.05) +1.9% 0.55(0.44+0.06) +1.9% 2000.27: git reset --hard (full-v4) 0.50(0.45+0.03) 0.50(0.43+0.05) +0.0% 0.49(0.41+0.06) -2.0% 0.50(0.42+0.05) +0.0% 2000.28: git reset --hard (sparse-v3) 0.83(0.76+0.06) 0.68(0.62+0.05) -18.1% 0.07(0.05+0.02) -91.6% 0.07(0.05+0.02) -91.6% 2000.29: git reset --hard (sparse-v4) 0.80(0.75+0.05) 0.69(0.62+0.06) -13.8% 0.07(0.04+0.02) -91.2% 0.07(0.04+0.03) -91.2% ``` As expected, `git reset [--hard]` improves with the sparse index integration, but remains constant across the full index case. ``` 2000.30: git update-index --add --remove (full-v3) 0.03(0.01+0.01) 0.03(0.02+0.01) +0.0% 0.03(0.02+0.01) +0.0% 0.03(0.01+0.01) +0.0% 2000.31: git update-index --add --remove (full-v4) 0.03(0.02+0.01) 0.03(0.02+0.01) +0.0% 0.03(0.03+0.00) +0.0% 0.03(0.02+0.01) +0.0% 2000.32: git update-index --add --remove (sparse-v3) 0.57(0.54+0.02) 0.43(0.42+0.00) -24.6% 0.44(0.41+0.03) -22.8% 0.44(0.42+0.01) -22.8% 2000.33: git update-index --add --remove (sparse-v4) 0.56(0.52+0.04) 0.43(0.42+0.01) -23.2% 0.44(0.42+0.02) -21.4% 0.42(0.41+0.01) -25.0% ``` These do not change significantly because #423 is not merged. ``` 2000.34: git diff (full-v3) 0.07(0.05+0.03) 0.06(0.05+0.03) -14.3% 0.07(0.05+0.03) +0.0% 0.06(0.05+0.03) -14.3% 2000.35: git diff (full-v4) 0.06(0.05+0.03) 0.06(0.05+0.02) +0.0% 0.06(0.05+0.02) +0.0% 0.06(0.06+0.02) +0.0% 2000.36: git diff (sparse-v3) 0.25(0.23+0.03) 0.17(0.17+0.02) -32.0% 0.18(0.18+0.02) -28.0% 0.01(0.03+0.03) -96.0% 2000.37: git diff (sparse-v4) 0.25(0.22+0.05) 0.16(0.16+0.01) -36.0% 0.18(0.15+0.04) -28.0% 0.01(0.04+0.02) -96.0% 2000.38: git diff --staged (full-v3) 0.03(0.01+0.01) 0.03(0.02+0.01) +0.0% 0.03(0.02+0.01) +0.0% 0.03(0.02+0.00) +0.0% 2000.39: git diff --staged (full-v4) 0.04(0.03+0.01) 0.03(0.02+0.01) -25.0% 0.03(0.03+0.00) -25.0% 0.03(0.03+0.00) -25.0% 2000.40: git diff --staged (sparse-v3) 0.21(0.19+0.01) 0.15(0.13+0.01) -28.6% 0.15(0.14+0.01) -28.6% 0.01(0.01+0.00) -95.2% 2000.41: git diff --staged (sparse-v4) 0.22(0.21+0.01) 0.14(0.11+0.03) -36.4% 0.15(0.13+0.02) -31.8% 0.01(0.01+0.00) -95.5% ``` The `git diff` improvements are measurable. ``` 2000.42: git sparse-checkout reapply (full-v3) 0.63(0.54+0.05) 0.56(0.48+0.04) -11.1% 0.57(0.48+0.03) -9.5% 0.59(0.48+0.05) -6.3% 2000.43: git sparse-checkout reapply (full-v4) 0.60(0.54+0.02) 0.51(0.46+0.03) -15.0% 0.54(0.48+0.02) -10.0% 0.50(0.44+0.04) -16.7% 2000.44: git sparse-checkout reapply (sparse-v3) 0.91(0.86+0.05) 0.05(0.05+0.00) -94.5% 0.06(0.05+0.01) -93.4% 0.06(0.06+0.00) -93.4% 2000.45: git sparse-checkout reapply (sparse-v4) 0.92(0.88+0.04) 0.05(0.05+0.00) -94.6% 0.05(0.05+0.01) -94.6% 0.05(0.04+0.01) -94.6% ``` Finally, the `git sparse-checkout` measurements are also present. This test script is particularly valuable when contributing changes upstream. It can be good to start by adding the lines to the performance test in an early commit, then demonstrating the performance change by copying the necessary lines from the output table into your commit message.
derrickstolee
added a commit
that referenced
this pull request
Jun 27, 2022
…ckout` builtin This integrates the `sparse-checkout` builtin with the sparse index. The tricky part here is that we need to partially expand the index when we are modifying the sparse-checkout definition. Note that we modify the pattern list in a careful way: we create a `struct pattern_list` in-memory in `builtin/sparse-checkout.c` then apply those patterns to the index before writing the patterns to the sparse-checkout file. The `update_sparsity()` method does the work to assign the `SKIP_WORKTREE` bit appropriately, but this doesn't work if the files that are within the new sparse-checkout cone are still hidden behind a sparse directory. The new `expand_to_pattern_list()` method does the hard work of expanding the sparse directories that are now within the new patterns. This expands only as far as needed, possibly creating new sparse directory entries. This method does not contract existing files to sparse directories, and a big reason why is because of the check for ignored files as we delete those directories. The `clean_tracked_sparse_directories()` method is called after `update_sparsity()`, but we need to read the `A/B/.gitignore` file (or lack thereof) before we can delete `A/B/`. If we convert to sparse too quickly, then we lose this information and cause a full expansion. Most of the correctness is handled by existing tests in `t1092`, but I add checks for `ensure_not_expanded` in some hopefully interesting cases. As for performance, `git sparse-checkout set` can be slow if it needs to move a lot of files. However, no-op `git sparse-checkout set` (i.e. set the sparse-checkout cone to only include files at root, and do this on repeat) has these performance results on Linux in a monorepo with 2+ million files at `HEAD`: ``` Benchmark #1: baseline Time (mean ± σ): 10.465 s ± 0.018 s [User: 9.885 s, System: 0.573 s] Range (min … max): 10.450 s … 10.497 s 5 runs Benchmark #2: new code Time (mean ± σ): 68.9 ms ± 2.9 ms [User: 45.8 ms, System: 17.1 ms] Range (min … max): 63.4 ms … 74.0 ms 41 runs Summary 'new code' ran 151.89 ± 6.30 times faster than 'baseline' ```
derrickstolee
added a commit
that referenced
this pull request
Jun 27, 2022
One thing I forgot when talking about the sparse index is that we have a performance test: `t/perf/p2000-sparse-operations.sh`. This test wasn't helpful for commands like `git merge` that need a particular set of input, but work for more read-only operations. Here is a quick demonstration of how this performance test works so we could have a definitive measure of how your previous updates improved performance. To get these results, I ran the following command in `t/perf`: ``` ./run 4bcd533 f9255a5 f28fc01 b713582 -- p2000-sparse-operations.sh ``` The short-shas correspond to the merge commits for these PRs: * #410 * #421 * #417 * #419 The test takes a copy of the Git repository and creates several copies within a nested directory heirarchy. ``` Test 4bcd533 f9255a5 f28fc01 b713582 ------------------------------------------------------------------------------------------------------------------------------------------------- 2000.2: git status (full-v3) 0.19(0.15+0.05) 0.19(0.16+0.05) +0.0% 0.20(0.18+0.03) +5.3% 0.19(0.17+0.04) +0.0% 2000.3: git status (full-v4) 0.20(0.18+0.04) 0.19(0.15+0.06) -5.0% 0.21(0.18+0.05) +5.0% 0.18(0.18+0.02) -10.0% 2000.4: git status (sparse-v3) 0.04(0.04+0.04) 0.05(0.07+0.04) +25.0% 0.04(0.04+0.05) +0.0% 0.04(0.06+0.04) +0.0% 2000.5: git status (sparse-v4) 0.04(0.03+0.06) 0.04(0.05+0.05) +0.0% 0.05(0.05+0.04) +25.0% 0.05(0.06+0.04) +25.0% 2000.6: git add -A (full-v3) 0.36(0.29+0.05) 0.38(0.28+0.07) +5.6% 0.36(0.31+0.05) +0.0% 0.37(0.31+0.05) +2.8% 2000.7: git add -A (full-v4) 0.34(0.27+0.06) 0.34(0.29+0.05) +0.0% 0.34(0.29+0.04) +0.0% 0.35(0.28+0.06) +2.9% 2000.8: git add -A (sparse-v3) 0.06(0.07+0.04) 0.06(0.05+0.06) +0.0% 0.06(0.09+0.01) +0.0% 0.06(0.08+0.03) +0.0% 2000.9: git add -A (sparse-v4) 0.05(0.05+0.04) 0.05(0.05+0.07) +0.0% 0.05(0.04+0.06) +0.0% 0.06(0.06+0.05) +20.0% 2000.10: git add . (full-v3) 0.38(0.31+0.05) 0.37(0.29+0.06) -2.6% 0.37(0.30+0.07) -2.6% 0.37(0.29+0.06) -2.6% 2000.11: git add . (full-v4) 0.35(0.31+0.04) 0.35(0.29+0.07) +0.0% 0.35(0.29+0.05) +0.0% 0.34(0.29+0.06) -2.9% 2000.12: git add . (sparse-v3) 0.06(0.06+0.05) 0.06(0.05+0.06) +0.0% 0.06(0.07+0.05) +0.0% 0.06(0.09+0.03) +0.0% 2000.13: git add . (sparse-v4) 0.06(0.06+0.06) 0.06(0.07+0.04) +0.0% 0.05(0.06+0.05) -16.7% 0.05(0.05+0.07) -16.7% 2000.14: git commit -a -m A (full-v3) 0.48(0.37+0.08) 0.45(0.36+0.08) -6.2% 0.45(0.35+0.09) -6.2% 0.44(0.36+0.07) -8.3% 2000.15: git commit -a -m A (full-v4) 0.45(0.40+0.06) 0.43(0.34+0.07) -4.4% 0.45(0.37+0.06) +0.0% 0.42(0.36+0.05) -6.7% 2000.16: git commit -a -m A (sparse-v3) 0.05(0.05+0.06) 0.05(0.05+0.03) +0.0% 0.05(0.06+0.06) +0.0% 0.05(0.04+0.06) +0.0% 2000.17: git commit -a -m A (sparse-v4) 0.05(0.06+0.03) 0.05(0.06+0.04) +0.0% 0.06(0.07+0.05) +20.0% 0.05(0.04+0.06) +0.0% 2000.18: git checkout -f - (full-v3) 0.55(0.43+0.08) 0.54(0.46+0.05) -1.8% 0.55(0.46+0.07) +0.0% 0.54(0.40+0.10) -1.8% 2000.19: git checkout -f - (full-v4) 0.55(0.41+0.09) 0.50(0.40+0.09) -9.1% 0.51(0.46+0.05) -7.3% 0.51(0.44+0.06) -7.3% 2000.20: git checkout -f - (sparse-v3) 0.06(0.09+0.03) 0.06(0.08+0.03) +0.0% 0.06(0.06+0.05) +0.0% 0.07(0.09+0.03) +16.7% 2000.21: git checkout -f - (sparse-v4) 0.06(0.08+0.04) 0.05(0.07+0.05) -16.7% 0.05(0.07+0.04) -16.7% 0.06(0.09+0.03) +0.0% ``` All of the above were already integrated. ``` 2000.22: git reset (full-v3) 0.41(0.32+0.06) 0.40(0.31+0.06) -2.4% 0.41(0.33+0.05) +0.0% 0.42(0.34+0.04) +2.4% 2000.23: git reset (full-v4) 0.37(0.32+0.05) 0.35(0.30+0.05) -5.4% 0.37(0.30+0.05) +0.0% 0.35(0.31+0.03) -5.4% 2000.24: git reset (sparse-v3) 0.68(0.65+0.05) 0.55(0.52+0.04) -19.1% 0.04(0.05+0.04) -94.1% 0.04(0.05+0.04) -94.1% 2000.25: git reset (sparse-v4) 0.70(0.65+0.05) 0.54(0.50+0.06) -22.9% 0.04(0.07+0.01) -94.3% 0.03(0.05+0.05) -95.7% 2000.26: git reset --hard (full-v3) 0.54(0.43+0.07) 0.53(0.43+0.06) -1.9% 0.55(0.46+0.05) +1.9% 0.55(0.44+0.06) +1.9% 2000.27: git reset --hard (full-v4) 0.50(0.45+0.03) 0.50(0.43+0.05) +0.0% 0.49(0.41+0.06) -2.0% 0.50(0.42+0.05) +0.0% 2000.28: git reset --hard (sparse-v3) 0.83(0.76+0.06) 0.68(0.62+0.05) -18.1% 0.07(0.05+0.02) -91.6% 0.07(0.05+0.02) -91.6% 2000.29: git reset --hard (sparse-v4) 0.80(0.75+0.05) 0.69(0.62+0.06) -13.8% 0.07(0.04+0.02) -91.2% 0.07(0.04+0.03) -91.2% ``` As expected, `git reset [--hard]` improves with the sparse index integration, but remains constant across the full index case. ``` 2000.30: git update-index --add --remove (full-v3) 0.03(0.01+0.01) 0.03(0.02+0.01) +0.0% 0.03(0.02+0.01) +0.0% 0.03(0.01+0.01) +0.0% 2000.31: git update-index --add --remove (full-v4) 0.03(0.02+0.01) 0.03(0.02+0.01) +0.0% 0.03(0.03+0.00) +0.0% 0.03(0.02+0.01) +0.0% 2000.32: git update-index --add --remove (sparse-v3) 0.57(0.54+0.02) 0.43(0.42+0.00) -24.6% 0.44(0.41+0.03) -22.8% 0.44(0.42+0.01) -22.8% 2000.33: git update-index --add --remove (sparse-v4) 0.56(0.52+0.04) 0.43(0.42+0.01) -23.2% 0.44(0.42+0.02) -21.4% 0.42(0.41+0.01) -25.0% ``` These do not change significantly because #423 is not merged. ``` 2000.34: git diff (full-v3) 0.07(0.05+0.03) 0.06(0.05+0.03) -14.3% 0.07(0.05+0.03) +0.0% 0.06(0.05+0.03) -14.3% 2000.35: git diff (full-v4) 0.06(0.05+0.03) 0.06(0.05+0.02) +0.0% 0.06(0.05+0.02) +0.0% 0.06(0.06+0.02) +0.0% 2000.36: git diff (sparse-v3) 0.25(0.23+0.03) 0.17(0.17+0.02) -32.0% 0.18(0.18+0.02) -28.0% 0.01(0.03+0.03) -96.0% 2000.37: git diff (sparse-v4) 0.25(0.22+0.05) 0.16(0.16+0.01) -36.0% 0.18(0.15+0.04) -28.0% 0.01(0.04+0.02) -96.0% 2000.38: git diff --staged (full-v3) 0.03(0.01+0.01) 0.03(0.02+0.01) +0.0% 0.03(0.02+0.01) +0.0% 0.03(0.02+0.00) +0.0% 2000.39: git diff --staged (full-v4) 0.04(0.03+0.01) 0.03(0.02+0.01) -25.0% 0.03(0.03+0.00) -25.0% 0.03(0.03+0.00) -25.0% 2000.40: git diff --staged (sparse-v3) 0.21(0.19+0.01) 0.15(0.13+0.01) -28.6% 0.15(0.14+0.01) -28.6% 0.01(0.01+0.00) -95.2% 2000.41: git diff --staged (sparse-v4) 0.22(0.21+0.01) 0.14(0.11+0.03) -36.4% 0.15(0.13+0.02) -31.8% 0.01(0.01+0.00) -95.5% ``` The `git diff` improvements are measurable. ``` 2000.42: git sparse-checkout reapply (full-v3) 0.63(0.54+0.05) 0.56(0.48+0.04) -11.1% 0.57(0.48+0.03) -9.5% 0.59(0.48+0.05) -6.3% 2000.43: git sparse-checkout reapply (full-v4) 0.60(0.54+0.02) 0.51(0.46+0.03) -15.0% 0.54(0.48+0.02) -10.0% 0.50(0.44+0.04) -16.7% 2000.44: git sparse-checkout reapply (sparse-v3) 0.91(0.86+0.05) 0.05(0.05+0.00) -94.5% 0.06(0.05+0.01) -93.4% 0.06(0.06+0.00) -93.4% 2000.45: git sparse-checkout reapply (sparse-v4) 0.92(0.88+0.04) 0.05(0.05+0.00) -94.6% 0.05(0.05+0.01) -94.6% 0.05(0.04+0.01) -94.6% ``` Finally, the `git sparse-checkout` measurements are also present. This test script is particularly valuable when contributing changes upstream. It can be good to start by adding the lines to the performance test in an early commit, then demonstrating the performance change by copying the necessary lines from the output table into your commit message.
dscho
pushed a commit
that referenced
this pull request
Jul 12, 2022
…ckout` builtin This integrates the `sparse-checkout` builtin with the sparse index. The tricky part here is that we need to partially expand the index when we are modifying the sparse-checkout definition. Note that we modify the pattern list in a careful way: we create a `struct pattern_list` in-memory in `builtin/sparse-checkout.c` then apply those patterns to the index before writing the patterns to the sparse-checkout file. The `update_sparsity()` method does the work to assign the `SKIP_WORKTREE` bit appropriately, but this doesn't work if the files that are within the new sparse-checkout cone are still hidden behind a sparse directory. The new `expand_to_pattern_list()` method does the hard work of expanding the sparse directories that are now within the new patterns. This expands only as far as needed, possibly creating new sparse directory entries. This method does not contract existing files to sparse directories, and a big reason why is because of the check for ignored files as we delete those directories. The `clean_tracked_sparse_directories()` method is called after `update_sparsity()`, but we need to read the `A/B/.gitignore` file (or lack thereof) before we can delete `A/B/`. If we convert to sparse too quickly, then we lose this information and cause a full expansion. Most of the correctness is handled by existing tests in `t1092`, but I add checks for `ensure_not_expanded` in some hopefully interesting cases. As for performance, `git sparse-checkout set` can be slow if it needs to move a lot of files. However, no-op `git sparse-checkout set` (i.e. set the sparse-checkout cone to only include files at root, and do this on repeat) has these performance results on Linux in a monorepo with 2+ million files at `HEAD`: ``` Benchmark #1: baseline Time (mean ± σ): 10.465 s ± 0.018 s [User: 9.885 s, System: 0.573 s] Range (min … max): 10.450 s … 10.497 s 5 runs Benchmark #2: new code Time (mean ± σ): 68.9 ms ± 2.9 ms [User: 45.8 ms, System: 17.1 ms] Range (min … max): 63.4 ms … 74.0 ms 41 runs Summary 'new code' ran 151.89 ± 6.30 times faster than 'baseline' ```
dscho
pushed a commit
that referenced
this pull request
Jul 12, 2022
One thing I forgot when talking about the sparse index is that we have a performance test: `t/perf/p2000-sparse-operations.sh`. This test wasn't helpful for commands like `git merge` that need a particular set of input, but work for more read-only operations. Here is a quick demonstration of how this performance test works so we could have a definitive measure of how your previous updates improved performance. To get these results, I ran the following command in `t/perf`: ``` ./run 4bcd533 f9255a5 f28fc01 b713582 -- p2000-sparse-operations.sh ``` The short-shas correspond to the merge commits for these PRs: * #410 * #421 * #417 * #419 The test takes a copy of the Git repository and creates several copies within a nested directory heirarchy. ``` Test 4bcd533 f9255a5 f28fc01 b713582 ------------------------------------------------------------------------------------------------------------------------------------------------- 2000.2: git status (full-v3) 0.19(0.15+0.05) 0.19(0.16+0.05) +0.0% 0.20(0.18+0.03) +5.3% 0.19(0.17+0.04) +0.0% 2000.3: git status (full-v4) 0.20(0.18+0.04) 0.19(0.15+0.06) -5.0% 0.21(0.18+0.05) +5.0% 0.18(0.18+0.02) -10.0% 2000.4: git status (sparse-v3) 0.04(0.04+0.04) 0.05(0.07+0.04) +25.0% 0.04(0.04+0.05) +0.0% 0.04(0.06+0.04) +0.0% 2000.5: git status (sparse-v4) 0.04(0.03+0.06) 0.04(0.05+0.05) +0.0% 0.05(0.05+0.04) +25.0% 0.05(0.06+0.04) +25.0% 2000.6: git add -A (full-v3) 0.36(0.29+0.05) 0.38(0.28+0.07) +5.6% 0.36(0.31+0.05) +0.0% 0.37(0.31+0.05) +2.8% 2000.7: git add -A (full-v4) 0.34(0.27+0.06) 0.34(0.29+0.05) +0.0% 0.34(0.29+0.04) +0.0% 0.35(0.28+0.06) +2.9% 2000.8: git add -A (sparse-v3) 0.06(0.07+0.04) 0.06(0.05+0.06) +0.0% 0.06(0.09+0.01) +0.0% 0.06(0.08+0.03) +0.0% 2000.9: git add -A (sparse-v4) 0.05(0.05+0.04) 0.05(0.05+0.07) +0.0% 0.05(0.04+0.06) +0.0% 0.06(0.06+0.05) +20.0% 2000.10: git add . (full-v3) 0.38(0.31+0.05) 0.37(0.29+0.06) -2.6% 0.37(0.30+0.07) -2.6% 0.37(0.29+0.06) -2.6% 2000.11: git add . (full-v4) 0.35(0.31+0.04) 0.35(0.29+0.07) +0.0% 0.35(0.29+0.05) +0.0% 0.34(0.29+0.06) -2.9% 2000.12: git add . (sparse-v3) 0.06(0.06+0.05) 0.06(0.05+0.06) +0.0% 0.06(0.07+0.05) +0.0% 0.06(0.09+0.03) +0.0% 2000.13: git add . (sparse-v4) 0.06(0.06+0.06) 0.06(0.07+0.04) +0.0% 0.05(0.06+0.05) -16.7% 0.05(0.05+0.07) -16.7% 2000.14: git commit -a -m A (full-v3) 0.48(0.37+0.08) 0.45(0.36+0.08) -6.2% 0.45(0.35+0.09) -6.2% 0.44(0.36+0.07) -8.3% 2000.15: git commit -a -m A (full-v4) 0.45(0.40+0.06) 0.43(0.34+0.07) -4.4% 0.45(0.37+0.06) +0.0% 0.42(0.36+0.05) -6.7% 2000.16: git commit -a -m A (sparse-v3) 0.05(0.05+0.06) 0.05(0.05+0.03) +0.0% 0.05(0.06+0.06) +0.0% 0.05(0.04+0.06) +0.0% 2000.17: git commit -a -m A (sparse-v4) 0.05(0.06+0.03) 0.05(0.06+0.04) +0.0% 0.06(0.07+0.05) +20.0% 0.05(0.04+0.06) +0.0% 2000.18: git checkout -f - (full-v3) 0.55(0.43+0.08) 0.54(0.46+0.05) -1.8% 0.55(0.46+0.07) +0.0% 0.54(0.40+0.10) -1.8% 2000.19: git checkout -f - (full-v4) 0.55(0.41+0.09) 0.50(0.40+0.09) -9.1% 0.51(0.46+0.05) -7.3% 0.51(0.44+0.06) -7.3% 2000.20: git checkout -f - (sparse-v3) 0.06(0.09+0.03) 0.06(0.08+0.03) +0.0% 0.06(0.06+0.05) +0.0% 0.07(0.09+0.03) +16.7% 2000.21: git checkout -f - (sparse-v4) 0.06(0.08+0.04) 0.05(0.07+0.05) -16.7% 0.05(0.07+0.04) -16.7% 0.06(0.09+0.03) +0.0% ``` All of the above were already integrated. ``` 2000.22: git reset (full-v3) 0.41(0.32+0.06) 0.40(0.31+0.06) -2.4% 0.41(0.33+0.05) +0.0% 0.42(0.34+0.04) +2.4% 2000.23: git reset (full-v4) 0.37(0.32+0.05) 0.35(0.30+0.05) -5.4% 0.37(0.30+0.05) +0.0% 0.35(0.31+0.03) -5.4% 2000.24: git reset (sparse-v3) 0.68(0.65+0.05) 0.55(0.52+0.04) -19.1% 0.04(0.05+0.04) -94.1% 0.04(0.05+0.04) -94.1% 2000.25: git reset (sparse-v4) 0.70(0.65+0.05) 0.54(0.50+0.06) -22.9% 0.04(0.07+0.01) -94.3% 0.03(0.05+0.05) -95.7% 2000.26: git reset --hard (full-v3) 0.54(0.43+0.07) 0.53(0.43+0.06) -1.9% 0.55(0.46+0.05) +1.9% 0.55(0.44+0.06) +1.9% 2000.27: git reset --hard (full-v4) 0.50(0.45+0.03) 0.50(0.43+0.05) +0.0% 0.49(0.41+0.06) -2.0% 0.50(0.42+0.05) +0.0% 2000.28: git reset --hard (sparse-v3) 0.83(0.76+0.06) 0.68(0.62+0.05) -18.1% 0.07(0.05+0.02) -91.6% 0.07(0.05+0.02) -91.6% 2000.29: git reset --hard (sparse-v4) 0.80(0.75+0.05) 0.69(0.62+0.06) -13.8% 0.07(0.04+0.02) -91.2% 0.07(0.04+0.03) -91.2% ``` As expected, `git reset [--hard]` improves with the sparse index integration, but remains constant across the full index case. ``` 2000.30: git update-index --add --remove (full-v3) 0.03(0.01+0.01) 0.03(0.02+0.01) +0.0% 0.03(0.02+0.01) +0.0% 0.03(0.01+0.01) +0.0% 2000.31: git update-index --add --remove (full-v4) 0.03(0.02+0.01) 0.03(0.02+0.01) +0.0% 0.03(0.03+0.00) +0.0% 0.03(0.02+0.01) +0.0% 2000.32: git update-index --add --remove (sparse-v3) 0.57(0.54+0.02) 0.43(0.42+0.00) -24.6% 0.44(0.41+0.03) -22.8% 0.44(0.42+0.01) -22.8% 2000.33: git update-index --add --remove (sparse-v4) 0.56(0.52+0.04) 0.43(0.42+0.01) -23.2% 0.44(0.42+0.02) -21.4% 0.42(0.41+0.01) -25.0% ``` These do not change significantly because #423 is not merged. ``` 2000.34: git diff (full-v3) 0.07(0.05+0.03) 0.06(0.05+0.03) -14.3% 0.07(0.05+0.03) +0.0% 0.06(0.05+0.03) -14.3% 2000.35: git diff (full-v4) 0.06(0.05+0.03) 0.06(0.05+0.02) +0.0% 0.06(0.05+0.02) +0.0% 0.06(0.06+0.02) +0.0% 2000.36: git diff (sparse-v3) 0.25(0.23+0.03) 0.17(0.17+0.02) -32.0% 0.18(0.18+0.02) -28.0% 0.01(0.03+0.03) -96.0% 2000.37: git diff (sparse-v4) 0.25(0.22+0.05) 0.16(0.16+0.01) -36.0% 0.18(0.15+0.04) -28.0% 0.01(0.04+0.02) -96.0% 2000.38: git diff --staged (full-v3) 0.03(0.01+0.01) 0.03(0.02+0.01) +0.0% 0.03(0.02+0.01) +0.0% 0.03(0.02+0.00) +0.0% 2000.39: git diff --staged (full-v4) 0.04(0.03+0.01) 0.03(0.02+0.01) -25.0% 0.03(0.03+0.00) -25.0% 0.03(0.03+0.00) -25.0% 2000.40: git diff --staged (sparse-v3) 0.21(0.19+0.01) 0.15(0.13+0.01) -28.6% 0.15(0.14+0.01) -28.6% 0.01(0.01+0.00) -95.2% 2000.41: git diff --staged (sparse-v4) 0.22(0.21+0.01) 0.14(0.11+0.03) -36.4% 0.15(0.13+0.02) -31.8% 0.01(0.01+0.00) -95.5% ``` The `git diff` improvements are measurable. ``` 2000.42: git sparse-checkout reapply (full-v3) 0.63(0.54+0.05) 0.56(0.48+0.04) -11.1% 0.57(0.48+0.03) -9.5% 0.59(0.48+0.05) -6.3% 2000.43: git sparse-checkout reapply (full-v4) 0.60(0.54+0.02) 0.51(0.46+0.03) -15.0% 0.54(0.48+0.02) -10.0% 0.50(0.44+0.04) -16.7% 2000.44: git sparse-checkout reapply (sparse-v3) 0.91(0.86+0.05) 0.05(0.05+0.00) -94.5% 0.06(0.05+0.01) -93.4% 0.06(0.06+0.00) -93.4% 2000.45: git sparse-checkout reapply (sparse-v4) 0.92(0.88+0.04) 0.05(0.05+0.00) -94.6% 0.05(0.05+0.01) -94.6% 0.05(0.04+0.01) -94.6% ``` Finally, the `git sparse-checkout` measurements are also present. This test script is particularly valuable when contributing changes upstream. It can be good to start by adding the lines to the performance test in an early commit, then demonstrating the performance change by copying the necessary lines from the output table into your commit message.
derrickstolee
added a commit
that referenced
this pull request
Aug 31, 2022
…ckout` builtin This integrates the `sparse-checkout` builtin with the sparse index. The tricky part here is that we need to partially expand the index when we are modifying the sparse-checkout definition. Note that we modify the pattern list in a careful way: we create a `struct pattern_list` in-memory in `builtin/sparse-checkout.c` then apply those patterns to the index before writing the patterns to the sparse-checkout file. The `update_sparsity()` method does the work to assign the `SKIP_WORKTREE` bit appropriately, but this doesn't work if the files that are within the new sparse-checkout cone are still hidden behind a sparse directory. The new `expand_to_pattern_list()` method does the hard work of expanding the sparse directories that are now within the new patterns. This expands only as far as needed, possibly creating new sparse directory entries. This method does not contract existing files to sparse directories, and a big reason why is because of the check for ignored files as we delete those directories. The `clean_tracked_sparse_directories()` method is called after `update_sparsity()`, but we need to read the `A/B/.gitignore` file (or lack thereof) before we can delete `A/B/`. If we convert to sparse too quickly, then we lose this information and cause a full expansion. Most of the correctness is handled by existing tests in `t1092`, but I add checks for `ensure_not_expanded` in some hopefully interesting cases. As for performance, `git sparse-checkout set` can be slow if it needs to move a lot of files. However, no-op `git sparse-checkout set` (i.e. set the sparse-checkout cone to only include files at root, and do this on repeat) has these performance results on Linux in a monorepo with 2+ million files at `HEAD`: ``` Benchmark #1: baseline Time (mean ± σ): 10.465 s ± 0.018 s [User: 9.885 s, System: 0.573 s] Range (min … max): 10.450 s … 10.497 s 5 runs Benchmark #2: new code Time (mean ± σ): 68.9 ms ± 2.9 ms [User: 45.8 ms, System: 17.1 ms] Range (min … max): 63.4 ms … 74.0 ms 41 runs Summary 'new code' ran 151.89 ± 6.30 times faster than 'baseline' ```
derrickstolee
added a commit
that referenced
this pull request
Aug 31, 2022
One thing I forgot when talking about the sparse index is that we have a performance test: `t/perf/p2000-sparse-operations.sh`. This test wasn't helpful for commands like `git merge` that need a particular set of input, but work for more read-only operations. Here is a quick demonstration of how this performance test works so we could have a definitive measure of how your previous updates improved performance. To get these results, I ran the following command in `t/perf`: ``` ./run 4bcd533 f9255a5 f28fc01 b713582 -- p2000-sparse-operations.sh ``` The short-shas correspond to the merge commits for these PRs: * #410 * #421 * #417 * #419 The test takes a copy of the Git repository and creates several copies within a nested directory heirarchy. ``` Test 4bcd533 f9255a5 f28fc01 b713582 ------------------------------------------------------------------------------------------------------------------------------------------------- 2000.2: git status (full-v3) 0.19(0.15+0.05) 0.19(0.16+0.05) +0.0% 0.20(0.18+0.03) +5.3% 0.19(0.17+0.04) +0.0% 2000.3: git status (full-v4) 0.20(0.18+0.04) 0.19(0.15+0.06) -5.0% 0.21(0.18+0.05) +5.0% 0.18(0.18+0.02) -10.0% 2000.4: git status (sparse-v3) 0.04(0.04+0.04) 0.05(0.07+0.04) +25.0% 0.04(0.04+0.05) +0.0% 0.04(0.06+0.04) +0.0% 2000.5: git status (sparse-v4) 0.04(0.03+0.06) 0.04(0.05+0.05) +0.0% 0.05(0.05+0.04) +25.0% 0.05(0.06+0.04) +25.0% 2000.6: git add -A (full-v3) 0.36(0.29+0.05) 0.38(0.28+0.07) +5.6% 0.36(0.31+0.05) +0.0% 0.37(0.31+0.05) +2.8% 2000.7: git add -A (full-v4) 0.34(0.27+0.06) 0.34(0.29+0.05) +0.0% 0.34(0.29+0.04) +0.0% 0.35(0.28+0.06) +2.9% 2000.8: git add -A (sparse-v3) 0.06(0.07+0.04) 0.06(0.05+0.06) +0.0% 0.06(0.09+0.01) +0.0% 0.06(0.08+0.03) +0.0% 2000.9: git add -A (sparse-v4) 0.05(0.05+0.04) 0.05(0.05+0.07) +0.0% 0.05(0.04+0.06) +0.0% 0.06(0.06+0.05) +20.0% 2000.10: git add . (full-v3) 0.38(0.31+0.05) 0.37(0.29+0.06) -2.6% 0.37(0.30+0.07) -2.6% 0.37(0.29+0.06) -2.6% 2000.11: git add . (full-v4) 0.35(0.31+0.04) 0.35(0.29+0.07) +0.0% 0.35(0.29+0.05) +0.0% 0.34(0.29+0.06) -2.9% 2000.12: git add . (sparse-v3) 0.06(0.06+0.05) 0.06(0.05+0.06) +0.0% 0.06(0.07+0.05) +0.0% 0.06(0.09+0.03) +0.0% 2000.13: git add . (sparse-v4) 0.06(0.06+0.06) 0.06(0.07+0.04) +0.0% 0.05(0.06+0.05) -16.7% 0.05(0.05+0.07) -16.7% 2000.14: git commit -a -m A (full-v3) 0.48(0.37+0.08) 0.45(0.36+0.08) -6.2% 0.45(0.35+0.09) -6.2% 0.44(0.36+0.07) -8.3% 2000.15: git commit -a -m A (full-v4) 0.45(0.40+0.06) 0.43(0.34+0.07) -4.4% 0.45(0.37+0.06) +0.0% 0.42(0.36+0.05) -6.7% 2000.16: git commit -a -m A (sparse-v3) 0.05(0.05+0.06) 0.05(0.05+0.03) +0.0% 0.05(0.06+0.06) +0.0% 0.05(0.04+0.06) +0.0% 2000.17: git commit -a -m A (sparse-v4) 0.05(0.06+0.03) 0.05(0.06+0.04) +0.0% 0.06(0.07+0.05) +20.0% 0.05(0.04+0.06) +0.0% 2000.18: git checkout -f - (full-v3) 0.55(0.43+0.08) 0.54(0.46+0.05) -1.8% 0.55(0.46+0.07) +0.0% 0.54(0.40+0.10) -1.8% 2000.19: git checkout -f - (full-v4) 0.55(0.41+0.09) 0.50(0.40+0.09) -9.1% 0.51(0.46+0.05) -7.3% 0.51(0.44+0.06) -7.3% 2000.20: git checkout -f - (sparse-v3) 0.06(0.09+0.03) 0.06(0.08+0.03) +0.0% 0.06(0.06+0.05) +0.0% 0.07(0.09+0.03) +16.7% 2000.21: git checkout -f - (sparse-v4) 0.06(0.08+0.04) 0.05(0.07+0.05) -16.7% 0.05(0.07+0.04) -16.7% 0.06(0.09+0.03) +0.0% ``` All of the above were already integrated. ``` 2000.22: git reset (full-v3) 0.41(0.32+0.06) 0.40(0.31+0.06) -2.4% 0.41(0.33+0.05) +0.0% 0.42(0.34+0.04) +2.4% 2000.23: git reset (full-v4) 0.37(0.32+0.05) 0.35(0.30+0.05) -5.4% 0.37(0.30+0.05) +0.0% 0.35(0.31+0.03) -5.4% 2000.24: git reset (sparse-v3) 0.68(0.65+0.05) 0.55(0.52+0.04) -19.1% 0.04(0.05+0.04) -94.1% 0.04(0.05+0.04) -94.1% 2000.25: git reset (sparse-v4) 0.70(0.65+0.05) 0.54(0.50+0.06) -22.9% 0.04(0.07+0.01) -94.3% 0.03(0.05+0.05) -95.7% 2000.26: git reset --hard (full-v3) 0.54(0.43+0.07) 0.53(0.43+0.06) -1.9% 0.55(0.46+0.05) +1.9% 0.55(0.44+0.06) +1.9% 2000.27: git reset --hard (full-v4) 0.50(0.45+0.03) 0.50(0.43+0.05) +0.0% 0.49(0.41+0.06) -2.0% 0.50(0.42+0.05) +0.0% 2000.28: git reset --hard (sparse-v3) 0.83(0.76+0.06) 0.68(0.62+0.05) -18.1% 0.07(0.05+0.02) -91.6% 0.07(0.05+0.02) -91.6% 2000.29: git reset --hard (sparse-v4) 0.80(0.75+0.05) 0.69(0.62+0.06) -13.8% 0.07(0.04+0.02) -91.2% 0.07(0.04+0.03) -91.2% ``` As expected, `git reset [--hard]` improves with the sparse index integration, but remains constant across the full index case. ``` 2000.30: git update-index --add --remove (full-v3) 0.03(0.01+0.01) 0.03(0.02+0.01) +0.0% 0.03(0.02+0.01) +0.0% 0.03(0.01+0.01) +0.0% 2000.31: git update-index --add --remove (full-v4) 0.03(0.02+0.01) 0.03(0.02+0.01) +0.0% 0.03(0.03+0.00) +0.0% 0.03(0.02+0.01) +0.0% 2000.32: git update-index --add --remove (sparse-v3) 0.57(0.54+0.02) 0.43(0.42+0.00) -24.6% 0.44(0.41+0.03) -22.8% 0.44(0.42+0.01) -22.8% 2000.33: git update-index --add --remove (sparse-v4) 0.56(0.52+0.04) 0.43(0.42+0.01) -23.2% 0.44(0.42+0.02) -21.4% 0.42(0.41+0.01) -25.0% ``` These do not change significantly because #423 is not merged. ``` 2000.34: git diff (full-v3) 0.07(0.05+0.03) 0.06(0.05+0.03) -14.3% 0.07(0.05+0.03) +0.0% 0.06(0.05+0.03) -14.3% 2000.35: git diff (full-v4) 0.06(0.05+0.03) 0.06(0.05+0.02) +0.0% 0.06(0.05+0.02) +0.0% 0.06(0.06+0.02) +0.0% 2000.36: git diff (sparse-v3) 0.25(0.23+0.03) 0.17(0.17+0.02) -32.0% 0.18(0.18+0.02) -28.0% 0.01(0.03+0.03) -96.0% 2000.37: git diff (sparse-v4) 0.25(0.22+0.05) 0.16(0.16+0.01) -36.0% 0.18(0.15+0.04) -28.0% 0.01(0.04+0.02) -96.0% 2000.38: git diff --staged (full-v3) 0.03(0.01+0.01) 0.03(0.02+0.01) +0.0% 0.03(0.02+0.01) +0.0% 0.03(0.02+0.00) +0.0% 2000.39: git diff --staged (full-v4) 0.04(0.03+0.01) 0.03(0.02+0.01) -25.0% 0.03(0.03+0.00) -25.0% 0.03(0.03+0.00) -25.0% 2000.40: git diff --staged (sparse-v3) 0.21(0.19+0.01) 0.15(0.13+0.01) -28.6% 0.15(0.14+0.01) -28.6% 0.01(0.01+0.00) -95.2% 2000.41: git diff --staged (sparse-v4) 0.22(0.21+0.01) 0.14(0.11+0.03) -36.4% 0.15(0.13+0.02) -31.8% 0.01(0.01+0.00) -95.5% ``` The `git diff` improvements are measurable. ``` 2000.42: git sparse-checkout reapply (full-v3) 0.63(0.54+0.05) 0.56(0.48+0.04) -11.1% 0.57(0.48+0.03) -9.5% 0.59(0.48+0.05) -6.3% 2000.43: git sparse-checkout reapply (full-v4) 0.60(0.54+0.02) 0.51(0.46+0.03) -15.0% 0.54(0.48+0.02) -10.0% 0.50(0.44+0.04) -16.7% 2000.44: git sparse-checkout reapply (sparse-v3) 0.91(0.86+0.05) 0.05(0.05+0.00) -94.5% 0.06(0.05+0.01) -93.4% 0.06(0.06+0.00) -93.4% 2000.45: git sparse-checkout reapply (sparse-v4) 0.92(0.88+0.04) 0.05(0.05+0.00) -94.6% 0.05(0.05+0.01) -94.6% 0.05(0.04+0.01) -94.6% ``` Finally, the `git sparse-checkout` measurements are also present. This test script is particularly valuable when contributing changes upstream. It can be good to start by adding the lines to the performance test in an early commit, then demonstrating the performance change by copying the necessary lines from the output table into your commit message.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This integrates the
sparse-checkout
builtin with the sparse index. The tricky part here is that we need to partially expand the index when we are modifying the sparse-checkout definition.Note that we modify the pattern list in a careful way: we create a
struct pattern_list
in-memory inbuiltin/sparse-checkout.c
then apply those patterns to the index before writing the patterns to the sparse-checkout file. Theupdate_sparsity()
method does the work to assign theSKIP_WORKTREE
bit appropriately, but this doesn't work if the files that are within the new sparse-checkout cone are still hidden behind a sparse directory.The new
expand_to_pattern_list()
method does the hard work of expanding the sparse directories that are now within the new patterns. This expands only as far as needed, possibly creating new sparse directory entries.This method does not contract existing files to sparse directories, and a big reason why is because of the check for ignored files as we delete those directories. The
clean_tracked_sparse_directories()
method is called afterupdate_sparsity()
, but we need to read theA/B/.gitignore
file (or lack thereof) before we can deleteA/B/
. If we convert to sparse too quickly, then we lose this information and cause a full expansion.Most of the correctness is handled by existing tests in
t1092
, but I add checks forensure_not_expanded
in some hopefully interesting cases.As for performance,
git sparse-checkout set
can be slow if it needs to move a lot of files. However, no-opgit sparse-checkout set
(i.e. set the sparse-checkout cone to only include files at root, and do this on repeat) has these performance results on Linux in a monorepo with 2+ million files atHEAD
: