-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Sparse Index] Integrate with git commit
and git checkout
#361
[Sparse Index] Integrate with git commit
and git checkout
#361
Conversation
1e1af99
to
b160b38
Compare
a80ef2a
to
8e163eb
Compare
58725bb
to
1eb8541
Compare
git checkout
with sparse indexgit commit
and git checkout
4e35f10
to
46172a8
Compare
As we increase our list of commands to test in p2000-sparse-operations.sh, we will want to have a slightly smaller test repository. Reduce the size by a factor of four by reducing the depth of the step that creates a big index around a moderately-sized repository. Also add a step to run 'git checkout -' on repeat. This requires having a previous location in the reflog, so add that to the initialization steps. Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
By using shorter names for the test repos, we will get a slightly more compressed performance summary without comprimising clarity. Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Update 'git commit' to allow using the sparse-index in memory without expanding to a full one. The only place that had an ensure_full_index() call was in cache_tree_update(). The recursive algorithm for update_one() was already updated in 2de37c5 (cache-tree: integrate with sparse directory entries, 2021-03-03) to handle sparse directory entries in the index. Most of this change involves testing different command-line options that allow specifying which on-disk changes should be included in the commit. This includes no options (only take currently-staged changes), -a (take all tracked changes), and --include (take a list of specific changes). To simplify testing that these options do not expand the index, update the test that previously verified that 'git status' does not expand the index with a helper method, ensure_not_expanded(). This allows 'git commit' to operate much faster when the sparse-checkout cone is much smaller than the full list of files at HEAD. Here are the relevant lines from p2000-sparse-operations.sh: Test HEAD~1 HEAD ---------------------------------------------------------------------------------- 2000.14: git commit -a -m A (full-v3) 0.35(0.26+0.06) 0.36(0.28+0.07) +2.9% 2000.15: git commit -a -m A (full-v4) 0.32(0.26+0.05) 0.34(0.28+0.06) +6.3% 2000.16: git commit -a -m A (sparse-v3) 0.63(0.59+0.06) 0.04(0.05+0.05) -93.7% 2000.17: git commit -a -m A (sparse-v4) 0.64(0.59+0.08) 0.04(0.04+0.04) -93.8% It is important to compare the full-index case to the sparse-index case, so the improvement for index version v4 is actually an 88% improvement in this synthetic example. In a real repository with over two million files at HEAD and 60,000 files in the sparse-checkout definition, the time for 'git commit -a' went from 2.61 seconds to 134ms. I compared this to the result if the index only contained the paths in the sparse-checkout definition and found the theoretical optimum to be 120ms, so the out-of-cone paths only add a 12% overhead. Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Previous changes did the necessary improvements to unpack-trees.c and diff-lib.c in order to modify a sparse index based on its comparision with a tree. The only remaining work is to remove some ensure_full_index() calls and add tests that verify that the index is not expanded in our interesting cases. Include 'switch' and 'restore' in these tests, as they share a base implementation with 'checkout'. Here are the relevant performance results from p2000-sparse-operations.sh: Test HEAD~1 HEAD -------------------------------------------------------------------------------- 2000.18: git checkout -f - (full-v3) 0.49(0.43+0.03) 0.47(0.39+0.05) -4.1% 2000.19: git checkout -f - (full-v4) 0.45(0.37+0.06) 0.42(0.37+0.05) -6.7% 2000.20: git checkout -f - (sparse-v3) 0.76(0.71+0.07) 0.04(0.03+0.04) -94.7% 2000.21: git checkout -f - (sparse-v4) 0.75(0.72+0.04) 0.05(0.06+0.04) -93.3% It is important to compare the full index case to the sparse index case, as the previous results for the sparse index were inflated by the index expansion. For index v4, this is an 88% improvement. On an internal repository with over two million paths at HEAD and a sparse-checkout definition containing ~60,000 of those paths, 'git checkout' went from 3.5s to 297ms with this change. The theoretical optimum where only those ~60,000 paths exist was 275ms, so the extra sparse directory entries contribute a 22ms overhead. Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
46172a8
to
9cb3301
Compare
Replaces #367, which used the upstream commits directly. This instead includes a merge commit onto `vfs-2.32.0` so we can move forward even if the upstream version is modified. This integrates the sparse-index with `git status`. There is a LOT of code here, and much of it makes future changes (#361 and #364) much simpler than they would be otherwise. This merges into `features/sparse-index` which is intended to be released as an experimental release. The goal of reviewing this PR is to find "good enough for a limited release to dogfooders" and not the same scrutiny of upstream. In particular, I am currently looking for an issue with directory/file conflicts at the sparse-checkout boundary. They don't reveal themselves until `git checkout`, and only in very rare cases that won't affect our dogfooders.
9cb3301
to
77637fd
Compare
Now ready for review. Force-pushed to kick of a new merge and builds now that #374 is merged. |
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
77637fd
to
db3c9e0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nicely done!
I have Trace2 data for the size of the index already being logged.
|
I have some tests that extract the list of cache entries and verify that |
This follows #374 and #361 to complete the necessary functionality for the sparse-index experimental release. This does not include the behavior changes requested upstream. The request is about how `git add` currently ignores changes that match a tracked path outside of the sparse-checkout definition. However, this is an established behavior of Git, so we don't need to change it for our dogfooders.
See gitgitgadget#973 for the upstream submission.
This PR follows #374 which does the heavy lifting.
There is one known issue with this integration: directory/file conflicts at the sparse-checkout boundary. This will not affect our dogfooders so we should ignore that bug for the purpose of the
features/sparse-index
branch. I will fix that upstream.