Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Sparse Index] Integrate git add with sparse-index #364

Conversation

derrickstolee
Copy link
Collaborator

@derrickstolee derrickstolee commented Jun 4, 2021

This follows #374 and #361 to complete the necessary functionality for the sparse-index experimental release.

This does not include the behavior changes requested upstream. The request is about how git add currently ignores changes that match a tracked path outside of the sparse-checkout definition. However, this is an established behavior of Git, so we don't need to change it for our dogfooders.

@derrickstolee derrickstolee self-assigned this Jun 4, 2021
@derrickstolee derrickstolee changed the base branch from vfs-2.31.1-exp to features/sparse-index June 7, 2021 14:01
@derrickstolee derrickstolee changed the title [Experimental] Integrate git add with sparse-index [Sparse Index] Integrate git add with sparse-index Jun 7, 2021
@derrickstolee derrickstolee force-pushed the sparse-index/vfs-add branch 5 times, most recently from e9444fe to e70be92 Compare June 10, 2021 17:22
Disable command_requires_full_index for 'git add'. This does not require
any additional removals of ensure_full_index(). The main reason is that
'git add' discovers changes based on the pathspec and the worktree
itself. These are then inserted into the index directly, and calls to
index_name_pos() or index_file_exists() already call expand_to_path() at
the appropriate time to support a sparse-index.

Add a test to check that 'git add -A' and 'git add <file>' does not
expand the index at all, as long as <file> is not within a sparse
directory. This does not help the global 'git add .' case.

We can measure the improvement using p2000-sparse-operations.sh with
these results:

Test                                  HEAD~1           HEAD
------------------------------------------------------------------------------
2000.6: git add -A (full-index-v3)    0.35(0.30+0.05)  0.37(0.29+0.06) +5.7%
2000.7: git add -A (full-index-v4)    0.31(0.26+0.06)  0.33(0.27+0.06) +6.5%
2000.8: git add -A (sparse-index-v3)  0.57(0.53+0.07)  0.05(0.04+0.08) -91.2%
2000.9: git add -A (sparse-index-v4)  0.58(0.55+0.06)  0.05(0.05+0.06) -91.4%

While the 91% improvement seems impressive, it's important to recognize
that previously we had significant overhead for expanding the
sparse-index. Comparing to the full index case, 'git add -A' goes from
0.37s to 0.05s, which is "only" an 86% improvement.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
The add_pathspec_matches_against_index() focuses on matching a pathspec
to file entries in the index. This already works correctly for its only
use: checking if untracked files exist in the index.

The compatibility checks in t1092 already test that 'git add <dir>'
works for a directory outside of the sparse cone. That provides coverage
for removing this guard.

This finalizes our ability to run 'git add .' without expanding a sparse
index to a full one. This is evidenced by an update to t1092 and by
these performance numbers for p2000-sparse-operations.sh:

Test                                    HEAD~1            HEAD
--------------------------------------------------------------------------------
2000.10: git add . (full-index-v3)      0.37(0.28+0.07)   0.36(0.27+0.06) -2.7%
2000.11: git add . (full-index-v4)      0.33(0.26+0.06)   0.32(0.28+0.05) -3.0%
2000.12: git add . (sparse-index-v3)    0.57(0.53+0.07)   0.06(0.06+0.07) -89.5%
2000.13: git add . (sparse-index-v4)    0.57(0.53+0.07)   0.05(0.03+0.09) -91.2%

While the ~90% improvement is shown by the test results, it is worth
noting that expanding the sparse index was adding overhead in previous
commits. Comparing to the full index case, we see the performance go
from 0.33s to 0.05s, an 85% improvement.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
derrickstolee added a commit that referenced this pull request Jun 21, 2021
Replaces #367, which used the upstream commits directly. This instead includes a merge commit onto `vfs-2.32.0` so we can move forward even if the upstream version is modified.

This integrates the sparse-index with `git status`. There is a LOT of code here, and much of it makes future changes (#361 and #364) much simpler than they would be otherwise.

This merges into `features/sparse-index` which is intended to be released as an experimental release. The goal of reviewing this PR is to find "good enough for a limited release to dogfooders" and not the same scrutiny of upstream.

In particular, I am currently looking for an issue with directory/file conflicts at the sparse-checkout boundary. They don't reveal themselves until `git checkout`, and only in very rare cases that won't affect our dogfooders.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
@derrickstolee derrickstolee marked this pull request as ready for review June 21, 2021 16:37
Copy link
Member

@dscho dscho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@derrickstolee
Copy link
Collaborator Author

I'm going to merge this and leave the "turn it on by default" part as a follow-up because it is doing strange things with empty repos that I think I can handle, but want to isolate the change.

@derrickstolee derrickstolee merged commit 720b22a into microsoft:features/sparse-index Jun 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants