-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sparse index: checkout-index
#424
Conversation
@@ -59,6 +61,10 @@ OPTIONS | |||
write the content to temporary files. The temporary name | |||
associations will be written to stdout. | |||
|
|||
--sparse:: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels a little odd. From what I understand, we generally use the --sparse
flag to enable sparse behavior, but here we seem to be circumventing it. We may want to consider something more like --ignore-sparse
. Let me know if I'm missing something, though!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check out gitgitgadget#1018 which restricts how Git operates outside of the sparse-checkout. The --sparse
option allows overriding these protections. It's really new.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ldennington I definitely see where you're coming from (it does seem somewhat backwards that --sparse
would expand the index/do things outside the sparse checkout). I did base it on @derrickstolee's linked PR (+ls-trees
), and I think this line conveys it best:
a new '--sparse' option is added that ignores the sparse-checkout patterns and the SKIP_WORKTREE bit
The option here is pretty much treating --sparse
as "does things in parts of the repository that should be sparse", rather than ignoring those parts of the repository. @derrickstolee please correct me if I'm using it wrong, though - I'm mostly looking for this to be consistent with the other already-implemented options (and offer a way to use checkout-index --all
without always expanding the full index).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vdye: I think you have the right approach. We should make Git ignore files outside of the sparse-checkout cone as much as possible, and allow users to override that behavior by specifying --sparse
. We do not expect users to have a legitimate reason to care about files outside of the sparse-checkout, but Git needs to care unless we make these changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good progress. I have a few questions.
# NEEDSWORK: even in sparse checkouts, checkout-index --all will create all | ||
# files (even those outside the sparse definition) on disk. However, these files | ||
# don't appear in the percentage of tracked files in git status. | ||
test_expect_failure 'checkout-index --all' ' | ||
init_repos && | ||
|
||
test_all_match git checkout-index --all && | ||
test_sparse_match test_path_is_missing folder1 | ||
' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! Scary that this could completely abolish all performance benefits for a sparse-checkout!
# Note: although all tests fail (as expected), the messaging differs. For | ||
# non-sparse index checkouts, the error is that the "file" does not appear | ||
# in the index; for sparse checkouts, the error is explicitly that the | ||
# entry is a sparse directory. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to make both implementations say "file is outside the sparse-checkout definition"?
- The "file does not appear in the index" is not entirely correct for full indexes.
- The "file is a sparse directory" has a bit too much internal information.
I wonder if using if (!path_in_sparse_checkout(name, &the_index))
somewhere would help here.
It's perhaps a bit tricky how this loop is structured, since it is doing a prefix match on a range of index entries.
In this test script, what happens with this example?
git sparse-checkout set deep/deeper1
git checkout-index -f -- deep/
Note that deep/deeper2/
is a sparse directory within the deep/
directory, but we would still want the other entries to succeed, silently ignoring the deep/deeper2/
entry. This would only be an error if the user ran git checkout-index -f -- deep/deeper2/
, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's just comparing the names, not performing a prefix match (the first condition in this check makes sure the names are the same length):
if (ce_namelen(ce) != namelen ||
memcmp(ce->name, name, namelen))
The result is that, from your example, git checkout-index -f -- deep/
(or any path that doesn't refer to a file entry in the index) returns git checkout-index: <name> is not in the cache
. It seems like this command expects exact names of cache entries (except for the subcommands using a pathspec, but they handle things differently anyway), in which case <name> is not in the cache
makes some amount of sense? Agreed that it's misleading, though - would that be something that should be clarified in the command doc, or in this message itself?
As for checking whether something is in the sparse checkout, this doesn't actually block checking out files outside the sparse definition (you can successfully run git checkout-index -- folder1/a
without any other repo setup). The issue is that sparse directories do appear as standalone cache entries, so without the S_ISSPARSEDIR
check the entry would be passed along to checkout_entry
(which then fails because it doesn't handle directories). Since there's no support checkout-index
-ing directories otherwise, this probably needs to be blocked rather than worked around. Maybe the message should be the same as whenever anyone tries to checkout-index
a directory in general? I.e., "lie" and say <name> is not in the cache
, or some other error message that's the same between them?
if (include_sparse) | ||
die("git checkout-index: don't mix '--sparse' and explicit filenames"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why shouldn't these be combined? What if I want to poke at a file outside of my sparse-checkout? It certainly seems like git checkout-index --sparse -- deep/deeper2/a
would be a good way to get that file on-disk.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason I didn't include --sparse
for individual files is because, as it is right now, individually-specified files will always expand the index if that specified file is not already in the index - I wasn't really sure what the "right" way to handle it was, though. On one hand, not requiring a --sparse
flag to checkout-index
individual files (even those outside the checkout definition) is the most consistent with existing behavior. On the other hand, it is kind of weird that we need a special --sparse
option for files in sparse directories for --all
, but not when you list out individual files.
What do you think would be better (especially based on how you've handled similar situations in other commands)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is a plumbing command, let's be cautious and avoid the behavior break by requiring --sparse
for individual files. (This is a change of opinion from my earlier comment.)
Add tests to cover basic `checkout-index` cases related to sparse checkouts: files and folder, both inside and outside of sparse checkout definition. New tests will serve as a baseline for expected behavior when integrating `checkout-index` with the sparse index. Of note is the test demonstrating the behavior of `checkout-index --all`; creating files even outside the sparse checkout definition is somewhat unintuitive, and will cause significant performance issues when run with a sparse index. The goal of later changes will be to change this default behavior and introduce a `--sparse` flag to enable populating files outside the sparse definition. Signed-off-by: Victoria Dye <vdye@github.com>
Change the default behavior of `checkout-index --all` for sparse checkouts to no longer refresh files outside the sparse checkout definition. The newly-added `--sparse` option, when used with `--all`, maintains the "old" behavior and checks out files outside the sparse checkout definition. Signed-off-by: Victoria Dye <vdye@github.com>
Add repository settings to allow usage of the sparse index. In order to prevent unexpected errors when attempting to check out a sparse directory entry, `checkout_file` directly checks whether a found entry is a sparse directory and, if so, exits with an error. The test corresponding to this case now verifies the error message, intentionally differing from the non-sparse index scenarios. Signed-off-by: Victoria Dye <vdye@github.com>
Update `checkout_all` to only run `ensure_full_index` when entries in sparse directories are needed (i.e., `--sparse` is specified). Signed-off-by: Victoria Dye <vdye@github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of my prior concerns were misconceptions about how git checkout-index
works. Thanks for clearing them up!
`git checkout-index --all` is a subcommand of `git stash`, so it is helpful to verify the performance of that particular usage. Signed-off-by: Victoria Dye <vdye@github.com>
@derrickstolee + @ldennington I added one last change (performance test for
The huge speedup seems to mostly come from having |
Yeah! Those 75-90s times are definitely from writing too much data to the working tree. It's nice to see exactly why this is such an important change to include and why users would not like that old behavior! |
Sparse index: `checkout-index`
Sparse index: `checkout-index`
Sparse index: `checkout-index`
Sparse index: `checkout-index`
Sparse index: `checkout-index`
Sparse index: `checkout-index`
Sparse index: `checkout-index`
Sparse index: `checkout-index`
Sparse index: `checkout-index`
Sparse index: `checkout-index`
Sparse index: `checkout-index`
Sparse index: `checkout-index`
Sparse index: `checkout-index`
Sparse index: `checkout-index`
Sparse index: `checkout-index`
Sparse index: `checkout-index`
Changes
t1092
to covercheckout-index
checkout-index --all
to make more sense with sparse checkouts (sparse index or not)--sparse
flag maintaining "old" behaviorcheckout-index
, including correspondingensure_not_expanded
tests