Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sparse index: checkout-index #424

Merged
merged 5 commits into from
Sep 14, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 8 additions & 2 deletions Documentation/git-checkout-index.txt
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ SYNOPSIS
'git checkout-index' [-u] [-q] [-a] [-f] [-n] [--prefix=<string>]
[--stage=<number>|all]
[--temp]
[--sparse]
derrickstolee marked this conversation as resolved.
Show resolved Hide resolved
[-z] [--stdin]
[--] [<file>...]

Expand All @@ -37,8 +38,9 @@ OPTIONS

-a::
--all::
checks out all files in the index. Cannot be used
together with explicit filenames.
checks out all files in the index, excluding those outside
any specified sparse checkout patterns (see `--sparse`).
derrickstolee marked this conversation as resolved.
Show resolved Hide resolved
Cannot be used together with explicit filenames.

-n::
--no-create::
Expand All @@ -59,6 +61,10 @@ OPTIONS
write the content to temporary files. The temporary name
associations will be written to stdout.

--sparse::
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels a little odd. From what I understand, we generally use the --sparse flag to enable sparse behavior, but here we seem to be circumventing it. We may want to consider something more like --ignore-sparse. Let me know if I'm missing something, though!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check out gitgitgadget#1018 which restricts how Git operates outside of the sparse-checkout. The --sparse option allows overriding these protections. It's really new.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ldennington I definitely see where you're coming from (it does seem somewhat backwards that --sparse would expand the index/do things outside the sparse checkout). I did base it on @derrickstolee's linked PR (+ls-trees), and I think this line conveys it best:

a new '--sparse' option is added that ignores the sparse-checkout patterns and the SKIP_WORKTREE bit

The option here is pretty much treating --sparse as "does things in parts of the repository that should be sparse", rather than ignoring those parts of the repository. @derrickstolee please correct me if I'm using it wrong, though - I'm mostly looking for this to be consistent with the other already-implemented options (and offer a way to use checkout-index --all without always expanding the full index).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vdye: I think you have the right approach. We should make Git ignore files outside of the sparse-checkout cone as much as possible, and allow users to override that behavior by specifying --sparse. We do not expect users to have a legitimate reason to care about files outside of the sparse-checkout, but Git needs to care unless we make these changes.

Refresh files outside of the sparse checkout boundary. May
only be used in conjunction with `--all`.

--stdin::
Instead of taking list of paths from the command line,
read list of paths from the standard input. Paths are
Expand Down
26 changes: 22 additions & 4 deletions builtin/checkout-index.c
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
#define USE_THE_INDEX_COMPATIBILITY_MACROS
#include "builtin.h"
#include "config.h"
#include "dir.h"
#include "lockfile.h"
#include "quote.h"
#include "cache-tree.h"
Expand Down Expand Up @@ -65,6 +66,7 @@ static int checkout_file(const char *name, const char *prefix)
int namelen = strlen(name);
int pos = cache_name_pos(name, namelen);
int has_same_name = 0;
int is_file = 0;
int did_checkout = 0;
int errs = 0;

Expand All @@ -78,6 +80,9 @@ static int checkout_file(const char *name, const char *prefix)
break;
has_same_name = 1;
pos++;
if (S_ISSPARSEDIR(ce->ce_mode))
break;
is_file = 1;
if (ce_stage(ce) != checkout_stage
&& (CHECKOUT_ALL != checkout_stage || !ce_stage(ce)))
continue;
Expand Down Expand Up @@ -106,6 +111,8 @@ static int checkout_file(const char *name, const char *prefix)
fprintf(stderr, "git checkout-index: %s ", name);
if (!has_same_name)
fprintf(stderr, "is not in the cache");
else if (!is_file)
fprintf(stderr, "is a sparse directory");
else if (checkout_stage)
fprintf(stderr, "does not exist at stage %d",
checkout_stage);
Expand All @@ -116,15 +123,18 @@ static int checkout_file(const char *name, const char *prefix)
return -1;
}

static int checkout_all(const char *prefix, int prefix_length)
static int checkout_all(const char *prefix, int prefix_length, int include_sparse)
{
int i, errs = 0;
struct cache_entry *last_ce = NULL;

/* TODO: audit for interaction with sparse-index. */
ensure_full_index(&the_index);
if (include_sparse)
ensure_full_index(&the_index);

for (i = 0; i < active_nr ; i++) {
struct cache_entry *ce = active_cache[i];
if (!include_sparse && !path_in_sparse_checkout(ce->name, &the_index))
continue;
if (ce_stage(ce) != checkout_stage
&& (CHECKOUT_ALL != checkout_stage || !ce_stage(ce)))
continue;
Expand Down Expand Up @@ -176,6 +186,7 @@ int cmd_checkout_index(int argc, const char **argv, const char *prefix)
int i;
struct lock_file lock_file = LOCK_INIT;
int all = 0;
int include_sparse = 0;
int read_from_stdin = 0;
int prefix_length;
int force = 0, quiet = 0, not_new = 0;
Expand All @@ -185,6 +196,8 @@ int cmd_checkout_index(int argc, const char **argv, const char *prefix)
struct option builtin_checkout_index_options[] = {
OPT_BOOL('a', "all", &all,
N_("check out all files in the index")),
OPT_BOOL(0, "sparse", &include_sparse,
N_("do not skip files outside the sparse checkout boundary")),
OPT__FORCE(&force, N_("force overwrite of existing files"), 0),
OPT__QUIET(&quiet,
N_("no warning for existing files and files not in index")),
Expand Down Expand Up @@ -212,6 +225,9 @@ int cmd_checkout_index(int argc, const char **argv, const char *prefix)
git_config(git_default_config, NULL);
prefix_length = prefix ? strlen(prefix) : 0;

prepare_repo_settings(the_repository);
the_repository->settings.command_requires_full_index = 0;

if (read_cache() < 0) {
die("invalid cache");
}
Expand Down Expand Up @@ -247,6 +263,8 @@ int cmd_checkout_index(int argc, const char **argv, const char *prefix)

if (all)
die("git checkout-index: don't mix '--all' and explicit filenames");
if (include_sparse)
die("git checkout-index: don't mix '--sparse' and explicit filenames");
Comment on lines +266 to +267
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why shouldn't these be combined? What if I want to poke at a file outside of my sparse-checkout? It certainly seems like git checkout-index --sparse -- deep/deeper2/a would be a good way to get that file on-disk.

Copy link
Collaborator Author

@vdye vdye Sep 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I didn't include --sparse for individual files is because, as it is right now, individually-specified files will always expand the index if that specified file is not already in the index - I wasn't really sure what the "right" way to handle it was, though. On one hand, not requiring a --sparse flag to checkout-index individual files (even those outside the checkout definition) is the most consistent with existing behavior. On the other hand, it is kind of weird that we need a special --sparse option for files in sparse directories for --all, but not when you list out individual files.

What do you think would be better (especially based on how you've handled similar situations in other commands)?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is a plumbing command, let's be cautious and avoid the behavior break by requiring --sparse for individual files. (This is a change of opinion from my earlier comment.)

if (read_from_stdin)
die("git checkout-index: don't mix '--stdin' and explicit filenames");
p = prefix_path(prefix, prefix_length, arg);
Expand Down Expand Up @@ -280,7 +298,7 @@ int cmd_checkout_index(int argc, const char **argv, const char *prefix)
}

if (all)
err |= checkout_all(prefix, prefix_length);
err |= checkout_all(prefix, prefix_length, include_sparse);

if (pc_workers > 1)
err |= run_parallel_checkout(&state, pc_workers, pc_threshold,
Expand Down
1 change: 1 addition & 0 deletions t/perf/p2000-sparse-operations.sh
Original file line number Diff line number Diff line change
Expand Up @@ -112,6 +112,7 @@ test_perf_on_all git commit -a -m A
test_perf_on_all git checkout -f -
test_perf_on_all git reset
test_perf_on_all git reset --hard
test_perf_on_all git checkout-index -f --all
test_perf_on_all git update-index --add --remove
test_perf_on_all git diff
test_perf_on_all git diff --staged
Expand Down
63 changes: 63 additions & 0 deletions t/t1092-sparse-checkout-compatibility.sh
Original file line number Diff line number Diff line change
Expand Up @@ -883,6 +883,67 @@ test_expect_success 'cherry-pick with conflicts' '
test_all_match test_must_fail git cherry-pick to-cherry-pick
'

test_expect_success 'checkout-index inside sparse definition' '
init_repos &&

run_on_all rm -f deep/a &&
test_all_match git checkout-index -- deep/a &&
vdye marked this conversation as resolved.
Show resolved Hide resolved
test_all_match git status --porcelain=v2 &&

echo test >>new-a &&
run_on_all cp ../new-a a &&
test_all_match test_must_fail git checkout-index -- a &&
test_all_match git checkout-index -f -- a &&
test_all_match git status --porcelain=v2
'

test_expect_success 'checkout-index outside sparse definition' '
init_repos &&

# File does not exist on disk yet for sparse checkouts, so checkout-index
# succeeds without -f
test_sparse_match git checkout-index -- folder1/a &&
test_cmp sparse-checkout/folder1/a sparse-index/folder1/a &&
test_cmp sparse-checkout/folder1/a full-checkout/folder1/a &&

run_on_sparse rm -rf folder1 &&
echo test >new-a &&
run_on_sparse mkdir -p folder1 &&
run_on_all cp ../new-a folder1/a &&

test_all_match test_must_fail git checkout-index -- folder1/a &&
test_all_match git checkout-index -f -- folder1/a &&
test_cmp sparse-checkout/folder1/a sparse-index/folder1/a &&
test_cmp sparse-checkout/folder1/a full-checkout/folder1/a
'

test_expect_success 'checkout-index with folders' '
init_repos &&

# Inside checkout definition
test_all_match test_must_fail git checkout-index -f -- deep/ &&

# Outside checkout definition
# Note: although all tests fail (as expected), the messaging differs. For
# non-sparse index checkouts, the error is that the "file" does not appear
# in the index; for sparse checkouts, the error is explicitly that the
# entry is a sparse directory.
Comment on lines +927 to +930
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to make both implementations say "file is outside the sparse-checkout definition"?

  • The "file does not appear in the index" is not entirely correct for full indexes.
  • The "file is a sparse directory" has a bit too much internal information.

I wonder if using if (!path_in_sparse_checkout(name, &the_index)) somewhere would help here.

It's perhaps a bit tricky how this loop is structured, since it is doing a prefix match on a range of index entries.

In this test script, what happens with this example?

git sparse-checkout set deep/deeper1
git checkout-index -f -- deep/

Note that deep/deeper2/ is a sparse directory within the deep/ directory, but we would still want the other entries to succeed, silently ignoring the deep/deeper2/ entry. This would only be an error if the user ran git checkout-index -f -- deep/deeper2/, right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's just comparing the names, not performing a prefix match (the first condition in this check makes sure the names are the same length):

if (ce_namelen(ce) != namelen ||
		    memcmp(ce->name, name, namelen))

The result is that, from your example, git checkout-index -f -- deep/ (or any path that doesn't refer to a file entry in the index) returns git checkout-index: <name> is not in the cache. It seems like this command expects exact names of cache entries (except for the subcommands using a pathspec, but they handle things differently anyway), in which case <name> is not in the cache makes some amount of sense? Agreed that it's misleading, though - would that be something that should be clarified in the command doc, or in this message itself?

As for checking whether something is in the sparse checkout, this doesn't actually block checking out files outside the sparse definition (you can successfully run git checkout-index -- folder1/a without any other repo setup). The issue is that sparse directories do appear as standalone cache entries, so without the S_ISSPARSEDIR check the entry would be passed along to checkout_entry (which then fails because it doesn't handle directories). Since there's no support checkout-index-ing directories otherwise, this probably needs to be blocked rather than worked around. Maybe the message should be the same as whenever anyone tries to checkout-index a directory in general? I.e., "lie" and say <name> is not in the cache, or some other error message that's the same between them?

run_on_all test_must_fail git checkout-index -f -- folder1/ &&
test_cmp full-checkout-err sparse-checkout-err &&
! test_cmp full-checkout-err sparse-index-err &&
grep "is a sparse directory" sparse-index-err
'

test_expect_success 'checkout-index --all' '
init_repos &&

test_all_match git checkout-index --all &&
test_sparse_match test_path_is_missing folder1 &&

test_all_match git checkout-index --sparse --all &&
test_all_match test_path_exists folder1
'

test_expect_success 'clean' '
init_repos &&

Expand Down Expand Up @@ -983,6 +1044,8 @@ test_expect_success 'sparse-index is not expanded' '
echo >>sparse-index/untracked.txt &&
ensure_not_expanded add . &&

ensure_not_expanded checkout-index -f a &&
ensure_not_expanded checkout-index -f --all &&
for ref in update-deep update-folder1 update-folder2 update-deep
do
echo >>sparse-index/README.md &&
Expand Down