-
Notifications
You must be signed in to change notification settings - Fork 133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sparse Index: fix a checkout bug with deep sparse-checkout patterns #1092
Sparse Index: fix a checkout bug with deep sparse-checkout patterns #1092
Conversation
Extend the repository data in the setup of t1092 to include more directories within two parent directories. This reproduces a bug found by users of the sparse index feature with suitably-complicated sparse-checkout definitions. Add a failing test that fails in its first 'git checkout deepest' run in the sparse index case with this error: error: Your local changes to the following files would be overwritten by checkout: deep/deeper1/deepest2/a deep/deeper1/deepest3/a Please commit your changes or stash them before you switch branches. Aborting The next change will fix this error, and that fix will make it clear why the extra depth is necessary for revealing this bug. The assignment of the sparse-checkout definition to include deep/deeper1/deepest as a sibling directory is important to ensure that deep/deeper1 is not a sparse directory entry, but deep/deeper1/deepest2 is. Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
/submit |
Submitted as pull.1092.git.1638586534.gitgitgadget@gmail.com To fetch this version into
To fetch this version to local tag
|
@@ -19,6 +19,8 @@ test_expect_success 'setup' ' | |||
mkdir folder1 folder2 deep x && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Elijah Newren wrote (reply to this):
On Fri, Dec 3, 2021 at 6:55 PM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <dstolee@microsoft.com>
>
> The sparse_dir_matches_path() method compares a cache entry that is a
> sparse directory entry against a 'struct traverse_info *info' and a
> 'struct name_entry *p' to see if the cache entry has exactly the right
> name for those other inputs.
>
> This method was introduced in 523506d (unpack-trees: unpack sparse
> directory entries, 2021-07-14), but included a significant mistake. The
> path comparisons used 'info->name' instead of 'info->traverse_path'.
> Since 'info->name' only stores a single tree entry name while
> 'info->traverse_path' stores the full path from root, this method does
> not work when 'info' is in a subdirectory of a directory. Replacing the
> right strings and their corresponding lengths make the method work
> properly.
>
> The previous change included a failing test that exposes this issue.
> That test now passes. The critical detail is that as we go deep into
> unpack_trees(), the logic for merging a sparse directory entry with a
> tree entry during 'git checkout' relies on this
> sparse_dir_matches_path() in order to avoid calling
> traverse_trees_recursive() during unpack_callback() in this hunk:
>
> if (!is_sparse_directory_entry(src[0], names, info) &&
> traverse_trees_recursive(n, dirmask, mask & ~dirmask,
> names, info) < 0) {
> return -1;
> }
>
> For deep paths, the short-circuit never occurred and
> traverse_trees_recursive() was being called incorrectly and that was
> causing other strange issues. Specifically, the error message from the
> now-passing test previously included this:
>
> error: Your local changes to the following files would be overwritten by checkout:
> deep/deeper1/deepest2/a
> deep/deeper1/deepest3/a
> Please commit your changes or stash them before you switch branches.
> Aborting
>
> These messages occurred because the 'current' cache entry in
> twoway_merge() was showing as NULL because the index did not contain
> entries for the paths contained within the sparse directory entries. We
> instead had 'oldtree' given as the entry at HEAD and 'newtree' as the
> entry in the target tree. This led to reject_merge() listing these
> paths.
>
> Now that sparse_dir_matches_path() works the same for deep paths as it
> does for shallow depths, the rest of the logic kicks in to properly
> handle modifying the sparse directory entries as designed.
Eek, sorry for not catching this in my earlier review. Thanks for the
detailed explanation; well analyzed.
>
> Reported-by: Gustave Granroth <gus.gran@gmail.com>
> Reported-by: Mike Marcelais <michmarc@exchange.microsoft.com>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
> t/t1092-sparse-checkout-compatibility.sh | 2 +-
> unpack-trees.c | 10 +++++-----
> 2 files changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
> index e6aef40e9b3..f04a02c6b20 100755
> --- a/t/t1092-sparse-checkout-compatibility.sh
> +++ b/t/t1092-sparse-checkout-compatibility.sh
> @@ -307,7 +307,7 @@ test_expect_success 'add, commit, checkout' '
> test_all_match git checkout -
> '
>
> -test_expect_failure 'deep changes during checkout' '
> +test_expect_success 'deep changes during checkout' '
> init_repos &&
>
> test_sparse_match git sparse-checkout set deep/deeper1/deepest &&
> diff --git a/unpack-trees.c b/unpack-trees.c
> index 89ca95ce90b..7381c275768 100644
> --- a/unpack-trees.c
> +++ b/unpack-trees.c
> @@ -1243,11 +1243,11 @@ static int sparse_dir_matches_path(const struct cache_entry *ce,
> assert(S_ISSPARSEDIR(ce->ce_mode));
> assert(ce->name[ce->ce_namelen - 1] == '/');
>
> - if (info->namelen)
> - return ce->ce_namelen == info->namelen + p->pathlen + 2 &&
> - ce->name[info->namelen] == '/' &&
> - !strncmp(ce->name, info->name, info->namelen) &&
> - !strncmp(ce->name + info->namelen + 1, p->path, p->pathlen);
> + if (info->pathlen)
> + return ce->ce_namelen == info->pathlen + p->pathlen + 1 &&
> + ce->name[info->pathlen - 1] == '/' &&
> + !strncmp(ce->name, info->traverse_path, info->pathlen) &&
> + !strncmp(ce->name + info->pathlen, p->path, p->pathlen);
> return ce->ce_namelen == p->pathlen + 1 &&
> !strncmp(ce->name, p->path, p->pathlen);
> }
> --
The comment at the beginning of this function (not shown in this
patch) is now stale and misleading; it should be corrected too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Derrick Stolee wrote (reply to this):
On 12/4/2021 12:42 AM, Elijah Newren wrote:
> On Fri, Dec 3, 2021 at 6:55 PM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
>> @@ -1243,11 +1243,11 @@ static int sparse_dir_matches_path(const struct cache_entry *ce,
>> assert(S_ISSPARSEDIR(ce->ce_mode));
>> assert(ce->name[ce->ce_namelen - 1] == '/');
>>
>> - if (info->namelen)
>> - return ce->ce_namelen == info->namelen + p->pathlen + 2 &&
>> - ce->name[info->namelen] == '/' &&
>> - !strncmp(ce->name, info->name, info->namelen) &&
>> - !strncmp(ce->name + info->namelen + 1, p->path, p->pathlen);
>> + if (info->pathlen)
>> + return ce->ce_namelen == info->pathlen + p->pathlen + 1 &&
>> + ce->name[info->pathlen - 1] == '/' &&
>> + !strncmp(ce->name, info->traverse_path, info->pathlen) &&
>> + !strncmp(ce->name + info->pathlen, p->path, p->pathlen);
>> return ce->ce_namelen == p->pathlen + 1 &&
>> !strncmp(ce->name, p->path, p->pathlen);
>> }
>> --
>
> The comment at the beginning of this function (not shown in this
> patch) is now stale and misleading; it should be corrected too.
Will do! Thanks for catching that.
Thanks,
-Stolee
On the Git mailing list, Elijah Newren wrote (reply to this):
|
This branch is now known as |
This patch series was integrated into seen via git@a028d3d. |
The sparse_dir_matches_path() method compares a cache entry that is a sparse directory entry against a 'struct traverse_info *info' and a 'struct name_entry *p' to see if the cache entry has exactly the right name for those other inputs. This method was introduced in 523506d (unpack-trees: unpack sparse directory entries, 2021-07-14), but included a significant mistake. The path comparisons used 'info->name' instead of 'info->traverse_path'. Since 'info->name' only stores a single tree entry name while 'info->traverse_path' stores the full path from root, this method does not work when 'info' is in a subdirectory of a directory. Replacing the right strings and their corresponding lengths make the method work properly. The previous change included a failing test that exposes this issue. That test now passes. The critical detail is that as we go deep into unpack_trees(), the logic for merging a sparse directory entry with a tree entry during 'git checkout' relies on this sparse_dir_matches_path() in order to avoid calling traverse_trees_recursive() during unpack_callback() in this hunk: if (!is_sparse_directory_entry(src[0], names, info) && traverse_trees_recursive(n, dirmask, mask & ~dirmask, names, info) < 0) { return -1; } For deep paths, the short-circuit never occurred and traverse_trees_recursive() was being called incorrectly and that was causing other strange issues. Specifically, the error message from the now-passing test previously included this: error: Your local changes to the following files would be overwritten by checkout: deep/deeper1/deepest2/a deep/deeper1/deepest3/a Please commit your changes or stash them before you switch branches. Aborting These messages occurred because the 'current' cache entry in twoway_merge() was showing as NULL because the index did not contain entries for the paths contained within the sparse directory entries. We instead had 'oldtree' given as the entry at HEAD and 'newtree' as the entry in the target tree. This led to reject_merge() listing these paths. Now that sparse_dir_matches_path() works the same for deep paths as it does for shallow depths, the rest of the logic kicks in to properly handle modifying the sparse directory entries as designed. Reported-by: Gustave Granroth <gus.gran@gmail.com> Reported-by: Mike Marcelais <michmarc@exchange.microsoft.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
c914219
to
aa37168
Compare
…parse-checkout paths We got multiple similar reports of failures with the sparse index (#473 as an issue, and another regarding `git checkout` via email). Both were hitting a similar set of paths, which was a hint. The reason we didn't hit this before is because it requires the following: 1. The sparse-checkout definition needs to have recursive inclusion of deep folders (depth 3 or more). 2. _Adjacent_ to those deep folders, we need a deep sparse directory entry that receives changes. 3. In this particular repo, deep directories are only added to the sparse-checkout in rare occasions and those adjacent folders are rarely updated. They happened to update this week and hit our sparse index dogfooders in surprising ways. The first commit adds a test that fails without the fix. It requires modifying our test data to make adjacent, deep sparse directory entries possible. It's a rather simple test after we have that data change. The second commit includes the actual fix. It's really just an error of not understanding the difference between the `name` and `traverse_path` members of the `struct traverse_info` structure. `name` only stores a single tree entry while `traverse_path` actually includes the full name from root. The method we are editing also has an additional `struct name_entry` that fills in the tree entry on top of the `traverse_path`, which explains how this worked to depth two, but not depth three. Resolves #473 See also gitgitgadget#1092
/submit |
Submitted as pull.1092.v2.git.1638799837.gitgitgadget@gmail.com To fetch this version into
To fetch this version to local tag
|
This patch series was integrated into seen via git@bf57f3a. |
This patch series was integrated into seen via git@42d802e. |
This patch series was integrated into seen via git@a8088e9. |
This patch series was integrated into seen via git@2585dde. |
There was a status update in the "New Topics" section about the branch The sparse-index/sparse-checkout feature had a bug in its use of the matching code to determine which path is in or outside the sparse checkout patterns. Will merge to 'next'. source: <pull.1092.v2.git.1638799837.gitgitgadget@gmail.com> |
This patch series was integrated into seen via git@09d3d9c. |
This patch series was integrated into next via git@7b7f742. |
This hit |
…bug with deep sparse-checkout paths We got multiple similar reports of failures with the sparse index (git-for-windows#473 as an issue, and another regarding `git checkout` via email). Both were hitting a similar set of paths, which was a hint. The reason we didn't hit this before is because it requires the following: 1. The sparse-checkout definition needs to have recursive inclusion of deep folders (depth 3 or more). 2. _Adjacent_ to those deep folders, we need a deep sparse directory entry that receives changes. 3. In this particular repo, deep directories are only added to the sparse-checkout in rare occasions and those adjacent folders are rarely updated. They happened to update this week and hit our sparse index dogfooders in surprising ways. The first commit adds a test that fails without the fix. It requires modifying our test data to make adjacent, deep sparse directory entries possible. It's a rather simple test after we have that data change. The second commit includes the actual fix. It's really just an error of not understanding the difference between the `name` and `traverse_path` members of the `struct traverse_info` structure. `name` only stores a single tree entry while `traverse_path` actually includes the full name from root. The method we are editing also has an additional `struct name_entry` that fills in the tree entry on top of the `traverse_path`, which explains how this worked to depth two, but not depth three. Resolves git-for-windows#473 See also gitgitgadget#1092
…th deep sparse-checkout paths We got multiple similar reports of failures with the sparse index (microsoft#473 as an issue, and another regarding `git checkout` via email). Both were hitting a similar set of paths, which was a hint. The reason we didn't hit this before is because it requires the following: 1. The sparse-checkout definition needs to have recursive inclusion of deep folders (depth 3 or more). 2. _Adjacent_ to those deep folders, we need a deep sparse directory entry that receives changes. 3. In this particular repo, deep directories are only added to the sparse-checkout in rare occasions and those adjacent folders are rarely updated. They happened to update this week and hit our sparse index dogfooders in surprising ways. The first commit adds a test that fails without the fix. It requires modifying our test data to make adjacent, deep sparse directory entries possible. It's a rather simple test after we have that data change. The second commit includes the actual fix. It's really just an error of not understanding the difference between the `name` and `traverse_path` members of the `struct traverse_info` structure. `name` only stores a single tree entry while `traverse_path` actually includes the full name from root. The method we are editing also has an additional `struct name_entry` that fills in the tree entry on top of the `traverse_path`, which explains how this worked to depth two, but not depth three. Resolves microsoft#473 See also gitgitgadget#1092
…th deep sparse-checkout paths We got multiple similar reports of failures with the sparse index (microsoft#473 as an issue, and another regarding `git checkout` via email). Both were hitting a similar set of paths, which was a hint. The reason we didn't hit this before is because it requires the following: 1. The sparse-checkout definition needs to have recursive inclusion of deep folders (depth 3 or more). 2. _Adjacent_ to those deep folders, we need a deep sparse directory entry that receives changes. 3. In this particular repo, deep directories are only added to the sparse-checkout in rare occasions and those adjacent folders are rarely updated. They happened to update this week and hit our sparse index dogfooders in surprising ways. The first commit adds a test that fails without the fix. It requires modifying our test data to make adjacent, deep sparse directory entries possible. It's a rather simple test after we have that data change. The second commit includes the actual fix. It's really just an error of not understanding the difference between the `name` and `traverse_path` members of the `struct traverse_info` structure. `name` only stores a single tree entry while `traverse_path` actually includes the full name from root. The method we are editing also has an additional `struct name_entry` that fills in the tree entry on top of the `traverse_path`, which explains how this worked to depth two, but not depth three. Resolves microsoft#473 See also gitgitgadget#1092
…bug with deep sparse-checkout paths We got multiple similar reports of failures with the sparse index (git-for-windows#473 as an issue, and another regarding `git checkout` via email). Both were hitting a similar set of paths, which was a hint. The reason we didn't hit this before is because it requires the following: 1. The sparse-checkout definition needs to have recursive inclusion of deep folders (depth 3 or more). 2. _Adjacent_ to those deep folders, we need a deep sparse directory entry that receives changes. 3. In this particular repo, deep directories are only added to the sparse-checkout in rare occasions and those adjacent folders are rarely updated. They happened to update this week and hit our sparse index dogfooders in surprising ways. The first commit adds a test that fails without the fix. It requires modifying our test data to make adjacent, deep sparse directory entries possible. It's a rather simple test after we have that data change. The second commit includes the actual fix. It's really just an error of not understanding the difference between the `name` and `traverse_path` members of the `struct traverse_info` structure. `name` only stores a single tree entry while `traverse_path` actually includes the full name from root. The method we are editing also has an additional `struct name_entry` that fills in the tree entry on top of the `traverse_path`, which explains how this worked to depth two, but not depth three. Resolves git-for-windows#473 See also gitgitgadget#1092
This week, we rolled out the sparse index to a large internal monorepo. We got two very similar bug reports that dealt with a strange error that involved the same set of paths. One was during
git pull
(pull
was a red herring) and the other wasgit checkout
. Thegit checkout
case gave enough of a reproduction to debug deep intounpack-trees.c
and find the problem.This bug dates back to 523506d (unpack-trees: unpack sparse directory entries, 2021-07-14). The reason we didn't hit this before is because it requires the following:
The first patch adds a test that fails without the fix. It requires modifying our test data to make adjacent, deep sparse directory entries possible. It's a rather simple test after we have that data change.
The second patch includes the actual fix. It's really just an error of not understanding the difference between the
name
andtraverse_path
members of thestruct traverse_info
structure.name
only stores a single tree entry whiletraverse_path
actually includes the full name from root. The method we are editing also has an additionalstruct name_entry
that fills in the tree entry on top of thetraverse_path
, which explains how this worked to depth two, but not depth three.Update in v2
Thanks, -Stolee
cc: stolee@gmail.com
cc: vdye@github.com
cc: gitster@pobox.com
cc: newren@gmail.com