Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sparse Index: fix a checkout bug with deep sparse-checkout paths #473

Conversation

derrickstolee
Copy link
Collaborator

@derrickstolee derrickstolee commented Dec 4, 2021

We got multiple similar reports of failures with the sparse index (#473 as an issue, and another regarding git checkout via email). Both were hitting a similar set of paths, which was a hint.

The reason we didn't hit this before is because it requires the following:

  1. The sparse-checkout definition needs to have recursive inclusion of deep folders (depth 3 or more).
  2. Adjacent to those deep folders, we need a deep sparse directory entry that receives changes.
  3. In this particular repo, deep directories are only added to the sparse-checkout in rare occasions and those adjacent folders are rarely updated. They happened to update this week and hit our sparse index dogfooders in surprising ways.

The first commit adds a test that fails without the fix. It requires modifying our test data to make adjacent, deep sparse directory entries possible. It's a rather simple test after we have that data change.

The second commit includes the actual fix. It's really just an error of not understanding the difference between the name and traverse_path members of the struct traverse_info structure. name only stores a single tree entry while traverse_path actually includes the full name from root. The method we are editing also has an additional struct name_entry that fills in the tree entry on top of the traverse_path, which explains how this worked to depth two, but not depth three.

Resolves #473
See also gitgitgadget#1092

Edit: an earlier version included some test cleanup that isn't necessary here.

Extend the repository data in the setup of t1092 to include more
directories within two parent directories. This reproduces a bug found
by users of the sparse index feature with suitably-complicated
sparse-checkout definitions.

Add a failing test that fails in its first 'git checkout deepest' run in
the sparse index case with this error:

  error: Your local changes to the following files would be overwritten by checkout:
          deep/deeper1/deepest2/a
          deep/deeper1/deepest3/a
  Please commit your changes or stash them before you switch branches.
  Aborting

The next change will fix this error, and that fix will make it clear why
the extra depth is necessary for revealing this bug. The assignment of
the sparse-checkout definition to include deep/deeper1/deepest as a
sibling directory is important to ensure that deep/deeper1 is not a
sparse directory entry, but deep/deeper1/deepest2 is.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Copy link

@ldennington ldennington left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the detailed explanations in your commit messages, and the test case that made it clear your fix is working. Great work!

derrickstolee added a commit that referenced this pull request Dec 6, 2021
v2.34.1 was released quickly after v2.34.0. The changes are minor, so we can merge them in without too much worry.

We should delay merging until we know if we want to include this with the release that includes #473.
The sparse_dir_matches_path() method compares a cache entry that is a
sparse directory entry against a 'struct traverse_info *info' and a
'struct name_entry *p' to see if the cache entry has exactly the right
name for those other inputs.

This method was introduced in 523506d (unpack-trees: unpack sparse
directory entries, 2021-07-14), but included a significant mistake. The
path comparisons used 'info->name' instead of 'info->traverse_path'.
Since 'info->name' only stores a single tree entry name while
'info->traverse_path' stores the full path from root, this method does
not work when 'info' is in a subdirectory of a directory. Replacing the
right strings and their corresponding lengths make the method work
properly.

The previous change included a failing test that exposes this issue.
That test now passes. The critical detail is that as we go deep into
unpack_trees(), the logic for merging a sparse directory entry with a
tree entry during 'git checkout' relies on this
sparse_dir_matches_path() in order to avoid calling
traverse_trees_recursive() during unpack_callback() in this hunk:

	if (!is_sparse_directory_entry(src[0], names, info) &&
	    traverse_trees_recursive(n, dirmask, mask & ~dirmask,
					    names, info) < 0) {
		return -1;
	}

For deep paths, the short-circuit never occurred and
traverse_trees_recursive() was being called incorrectly and that was
causing other strange issues. Specifically, the error message from the
now-passing test previously included this:

      error: Your local changes to the following files would be overwritten by checkout:
              deep/deeper1/deepest2/a
              deep/deeper1/deepest3/a
      Please commit your changes or stash them before you switch branches.
      Aborting

These messages occurred because the 'current' cache entry in
twoway_merge() was showing as NULL because the index did not contain
entries for the paths contained within the sparse directory entries. We
instead had 'oldtree' given as the entry at HEAD and 'newtree' as the
entry in the target tree. This led to reject_merge() listing these
paths.

Now that sparse_dir_matches_path() works the same for deep paths as it
does for shallow depths, the rest of the logic kicks in to properly
handle modifying the sparse directory entries as designed.

Reported-by: Gustave Granroth <gus.gran@gmail.com>
Reported-by: Mike Marcelais <michmarc@exchange.microsoft.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
@derrickstolee
Copy link
Collaborator Author

Just pushed a quick comment update as requested by @newren upstream.

@derrickstolee derrickstolee merged commit ed66b20 into microsoft:vfs-2.34.0 Dec 6, 2021
ldennington pushed a commit to ldennington/git that referenced this pull request Jan 19, 2022
…th deep sparse-checkout paths

We got multiple similar reports of failures with the sparse index (microsoft#473 as an issue, and another regarding `git checkout` via email). Both were hitting a similar set of paths, which was a hint.

The reason we didn't hit this before is because it requires the following:

1. The sparse-checkout definition needs to have recursive inclusion of deep folders (depth 3 or more).
2. _Adjacent_ to those deep folders, we need a deep sparse directory entry that receives changes.
3. In this particular repo, deep directories are only added to the sparse-checkout in rare occasions and those adjacent folders are rarely updated. They happened to update this week and hit our sparse index dogfooders in surprising ways.

The first commit adds a test that fails without the fix. It requires modifying our test data to make adjacent, deep sparse directory entries possible. It's a rather simple test after we have that data change.

The second commit includes the actual fix. It's really just an error of not understanding the difference between the `name` and `traverse_path` members of the `struct traverse_info` structure. `name` only stores a single tree entry while `traverse_path` actually includes the full name from root. The method we are editing also has an additional `struct name_entry` that fills in the tree entry on top of the `traverse_path`, which explains how this worked to depth two, but not depth three.

Resolves microsoft#473 
See also gitgitgadget#1092
ldennington pushed a commit to ldennington/git that referenced this pull request Jan 20, 2022
…th deep sparse-checkout paths

We got multiple similar reports of failures with the sparse index (microsoft#473 as an issue, and another regarding `git checkout` via email). Both were hitting a similar set of paths, which was a hint.

The reason we didn't hit this before is because it requires the following:

1. The sparse-checkout definition needs to have recursive inclusion of deep folders (depth 3 or more).
2. _Adjacent_ to those deep folders, we need a deep sparse directory entry that receives changes.
3. In this particular repo, deep directories are only added to the sparse-checkout in rare occasions and those adjacent folders are rarely updated. They happened to update this week and hit our sparse index dogfooders in surprising ways.

The first commit adds a test that fails without the fix. It requires modifying our test data to make adjacent, deep sparse directory entries possible. It's a rather simple test after we have that data change.

The second commit includes the actual fix. It's really just an error of not understanding the difference between the `name` and `traverse_path` members of the `struct traverse_info` structure. `name` only stores a single tree entry while `traverse_path` actually includes the full name from root. The method we are editing also has an additional `struct name_entry` that fills in the tree entry on top of the `traverse_path`, which explains how this worked to depth two, but not depth three.

Resolves microsoft#473 
See also gitgitgadget#1092
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants