Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] sparse-checkout: basics of builtin #171

Closed

Conversation

derrickstolee
Copy link
Collaborator

Implements init, add, and list. Save 'remove' for later. Creating the PR to get initial feedback, build an installer, and do some testing in the microsoft/[redacted] repo.

sparse-checkout.c Outdated Show resolved Hide resolved
@kewillford
Copy link
Member

I realize that this is a WIP and not necessarily what will go upstream so this is mainly a suggestion of how we could get it more quickly accepted upstream.

I think we need to help how sparse-checkout works as is and then enhance what the built-in does from there. So the first cut would add and remove entries as is to the sparse-checkout and turn the feature on and off. Not sure if we would turn on as soon as someone adds something and turn off if the last entry is removed or have a different command they have to run.

Turn on could just be setting the core.sparseCheckout to true and add the automatic "checkout" later.

Turning off is a little more tricky because it would be nice if we could help

Another tricky thing is fully repopulating the working directory when you no longer want sparse checkout. You cannot just disable "sparse checkout" because skip-worktree bits are still in the index and your working directory is still sparsely populated. You should re-populate the working directory with the $GIT_DIR/info/sparse-checkout file content as follows:
/*

So we would have to add only /* to the sparse-checkout, run a checkout, and then remove the sparse-checkout and turn off the core.sparseCheckout

I would think upstream would have little push back since it is really helping the sparse-checkout feature. We could then add more "enhancements" on top of it.

@derrickstolee
Copy link
Collaborator Author

I realize that this is a WIP and not necessarily what will go upstream so this is mainly a suggestion of how we could get it more quickly accepted upstream.

I appreciate the feedback, @kewillford!

I think we need to help how sparse-checkout works as is and then enhance what the built-in does from there. So the first cut would add and remove entries as is to the sparse-checkout and turn the feature on and off. Not sure if we would turn on as soon as someone adds something and turn off if the last entry is removed or have a different command they have to run.

You are right that we should give more respect to the feature and how it works right now, then build on it. The problem I'm balancing right now is that we need a "restricted patterns" mode for faster performance. My current focus is on making that work first, then generalizing to the full feature.

Turn on could just be setting the core.sparseCheckout to true and add the automatic "checkout" later.

Perhaps the core.sparseCheckout variable should generalize to have false, true, and cone, where the new cone value means we only allow the patterns that are fast. Then, git sparse-checkout init --cone could work the way I have it right now, while without --cone would work the way you suggest (and include /* by default).

Turning off is a little more tricky because it would be nice if we could help

Another tricky thing is fully repopulating the working directory when you no longer want sparse checkout. You cannot just disable "sparse checkout" because skip-worktree bits are still in the index and your working directory is still sparsely populated. You should re-populate the working directory with the $GIT_DIR/info/sparse-checkout file content as follows:
/*

So we would have to add only /* to the sparse-checkout, run a checkout, and then remove the sparse-checkout and turn off the core.sparseCheckout

I have no problem with this idea: call it git sparse-checkout disable? Would the init instead want to be enable?

I would think upstream would have little push back since it is really helping the sparse-checkout feature. We could then add more "enhancements" on top of it.

Your ideas are a great first step. I do want some feedback on the full vision before getting anything else upstream, so I'll prepare an RFC whose first commits do as you suggest, but later commits give the full feature (including the performance boosts).

return -1; /* undecided */
}

strbuf_addch(&parent_pathname, '/');
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wilbaker here is where the new "fast" pattern matching logic is.

Implements init, add, and list. Save 'remove' for later.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
derrickstolee added a commit that referenced this pull request Aug 22, 2019
I'm creating this PR for a fourth time (see #163, #171, and #178 for earlier versions). This version is tracking my progress to create something that can be sent as an RFC upstream, but also can be used to start the sparse feature branch. This is based on v2.23.0.

Note: I currently have conflicts with the virtual filesystem feature, and I'll resolve those with a merge commit when I'm ready. I'm just creating this for tracking progress at the moment, but can also be a place for early feedback.

* git sparse-checkout disable to disable the sparse-checkout feature and return to a full checkout.
* git sparse-checkout init --cone, to initialize in cone mode.
* git sparse-checkout add when core.sparseCheckout=cone.

This includes performance improvements with the hashset-based matching algorithm, but I'll leave further enhancements as smaller steps on top. Here are a few things I want to try:

Track the maximum depth of a prefix pattern, so we know to not run hashes for deeper paths.
Use the "known exclude" bit in cone mode to stop hashing paths we know will not be included.
Use the "known include" bit in cone mode to stop hashing paths we know will be included. This is more difficult than "known exclude" because we need to distinguish directories from files when doing path matches so we don't give a directory a "known include" when it isn't a recursive pattern match.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants