Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for sparse git checkouts #11165

Open
sdroege opened this issue Sep 30, 2022 · 5 comments
Open

Add support for sparse git checkouts #11165

sdroege opened this issue Sep 30, 2022 · 5 comments
Labels
A-git Area: anything dealing with git C-feature-request Category: proposal for a feature. Before PR, ping rust-lang/cargo if this is not `Feature accepted` S-blocked-external Status: ❌ blocked on something out of the direct control of the Cargo project, e.g., upstream fix

Comments

@sdroege
Copy link
Contributor

sdroege commented Sep 30, 2022

Problem

Currently if a git repository is listed as a dependency in Cargo.toml then the whole git repository is cloned, which might contain a lot of other unnecessary things.

Proposed Solution

It would be nice if cargo supported sparse git checkouts so that only a specific subdirectory of the repository is cloned. This subdirectory would have to be specified in Cargo.toml together with the dependency.

cargo would then use this subdirectory as the root and assume it to be either a plain crate or a workspace, like it now does for the actual repository root.

Notes

No response

@sdroege sdroege added the C-feature-request Category: proposal for a feature. Before PR, ping rust-lang/cargo if this is not `Feature accepted` label Sep 30, 2022
@weihanglo
Copy link
Member

Thank you for the suggestion. I learnt new stuff today!

As I understand it, with sparse-checkout, git still needs to clone the whole repository index. Only if pairing with partial-clone can git avoids full clone. That means Cargo needs the supports of both features from libgit2, which Cargo depends on for all git-related operations at this time being. Those two features in libgit2 are still under consideration/development, and you can track them from here and here. Without libgit2 supports them natively, Cargos hardly helps.

In the meanwhile, the Cargo team plans to do experiments on replacing some of git2 functionality in Cargo with gitoxide. The short-term goals of gitoxide doesn't seem to include either sparse-index and partial-clone. You might want to kindly ask them about their opinions on these features.

@weihanglo weihanglo added the A-git Area: anything dealing with git label Sep 30, 2022
@ehuss ehuss added S-blocked S-blocked-external Status: ❌ blocked on something out of the direct control of the Cargo project, e.g., upstream fix labels Sep 30, 2022
anisse added a commit to anisse/advent2023 that referenced this issue Dec 25, 2023
checkout is a bit long (no sparse, see
rust-lang/cargo#11165 ), but it is faster than
fully building z3.

Latest version supports bindgen's pkg-config and builds on Fedora.
@flying-sheep
Copy link

Here’s the issue for partial clones: Byron/gitoxide#1046

@Byron
Copy link
Member

Byron commented Apr 1, 2024

As an update, sparse-indices are supported in the sense that they can be read, and every index interaction does consider them. It's a bit of a fringe feature right now as well, but it's on the radar.

And indeed, partial checkouts would only help so-and-so much if it wasn't accompanied with a way of reducing the initial download size. Today, a shallow clone, i.e. receiving only the data needed for the most recent commit, can already help (presumably, it also takes more time on the remote to generate).

As a next step, I imagine doing a partial clone with blob filter, so only a single commit (or maybe even the whole history) without any blob is downloaded. Despite being custom-generated, it should be fast as blobs should be the most costly here. Finally, gitoxide (partial) checkouts would have to be partial-repository aware, collect the missing blobs, and download them separately as part of the checkout. That pack would only be the subset of blobs actually needed, which should be good for a speed-boost on all sides.

@ibraheemdev
Copy link
Member

Could Cargo use the git cli for sparse checkouts instead?

@weihanglo
Copy link
Member

The idea is that Cargo avoids depending on external binaries. We can't control how external binaries evolve. It might become a compatibility issue. That's why net.git-fetch-with-cli is not the default.

Slightly off-topic. There is a generalized idea: Cargo could provide a plugin interface for fetching sources. I forgot if there is already an existing issue for that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-git Area: anything dealing with git C-feature-request Category: proposal for a feature. Before PR, ping rust-lang/cargo if this is not `Feature accepted` S-blocked-external Status: ❌ blocked on something out of the direct control of the Cargo project, e.g., upstream fix
Projects
None yet
Development

No branches or pull requests

6 participants