-
Notifications
You must be signed in to change notification settings - Fork 760
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Port all git functionality to use git CLI #3833
Conversation
db17525
to
a8686e2
Compare
Do we introduce a specific git cli minimum version requirement this way? I remember graphite breaking because my linux distro git was to old for their wrapper. |
CodSpeed Performance ReportMerging #3833 will degrade performances by 9.61%Comparing Summary
Benchmarks breakdown
|
3a8b6cb
to
b631841
Compare
@konstin I don't think this relies on anything even relatively new, but I can audit the commands and see the minimum version we require. |
Benchmarks are pretty inconsistent but it seems like overall this change introduces a ~5% regression, which is expected because we have to shell out to Git quite a bit. We may be able to improve this somewhat taking into account the new cost (e.g. some of the |
#[derive(Debug, Copy, Clone, PartialEq, Eq, PartialOrd, Ord, Hash)] | ||
pub struct GitOid { | ||
len: usize, | ||
bytes: [u8; 40], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
libgit2
hex encodes object IDs to reduce their size (by half). However we can't use the hex
crate directly because we would need to support odd-length strings, so I left it out for now. We could also just store the IDs as String
s and do a bit of extra cloning.
// Skip anything related to templates, they just call all sorts of issues as | ||
// we really don't want to use them yet they insist on being used. See #6240 | ||
// for an example issue that comes up. | ||
// opts.external_template(false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't seem to find a way to do this with the CLI...
crates/uv-git/src/git.rs
Outdated
git_config.set_bool("core.autocrlf", false)?; | ||
} | ||
// if let Ok(mut git_config) = self.repo.config() { | ||
// git_config.set_bool("core.autocrlf", false)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also unsure if we need to do this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can just remove this. We don't support vendoring right? Unless I'm misunderstanding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What sort of vendoring are you thinking of? core.autocrlf
configures whether git automatically converts line endings (CRLF to LF) for checked out files. I think this can still affect us, but I'm not sure whether it matters (or why it matters for Cargo)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah you mean cargo vendoring. Yeah that looks to be why they needed this, it shouldn't affect us. I'll remove it.
} | ||
|
||
let mut out = [0; 40]; | ||
out[..s.len()].copy_from_slice(s.as_bytes()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to validate that this a valid hex?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so, git will end up failing if we try to identify a commit using a malformed/broken hash anyways. I guess we could for better error messages, but it seems unlikely anyways?
@@ -970,10 +458,6 @@ pub(crate) fn fetch( | |||
} | |||
}; | |||
|
|||
maybe_gc_repo(repo)?; | |||
|
|||
clean_repo_temp_files(repo); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why remove these two?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both of those were linked to libgit2 issues. The git cli runs garbage collection automatically on git fetch
, git commit
, and other commands, so it shouldn't be a problem now.
} | ||
let result = match self { | ||
// Resolve the commit pointed to by the tag. | ||
Self::Tag(s) => repo.rev_parse(&format!("refs/remotes/origin/tags/{s}^0")), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does the ^0
signify here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
^0
is short for ^{commit}
, which recursively "peels" to the underlying commit object. e.g. if the tag points to another tag, which then points to a commit. I'm not 100% sure if we need to do this, but the previous code had calls to peel(ObjectType::Commit)
, so this retains the original behavior. I'll add a more detailed comment explaining this.
So just confirming -- the overall scheme here is the same, right? We have one checkout for Git repo, and then we fetch individual commits and clone them into separate directories to build? |
crates/uv-git/src/git.rs
Outdated
ProcessBuilder::new("git") | ||
.arg("init") | ||
.cwd(path) | ||
.exec_with_streaming(&mut silent, &mut silent, true)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any risk of dropping important output here? (What did we do with Git CLI output before, when fetching?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the same thing we've always done in fetch_with_cli
. The true
means that stderr output will still be captured in the error message if the command fails, the silent
callbacks just make sure that the stdout/stderr output is not piped to the user's terminal (Command::{stdout,stderr}
are overridden).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turns out we can use exec_with_output
here instead, which is a bit clearer.
Yes, there should be no change in behavior. |
Summary
We currently rely on libgit2 for most git-related functionality. However, libgit2 has long-standing performance issues, as well as lags significantly behind git in terms of new features. For these reasons we now use the git CLI by default for fetching repositories (#1781). This PR completely drops libgit2 in favor of the git CLI for all git-related functionality, which should allow us to use features such as partial clones and sparse checkouts in the future for performance.
There is also a lot of technical debt in the current git code as it's mostly taken from Cargo. Switching to the git CLI vastly simplifies the
uv-git
codebase.Eventually we might want to look into switching to
gitoxide
, but it's currently too immature for our use case.Test Plan