-
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor git fetcher #9031
base: master
Are you sure you want to change the base?
Refactor git fetcher #9031
Conversation
It was removed because it cannot handle dirty repos. However dirty repos are now handled by producing a diff and generating a tree from the diff. That way they can be treated like clean repos at the tree.
This is a very laudable project! It's always good when people want to tackle fundamental issues. Unfortunately it is not the only big overhaul in the works: @edolstra also has a rewrite using libgit2. |
I think the current PR for the libgit2 rewrite is #6530 (At least I did not find a different one). It seems like the changes to the git fetcher in that PR are mostly preserving the core logic but use libgit2 instead of the git binary. This PR goes at the core logic and tries to improve it and fix/document bugs. I am open to adjusting this PR to use libgit2 and merge with #6530. Alternatively I can also delay with this PR until #6530 is merged. @edolstra Do you think it is possible to rewrite the git fetcher in a way that only uses libgit2 and does not depend on the git binary or are there any blockers in libgit2? |
This pull request has been mentioned on NixOS Discourse. There might be relevant details there: https://discourse.nixos.org/t/2023-10-20-nix-team-meeting-minutes-96/34557/1 |
Related: #9240 |
Yes the libgit2 rewrite is now merged, so if you would like to continue working on this now would be an excellent time! |
Motivation
Currently, the git fetcher has a problem with a lot of unexpected behavior that is not well documented (#5291, #7195, #8134, #6929, probably more). It is also nearly impossible to adjust because the code is quite complex. This PR rewrites the built-in git fetcher to improve maintainability and performance. It also tries to document the exact behavior of the git fetcher.
Context
The current git fetcher has three relatively independent code paths for dirty local repos, repos with submodules, and repos without submodules. This PR unifies these paths and removes/documents inconsistencies between them. Previously, it was possible to fetch abbreviated tags (
git fetch v0.1.0
instead of full refgit fetch refs/tags/v0.1.0
) from local repositories but not remote repositories. #7195 documents how all three paths result in a slightly different result. I am pretty sure that there are more inconsistencies like this.The current git fetcher only fetches Git references(mostly branches or tags). Fetching a specific revision(commit) always requires knowing and fetching a whole reference that it is part of. The only way to fetch a revision without knowing a reference that contains it is by fetching all references (
allRefs
flag). Before Git version 2.5, Git only supported fetching references. Since Git version 2.5 (2015), it supports fetching by revision. It is supported by GitHub and GitLab (probably most other Git hosting providers too). This PR adjusts the Git fetcher to fetch revisions instead of references. It improves performance when fetching remote repositories, as Git only checks out the necessary commits. It can reduce the amount of data transferred, especially when fetching older commits. It also makes it possible to do shallow fetches, which save a ton of data.The changes should also make the git fetcher easier to maintain and make the code less spaghetti.
This PR probably contains breaking changes; I will document them once the code is done. For remote or local repositories at a specific reference or revision, I attempt to have everything that worked previously produce the exact same results. Some things that caused errors before may work after this PR (like fetching a revision without specifying allRefs). I think it is okay to break some edge cases for dirty local repositories, but most normal usage should produce the same result as before. There are a lot of changes to caching behavior.
This PR uses the Git CLI and not libgit2 or gitoxide.
The best way to read the diff is probably to just view it as a new file.
Some related issues
This should be most issues that are currently open and related to this PR.
git ls-remote url ref
and picking the first result. because fetching from references is unpredictable anyway. Keep in mind that this can lead to weird caching issues when there are refs with the same name as tags. This PR also adds documentation for the exact behavior.sha256:
still fails and assumes sha1 #6496: Support for sha256Nix.fetchGit
surprisingly usesexport-ignore
git attribute #7195: The problem with inconsistent behavior. This PR documents and unifies that. Previously, clone-like behavior was used for everything with submodules and dirty local repos, and archive-like behavior for everything else.sourceInfo
when usinggit+
flake reference vs.github
#6929Tasks
Checklist:
settings.tarballTtl
.Priorities
Add 👍 to pull requests you find important.