Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor git fetcher #9031

Draft
wants to merge 9 commits into
base: master
Choose a base branch
from
Draft

Conversation

zebreus
Copy link

@zebreus zebreus commented Sep 24, 2023

Motivation

Currently, the git fetcher has a problem with a lot of unexpected behavior that is not well documented (#5291, #7195, #8134, #6929, probably more). It is also nearly impossible to adjust because the code is quite complex. This PR rewrites the built-in git fetcher to improve maintainability and performance. It also tries to document the exact behavior of the git fetcher.

Context

The current git fetcher has three relatively independent code paths for dirty local repos, repos with submodules, and repos without submodules. This PR unifies these paths and removes/documents inconsistencies between them. Previously, it was possible to fetch abbreviated tags (git fetch v0.1.0 instead of full ref git fetch refs/tags/v0.1.0) from local repositories but not remote repositories. #7195 documents how all three paths result in a slightly different result. I am pretty sure that there are more inconsistencies like this.

The current git fetcher only fetches Git references(mostly branches or tags). Fetching a specific revision(commit) always requires knowing and fetching a whole reference that it is part of. The only way to fetch a revision without knowing a reference that contains it is by fetching all references (allRefs flag). Before Git version 2.5, Git only supported fetching references. Since Git version 2.5 (2015), it supports fetching by revision. It is supported by GitHub and GitLab (probably most other Git hosting providers too). This PR adjusts the Git fetcher to fetch revisions instead of references. It improves performance when fetching remote repositories, as Git only checks out the necessary commits. It can reduce the amount of data transferred, especially when fetching older commits. It also makes it possible to do shallow fetches, which save a ton of data.

The changes should also make the git fetcher easier to maintain and make the code less spaghetti.

This PR probably contains breaking changes; I will document them once the code is done. For remote or local repositories at a specific reference or revision, I attempt to have everything that worked previously produce the exact same results. Some things that caused errors before may work after this PR (like fetching a revision without specifying allRefs). I think it is okay to break some edge cases for dirty local repositories, but most normal usage should produce the same result as before. There are a lot of changes to caching behavior.

This PR uses the Git CLI and not libgit2 or gitoxide.

The best way to read the diff is probably to just view it as a new file.

Some related issues

This should be most issues that are currently open and related to this PR.

Tasks

Checklist:

  • Caching between commits works (when fetching a commit with already fetched ancestors only the missing commits are fetched).
  • Compared performance to previous implementation
  • Dirty local, submodule and normal repos behave the same way.
  • All three layers of caching respect settings.tarballTtl.
  • Shallow repos works
  • Submodules works
  • Documentation is updated
  • Debug prints are removed
  • Caching behaviour is documented
  • No path does more than one call to remote git
  • Support SHA256 revisions
  • Investigate fetchGit/fetchTree reproducibility problems #5313
  • Squash commits

Priorities

Add 👍 to pull requests you find important.

It was removed because it cannot handle dirty repos.
However dirty repos are now handled by producing a diff
and generating a tree from the diff. That way they can be treated like
clean repos at the tree.
@github-actions github-actions bot added documentation new-cli Relating to the "nix" command with-tests Issues related to testing. PRs with tests have some priority fetching Networking with the outside (non-Nix) world, input locking labels Sep 24, 2023
@zebreus zebreus changed the title Rewrite git fetcher Refactor git fetcher Sep 24, 2023
@Ericson2314
Copy link
Member

This is a very laudable project! It's always good when people want to tackle fundamental issues. Unfortunately it is not the only big overhaul in the works: @edolstra also has a rewrite using libgit2.

@zebreus
Copy link
Author

zebreus commented Sep 25, 2023

I think the current PR for the libgit2 rewrite is #6530 (At least I did not find a different one). It seems like the changes to the git fetcher in that PR are mostly preserving the core logic but use libgit2 instead of the git binary. This PR goes at the core logic and tries to improve it and fix/document bugs.

I am open to adjusting this PR to use libgit2 and merge with #6530. Alternatively I can also delay with this PR until #6530 is merged.

@edolstra Do you think it is possible to rewrite the git fetcher in a way that only uses libgit2 and does not depend on the git binary or are there any blockers in libgit2?

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/2023-10-20-nix-team-meeting-minutes-96/34557/1

@tomberek
Copy link
Contributor

Related: #9240

@Ericson2314
Copy link
Member

Yes the libgit2 rewrite is now merged, so if you would like to continue working on this now would be an excellent time!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation fetching Networking with the outside (non-Nix) world, input locking new-cli Relating to the "nix" command with-tests Issues related to testing. PRs with tests have some priority
Projects
None yet
4 participants