-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make git fetch
behavior configurable
#8938
Comments
@yujunz thanks for following up on this! A couple questions:
|
@crenshaw-dev thanks for getting back to this topic Gerrit model is much different from GitHub/GitLab. The default checkout command for gerrit is
Gerrit creates a new ref for every commit uploaded for review, i.e. The performance hit of fetching default first is an extra Do you think running |
Can you quantify that doubling in absolute terms? It'll be easier to advocate for this change if we can say "checkout is taking 1000ms instead of 500ms, and the extra 500ms are a problem because of ____." Without that analysis, we risk pushing for something that looks like premature optimization.
There are a few options I've thought of for gc:
A gc after every fetch would 1) slow down GenerateManifest and 2) might prevent parallel repo access (slowing down other GenerateManifest calls). A gc after every N fetches would 1) require the user to research their repo's packfile sizes and configure N to keep disk usage in a tolerable range and 2) pick "loser" GenerateManifest requests to take the gc performance hit. A gc every M seconds would lock the repo at potentially-inconvenient times (like times with high load), and it wouldn't offer any disk usage guarantees (a burst of commits could exceed tolerations). I think all of those are problematic enough to justify looking for alternatives.
Yep, that makes sense! I think your initial suggestion is the most compelling: add a field to the Application spec to configure whether we fetch specific commit SHAs or just do a revisionless fetch. |
Good point. I just made a test in my local machine Fetch default ❯ time git fetch origin
From ssh://git:29418/repo
0b92f49d5f..683c914127 main -> origin/main
git fetch origin 0.05s user 0.05s system 1% cpu 6.682 total
❯ time git fetch origin --verbose
From ssh://git:29418/repo
= [up to date] main -> origin/main
...
= [up to date] wip/yujunz/topic -> origin/wip/yujunz/topic
git fetch origin --verbose 0.05s user 0.04s system 1% cpu 6.825 total Then fetch revision ❯ time git fetch origin 1b6309b0538f610cf42706b7e45745215e78f6bb
remote: Counting objects: 35576, done
remote: Finding sources: 100% (8/8)
remote: Total 8 (delta 6), reused 8 (delta 6)
Unpacking objects: 100% (8/8), 724 bytes | 2.00 KiB/s, done.
From ssh://git:29418/repo
* branch 1b6309b0538f610cf42706b7e45745215e78f6bb -> FETCH_HEAD
git fetch origin 1b6309b0538f610cf42706b7e45745215e78f6bb 0.16s user 0.19s system 2% cpu 11.820 total
❯ time git fetch origin 1b6309b0538f610cf42706b7e45745215e78f6bb
From ssh://git:29418/repo
* branch 1b6309b0538f610cf42706b7e45745215e78f6bb -> FETCH_HEAD
git fetch origin 1b6309b0538f610cf42706b7e45745215e78f6bb 0.10s user 0.07s system 1% cpu 11.798 total This is a typical latency of a cross region fetch. It is a surprise to me that the second fetch is not faster than the first and fetching default is not helping on fetching revision. IIUC, the latency comes from local, remote, roundtrip and network transfer. The performance could be much better on the server side since it is co-located with git repository and much powerful than my Mac. The latency should be no more than a few seconds I suppose.
I was thinking about a weekly job during weekend or daily job over night, i.e. during idle period. It is not a time critical job anyway.
Good to know! |
I guess there could be a /gc API endpoint, so the user could configure a CronJob that matches their needs. Are you facing significant enough performance issues to justify writing a PR with the new Application config field? Unless it's become a problem for a large number of users, I don't know if I'll be able to dedicate dev time to it any time soon. |
Not yet, we are still on the previous release that fetches revision then fallback to default. So not an urgent issue for now. |
For us this is also an issue. And since we have to run v2.3.4 because of the security fixes, there is no way back to pre 2.3.3 releases. From the proposed options 2 and 3 would work for us (2. gc after every N fetches, 3. gc every M seconds). FMPOV the default setup should work for regular cases. If you have a very specific demand when to run GC on demand, the |
Would anyone be up for contributing a PR? I'd be happy to review. |
Can we please revert this performance improvement, because for us now it's not working at all. |
@MichaelSp unfortunately I don't think reversion is an option. The change wasn't only a performance improvement, it resolved issues where folks' disks were filling up and crashing the repo-server. At Intuit we had to downgrade an Argo CD instance until this patch was released, because it was causing issues for an internal customer.
My understanding from the conversation above is that the change is only an issue for high-latency (typically cross-origin) fetches, because the second call introduces more round trips. I'm not sure what you mean when you say that "it's not working at all." Are fetches completely broken, or is the additional latency causing some issue? |
Some repos we fetch by branch and some we fetch with a specific SHA. ALL SHA fetches do no longer work. Example by branch: ✅ project: service
source:
repoURL: 'https://github/org/service'
path: service/dev
targetRevision: main Example by SHA: ❌ project: operator
source:
repoURL: 'https://github/org/service'
path: operator/one
targetRevision: 8b5b0099a0298f3e95320b78cd4bd6e54af43f85 Error message
but cloning the repo locally it is very well possible to Conclusion: The error message is correct, there is no ref like that in the tree, because it has not been fetched. |
@MichaelSp can you confirm that this worked before #8897 ? The exact same actions should be taken, just in a different order. Before (abbreviated): err = gitClient.Fetch(revision)
if err != nil {
err = gitClient.Fetch("")
err = gitClient.Checkout(revision, submoduleEnabled)
} After (abbreviated): err = gitClient.Fetch("")
err = gitClient.Checkout(revision, submoduleEnabled)
if err != nil {
// Looks like this is where it's failing for you. But this would have been the _first_ thing called before.
err = gitClient.Fetch(revision)
} |
I can't run particular builds, but only releases:
I guess I have some debugging to do. Thanks @crenshaw-dev for putting my on the right track. 😢 I'll come back with my findings. |
Apologies 🤦 I was running an outdated |
Summary
In the latest release v2.3.3,
git fetch
behavior was changed to default refspecs first by #8897. This causes performance hit for gerrit server which keeps in-review commits inrefs/changes/*
outside default refspec.Motivation
The 👍s in #5605 shows the use case of fetching commit outside default ref spec is not neglect-able.
Our deployment model depends heavily on unmerged commit especially for frequently deployed stage environments. Fetch default then checkout will fail almost on every deploy while fetching specific ref will succeed no matter it is in default spec or not. The disk usage is a fair tradeoff for performance to us.
Checked some of our repository using the fetch commit first mechanism. The frequent used repository does cost more disk usage (553M / 11 files vs 6.4G / 41 files). Despite the high ratio (~8x by size, ~4x by files), the absolute disk usage is not huge. And
git gc
would effectively reduce it to 540M / 3 files in a few seconds.Proposal
git fetch
behavior configurable. It doesn't have be to exposed in UI. A boolean field in the spec would be good enough as a feature flag since it is a one time configuration and most admins should be comfortable to set it via CLI or declarative configuration.git gc
periodically in repo server to reduce the disk usage if that is a concern. In this way, we get both the performance on checkout and efficient disk usage with a relatively low price of a background job when idle.The text was updated successfully, but these errors were encountered: