-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
repo-server duplicates git packfiles, filling up disk #8845
Comments
The code itself is a little difficult to follow. Here's the important part. err = gitClient.Init()
// the old way
//err = gitClient.Fetch("")
//err = gitClient.Checkout(revision, false)
// push a commit here
//err = gitClient.Fetch("")
//err = gitClient.Checkout(revision, false)
// observe that the pack file has not been duplicated
// the new way
err = gitClient.Fetch("some-revision")
err = gitClient.Checkout("FETCH_HEAD", false)
// push a commit here
err = gitClient.Fetch("some-revision")
err = gitClient.Checkout("FETCH_HEAD", false)
// observe that the pack file HAS been duplicated. Basically if you comment out the new way and uncomment the old way, you won't observe the duplicated pack files. I've created a way to reproduce the bug with just git:
I'm not sure what this teaches us. |
|
Okay. So when you run git does however support fetching a specific SHA. That will resolve even refs which are not in the default refspec. So when a user specifies Unfortunately, it seems like git isn't very tidy when you fetch specific SHAs like that. So here is my proposal. Instead of defaulting to fetching specific commits, first just do a standard For users of non-standard refs, this causes a performance hit, because you run For users of standard refs, this improves disk usage, because
Will put up a PR tomorrow. |
Another demonstration: mkdir argo-cd
cd argo-cd/
git init
git remote add origin https://github.com/argoproj/argo-cd.git
git fetch origin
git checkout 497e53b0203638409e3083fa2ffac7d8fb3cce14
git fetch origin
git checkout 32be020af0f8bf6438201ee79b4d2b8037c57154
git fetch origin
git checkout 32d33dedcc70d94177384b235891b99d89497273
git fetch origin
git checkout 2e65b42f05bcc1401d1489e751993ec197f6942c
git fetch origin
git checkout b1ff9dbe1e3e3b2520e94eefc77d0322c765cd75
ls .git/objects/pack # shows two files
du -h . # current directory is 96M cd ..
mkdir argo-cd-fetch
cd argo-cd-fetch/
git init
git remote add origin https://github.com/argoproj/argo-cd.git
git checkout FETCH_HEAD
git fetch origin 497e53b0203638409e3083fa2ffac7d8fb3cce14
git checkout FETCH_HEAD
git fetch origin 32be020af0f8bf6438201ee79b4d2b8037c57154
git checkout FETCH_HEAD
git fetch origin 32d33dedcc70d94177384b235891b99d89497273
git checkout FETCH_HEAD
git fetch origin 2e65b42f05bcc1401d1489e751993ec197f6942c
git checkout FETCH_HEAD
git fetch origin b1ff9dbe1e3e3b2520e94eefc77d0322c765cd75
git checkout FETCH_HEAD
ls .git/objects/pack. # shows ten files
du -sh . # current directory is 244M |
I asked on StackOverflow why the packfile behavior is so different and got a really interesting answer: https://stackoverflow.com/questions/71618307/why-would-fetching-specific-git-commits-use-more-disk-space-than-fetching-all |
Looks like this is a pretty bad regression. We should cherry-pick fix into v2.3 |
…oj#8845) (argoproj#8897) fix: prevent excessive repo-server disk usage for large repos (argoproj#8845) (argoproj#8897) Signed-off-by: Michael Crenshaw <michael@crenshaw.dev> Signed-off-by: wojtekidd <wojtek.cichon@protonmail.com>
* fix(ui): Applications page incorrectly resets to tiles view. Fixes argoproj#8702 (argoproj#8718) Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> * fix: correct jsonnet paths resolution (argoproj#8721) Signed-off-by: Alexander Matyushentsev <AMatyushentsev@gmail.com> * chore: Bump stable version of application set addon (argoproj#8744) Signed-off-by: Alexander Matyushentsev <AMatyushentsev@gmail.com> * fix: Retry checkbox unchecked unexpectedly; Sync up with YAML (argoproj#8682) (argoproj#8720) Signed-off-by: Keith Chong <kykchong@redhat.com> * Bump version to 2.3.1 * Bump version to 2.3.1 * Merge pull request from GHSA-2f5v-8r3f-8pww * fix: application resource APIs must enforce project restrictions Signed-off-by: Alexander Matyushentsev <AMatyushentsev@gmail.com> * Fix unit tests Signed-off-by: jannfis <jann@mistrust.net> Co-authored-by: jannfis <jann@mistrust.net> * chore: remove lint-docs CI task (argoproj#8722) (argoproj#8858) * chore: remove lint-docs CI task Signed-off-by: Alexander Matyushentsev <AMatyushentsev@gmail.com> * chore: remove not longer necessary url-allow-list Signed-off-by: Alexander Matyushentsev <AMatyushentsev@gmail.com> Co-authored-by: Alexander Matyushentsev <AMatyushentsev@gmail.com> * chore: fix imports (argoproj#8859) Signed-off-by: Michael Crenshaw <michael@crenshaw.dev> * Bump version to 2.3.2 * Bump version to 2.3.2 * fix: Set QPS and burst rate for resource ops client (argoproj#8915) * fix: Set QPS and burst rate for resource ops client Signed-off-by: jannfis <jann@mistrust.net> * fix: prevent excessive repo-server disk usage for large repos (argoproj#8845) (argoproj#8897) fix: prevent excessive repo-server disk usage for large repos (argoproj#8845) (argoproj#8897) Signed-off-by: Michael Crenshaw <michael@crenshaw.dev> * fix: bump gitops engine version to v0.6.2 Signed-off-by: Alexander Matyushentsev <AMatyushentsev@gmail.com> * docs: update v2.4+ roadmap items (argoproj#8593) Signed-off-by: ishitasequeira <isequeir@redhat.com> * docs: reflect v2.3 release changes in roadmap.md (argoproj#8747) docs: reflect v2.3 release changes in roadmap.md (argoproj#8747) Signed-off-by: Alexander Matyushentsev <AMatyushentsev@gmail.com> * Bump version to 2.3.3 * Bump version to 2.3.3 * Add manifest for OnePipeline (cherry picked from commit 15aa211080ef020e6a2ceaee9b845eb3259db237) * Load additional resource overrides from dedicated ConfigMap * Run unit tests (cherry picked from commit 7605d5b0e2e816bb1cf9a29c5910c0fd511900c2) * Install and config Git for unit tests (cherry picked from commit 05dda11f6adf3191712b4598c8d55fe8ca1647a6) * Add doc for changes * feat: Argo CD CI pipeline changes (argoproj#4) * updated cicd image * upadted registry region * updated one.pipeline.yaml to use the latest scripts * updated makefile to add required targets * feat: Argo CD v2.3.2 (argoproj#5) * fix(ui): Applications page incorrectly resets to tiles view. Fixes argoproj#8702 (argoproj#8718) Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> * fix: correct jsonnet paths resolution (argoproj#8721) Signed-off-by: Alexander Matyushentsev <AMatyushentsev@gmail.com> * fix: Retry checkbox unchecked unexpectedly; Sync up with YAML (argoproj#8682) (argoproj#8720) Signed-off-by: Keith Chong <kykchong@redhat.com> * chore: Bump stable version of application set addon (argoproj#8744) Signed-off-by: Alexander Matyushentsev <AMatyushentsev@gmail.com> * Bump version to 2.3.1 * Bump version to 2.3.1 * Merge pull request from GHSA-2f5v-8r3f-8pww * fix: application resource APIs must enforce project restrictions Signed-off-by: Alexander Matyushentsev <AMatyushentsev@gmail.com> * Fix unit tests Signed-off-by: jannfis <jann@mistrust.net> Co-authored-by: jannfis <jann@mistrust.net> * chore: remove lint-docs CI task (argoproj#8722) (argoproj#8858) * chore: remove lint-docs CI task Signed-off-by: Alexander Matyushentsev <AMatyushentsev@gmail.com> * chore: remove not longer necessary url-allow-list Signed-off-by: Alexander Matyushentsev <AMatyushentsev@gmail.com> Co-authored-by: Alexander Matyushentsev <AMatyushentsev@gmail.com> * chore: fix imports (argoproj#8859) Signed-off-by: Michael Crenshaw <michael@crenshaw.dev> * Bump version to 2.3.2 * Bump version to 2.3.2 * feat: Updated CHANGES.md Co-authored-by: Yuan Tang <terrytangyuan@gmail.com> Co-authored-by: Alexander Matyushentsev <AMatyushentsev@gmail.com> Co-authored-by: Keith Chong <kykchong@redhat.com> Co-authored-by: argo-bot <argoproj@gmail.com> Co-authored-by: jannfis <jann@mistrust.net> Co-authored-by: Michael Crenshaw <michael@crenshaw.dev> * feat: Add .whitesource configuration file (argoproj#6) Co-authored-by: whitesource-ets[bot] <328400+whitesource-ets[bot]@users.noreply.github.ibm.com> * docs: CHANGES.md Updated the CHANGES.md file to include updated information about changes made. Contributes to: automation-saas/native-AWS#1413 Signed-off-by: Sujeily Fonseca <sujeily.fonseca@ibm.com> Co-authored-by: Yuan Tang <terrytangyuan@gmail.com> Co-authored-by: Alexander Matyushentsev <AMatyushentsev@gmail.com> Co-authored-by: Keith Chong <kykchong@redhat.com> Co-authored-by: argo-bot <argoproj@gmail.com> Co-authored-by: jannfis <jann@mistrust.net> Co-authored-by: Michael Crenshaw <michael@crenshaw.dev> Co-authored-by: Ishita Sequeira <46771830+ishitasequeira@users.noreply.github.com> Co-authored-by: Nikolas McGovern <Nikolas.McGovern@ibm.com> Co-authored-by: Rahul Mourya <Rahul.Mourya2@ibm.com> Co-authored-by: whitesource-ets[bot] <328400+whitesource-ets[bot]@users.noreply.github.ibm.com>
* fix(ui): Applications page incorrectly resets to tiles view. Fixes argoproj#8702 (argoproj#8718) Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> * fix: correct jsonnet paths resolution (argoproj#8721) Signed-off-by: Alexander Matyushentsev <AMatyushentsev@gmail.com> * chore: Bump stable version of application set addon (argoproj#8744) Signed-off-by: Alexander Matyushentsev <AMatyushentsev@gmail.com> * fix: Retry checkbox unchecked unexpectedly; Sync up with YAML (argoproj#8682) (argoproj#8720) Signed-off-by: Keith Chong <kykchong@redhat.com> * Bump version to 2.3.1 * Bump version to 2.3.1 * Merge pull request from GHSA-2f5v-8r3f-8pww * fix: application resource APIs must enforce project restrictions Signed-off-by: Alexander Matyushentsev <AMatyushentsev@gmail.com> * Fix unit tests Signed-off-by: jannfis <jann@mistrust.net> Co-authored-by: jannfis <jann@mistrust.net> * chore: remove lint-docs CI task (argoproj#8722) (argoproj#8858) * chore: remove lint-docs CI task Signed-off-by: Alexander Matyushentsev <AMatyushentsev@gmail.com> * chore: remove not longer necessary url-allow-list Signed-off-by: Alexander Matyushentsev <AMatyushentsev@gmail.com> Co-authored-by: Alexander Matyushentsev <AMatyushentsev@gmail.com> * chore: fix imports (argoproj#8859) Signed-off-by: Michael Crenshaw <michael@crenshaw.dev> * Bump version to 2.3.2 * Bump version to 2.3.2 * fix: Set QPS and burst rate for resource ops client (argoproj#8915) * fix: Set QPS and burst rate for resource ops client Signed-off-by: jannfis <jann@mistrust.net> * fix: prevent excessive repo-server disk usage for large repos (argoproj#8845) (argoproj#8897) fix: prevent excessive repo-server disk usage for large repos (argoproj#8845) (argoproj#8897) Signed-off-by: Michael Crenshaw <michael@crenshaw.dev> * fix: bump gitops engine version to v0.6.2 Signed-off-by: Alexander Matyushentsev <AMatyushentsev@gmail.com> * docs: update v2.4+ roadmap items (argoproj#8593) Signed-off-by: ishitasequeira <isequeir@redhat.com> * docs: reflect v2.3 release changes in roadmap.md (argoproj#8747) docs: reflect v2.3 release changes in roadmap.md (argoproj#8747) Signed-off-by: Alexander Matyushentsev <AMatyushentsev@gmail.com> * Bump version to 2.3.3 * Bump version to 2.3.3 * fix: Fix docs build error (argoproj#8895) * work with specific jinja version Signed-off-by: pashavictorovich <pavel@codefresh.io> * fix: fix broken monaco editor collapse icons (argoproj#8709) Signed-off-by: Alexander Matyushentsev <AMatyushentsev@gmail.com> * chore: upgrade to go 1.17.8 (argoproj#8866) (argoproj#9004) * chore: upgrade to go 1.17.8 Signed-off-by: Michael Crenshaw <michael@crenshaw.dev> * chore: use 1.17 so it's always latest in the series Signed-off-by: Michael Crenshaw <michael@crenshaw.dev> * fix: allow cli/ui to follow logs (argoproj#8987) (argoproj#9065) Signed-off-by: Daniel Helfand <helfand.4@gmail.com> * Merge pull request from GHSA-xmg8-99r8-jc2j Signed-off-by: Michael Crenshaw <michael@crenshaw.dev> Co-authored-by: Michael Crenshaw <michael@crenshaw.dev> * Merge pull request from GHSA-6gcg-hp2x-q54h * fix: do not allow symlinks from directory-type applications Signed-off-by: Michael Crenshaw <michael@crenshaw.dev> * chore: add new util file Signed-off-by: Michael Crenshaw <michael@crenshaw.dev> * chore: lint Signed-off-by: Michael Crenshaw <michael@crenshaw.dev> * chore: use t.TempDir for simpler tests Signed-off-by: Michael Crenshaw <michael@crenshaw.dev> * address comments Signed-off-by: Michael Crenshaw <michael@crenshaw.dev> * Merge pull request from GHSA-r642-gv9p-2wjj Signed-off-by: jannfis <jann@mistrust.net> Co-authored-by: Michael Crenshaw <michael@crenshaw.dev> Co-authored-by: Michael Crenshaw <michael@crenshaw.dev> * Bump version to 2.3.4 * Bump version to 2.3.4 * test: fix ErrorContains (argoproj#9445) Signed-off-by: Michael Crenshaw <michael@crenshaw.dev> * fix: missing Helm params (argoproj#9565) (argoproj#9566) * fix: missing Helm params Signed-off-by: Michael Crenshaw <michael@crenshaw.dev> * use absolute paths, fix tests Signed-off-by: Michael Crenshaw <michael@crenshaw.dev> * fix race in test Signed-off-by: Michael Crenshaw <michael@crenshaw.dev> * chore: upgrade golangci-lint to v1.46.2 (argoproj#9448) * chore: upgrade golangci-lint to v1.46.2 Because: * Installation of golangci-lint v1.45.2 is currently broken and fails silently due to a redacted dependency (blizzy78/varnamelen#13) This commit: * Upgrades golangci-lint to v1.46.2 Signed-off-by: Tommaso Sardelli <lacapannadelloziotom@gmail.com> * fix: lint Signed-off-by: Michael Crenshaw <michael@crenshaw.dev> * fix: lint Signed-off-by: Tommaso Sardelli <lacapannadelloziotom@gmail.com> Co-authored-by: Michael Crenshaw <michael@crenshaw.dev> Signed-off-by: Michael Crenshaw <michael@crenshaw.dev> * fix: test race (argoproj#9469) Signed-off-by: Michael Crenshaw <michael@crenshaw.dev> * chore: lint issues Signed-off-by: Michael Crenshaw <michael@crenshaw.dev> * chore: update golangci-lint (argoproj#8988) * chore: update golangci-lint Signed-off-by: Michael Crenshaw <michael@crenshaw.dev> * chore: remove obsolete repo-server unit test (argoproj#9559) Signed-off-by: Alexander Matyushentsev <AMatyushentsev@gmail.com> * chore: Make unit tests run on platforms other than amd64 (argoproj#8995) Signed-off-by: jannfis <jann@mistrust.net> Co-authored-by: Michael Crenshaw <michael@crenshaw.dev> Signed-off-by: Michael Crenshaw <michael@crenshaw.dev> * chore: eliminate go-mpatch dependency (argoproj#9045) * chore: eliminate go-mpatch dependency Signed-off-by: Michael Crenshaw <michael@crenshaw.dev> * chore: abstract out resource list function Signed-off-by: Michael Crenshaw <michael@crenshaw.dev> * chore: don't exit the program in anything but the main function Signed-off-by: Michael Crenshaw <michael@crenshaw.dev> * chore: better error messages Signed-off-by: Michael Crenshaw <michael@crenshaw.dev> * chore: better error messages Signed-off-by: Michael Crenshaw <michael@crenshaw.dev> * test: directory app manifest generation (argoproj#9503) * test: directory app manifest generation Signed-off-by: Michael Crenshaw <michael@crenshaw.dev> * git doesn't support empty dirs Signed-off-by: Michael Crenshaw <michael@crenshaw.dev> * Merge pull request from GHSA-h4w9-6x78-8vrj Signed-off-by: Michael Crenshaw <michael@crenshaw.dev> * Merge pull request from GHSA-2m7h-86qq-fp4v Signed-off-by: Michael Crenshaw <michael@crenshaw.dev> fix references Signed-off-by: Michael Crenshaw <michael@crenshaw.dev> use long enough state param for oauth2 Signed-off-by: Michael Crenshaw <michael@crenshaw.dev> typo Signed-off-by: Michael Crenshaw <michael@crenshaw.dev> more entropy Signed-off-by: Michael Crenshaw <michael@crenshaw.dev> fix test Signed-off-by: Michael Crenshaw <michael@crenshaw.dev> * Merge pull request from GHSA-q4w5-4gq2-98vm Signed-off-by: Michael Crenshaw <michael@crenshaw.dev> * Merge pull request from GHSA-jhqp-vf4w-rpwq Signed-off-by: Michael Crenshaw <michael@crenshaw.dev> defer instead of multiple close calls Signed-off-by: Michael Crenshaw <michael@crenshaw.dev> oops Signed-off-by: Michael Crenshaw <michael@crenshaw.dev> don't count jsonnet against max Signed-off-by: Michael Crenshaw <michael@crenshaw.dev> fix codegen Signed-off-by: Michael Crenshaw <michael@crenshaw.dev> add caveat about 300x ratio Signed-off-by: Michael Crenshaw <michael@crenshaw.dev> fix versions Signed-off-by: Michael Crenshaw <michael@crenshaw.dev> fix tests/lint Signed-off-by: Michael Crenshaw <michael@crenshaw.dev> * chore: fix docs gen Signed-off-by: Michael Crenshaw <michael@crenshaw.dev> * Bump version to 2.3.5 * Bump version to 2.3.5 * docs: Changes for v2.3.5 Documented key decision factors to use Argo CD v2.3.5. Contributes to: automation-saas/automation-saas/native-AWS#1972 Signed-off-by: Sujeily Fonseca <sujeily.fonseca@ibm.com> Co-authored-by: Yuan Tang <terrytangyuan@gmail.com> Co-authored-by: Alexander Matyushentsev <AMatyushentsev@gmail.com> Co-authored-by: Keith Chong <kykchong@redhat.com> Co-authored-by: argo-bot <argoproj@gmail.com> Co-authored-by: jannfis <jann@mistrust.net> Co-authored-by: Michael Crenshaw <michael@crenshaw.dev> Co-authored-by: Ishita Sequeira <46771830+ishitasequeira@users.noreply.github.com> Co-authored-by: pasha-codefresh <pavel@codefresh.io> Co-authored-by: Daniel Helfand <helfand.4@gmail.com> Co-authored-by: Tommaso Sardelli <lacapannadelloziotom@gmail.com>
Describe the bug
In version 2.3.1, each time repo-server fetches a new commit from a repo which has a packfile, the packfile is duplicated. So N commits means N packfiles, when git should only have one packfile. If the file is big, this can fill up the repo-server disk.
This doesn't happen in 2.2.7.
To Reproduce
Create an app using a repo which has a packfile. I've been using a repo which has ~70k commits. When I clone that locally, I can see that there's a packfile in .git/objects/packs.
Remote into the repo-server pod and list the files in /tmp/_argocd-repo//.git/objects/head. You'll have to
chmod +rx
some directories to get access. There should be one .idx and one .pack file.Push a new commit to the repo and do a hard refresh on the app. List pack files again, and you'll see an additional .idx and an additional .pack file.
Expected behavior
I expected git to maintain one pack file.
Version
v2.3.1
Logs
I've manually added
--verbose
andGIT_TRACE=1
to the git calls. There's nothing interesting in the logs as far as I can tell.I've also commented out the initializer and closer logic that sets repo directory permissions as well as manually setting the _argocd-repo permissions to rwx. No effect.
Finally I've tried downgrading git to 2.30.2 by building an image based on Ubuntu 21.04. Same bug.
I'm out of hunches.
The text was updated successfully, but these errors were encountered: