Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(repo-server): excess git requests, add shared cache lock on revisions (Issue #14725) #17109

Merged
merged 6 commits into from
Mar 25, 2024

Conversation

nromriell
Copy link
Contributor

@nromriell nromriell commented Feb 7, 2024

Additional Fixes #14725

Third part of a split of the #16309 and follow-up to #16410

Another piece of the fixes for #14725. This resolves the excess ls-remote requests that still occur in high application count environments due to the fact that all active processes get their cache invalidated at the same time. When this happens they all try to re-fetch the remotes at the same time and all miss the cache.

The solution is to add a soft cache level lock on the key, the first process to try and access that key on a miss will obtain the lock and attempt to refresh the cache.

This changed required adding the ability to use "set only if not exists" on the util/cache (redis NX or memcached add) interface. Without this option all of the processes could still end up in a position where they think they own the lock, so there are some minor changes to the util/cache interface:

  • Adds a DisableOverwrite option to represent the NX/add during set option
  • The Item struct now also contains options for the cache set actions. This allows passing Delete and DisableOverwrite in a shared options value and replaces the position of the current delete option in the SetItem call

This is the final major logic portion of the changes to remove all the excess git calls, but we can get some good performance improvements by also adding the two layer cache to the caching layer for revisions. So I'll be following up with another PR to add that. Most of the abstraction to prepare for that was handled as part of this PR.

Baseline without changes with 200 multi-source applications with a ref only source:
Screenshot from 2023-11-18 18-24-30

With all the changes up to and including this PR under the same conditions:
Screenshot from 2023-11-21 03-19-14

Checklist:

  • Either (a) I've created an enhancement proposal and discussed it with the community, (b) this is a bug fix, or (c) this does not need to be in the release notes.
  • The title of the PR states what changed and the related issues number (used for the release note).
  • The title of the PR conforms to the Toolchain Guide
  • I've included "Closes [ISSUE #]" or "Fixes [ISSUE #]" in the description to automatically close the associated issue.
  • I've updated documentation as required by this PR.
  • I have signed off all my commits as required by DCO
  • I have written unit and/or e2e tests for my change. PRs without these are unlikely to be merged.
  • My build is green (troubleshooting builds).
  • I have added a brief description of why this PR is necessary and/or what this PR solves.

Copy link
Member

@ishitasequeira ishitasequeira left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nromriell Thanks for following up with this improvement PRs.

Took an initial quick view on the PR. Left some minor comments. Will test this PR and share feedback.

reposerver/cache/cache.go Outdated Show resolved Hide resolved
util/git/client.go Outdated Show resolved Hide resolved
Copy link
Member

@ishitasequeira ishitasequeira left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall PR looks great. @nromriell just minor comments .

@leoluz @crenshaw-dev Could you provide an additional set of eyes to review in case I overlooked anything?

util/cache/client.go Outdated Show resolved Hide resolved
util/git/client.go Outdated Show resolved Hide resolved
reposerver/cache/cache.go Outdated Show resolved Hide resolved
reposerver/cache/cache.go Outdated Show resolved Hide resolved
util/git/client.go Outdated Show resolved Hide resolved
util/git/client.go Outdated Show resolved Hide resolved
reposerver/cache/cache.go Outdated Show resolved Hide resolved
reposerver/cache/cache.go Show resolved Hide resolved
util/git/client.go Outdated Show resolved Hide resolved
@nromriell
Copy link
Contributor Author

With the refactor I need to re-run the tests with the changes live, I'll try and take a look at that tonight

reposerver/cache/cache.go Outdated Show resolved Hide resolved
reposerver/repository/repository_test.go Outdated Show resolved Hide resolved
util/cache/cache.go Outdated Show resolved Hide resolved
Copy link
Member

@ishitasequeira ishitasequeira left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nromriell left some minor nits.

@agaudreault do you have any other concerns on the PR?

@agaudreault
Copy link
Member

@nromriell left some minor nits.

@agaudreault do you have any other concerns on the PR?

No new comments, LGTM :) Good work and good refactor 👍

Signed-off-by: nromriell <nateromriell@gmail.com>
Signed-off-by: nromriell <nateromriell@gmail.com>
Signed-off-by: nromriell <nateromriell@gmail.com>
Signed-off-by: nromriell <nateromriell@gmail.com>
@nromriell
Copy link
Contributor Author

Thanks @ishitasequeira @agaudreault for all your time on these great reviews!

Fixed that PR feedback @ishitasequeira and I was able to finally test this out and it's looking good to me after fixing one more issue I found where the cmd.Flags for revisionCacheLockTimeout was writing to the wrong variable.

I had to swap the rollup period to actually show the difference between 2.10.4 and this change since it includes the previous work, but here's the metrics between 2.10.4 and this PR. The swap is at about the 10:37 mark.
Screenshot from 2024-03-23 11-03-02

Copy link
Member

@ishitasequeira ishitasequeira left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @nromriell for addressing the comments. I missed this one last check earlier

Comment on lines 229 to 230
case err != nil && err != ErrCacheMiss:
return "", err
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to log the err if it's not ErrCacheMiss ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As is this will always log in util/git/client.go#getRefs for debug but this is treated as a critical failure so definitely seems like a good idea to have here to make sure it gets logged. Added in an error log for it

Signed-off-by: nromriell <nateromriell@gmail.com>
Copy link
Member

@ishitasequeira ishitasequeira left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @nromriell, LGTM!!

Copy link
Member

@agaudreault agaudreault left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Good job 💪

Copy link
Member

@agaudreault agaudreault left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Good job 💪

@ishitasequeira ishitasequeira merged commit c4fdc54 into argoproj:master Mar 25, 2024
27 checks passed
lyda pushed a commit to lyda/argo-cd that referenced this pull request Mar 28, 2024
…ions (Issue argoproj#14725) (argoproj#17109)

* fix(repo-server): excess git requests, cache lock on revisions

Signed-off-by: nromriell <nateromriell@gmail.com>

* fix: pr feedback, simplify, add configurable variable

Signed-off-by: nromriell <nateromriell@gmail.com>

* fix: codegen, lint

Signed-off-by: nromriell <nateromriell@gmail.com>

* fix: test print, no opts set, var type nit

Signed-off-by: nromriell <nateromriell@gmail.com>

* chore: add additional logging for unexpected cache error

Signed-off-by: nromriell <nateromriell@gmail.com>

---------

Signed-off-by: nromriell <nateromriell@gmail.com>
Co-authored-by: Ishita Sequeira <46771830+ishitasequeira@users.noreply.github.com>
Signed-off-by: Kevin Lyda <kevin@lyda.ie>
mkieweg pushed a commit to mkieweg/argo-cd that referenced this pull request Jun 11, 2024
…ions (Issue argoproj#14725) (argoproj#17109)

* fix(repo-server): excess git requests, cache lock on revisions

Signed-off-by: nromriell <nateromriell@gmail.com>

* fix: pr feedback, simplify, add configurable variable

Signed-off-by: nromriell <nateromriell@gmail.com>

* fix: codegen, lint

Signed-off-by: nromriell <nateromriell@gmail.com>

* fix: test print, no opts set, var type nit

Signed-off-by: nromriell <nateromriell@gmail.com>

* chore: add additional logging for unexpected cache error

Signed-off-by: nromriell <nateromriell@gmail.com>

---------

Signed-off-by: nromriell <nateromriell@gmail.com>
Co-authored-by: Ishita Sequeira <46771830+ishitasequeira@users.noreply.github.com>
Hariharasuthan99 pushed a commit to AmadeusITGroup/argo-cd that referenced this pull request Jun 16, 2024
…ions (Issue argoproj#14725) (argoproj#17109)

* fix(repo-server): excess git requests, cache lock on revisions

Signed-off-by: nromriell <nateromriell@gmail.com>

* fix: pr feedback, simplify, add configurable variable

Signed-off-by: nromriell <nateromriell@gmail.com>

* fix: codegen, lint

Signed-off-by: nromriell <nateromriell@gmail.com>

* fix: test print, no opts set, var type nit

Signed-off-by: nromriell <nateromriell@gmail.com>

* chore: add additional logging for unexpected cache error

Signed-off-by: nromriell <nateromriell@gmail.com>

---------

Signed-off-by: nromriell <nateromriell@gmail.com>
Co-authored-by: Ishita Sequeira <46771830+ishitasequeira@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Excessive requests to git repository on commit to monorepo w/ multi-source apps
3 participants