Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleaning file-based package cache is slow, research if/when we need verify step #29795

Closed
4 tasks done
HonkingGoose opened this issue Jun 21, 2024 · 1 comment · Fixed by #29860
Closed
4 tasks done
Labels
priority-3-medium Default priority, "should be done" but isn't prioritised ahead of others regression Issue about a regression bug, or the PR caused it type:refactor Refactoring or improving of existing code

Comments

@HonkingGoose
Copy link
Collaborator

HonkingGoose commented Jun 21, 2024

Describe the proposed change(s).

Intro

Discussion #29767 describes how this PR may have a regression:

The PR appears to cause a major slowdown on self-hosted users, especially for those using long living containers (CE/EE).

Reproduction

Quotes from the discussion:

The bug reporter said:

This performance hit is pretty consistent for all scan. I even have an example where cleaning the file-based package cache took almost 40 seconds for 1200~ cache entries, whereas prior to the fix, the time taken was much more reasonable.

Now this issue becomes even bigger for long living containers such as Renovate CE/EE.
One workaround is to use the redis/sqlite cache.

Main question is: is this considered a bug? should we change the default package cache to sqlite based? Any other suggestions?

@rarkins responded:

#28169 (comment) could be relevant.

If we rm.content then we shouldn't need to do a verify. It might be the verify which takes all the time

Todos:

  • Confirm the verify step is slower now
  • Research if we need to run a verify after rm.content?
  • Research if we still want to do a verify from time to time to stop the cache growing continuously?
  • Open PR to speed up behavior, or close issue if there's no good solution.
@HonkingGoose HonkingGoose added priority-3-medium Default priority, "should be done" but isn't prioritised ahead of others type:refactor Refactoring or improving of existing code regression Issue about a regression bug, or the PR caused it labels Jun 21, 2024
@RahulGautamSingh
Copy link
Collaborator

RahulGautamSingh commented Jun 22, 2024

Confirm the verify step is slower now

Yes.
In the referenced discussion author did confirm this. I also tested this locally and after removing verify call and using rm.content the time takes was reduced significantly. ( from 2.9s to .3s)

Research if we need to run a verify after rm.content?

No.
rm.content will remove the content from disk, which is what we want. Previously we used rm.entry and it didn't remove content entirely hence we needed the verify call.

Research if we still want to do a verify from time to time to stop the cache growing continuously?

Don't think so.
If the cache already has some unused entries then verify is needed. But if a fresh cache used is along with rm.content, the cache should not keep growing.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jul 27, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
priority-3-medium Default priority, "should be done" but isn't prioritised ahead of others regression Issue about a regression bug, or the PR caused it type:refactor Refactoring or improving of existing code
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants