Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Our CI suffers from cache thrashing #9850

Open
andreasabel opened this issue Mar 27, 2024 · 3 comments
Open

Our CI suffers from cache thrashing #9850

andreasabel opened this issue Mar 27, 2024 · 3 comments

Comments

@andreasabel
Copy link
Member

At this point in time, GH has evicted all our CI caches of our default branch except one:

$ gh actions-cache list --branch master
Showing 1 of 1 cache entries in haskell/cabal

Linux-8.8.4-1c1230ca228cc03a9ee68166243af358ab3992fc  237.45 MB  refs/heads/master  2 hours ago

$ date
Wed Mar 27 18:12:40 CET 2024

This effectively means our caching isn't working in a good way. Because if master is bare of caches, every PR has to start building all the dependencies from scratch.

Caching has to be engineered so that the latest master caches remain given the expected amount of PRs that are active in parallel (and generate their own caches).

In PRs #9845 and #9849 I have tried to do sensible cache engineering so that a new cache is only written if dependencies actually got updated.
However, our main workhorse validate.yml simply creates a new cache on every run (for each OS and GHC it runs) by including the github.sha in the cache key.

key: ${{ runner.os }}-${{ matrix.ghc }}-${{ github.sha }}

With each cache around 300MB, you can do simple arithmetics now to see that our 10GB limit is busted all the time (even 30 GB wouldn't make much of a difference).

E.g. these are the 46 caches by one currently active PR:

$ gh actions-cache list -L 100 --branch refs/pull/9718/merge
Showing 46 of 46 cache entries in haskell/cabal

Linux-8.8.4-aa5d3e49b1736ef89bfbc4f26c7875ad653264d3 257.98 MB refs/pull/9718/merge 6 hours ago
macOS-8.6.5-aa5d3e49b1736ef89bfbc4f26c7875ad653264d3 212.56 MB refs/pull/9718/merge 6 hours ago
macOS-9.2.8-aa5d3e49b1736ef89bfbc4f26c7875ad653264d3 204.08 MB refs/pull/9718/merge 6 hours ago
macOS-8.10.7-aa5d3e49b1736ef89bfbc4f26c7875ad653264d3 209.47 MB refs/pull/9718/merge 6 hours ago
macOS-9.0.2-aa5d3e49b1736ef89bfbc4f26c7875ad653264d3 206.74 MB refs/pull/9718/merge 6 hours ago
macOS-8.8.4-aa5d3e49b1736ef89bfbc4f26c7875ad653264d3 212.92 MB refs/pull/9718/merge 6 hours ago
macOS-9.4.8-aa5d3e49b1736ef89bfbc4f26c7875ad653264d3 186.53 MB refs/pull/9718/merge 6 hours ago
macOS-9.6.3-aa5d3e49b1736ef89bfbc4f26c7875ad653264d3 174.85 MB refs/pull/9718/merge 6 hours ago
macOS-9.8.1-aa5d3e49b1736ef89bfbc4f26c7875ad653264d3 175.46 MB refs/pull/9718/merge 6 hours ago
Windows-9.2.8-aa5d3e49b1736ef89bfbc4f26c7875ad653264d3 276.43 MB refs/pull/9718/merge 6 hours ago
Windows-9.0.2-aa5d3e49b1736ef89bfbc4f26c7875ad653264d3 270.71 MB refs/pull/9718/merge 6 hours ago
Windows-9.4.8-aa5d3e49b1736ef89bfbc4f26c7875ad653264d3 241.21 MB refs/pull/9718/merge 6 hours ago
Windows-9.6.3-aa5d3e49b1736ef89bfbc4f26c7875ad653264d3 224.59 MB refs/pull/9718/merge 6 hours ago
Windows-9.8.1-aa5d3e49b1736ef89bfbc4f26c7875ad653264d3 223.40 MB refs/pull/9718/merge 7 hours ago
Linux-9.2.8-aa5d3e49b1736ef89bfbc4f26c7875ad653264d3 229.11 MB refs/pull/9718/merge 7 hours ago
Linux-8.10.7-aa5d3e49b1736ef89bfbc4f26c7875ad653264d3 227.63 MB refs/pull/9718/merge 7 hours ago
Linux-8.8.4-aa5d3e49b1736ef89bfbc4f26c7875ad653264d3 227.29 MB refs/pull/9718/merge 7 hours ago
Linux-9.0.2-aa5d3e49b1736ef89bfbc4f26c7875ad653264d3 227.26 MB refs/pull/9718/merge 7 hours ago
Linux-8.6.5-aa5d3e49b1736ef89bfbc4f26c7875ad653264d3 279.99 MB refs/pull/9718/merge 7 hours ago
Linux-9.4.8-aa5d3e49b1736ef89bfbc4f26c7875ad653264d3 216.40 MB refs/pull/9718/merge 7 hours ago
Linux-9.8.1-aa5d3e49b1736ef89bfbc4f26c7875ad653264d3 177.89 MB refs/pull/9718/merge 7 hours ago
Linux-9.6.3-aa5d3e49b1736ef89bfbc4f26c7875ad653264d3 177.28 MB refs/pull/9718/merge 7 hours ago
bootstrap-Linux-8.10.7-20221115-aa5d3e49b1736ef89bfbc4f26c7875ad653264d3 279.37 MB refs/pull/9718/merge 7 hours ago
bootstrap-Linux-9.0.2-20221115-aa5d3e49b1736ef89bfbc4f26c7875ad653264d3 270.39 MB refs/pull/9718/merge 7 hours ago
Linux-9.2.8-aa5d3e49b1736ef89bfbc4f26c7875ad653264d3 326.73 MB refs/pull/9718/merge 7 hours ago
bootstrap-Linux-9.4.8-20221115-aa5d3e49b1736ef89bfbc4f26c7875ad653264d3 267.49 MB refs/pull/9718/merge 7 hours ago
bootstrap-Linux-9.2.8-20221115-aa5d3e49b1736ef89bfbc4f26c7875ad653264d3 248.42 MB refs/pull/9718/merge 7 hours ago
bootstrap-Linux-9.8.1-20221115-aa5d3e49b1736ef89bfbc4f26c7875ad653264d3 201.04 MB refs/pull/9718/merge 7 hours ago
bootstrap-Linux-9.6.4-20221115-aa5d3e49b1736ef89bfbc4f26c7875ad653264d3 202.54 MB refs/pull/9718/merge 7 hours ago
bootstrap-Linux-9.8.1-20221115-5d6f3f3a6464c1ff23b625ac924fbb125d580a97 201.04 MB refs/pull/9718/merge 7 hours ago
bootstrap-Linux-9.2.8-20221115-5d6f3f3a6464c1ff23b625ac924fbb125d580a97 248.41 MB refs/pull/9718/merge 7 hours ago
bootstrap-Linux-9.6.4-20221115-5d6f3f3a6464c1ff23b625ac924fbb125d580a97 202.55 MB refs/pull/9718/merge 7 hours ago
Linux-fix-whitespace-0.1 3.01 MB refs/pull/9718/merge 7 hours ago
bootstrap-Linux-9.4.8-20221115-5d6f3f3a6464c1ff23b625ac924fbb125d580a97 267.49 MB refs/pull/9718/merge 7 hours ago
bootstrap-Linux-9.0.2-20221115-5d6f3f3a6464c1ff23b625ac924fbb125d580a97 270.39 MB refs/pull/9718/merge 7 hours ago
bootstrap-Linux-8.10.7-20221115-5d6f3f3a6464c1ff23b625ac924fbb125d580a97 279.36 MB refs/pull/9718/merge 7 hours ago
bootstrap-Linux-9.8.1-20221115-016216c7e7e7f383ddd2c5497fa737640f0b792b 201.06 MB refs/pull/9718/merge 8 hours ago
bootstrap-Linux-9.6.4-20221115-016216c7e7e7f383ddd2c5497fa737640f0b792b 202.54 MB refs/pull/9718/merge 8 hours ago
bootstrap-Linux-8.10.7-20221115-016216c7e7e7f383ddd2c5497fa737640f0b792b 279.37 MB refs/pull/9718/merge 8 hours ago
bootstrap-Linux-9.2.8-20221115-016216c7e7e7f383ddd2c5497fa737640f0b792b 248.41 MB refs/pull/9718/merge 8 hours ago
bootstrap-Linux-9.4.8-20221115-016216c7e7e7f383ddd2c5497fa737640f0b792b 267.48 MB refs/pull/9718/merge 8 hours ago
bootstrap-Linux-9.0.2-20221115-016216c7e7e7f383ddd2c5497fa737640f0b792b 270.39 MB refs/pull/9718/merge 8 hours ago
Linux-9.2.8-016216c7e7e7f383ddd2c5497fa737640f0b792b 326.72 MB refs/pull/9718/merge a day ago
bootstrap-Linux-8.10.7-20221115-b171551765807a9594f9724ffec82dd49457491e 279.36 MB refs/pull/9718/merge a day ago
bootstrap-Linux-9.6.4-20221115-b171551765807a9594f9724ffec82dd49457491e 202.55 MB refs/pull/9718/merge a day ago
bootstrap-Linux-9.8.1-20221115-b171551765807a9594f9724ffec82dd49457491e 201.04 MB refs/pull/9718/merge a day ago

These alone bust the 10GB limit.

Including github.sha in the cache key is a nice solution to cache aging for small projects that have effectively infinite cache space (e.g. if your caches are 10MB and you have 10GB available), but for sizeable project like cabal this isn't a solution.
For Agda I spent days in cache engineering so that the goals are met (staying in the limit even with a couple of parallel PRs).

The current caching philosophy seems to originate from https://github.com/haskell/cabal/pull/7952/files#r807953232

@andreabedini
Copy link
Collaborator

@andreasabel

The problem is that cabal does a fair bit of caching (in the form or the store) and GHA caches cannot be updated. The "trick" I mention in the comment you link is just what GitHub recommends to do to update a cache. See https://github.com/actions/cache/blob/main/tips-and-workarounds.md#update-a-cache.

The line right after the one you linked is

          key: ${{ runner.os }}-${{ matrix.ghc }}-${{ github.sha }}
          restore-keys: ${{ runner.os }}-${{ matrix.ghc }}-

If a cache with the current gha does not exist (as is expected and intentional), GHA will pick the most recent one matching the restore key and create a new one with the current sha. The older cache entries will fall in disuse and will eventually be evicted.

If a cache key is present, GHA will restore that but it won't update it or save it again.

Of the top of my head, an improvement would be to use separate caches for the store and dist-newstyle. They are different thing that change at different speed.

What are you ideas?

PS: I don't think we can protect against GitHub (or someone) deleting all the caches. We just do too many builds. We should cull them.

@andreasabel
Copy link
Member Author

andreasabel commented Apr 4, 2024

Yes, I understand the mechanism with the ${{ github.sha }}, it is the theoretically best solution but only if this does not make you run out of cache space.
Since our caches are big, we would need the order of magnitute of 100 GB or 1TB cache store rather than the 10 GB GitHub offers us.

I suggest the following strategy: Only write new big caches when CI runs on master. When we are running on a branch/PR, just reuse the caches from master. This way, we always have a recent enough cache to restore a significant portion of the dependencies, at least for the average PR.

In practice, we separate actions/cache/restore from actions/cache/save. The latter is only executed if the PR runs on master. (I have never implemented this strategy, so I don't know of the top of my head how the check for master would look like, but I believe/hope this is possible.)

@andreabedini
Copy link
Collaborator

In practice, we separate actions/cache/restore from actions/cache/save. The latter is only executed if the PR runs on master. (I have never implemented this strategy, so I don't know of the top of my head how the check for master would look like, but I believe/hope this is possible.)

It sounds like a good plan. I don't think we had separate actions/cache/restore and actions/cache/save at the time.

I still think store and dist-newstyle deserve separate caches. While the store can follow what you propose; dist-newstyle could be cached per branch? Would that make sense?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants