Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Save a warmed R package cache for faster installs in failing Actions #695

Merged
merged 16 commits into from
May 8, 2024

Conversation

schloerke
Copy link
Contributor

@schloerke schloerke commented Jan 18, 2023

Goal

When restoring the R package cache, if the primary key is not used, save the currently installed R packages to use as a warmed cache for a followup Action.

If the primary key is used to restore the R package cache, then no secondary cache is saved as there will be a cache hit for followup Actions.

Motivation

actions/cache@v3 is not saved if the Action fails. To me, this is incorrect when installing R packages. The previously installed packages should be available for faster installation in a followup Action.

Cons

  • The secondary cache will not save packages that are installed after calling r-lib/actions/setup-r-dependencies
  • Uses a second cache which will take up more cache storage space

Other approaches

This was not possible before actions/cache had the tag v3.

To remove the cons above, we could

  • Restore package cache with actions/cache/restore@v3
  • then install the R packages
  • then cache the R packages with actions/cache/save@v3

However, packages installed outside of r-lib/actions/setup-r-dependencies will not be saved with this approach.
While not common, rstudio/shinycoreci installs packages after the initial dependency installation and would benefit from caching all installed packages when the Action is complete via actions/cache@v3.

If the Actions are successful, the secondary cache will not be saved or utilized after the initial passing Action.

Example usage


Update: 01/20

Using actions/cache/restore@v3 and actions/cache/save@v3 to restore and save the package library.

Benefit:

  • Cache is warmed on failing runs
  • (Same amount of cache storage)
  • (No confusing message when the cache is tried to be saved twice on an initial successful Action)

Does not capture packages that were installed after the initial installation, but those packages are NOT contained within the cache key. So we should not worry about them.

@codecov-commenter
Copy link

codecov-commenter commented Jan 18, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 83.33%. Comparing base (514fd75) to head (1159dc9).

❗ Current head 1159dc9 differs from pull request most recent head 6d13c6a. Consider uploading reports for the commit 6d13c6a to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##           v2-branch     #695   +/-   ##
==========================================
  Coverage      83.33%   83.33%           
==========================================
  Files              3        3           
  Lines             12       12           
==========================================
  Hits              10       10           
  Misses             2        2           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@schloerke
Copy link
Contributor Author

If the cache is not being updated on a successful Action given a cache hit, I think we should use the actions/cache/restore@v3.

This would get rid of the message and gives a clean solution.

rstudio/shinycoreci can worry about itself later on. I'd rather have a clean/non-confusing solution.

** Implementing with actions/cache/restore@v3


Do you know how to save a multiline step output to $GITHUB_OUTPUT?

I'd like to save the cache path info and use it as an output value.

@schloerke
Copy link
Contributor Author

Request to pass the path value from the restore step to be able to use it in the save step: actions/cache#1082

@dan-knight
Copy link

@schloerke Thanks for this! Great timing - I was just exploring the same issue.

@gaborcsardi
Copy link
Member

The drawback of this is that sometimes the reason for failure is botched packages installation, and we would save a botched cache, that will be potentially picked up across all branches. We have seen this happen for example for mis-specified RSPM repos. So I would be more comfortable if this was opt-in.

@schloerke
Copy link
Contributor Author

The drawback of this is that sometimes the reason for failure is botched packages installation

Do you mean that the restore step works, but then the package installation fails? If this is the case, nothing would be saved.

Or do you mean that a bad package is installed successfully but during package checking, it fails? In this situation, the bad package would be cached for future restores.


I would be more comfortable if this was opt-in.

I do not believe a step's uses field can be dynamic. So this would require a different action file completely. 🙁 It would be very nice if we could avoid duplicating the installation code.

If it can be dynamic, I'm happy with it being opt-in.

@gaborcsardi
Copy link
Member

Or do you mean that a bad package is installed successfully but during package checking, it fails? In this situation, the bad package would be cached for future restores.

Yes.

If it can be dynamic, I'm happy with it being opt-in.

You can have alternative cache steps depending on an input parameter.

@schloerke
Copy link
Contributor Author

schloerke commented Apr 4, 2023

If it can be dynamic, I'm happy with it being opt-in.

You can have alternative cache steps depending on an input parameter.

Will do!


  • Implement different cache steps based on input parameter

Use a cache keys step to create consistent key and path values.
If `cache == 'always'`, use `actions/cache/restore@v3` and `actions/cache/save@v3`.
Else use the regular `actions/cache@v3` to restore and save (given Action was successful).
@schloerke
Copy link
Contributor Author

(Rebased and squashed to remove the long-winded commit history)

@schloerke
Copy link
Contributor Author

If during packing installation, support was added for removing packages that are installed but not contained within the lock file, then it would fit with this approach nicely. This way, the library would never balloon with unnecessary packages.

  • Accept this PR once PR adds ability to remove packages not contained within lock file

@schloerke
Copy link
Contributor Author

Will request a review when the PR is stable. Thank you!

@schloerke schloerke marked this pull request as draft February 14, 2024 17:25
@gaborcsardi
Copy link
Member

Will request a review when the PR is stable. Thank you!

I wonder if it was better and easier to implement this is in pak itself.

@schloerke
Copy link
Contributor Author

Will request a review when the PR is stable. Thank you!

I wonder if it was better and easier to implement this is in pak itself.

It seems fair to implement it all the way in pkgdepends in the $install() method. Where a config value could be prune=TRUE to remove the packages. (But that seemed like a lot of up hill work for me.)


If I can show the PR w/ prune works, I'd be happy to have you relocate it anywhere that makes more sense.

@schloerke
Copy link
Contributor Author

schloerke commented Feb 14, 2024

Proof of pruning packages: https://github.com/ggobi/ggally/actions/runs/7905722423/job/21578966735?pr=493#step:3:7274

Run # Prune packages
Prune unnecessary packages
  Removing unnecessary packages:  httpuv, later, promises, shiny, sourcetools, xtable 
  Removing packages from ‘/home/runner/work/_temp/Library’
  (as ‘lib’ is unspecified)

Proof of not pruning packages: https://github.com/ggobi/ggally/actions/runs/7906986262/job/21583015846?pr=493#step:3:7277

Run # Prune packages
Prune unnecessary packages
  No unnecessary packages to remove

@schloerke
Copy link
Contributor Author

schloerke commented Feb 14, 2024

@gaborcsardi I'm happy to have you relocate the code to the package you think is best. The addition of pruning packages that are not defined in the lock file makes me feel safe when using a cache that will most likely be restored and should always be saved. This prevents the R library from being poisoned from packages installed from earlier runs.

If users do not want to prune the library, they can opt-out. (Pruning only happens when cache="always", which is opt-in.)

@schloerke schloerke marked this pull request as ready for review February 14, 2024 20:34
* v2-branch: (33 commits)
  Try to update tinytex
  Test tinytex version 2024.02
  Try installing in tinytex text workflow
  Upload snapshots in check-release example
  check-r-package artifact names: include explicit id or job index
  Include run & attempt number in check-r-package artifact names
  check-r-package: include arch in artifact name
  R-devel test: add macos-14
  More action version updates
  Update JamesIves/github-pages-deploy-action version in examples
  Update actions versions in examples
  Update setup-tinytex dependencies
  Update pr-push dependencies
  Update pr-fetch dependencies
  Update setup-r dependencies
  Update setup-pandoc dependencies
  Update node actions, switch to node 20
  Upgrade upload-artifact action
  Add `working-directory input to `setup-renv` docs
  Avoid r-project.org URLs
  ...
@gaborcsardi gaborcsardi force-pushed the v2-branch branch 3 times, most recently from fae31a5 to 23fc05b Compare February 23, 2024 14:49
@gaborcsardi gaborcsardi force-pushed the v2-branch branch 2 times, most recently from 835f496 to 890cb24 Compare February 27, 2024 10:58
@schloerke
Copy link
Contributor Author

schloerke commented Mar 28, 2024

@gaborcsardi
Copy link
Member

Update to use actions/cache@v4 as it now has a save-always: flag. Related: actions/cache@b1378c8

Oh, that is awesome!

setup-r-dependencies/action.yaml Show resolved Hide resolved
setup-r-dependencies/action.yaml Outdated Show resolved Hide resolved
setup-r-dependencies/README.md Outdated Show resolved Hide resolved
setup-r-dependencies/README.md Show resolved Hide resolved
@schloerke
Copy link
Contributor Author

Thank you! ✅

@gaborcsardi gaborcsardi merged commit 96b1dc6 into r-lib:v2-branch May 8, 2024
26 checks passed
@gaborcsardi
Copy link
Member

Thank you, and sorry for the long wait!

@schloerke
Copy link
Contributor Author

schloerke commented May 8, 2024

@gaborcsardi If you could update the v2 tag (and others?) for the repo, that'd be great! It'll allow me to delete my branch and have a consistent step location. Thank you!

@gaborcsardi
Copy link
Member

Yep, I am preparing a release, either today or tomorrow.

schloerke added a commit to rstudio/shiny-workflows that referenced this pull request May 20, 2024
Copy link

This pull request has been automatically locked. If you believe you have found a related problem, please file a new issue and include a link to this pull request.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators May 23, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants