Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: Migrate gha cache to opendal based #1528

Merged
merged 7 commits into from
Jan 8, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
49 changes: 49 additions & 0 deletions .github/workflows/integration-tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -189,3 +189,52 @@ jobs:
${SCCACHE_PATH} --show-stats

${SCCACHE_PATH} --show-stats | grep -e "Cache hits\s*[1-9]"

gha:
runs-on: ubuntu-latest
needs: build

env:
SCCACHE_GHA_VERSION: "sccache_integration_tests"
RUSTC_WRAPPER: /home/runner/.cargo/bin/sccache

steps:
- name: Clone repository
uses: actions/checkout@v3

- name: Configure Cache Env
uses: actions/github-script@v6
with:
script: |
core.exportVariable('ACTIONS_CACHE_URL', process.env.ACTIONS_CACHE_URL || '');
core.exportVariable('ACTIONS_RUNTIME_TOKEN', process.env.ACTIONS_RUNTIME_TOKEN || '');

- name: Install rust
uses: ./.github/actions/rust-toolchain
with:
toolchain: "stable"

- uses: actions/download-artifact@v3
with:
name: integration-tests
path: /home/runner/.cargo/bin/
- name: Chmod for binary
run: chmod +x ${SCCACHE_PATH}

- name: Test
run: cargo clean && cargo build

- name: Output
run: |
${SCCACHE_PATH} --show-stats

${SCCACHE_PATH} --show-stats | grep gha

- name: Test Twice for Cache Read
run: cargo clean && cargo build

- name: Output
run: |
${SCCACHE_PATH} --show-stats

${SCCACHE_PATH} --show-stats | grep -e "Cache hits\s*[1-9]"
152 changes: 6 additions & 146 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

5 changes: 2 additions & 3 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ bincode = "1"
blake3 = "1"
byteorder = "1.0"
bytes = "1"
opendal = { version= "0.22.6", optional=true }
opendal = { version= "0.24", optional=true }
reqsign = {version="0.7.3", optional=true}
chrono = { version = "0.4.23", optional = true }
clap = { version = "4.0.29", features = ["derive", "env", "wrap_help"] }
Expand All @@ -38,7 +38,6 @@ filetime = "0.2"
flate2 = { version = "1.0", optional = true, default-features = false, features = ["rust_backend"] }
futures = "0.3"
futures-locks = "0.7"
gha-toolkit = { version = "0.3.1", optional = true }
gzp = { version = "0.11.1", default-features = false, features = ["deflate_rust"] }
http = "0.2"
hyper = { version = "0.14.10", optional = true, features = ["server"] }
Expand Down Expand Up @@ -123,7 +122,7 @@ all = ["dist-client", "redis", "s3", "memcached", "gcs", "azure", "gha"]
azure = ["opendal","reqsign"]
s3 = ["opendal","reqsign"]
gcs = ["opendal","reqsign", "url", "reqwest/blocking"]
gha = ["gha-toolkit"]
gha = ["opendal"]
memcached = ["memcached-rs"]
native-zlib = []
redis = ["url", "opendal/services-redis"]
Expand Down
18 changes: 2 additions & 16 deletions docs/GHA.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
# GitHub Actions

To use the [GitHub Actions cache](https://docs.github.com/en/actions/using-workflows/caching-dependencies-to-speed-up-workflows), you need to set the `SCCACHE_GHA_CACHE_URL`/`ACTIONS_CACHE_URL` and `SCCACHE_GHA_RUNTIME_TOKEN`/`ACTIONS_RUNTIME_TOKEN` environmental variables. The `SCCACHE_` prefixed environmental variables override the variables without the prefix.
To use the [GitHub Actions cache](https://docs.github.com/en/actions/using-workflows/caching-dependencies-to-speed-up-workflows), you need to set the `SCCACHE_GHA_VERSION` which is a namespace for the whole cache set.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love how this simplifies GHA :)


In a GitHub Actions workflow, you can set these environmental variables using the following step.
This cache type will needs token like `ACTIONS_CACHE_URL` and `ACTIONS_RUNTIME_TOKEN` to work. You can set these environmental variables using the following step in a GitHub Actions workflow.

```yaml
- name: Configure sccache
Expand All @@ -12,17 +12,3 @@ In a GitHub Actions workflow, you can set these environmental variables using th
core.exportVariable('ACTIONS_CACHE_URL', process.env.ACTIONS_CACHE_URL || '');
core.exportVariable('ACTIONS_RUNTIME_TOKEN', process.env.ACTIONS_RUNTIME_TOKEN || '');
```

To write to the cache, set `SCCACHE_GHA_CACHE_TO` to a cache key, for example
`sccache-latest`. To read from cache key prefixes, set `SCCACHE_GHA_CACHE_FROM`
to a comma-separated list of cache key prefixes, for example `sccache-`.

In contrast to the [`@actions/cache`](https://github.com/actions/cache) action, which saves a single large archive per cache key, `sccache` with GHA cache storage saves each cache entry separately.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This paragraph was added to explain why sccache will create many small caches instead of the usual single large caches. #1433 (comment)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think our users don't need to care about those service internal details.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, it isn't really internal as it shows on github

I am trying with another project and I can see caches of several gb to a few kb
https://github.com/sylvestre/coreutils/actions/caches

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @jakelee8 - documenting visible behavior is a must.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be addressed by #1533


GHA cache storage will create many small caches with the same cache key, e.g. `SCCACHE_GHA_CACHE_TO` and `SCCACHE_GHA_CACHE_FROM`. These GHA caches are differentiated by their [_version_](https://github.com/actions/cache#cache-version). The GHA cache implementation in `sccache` calculates the cache version from the [`sccache` entry key](docs/Caching.md), e.g. the source file path.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing the cache from/to options makes it impossible to configure the cache to write-only. Write-only cache is useful for allowing PR builds to use the main branch cache, but forcing the main branch build to build without caching.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Write-only seems useful. Can you raise an issue of this feature?

We can implement the same cache_to and cache_from (if needed) by using two ghac clients with different versions.

Also, OpenDAL is open to any contributions. Welcome for any comments and discussion ❤️


For example, if a cache entry has the version `main.rs` and has GHA cache entries for the `sccache-1` and `sccache-2` keys, then `SCCACHE_GHA_CACHE_FROM=sccache-` will match both and [return the most recent entry](https://docs.github.com/en/actions/using-workflows/caching-dependencies-to-speed-up-workflows#matching-a-cache-key).

This behavior is useful for scoping caches from different versions of Rust or for cross-platform builds (`rust-sdk-{RUST_TOOLKIT}-{TARGET_TRIPLE}-`), and to allow newer commits to override older caches by adding the Git SHA as a suffix (`-{GITHUB_SHA}`), as in the following screenshot.
Copy link
Contributor

@jakelee8 jakelee8 Jan 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without the option of reading caches from a cache key prefix, PR builds cannot use caches from the main branch and will never use caches from the main branch.

Copy link
Collaborator Author

@Xuanwo Xuanwo Jan 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

User can specify the prefix by setting root.

And sccache will always calculate the same path (cache key) for the same given input, so we don't need to set cache key by hand. Leaving them in the same prefix (even empty) is good enough to me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not now GitHub Actions cache works...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not now GitHub Actions cache works...

Can explain a bit to me? I'm not very familiar with github action cache services.

So far, I just treat github action cache as a storage service which means:

  • We don't care about how github action cache works (especially for actions/cache)
  • We don't want to work along with caches that written by others.

Sccache will calculate the unique path for every object file, and we just use the same path as the cache key. Will this behavior break our cache service?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jakelee8 sorry for the dumb question but what is wrong here? thanks

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it worth to start a discussion for this~

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Take this action as an example https://github.com/mozilla/sccache/actions/runs/3868518097/jobs/6594049385

The first run on this PR will have cache hits which means we are reading cache from main branch.


<img width="718" src="https://user-images.githubusercontent.com/19253212/205356799-deedc465-e534-4ef6-a249-fc15121fdfd9.png">
12 changes: 4 additions & 8 deletions src/cache/cache.rs
Original file line number Diff line number Diff line change
Expand Up @@ -394,6 +394,7 @@ impl Storage for opendal::Operator {

let can_write = match self.object(path).write("Hello, World!").await {
Ok(_) => true,
Err(err) if err.kind() == ErrorKind::ObjectAlreadyExists => true,
Err(err) if err.kind() == ErrorKind::ObjectPermissionDenied => false,
Err(err) => bail!("cache storage failed to write: {:?}", err),
};
Expand Down Expand Up @@ -485,15 +486,10 @@ pub fn storage_from_config(
return Ok(Arc::new(storage));
}
#[cfg(feature = "gha")]
CacheType::GHA(config::GHACacheConfig {
ref url,
ref token,
ref cache_to,
ref cache_from,
}) => {
debug!("Init gha cache with url {url}");
CacheType::GHA(config::GHACacheConfig { ref version }) => {
debug!("Init gha cache with version {version}");

let storage = GHACache::new(url, token, cache_to.clone(), cache_from.clone())
let storage = GHACache::build(version)
.map_err(|err| anyhow!("create gha cache failed: {err:?}"))?;
return Ok(Arc::new(storage));
}
Expand Down
Loading