Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

non-local runner uses of CAS should degrade gracefully in remote cache mode #11455

Closed
tdyas opened this issue Jan 13, 2021 · 0 comments · Fixed by #11468
Closed

non-local runner uses of CAS should degrade gracefully in remote cache mode #11455

tdyas opened this issue Jan 13, 2021 · 0 comments · Fixed by #11468
Assignees

Comments

@tdyas
Copy link
Contributor

tdyas commented Jan 13, 2021

In remote cache mode, a local-only Store is passed to the process_execution::local::CommandRunner to isolate it from failures in a configured remote CAS. The remote cache code then wraps the upload of digests so that errors in accessing the remote CAS only warn and do not error.

As it turns out, however, some intrinsics use the Store on Core to access the CAS and that Store is the instance configured with access to the remote CAS. Thus, certain code paths that run outside of the context of process_execution::local::CommandRunner are not covered by the warning code in the remote cache code.

A particular example is the download_file_to_digest intrinsic. See

async fn load_or_download(
&self,
core: Arc<Core>,
url: Url,
digest: hashing::Digest,
) -> Result<store::Snapshot, String> {
let file_name = url
.path_segments()
.and_then(Iterator::last)
.map(str::to_owned)
.ok_or_else(|| format!("Error getting the file name from the parsed URL: {}", url))?;
let path = RelativePath::new(&file_name).map_err(|e| {
format!(
"The file name derived from {} was {} which is not relative: {:?}",
&url, &file_name, e
)
})?;
let maybe_bytes = core.store().load_file_bytes_with(digest, |_| ()).await?;
if maybe_bytes.is_none() {
DownloadedFile::download(core.clone(), url, file_name, digest).await?;
}
core.store().snapshot_of_one_file(path, digest, true).await
}
where a call is made to core.store() to get the Store stored on the Core struct.

Note in

// If remote caching is used with eager_fetch, we do not want to use the remote store
// with the local command runner. This reduces the surface area of where the remote store is
// used to only be the remote cache command runner.
let store_for_local_runner = if remote_caching_used && remoting_opts.cache_eager_fetch {
store.clone().into_local_only()
} else {
store.clone()
};
that the local-only Store is passed only to the process_execution::local::CommandRunner. Intrinsics are not covered by this behavior.

The solution should audit for store accesses that are not covered by the local-only store and that should be. Manually fixing each call site is likely to be a game of whack-a-mole, so a generic solution that intrinics can use is probably preferred.

@tdyas tdyas changed the title non-remote-cache uses of CAS should degrade gracefully in remote cache mode non-local runner uses of CAS should degrade gracefully in remote cache mode Jan 13, 2021
@stuhood stuhood assigned tdyas and unassigned Eric-Arellano Jan 13, 2021
tdyas pushed a commit that referenced this issue Jan 15, 2021
### Problem

As described in #11455, in remote cache mode with eager fetching, the only accesses to the remote CAS should be through the remote cache code; the remainder of the Pants code base should only see the local store. Intrinsics, such as download_file_to_digest, still have access to the remote CAS currently, which means that any issues with the remote CAS will cause an error instead of being a warning if the access to the remote CAS had occurred through the remote cache code. The intended behavior is for all problems with the remote CAS to be warnings when in remote cache mode.

### Solution

In remote cache mode with eager fetching, only expose the local store to most of the Pants code base.

### Result

Added an integration test to test that warnings are generated in remote cache mode.

Fixes #11455
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants