Execlog parser should output each action's digest #16741

tbaing · 2022-11-11T04:35:47Z

Description of the feature request:

While trying to determine why a given target results in a remote cache miss, I used the execlog parser to gather detailed logs about what was happening. I wanted to get the action digest for the problematic action, to check for the existence of that object in the remote cache, but didn't see anything I recognized as that. The output contains hashes for each individual input and for the output, but nothing that appeared to be the digest for the aggregate set of all inputs, env variables, command, etc.

Is there some option for adding this information to the output that's not documented on https://bazel.build/remote/cache-remote? If not, can we please add that information to the output from the execlog parser?

What underlying problem are you trying to solve with this feature?

It's difficult to correlate the output from the execlog parser with the artifacts in the cache, which makes it harder to be certain whether a cache miss is due to the artifact not being present or is due to something else going wrong.

Which operating system are you running Bazel on?

gLinux

What is the output of `bazel info release`?

release 6.0.0-pre.20221012.2

If `bazel info release` returns `development version` or `(@non-git)`, tell us how you built Bazel.

No response

What's the output of `git remote get-url origin; git rev-parse master; git rev-parse HEAD` ?

fatal: not a git repository (or any of the parent directories): .git

Have you found anything relevant by searching the web?

Nothing relevant found.

Any other information, logs, or outputs that you want to share?

No response

The text was updated successfully, but these errors were encountered:

tjgq · 2022-11-11T09:51:01Z

You're looking for the SpawnExec.digest field. This is a new addition in Bazel 6.

On a separate note: you might want to use --execution_log_json_file to produce the log in JSON format instead of binary and avoid using the execlog parser.

tbaing · 2022-11-11T22:58:09Z

I switched to the JSON format (thanks for the suggestion!) and it's easier to read. I do see the SpawnExec.digest field there, but I'm noticing it's not present in some cases like local execution, which is exactly the situation where I'd want to see its value. Is there a reason it's not present for local execution?

tjgq · 2022-11-14T18:54:48Z

You are correct - the field is missing for locally executed actions because the corresponding data structure (RemoteCacheClient.ActionKey) is only created for actions that run remotely.

I agree that producing a digest for locally executed actions would be helpful, but I'm not sure about the best way to do it; creating an ActionKey for an action that won't run remotely might be unnecessarily expensive.

tbaing · 2022-11-14T20:05:47Z

For the action to run locally, wouldn't we have had to compute the ActionKey and use it to check the remote cache, and only execute locally after a cache miss? Shouldn't the ActionKey already exist by this point?

tjgq · 2022-11-14T20:08:52Z

Yes, the ActionKey is definitely computed at some point if you have either remote caching or remote execution enabled. But those code paths are both separate from local execution, so it isn't necessarily straightforward to plumb it into the right place.

tjgq · 2022-11-14T20:10:20Z

There's also the question of whether/how we should populate the field if neither remote caching nor remote execution are enabled.

tbaing · 2022-11-14T23:43:58Z

The difficulty of "we have it but not here" makes sense.

Maybe alternatively we could include information about the cache interactions (checking and writing) in the raw log for the run? That would allow us to see the ActionKey being used and the success/failure results for each interaction with the cache, eliminating both the problem of not having the necessary info at the time we construct the content for the actual execution and the problem of what to do if no remote cache interaction is occurring.

tjgq · 2022-11-22T11:28:31Z

@tbaing Bazel has its own concept of an "action key" [1] which every action has, independently of the strategy it ends up running under. What do you think about adding it to a separate SpawnExec.action_key field? This would let you correlate executions of "the same action" across invocations; the drawback is that there's no way to convert it to and from the Remote Execution key (but it's not clear to me that's a requirement).

[1] https://cs.opensource.google/bazel/bazel/+/master:src/main/java/com/google/devtools/build/lib/actions/ActionAnalysisMetadata.java;l=98;drc=14e32e7f5d1492f7ca4ffc17c40b913ec5799603

tbaing · 2022-11-22T16:32:48Z

Would the remote cache key be 1:1 with that internal action key, even though not identical/computable from one another? Or does one of them consider information that the other does not, potentially resulting in multiple values on one side being mapped to a single value on the other?

If it's 1:1 then that sounds great. If it's many:1, I'd need more information about what differs between the keys to know whether it would give what I'm asking for here. If a single internal action key could map to multiple remote cache keys, then it doesn't help answer the question of "should I have found a remote cache hit for this action" because it might be that the remote cache key would be different from one invocation to the next and remote cache misses would be expected. (But if that's the case, I'd be very interested in understanding why those attributes aren't part of Bazel's internal action key.)

tjgq · 2023-05-03T12:52:19Z

Would the remote cache key be 1:1 with that internal action key, even though not identical/computable from one another? Or does one of them consider information that the other does not, potentially resulting in multiple values on one side being mapped to a single value on the other?

It's definitely possible for two internally distinct actions to map to the same remote cache entry. I don't believe the reverse is possible.

tbaing · 2023-05-03T14:56:53Z

Thanks for the confirmation that it's not 1:many; that would be very unexpected!

I wrote this FR because I wasn't seeing remote cache hits for some actions in our graph when used as inputs by other actions, and I wanted to confirm whether the problem was that Bazel wasn't writing the output to the cache or that Bazel was failing to find/use the remote content. I've since identified why that was happening so this isn't a current problem, but it did highlight to me that it's very hard to examine Bazel's interactions with the remote cache to understand why we're getting a particular behavior.

I think what would be most useful would be for the exec log to list:

for each input, what was the status of the search of each location (execroot, disk cache, remote cache, etc.) including a full identifier/path for the artifact within that location, including the HTTP response code/message where applicable
for each output, did we attempt to write it to remote caches and if so what was the status of the attempt to write the artifact to the remote location, again including a full identifier/path for the artifact within that location

From that info, you'd be able to see exactly what remote interactions occurred and whether there were problems with any of them, and you could see where to look for the artifact in remotes that allow browsing of the content (GS, --disk_cache). I recognize that that info may not be readily available in the parts of the code that will be writing this file, and that it might be more effort than it's worth since I've gotten past the problem that prompted me to open this FR and maybe most people don't need to troubleshoot remote/disk cache interactions.

brentleyjones · 2023-10-30T11:44:48Z

@bazel-io flag

keertk · 2023-10-30T15:53:50Z

@bazel-io fork 7.0.0

Specifically, introduce a getter/setter pair to store the digest in the SpawnExecutionContext when looking up the disk or remote cache, and retrieve it later when constructing the SpawnResult. Fixes bazelbuild#16741. PiperOrigin-RevId: 577799982 Change-Id: I570db74f4dbeee95e00853883eda68b25a84934c

Specifically, introduce a getter/setter pair to store the digest in the SpawnExecutionContext when looking up the disk or remote cache, and retrieve it later when constructing the SpawnResult. Fixes #16741. Commit 40d6780 PiperOrigin-RevId: 577799982 Change-Id: I570db74f4dbeee95e00853883eda68b25a84934c Co-authored-by: Googler <tjgq@google.com> Co-authored-by: Ian (Hee) Cha <heec@google.com>

…isses. Specifically, introduce a getter/setter pair to store the digest in the SpawnExecutionContext when looking up the disk or remote cache, and retrieve it later when constructing the SpawnResult. Fixes bazelbuild#16741. PiperOrigin-RevId: 577799982 Change-Id: I570db74f4dbeee95e00853883eda68b25a84934c

iancha1992 · 2023-12-01T18:39:04Z

A fix for this issue has been included in Bazel 7.0.0 RC5. Please test out the release candidate and report any issues as soon as possible. Thanks!

sgowroji added type: feature request untriaged team-Remote-Exec Issues and PRs for the Execution (Remote) team labels Nov 11, 2022

coeuvre assigned tjgq Nov 22, 2022

coeuvre added P2 We'll consider working on this in future. (Assignee optional) and removed untriaged labels Nov 22, 2022

meisterT mentioned this issue Jun 20, 2023

Implement a way to be able to debug cache misses that can be always on without too much overhead #18643

Closed

copybara-service bot closed this as completed in 40d6780 Oct 30, 2023

bazel-io added the potential release blocker Flagged by community members using "@bazel-io flag". Should be added to a release blocker milestone label Oct 30, 2023

bazel-io removed the potential release blocker Flagged by community members using "@bazel-io flag". Should be added to a release blocker milestone label Oct 30, 2023

This was referenced Oct 30, 2023

[7.0.0] Execlog parser should output each action's digest #19983

Closed

[7.0.0] Write the action digest to the execution log also for cache misses. #19984

Merged

tjgq mentioned this issue Nov 6, 2023

[7.0.0] Write the action digest to the execution log also for cache misses. #20059

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Execlog parser should output each action's digest #16741

Execlog parser should output each action's digest #16741

tbaing commented Nov 11, 2022

tjgq commented Nov 11, 2022

tbaing commented Nov 11, 2022

tjgq commented Nov 14, 2022

tbaing commented Nov 14, 2022

tjgq commented Nov 14, 2022

tjgq commented Nov 14, 2022

tbaing commented Nov 14, 2022

tjgq commented Nov 22, 2022 •

edited

Loading

tbaing commented Nov 22, 2022

tjgq commented May 3, 2023

tbaing commented May 3, 2023

brentleyjones commented Oct 30, 2023

keertk commented Oct 30, 2023

iancha1992 commented Dec 1, 2023

Execlog parser should output each action's digest #16741

Execlog parser should output each action's digest #16741

Comments

tbaing commented Nov 11, 2022

Description of the feature request:

What underlying problem are you trying to solve with this feature?

Which operating system are you running Bazel on?

What is the output of bazel info release?

If bazel info release returns development version or (@non-git), tell us how you built Bazel.

What's the output of git remote get-url origin; git rev-parse master; git rev-parse HEAD ?

Have you found anything relevant by searching the web?

Any other information, logs, or outputs that you want to share?

tjgq commented Nov 11, 2022

tbaing commented Nov 11, 2022

tjgq commented Nov 14, 2022

tbaing commented Nov 14, 2022

tjgq commented Nov 14, 2022

tjgq commented Nov 14, 2022

tbaing commented Nov 14, 2022

tjgq commented Nov 22, 2022 • edited Loading

tbaing commented Nov 22, 2022

tjgq commented May 3, 2023

tbaing commented May 3, 2023

brentleyjones commented Oct 30, 2023

keertk commented Oct 30, 2023

iancha1992 commented Dec 1, 2023

What is the output of `bazel info release`?

If `bazel info release` returns `development version` or `(@non-git)`, tell us how you built Bazel.

What's the output of `git remote get-url origin; git rev-parse master; git rev-parse HEAD` ?

tjgq commented Nov 22, 2022 •

edited

Loading