Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix ComputeOutputsToUpload to build directory tree consistently #411

Merged
merged 6 commits into from
Mar 10, 2022

Conversation

andusy
Copy link
Collaborator

@andusy andusy commented Mar 4, 2022

Currently the output directory tree's child nodes are appended randomly in ComputeOutputsToUpload. This changes the ComputeOutputsToUpload to build the directory trees based on the lexical order of the sub directories so that we will get a consistent output.

The new test case added in TestComputeOutputsToUploadDirectories would give the following result before the changes:

INFO: Found 1 test target...
--- FAIL: TestComputeOutputsToUploadDirectories (0.05s)
    --- FAIL: TestComputeOutputsToUploadDirectories/Directory_tree_preserves_natural_order (0.01s)
        tree_test.go:1813: ComputeOutputsToUpload(...) directory digests are not consistent got:[cbf9cdd9a275d4417d46f98b673db3010b6ed50b740a5a0b728b16d86270c51c cf74177e895f8aad5f2dd165cc269113bee533671ad25ee25ae9234033ad0570 f3889072a2b2d805fe69e3feeb8eee508f883da81593527a61acc5f829c34032 d9062ce85ddceda61f49d9ef54007cb718425ff86c00a465a55e4b15993c4bb7]

@andusy andusy self-assigned this Mar 4, 2022
@@ -556,8 +557,18 @@ func (c *Client) ComputeOutputsToUpload(execRoot, workingDir string, paths []str
}
outs[ue.Digest] = ue
treePb.Root = rootDir
for _, c := range childDirs {
treePb.Children = append(treePb.Children, c)
err = filepath.Walk(absPath, func(path string, info fs.FileInfo, err error) error {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing you want treePb.Children to be ordered and you are using filepath.Walk() to guarantee the ordering? (certainly an approach I didn't really expect :D)

An alternative is to put the keys of childDirs into an array, sort the array and then iterate the array - that would also guarantee ordering. I think sorting the strings might be more efficient (agreed we use extra space) since filepath.Walk() has to issue stat calls inorder to walk the directory? Wdyt?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, I think sorting is probably the better approach. Although childDirs will contain all the child nodes. These are not necessarily the direct children of root, so I sorted the order in which we iterate the directories in packageDirectories and saved that order.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed offline, I removed children from packageDirectories and appended the child nodes directly to the tree that is now being passed in.

The new test case I added should test that they are being appended in lexical order according to path. Swapping line 1671 with 1672 or 1680 with 1681 (want output) will fail the test because ComputeOutputsToUpload would output the digest with the correct lexical directory structure.

Failure example when swapping 1671 and 1672

tree_test.go:1867: ComputeOutputsToUpload(...) gave diff (-want +got) on tree children:
              map[digest.Digest]*remoteexecution.Directory{
              	s"7478e0d4249ff4975c89427103939e36dc4d0a3078422aa53d53a9f339bcc064/156": s`files:{name:"bar"  digest:{hash:"fcde2b2edba56bf408601fb721fe9b5`...,
              	s"778eefa0c2ce4f6c10fa950be47e0c930c91d21eff989c0d7d98eaae25d8f8ea/157": s`directories:{name:"dirD"  digest:{hash:"7bcce45c82c21e91851bcc49`...,
              	s"78dc0850b3947e960b72ab23bcaa2c9506d6fa46c5ea88f9427c1c75d9e28d41/79":  s`files:{name:"foo"  digest:{hash:"2c26b46b68ffc68ff99b453c1d30413`...,
            - 	s"7a2254e528f733780bd310d336c828ebdfc3f11dfe7be0bbfa68d136a8a0f51d/156": s`directories:{name:"dirF"  digest:{hash:"78dc0850b3947e960b72ab23bcaa2c9506d6fa46c5ea88f9427c1c75d9e28d41"  size_bytes:79}}  directories:{name:"dirC"  digest:{hash:"d5cadb0a45f1af229e10ba007aa6c68f8889bfe47a0e0b6b730833de52b199d6"  size_bytes:77}}`,
              	s"7bcce45c82c21e91851bcc49b2e69953159e11d087f6996c6a385e0ebe1fc0df/77":  s`files:{name:"baz"  digest:{hash:"baa5a0964d3320fbc0c6a922140453c`...,
              	s"d5cadb0a45f1af229e10ba007aa6c68f8889bfe47a0e0b6b730833de52b199d6/77":  s`files:{name:"bar"  digest:{hash:"fcde2b2edba56bf408601fb721fe9b5`...,
            + 	s"ec3f070535f82e8c26fee750cb23d47141bc96f31119e37716dbe24313dd3a09/156": s`directories:{name:"dirC"  digest:{hash:"d5cadb0a45f1af229e10ba007aa6c68f8889bfe47a0e0b6b730833de52b199d6"  size_bytes:77}}  directories:{name:"dirF"  digest:{hash:"78dc0850b3947e960b72ab23bcaa2c9506d6fa46c5ea88f9427c1c75d9e28d41"  size_bytes:79}}`,

The want and got digest contents here are the same except the got digest has dirC appended before dirF.

Copy link
Collaborator

@gkousik gkousik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets also test this with an Android build if possible before merging and ensure that for some sample action that mismatches, we get a consistent output directory digest.

go/pkg/client/tree.go Outdated Show resolved Hide resolved
go/pkg/client/tree_test.go Outdated Show resolved Hide resolved
go/pkg/client/tree_test.go Show resolved Hide resolved
Copy link
Collaborator

@ramymedhat ramymedhat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm after comments and lint errors are addressed

@andusy
Copy link
Collaborator Author

andusy commented Mar 10, 2022

Lets also test this with an Android build if possible before merging and ensure that for some sample action that mismatches, we get a consistent output directory digest.

I ran multiple Android builds and it looks like it's working. I don't see any mismatches across multiple builds with local/remote reruns set to 5. I also logged the digests to make sure we're not seeing anything weird and it looks like it's matching the remote digests.

@andusy
Copy link
Collaborator Author

andusy commented Mar 10, 2022

Some additional tests I did:

test command:

 RBE_log_path=text://`pwd`/testlogs/reproxy_log.txt RBE_log_dir=`pwd`/testlogs ./scripts/rbe_action.sh --output_directories=out --inputs=script.sh --num_local_reruns=5 --num_remote_reruns=5 --compare=true /bin/bash -c ./script.sh

script.sh:

#!/bin/bash

mkdir -p out/a/b
mkdir -p out/b/a
echo foo > out/a/b/a.txt
echo bar > out/b/a/b.txt

With SDK changes in re-client:
no mismatches

Without SDK changes in re-client:
remote digests:1
local digests:3

The test above shows that without the sdk changes the local digests are being produced inconsistently due to the subdirectories even though the file contents are the same. With the sdk changes, there doesn't seem to be any mismatches.

#!/bin/bash

mkdir -p out/a/b
mkdir -p out/b/a
echo $RANDOM >> out/a/b/a.txt
echo $RANDOM >> out/b/a/b.txt

With SDK changes in re-client:
remote digests:5
local digests:5

Without SDK changes in re-client:
remote digests:5
local digests:5

This test was just to make sure that the mismatch detection is working as intended. Seems like the different mismatches are still being picked up and logged to reproxy.ERROR.

@gkousik
Copy link
Collaborator

gkousik commented Mar 10, 2022

Thanks for the comprehensive testing!

@andusy andusy merged commit edee8d7 into master Mar 10, 2022
@andusy andusy deleted the computeOutputs branch March 10, 2022 22:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants