Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature: experimental_hash_all_targets #47

Closed
jonahgeorge opened this issue Jan 17, 2021 · 9 comments
Closed

feature: experimental_hash_all_targets #47

jonahgeorge opened this issue Jan 17, 2021 · 9 comments

Comments

@jonahgeorge
Copy link

jonahgeorge commented Jan 17, 2021

Hey @tinder-maxwellelliott, I encountered an interesting issue which appears to be coming from the hash function. I've put together a small repro below showing that the hashes from generate-hashes are not changing even when the source files are changing. Have you seen this before? Based on my understanding of the rule_implementation_hash bug, I don't think this is related but my exposure to this is pretty limited right now.

https://github.com/jonahgeorge/bazel-diff-repro

 ± git rev-parse HEAD
0d42f9d1b0b514382825ddf1272a226ec5a6bff0

 ± bazel run //:bazel-diff -- \
  --workspacePath $(pwd) --bazelPath $(which bazel) generate-hashes /dev/stdout | grep repro

INFO: Invocation ID: d4a882ba-027c-4361-a6c8-fb66c2e269ea
Loading: 
Loading: 0 packages loaded
Analyzing: target //:bazel-diff (0 packages loaded, 0 targets configured)
INFO: Analyzed target //:bazel-diff (0 packages loaded, 0 targets configured).
INFO: Found 1 target...
[1 / 6] [Prepa] BazelWorkspaceStatusAction stable-status.txt
Target //:bazel-diff up-to-date:
  bazel-bin/bazel-diff.jar
  bazel-bin/bazel-diff
INFO: Elapsed time: 0.169s, Critical Path: 0.00s
INFO: 1 process: 1 internal.
INFO: Build completed successfully, 1 total action
INFO: Running command line: bazel-bin/bazel-diff --workspacePath /Users/jonahgeorge/Workspace/button/bazel-diff-repro --bazelPath /usr/local/bin/bazel generate-hashes /dev/stdout
INFO: Build completed successfully, 1 total action
  "//:repro": "5bc3def9bc1f63785fe2b84dee005d62e8d20de42be50e796b6c57e45ed58632",
  "//:repro.go": "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855",
  "//:repro_lib": "7e462ed495dd9d56d742eab9d831ff19ef18d5bdb9fba0e105f5ce24914c70b1",

 ± echo '\nfunc init() { fmt.Println("should change the hash") }' >> repro.go

 ± git add repro.go

 ± git commit -m 'modify repro.go'
[master bf6d672] modify repro.go
 1 file changed, 2 insertions(+)

 ± bazel run //:bazel-diff -- \                                              
  --workspacePath $(pwd) --bazelPath $(which bazel) generate-hashes /dev/stdout | grep repro

INFO: Invocation ID: fb25ffac-0adc-4806-bf1e-1ee7fabd30d2
Loading: 
Loading: 0 packages loaded
Analyzing: target //:bazel-diff (0 packages loaded, 0 targets configured)
INFO: Analyzed target //:bazel-diff (0 packages loaded, 0 targets configured).
INFO: Found 1 target...
[1 / 7] [Prepa] BazelWorkspaceStatusAction stable-status.txt
Target //:bazel-diff up-to-date:
  bazel-bin/bazel-diff.jar
  bazel-bin/bazel-diff
INFO: Elapsed time: 0.168s, Critical Path: 0.00s
INFO: 1 process: 1 internal.
INFO: Build completed successfully, 1 total action
INFO: Running command line: bazel-bin/bazel-diff --workspacePath /Users/jonahgeorge/Workspace/button/bazel-diff-repro --bazelPath /usr/local/bin/bazel generate-hashes /dev/stdout
INFO: Build completed successfully, 1 total action
  "//:repro": "5bc3def9bc1f63785fe2b84dee005d62e8d20de42be50e796b6c57e45ed58632",
  "//:repro.go": "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855",
  "//:repro_lib": "7e462ed495dd9d56d742eab9d831ff19ef18d5bdb9fba0e105f5ce24914c70b1",
@tinder-maxwellelliott
Copy link
Collaborator

tinder-maxwellelliott commented Jan 17, 2021

Hello @jonahgeorge,

I was actually able to make your example work with the provided example script in the repo

I have made a fork of your repo above to show how to get this working:

  1. Clone down https://github.com/tinder-maxwellelliott/bazel-diff-repro
  2. cd into the directory, run git fetch origin modify_go_file && git checkout modify_go_file
  3. Run sh bazel-diff-example.sh $PWD bazel main modify_go_file
  4. Bazel-diff now runs, you will see the following output:
Impacted Targets between modify_go_file and main:
//:repro //:repro_lib

The issue is that for bazel-diff to work it must be informed what files changed using the -m flag, you can see this in the example script above. By using the modified filepaths list we avoid having to read every single file in source to determine if it was changed or not.

@jonahgeorge
Copy link
Author

jonahgeorge commented Jan 18, 2021

Ah, I think I tracked it down... looking at your script I see that repro.go has a new hash in the final hashes output:

 ± cat /tmp/starting_hashes.json | grep repro
  "//:repro": "5a9e8afc5f16068ed022e508cee1dbdf6f80704ca2bd311f2adaf583a761eb42",
  "//:repro.go": "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855",
  "//:repro_lib": "ea7e08316a3d2e988162cca450eed1ea906265720e6b31cba0f4436f3467a1c6",
 ± cat /tmp/final_hashes.json | grep repro
  "//:repro": "04061b86f238ddd24b44ac8e10165bfb13f71be62e8456f4aeb978619c695804",
  "//:repro.go": "8e581d2418038586912dd4df426854b6086275131ba07bc52b051f0476a4f252",
  "//:repro_lib": "008119f5e9030a1ffe85a1b5169851407a34a0c5ce6027b74fb9dafc6421c23c",

I was under the assumption that generate-hashes would hash all files, all the time, but it appears that it only considers files passed in via modifiedFilepaths. This aligns with what I'm seeing in my repro in which repro.go hashes to an empty file both times (e3b0c4) when this flag is not passed.

This makes sense and seems significantly faster when you have access to which files have changed, but poses some problems when operating in an environment in which you can't feasibly execute modified-filepaths (say a CI agent which does not have git history). I thought that calling generate-hashes twice without the flag would be acceptable albeit slow. This involved uploading the hashes for each commit to S3 and downloading the hashes for previous revision when calculating impacted targets.

Would you consider a patch to generate-hashes which hashes all files when provided a new flag? I could see this behavior being necessary for implementation of the query service.

@tinder-maxwellelliott
Copy link
Collaborator

@jonahgeorge I like the idea of allowing users to scan all input files via a flag, what flag name makes sense to you?

@jonahgeorge
Copy link
Author

jonahgeorge commented Jan 20, 2021

I'd lean towards something like --experimental_hash_all_targets until we know for sure this is a good idea. I'm poking around at implementing this right now, but running into some issues with retrieving the filepaths of source files.

Potentially related:

@jonahgeorge jonahgeorge changed the title Potential issue with hash function feature: experimental_hash_all_targets Jan 20, 2021
@tinder-maxwellelliott
Copy link
Collaborator

@jonahgeorge can you try out #52 locally and see if that works for you?

@tinder-maxwellelliott
Copy link
Collaborator

@jonahgeorge Did this end up working for you?

@jonahgeorge
Copy link
Author

Unfortunately trying it out of the box with my company's primary monorepo yielded bad results: After 20 minutes the generate-hashes command was still running. I'll try to carve out some time to try on a smaller repo and also investigate whether the hash generation could be parallelized.

@tinder-maxwellelliott
Copy link
Collaborator

Unfortunately trying it out of the box with my company's primary monorepo yielded bad results: After 20 minutes the generate-hashes command was still running. I'll try to carve out some time to try on a smaller repo and also investigate whether the hash generation could be parallelized.

Thank you @jonahgeorge, Ill admit this is now how we use the tool, we use the -m flag to drastically reduce the number of file reads

@tinder-maxwellelliott
Copy link
Collaborator

Closing this in favor of #54 , hash all targets was merged in #52

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants