Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flag to include URLs in repository cache key when canonical_id not specified #14128

Closed
kmicklas opened this issue Oct 18, 2021 · 2 comments
Closed
Labels
P2 We'll consider working on this in future. (Assignee optional) team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file. team-OSS Issues for the Bazel OSS team: installation, release processBazel packaging, website type: feature request

Comments

@kmicklas
Copy link
Contributor

Description of the problem / feature request:

It would be nice to have a flag, e.g. --experimental_use_urls_as_default_canonical_id, which includes the URLs in the repository cache key if canonical_id isn't set manually in the call to download/download_and_extract. This can be used in a CI system to verify that invalid repository updates don't get checked in.

Feature requests: what underlying problem are you trying to solve with this feature?

It seems that users frequently run into issues with repository rules using cached downloads based on SHA256 despite some other detail (especially URLs) being changed and making it invalid. This isn't noticed until building without that cache entry. See #10630, #10612, #10353, #5144.

The way to prevent this is to use canonical_id to force redownloading, for example by setting it to the URL. The problem is that requiring developers to set canonical_id is prone to error. You could make custom wrappers for e.g. http_file to automatically use the URL as canonical_id, but in a large codebase it's possible this will be accidentally circumvented with the upstream rule definition.

What operating system are you running Bazel on?

macOS 11.6

What's the output of bazel info release?

release 4.2.1
kmicklas added a commit to kmicklas/bazel that referenced this issue Nov 12, 2021
…elp detect broken repository URLs

This new flag can be used to force redownloading when repository URLs are changed. Otherwise, it's possible broken URLs can be masked by the presence of a repository cache entry with the same hash. Specifying a `canonical_id` works as before, overriding this behavior.

Closes bazelbuild#14128.
@gregestren gregestren added team-OSS Issues for the Bazel OSS team: installation, release processBazel packaging, website type: feature request untriaged labels Nov 18, 2021
@aiuto
Copy link
Contributor

aiuto commented Jan 7, 2022

cc: @davido @meteorcloudy

@davido
Copy link
Contributor

davido commented Jan 8, 2022

@aiuto Thanks for the pointer.

@kmicklas Thanks, this is really good idea and is overdue (and should be even flipped to true per default in later Bazel releases).

I cherry-picked your PR and applied it to Gerrit Code Review project.

This worked as expected and flaged the incomplete dependency update and fail the build (and this is great).

Consider this (bad) diff:

diff --git a/WORKSPACE b/WORKSPACE
index bbc236780b..b031d60130 100644
--- a/WORKSPACE
+++ b/WORKSPACE
@@ -56,10 +56,18 @@ load("@com_google_protobuf//:protobuf_deps.bzl", "protobuf_deps")
 
 protobuf_deps()
 
+#http_archive(
+#    name = "build_bazel_rules_nodejs",
+#    canonical_id = "https://github.com/bazelbuild/rules_nodejs/releases/download/3.5.0/rules_nodejs-3.5.0.tar.gz",
+#    sha256 = "10f534e1c80f795cffe1f2822becd4897754d18564612510c59b3c73544ae7c6",
+#    urls = ["https://github.com/bazelbuild/rules_nodejs/releases/download/3.5.0/rules_nodejs-3.5.0.tar.gz"],
+#)
+
 http_archive(
     name = "build_bazel_rules_nodejs",
+    #canonical_id = "https://github.com/bazelbuild/rules_nodejs/releases/download/4.6.0/rules_nodejs-4.6.0.tar.gz",
     sha256 = "10f534e1c80f795cffe1f2822becd4897754d18564612510c59b3c73544ae7c6",
-    urls = ["https://github.com/bazelbuild/rules_nodejs/releases/download/3.5.0/rules_nodejs-3.5.0.tar.gz"],
+    urls = ["https://github.com/bazelbuild/rules_nodejs/releases/download/4.6.0/rules_nodejs-4.6.0.tar.gz"],
 )

The external dependency build_bazel_rules_nodejs was bumped from 3.5.0 to 4.6.0 but sha256 was missed to be updated. In real life, WORKSPACE file currently doesn't use canonical_id attribute, unfortunately. See here: [1].

With latest Bazel@HEAD (d67dc88), build Gerrit :release with the above incomplete/broken update would succeed (and this is really worrying):

  $  bazeldev build --experimental_repository_cache_urls_as_default_canonical_id=false :release
INFO: Invocation ID: 4a614779-24b7-4561-8638-4fecd228283b
INFO: Options provided by the client:
  Inherited 'common' options: --isatty=1 --terminal_columns=186
INFO: Reading rc options for 'build' from /home/davido/projects/gerrit2/.bazelrc:
  'build' options: --workspace_status_command=python3 ./tools/workspace_status.py --repository_cache=~/.gerritcodereview/bazel-cache/repository --action_env=PATH --disk_cache=~/.gerritcodereview/bazel-cache/cas --java_language_version=11 --java_runtime_version=remotejdk_11 --tool_java_language_version=11 --tool_java_runtime_version=remotejdk_11 --incompatible_strict_action_env --announce_rc
INFO: Reading rc options for 'build' from /home/davido/.bazelrc:
  'build' options: --experimental_generate_json_trace_profile --experimental_profile_cpu_usage
INFO: Analyzed target //:release (0 packages loaded, 0 targets configured).
INFO: Found 1 target...
Target //:release up-to-date:
  bazel-bin/release.war
INFO: Elapsed time: 0.963s, Critical Path: 0.67s
INFO: 1 process: 1 internal.
INFO: Build completed successfully, 1 total action  

With your PR and with option flipped: --experimental_repository_cache_urls_as_default_canonical_id=true the build is failing, as expected, and that's great:

  $ bazeldev build --experimental_repository_cache_urls_as_default_canonical_id=true :release
INFO: Invocation ID: 0a232066-336e-4013-8217-7d6086a45339
INFO: Options provided by the client:
  Inherited 'common' options: --isatty=1 --terminal_columns=186
INFO: Reading rc options for 'build' from /home/davido/projects/gerrit2/.bazelrc:
  'build' options: --workspace_status_command=python3 ./tools/workspace_status.py --repository_cache=~/.gerritcodereview/bazel-cache/repository --action_env=PATH --disk_cache=~/.gerritcodereview/bazel-cache/cas --java_language_version=11 --java_runtime_version=remotejdk_11 --tool_java_language_version=11 --tool_java_runtime_version=remotejdk_11 --incompatible_strict_action_env --announce_rc
INFO: Reading rc options for 'build' from /home/davido/.bazelrc:
  'build' options: --experimental_generate_json_trace_profile --experimental_profile_cpu_usage
INFO: Repository build_bazel_rules_nodejs instantiated at:
  /home/davido/projects/gerrit2/WORKSPACE:66:13: in <toplevel>
Repository rule http_archive defined at:
  /home/davido/.cache/bazel/_bazel_davido/5c01f4f713b675540b8b424c5c647f63/external/bazel_tools/tools/build_defs/repo/http.bzl:364:31: in <toplevel>
WARNING: Download from https://github.com/bazelbuild/rules_nodejs/releases/download/4.6.0/rules_nodejs-4.6.0.tar.gz failed: class com.google.devtools.build.lib.bazel.repository.downloader.UnrecoverableHttpException Checksum was ddb78717b802f8dd5d4c01c340ecdc007c8ced5c1df7db421d0df3d642ea0580 but wanted 10f534e1c80f795cffe1f2822becd4897754d18564612510c59b3c73544ae7c6
ERROR: An error occurred during the fetch of repository 'build_bazel_rules_nodejs':
   Traceback (most recent call last):
	File "/home/davido/.cache/bazel/_bazel_davido/5c01f4f713b675540b8b424c5c647f63/external/bazel_tools/tools/build_defs/repo/http.bzl", line 111, column 45, in _http_archive_impl
		download_info = ctx.download_and_extract(
Error in download_and_extract: java.io.IOException: Error downloading [https://github.com/bazelbuild/rules_nodejs/releases/download/4.6.0/rules_nodejs-4.6.0.tar.gz] to /home/davido/.cache/bazel/_bazel_davido/5c01f4f713b675540b8b424c5c647f63/external/build_bazel_rules_nodejs/temp5604337858586509616/rules_nodejs-4.6.0.tar.gz: Checksum was ddb78717b802f8dd5d4c01c340ecdc007c8ced5c1df7db421d0df3d642ea0580 but wanted 10f534e1c80f795cffe1f2822becd4897754d18564612510c59b3c73544ae7c6
ERROR: /home/davido/projects/gerrit2/WORKSPACE:66:13: fetching http_archive rule //external:build_bazel_rules_nodejs: Traceback (most recent call last):
	File "/home/davido/.cache/bazel/_bazel_davido/5c01f4f713b675540b8b424c5c647f63/external/bazel_tools/tools/build_defs/repo/http.bzl", line 111, column 45, in _http_archive_impl
		download_info = ctx.download_and_extract(
Error in download_and_extract: java.io.IOException: Error downloading [https://github.com/bazelbuild/rules_nodejs/releases/download/4.6.0/rules_nodejs-4.6.0.tar.gz] to /home/davido/.cache/bazel/_bazel_davido/5c01f4f713b675540b8b424c5c647f63/external/build_bazel_rules_nodejs/temp5604337858586509616/rules_nodejs-4.6.0.tar.gz: Checksum was ddb78717b802f8dd5d4c01c340ecdc007c8ced5c1df7db421d0df3d642ea0580 but wanted 10f534e1c80f795cffe1f2822becd4897754d18564612510c59b3c73544ae7c6
ERROR: no such package '@build_bazel_rules_nodejs//': java.io.IOException: Error downloading [https://github.com/bazelbuild/rules_nodejs/releases/download/4.6.0/rules_nodejs-4.6.0.tar.gz] to /home/davido/.cache/bazel/_bazel_davido/5c01f4f713b675540b8b424c5c647f63/external/build_bazel_rules_nodejs/temp5604337858586509616/rules_nodejs-4.6.0.tar.gz: Checksum was ddb78717b802f8dd5d4c01c340ecdc007c8ced5c1df7db421d0df3d642ea0580 but wanted 10f534e1c80f795cffe1f2822becd4897754d18564612510c59b3c73544ae7c6
INFO: Elapsed time: 0.934s
INFO: 0 processes.
FAILED: Build did NOT complete successfully (0 packages loaded)

[1] https://gerrit.googlesource.com/gerrit/+/refs/heads/master/WORKSPACE#60

@meteorcloudy meteorcloudy added P2 We'll consider working on this in future. (Assignee optional) team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file. and removed untriaged labels Jan 10, 2022
brentleyjones pushed a commit to brentleyjones/bazel that referenced this issue Mar 7, 2022
…elp detect broken repository URLs

This new flag can be used to force redownloading when repository URLs are changed. Otherwise, it's possible broken URLs can be masked by the presence of a repository cache entry with the same hash. Specifying a `canonical_id` works as before, overriding this behavior.

Closes bazelbuild#14128.

Closes bazelbuild#14268.

PiperOrigin-RevId: 420976730
(cherry picked from commit f9882e4)
Wyverald pushed a commit that referenced this issue Mar 7, 2022
…elp detect broken repository URLs (#14989)

This new flag can be used to force redownloading when repository URLs are changed. Otherwise, it's possible broken URLs can be masked by the presence of a repository cache entry with the same hash. Specifying a `canonical_id` works as before, overriding this behavior.

Closes #14128.

Closes #14268.

PiperOrigin-RevId: 420976730
(cherry picked from commit f9882e4)

Co-authored-by: Ken Micklas <kmicklas@uber.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P2 We'll consider working on this in future. (Assignee optional) team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file. team-OSS Issues for the Bazel OSS team: installation, release processBazel packaging, website type: feature request
Projects
None yet
5 participants