Skip to content

Commit

Permalink
Make get_default_canonical_id public
Browse files Browse the repository at this point in the history
When `repository_ctx.download` or `repository_ctx.download_and_extract` are not given an explicit `canonical_id` the default behaviour can lead to some counterintuitive results (e.g. URL changed but old asset restored from cache due to unchanged checksum).

This PR seeks to bring greater attention to `canonical_id` in these low level API (relative to `http_archive` which uses `get_default_canonical_id` by default).

URLs are usually the most appropriate `canonical_id` choice, so `get_default_canonical_id` has been added to the public API and sample usage added to documentation.

Related to #22652

## TODO

I need some pointers to make these happen.

- [x] Add `get_default_canonical_id` to the Bazel docs (I believe there are other `load`ed API that is missing from docs in one way or another).
- [x] Confirm examples render correctly in docs.

Closes #22742.

PiperOrigin-RevId: 662822225
Change-Id: I1617aa16a62da2d8dff2034fef8ca19aecd33d58
  • Loading branch information
Silic0nS0ldier authored and copybara-github committed Aug 14, 2024
1 parent b03dfeb commit 6052ad6
Show file tree
Hide file tree
Showing 5 changed files with 88 additions and 16 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -602,11 +602,14 @@ private StructImpl completeDownload(PendingDownload pendingDownload)
name = "download",
doc =
"""
Downloads a file to the output path for the provided url and returns a struct \
containing <code>success</code>, a flag which is <code>true</code> if the \
download completed successfully, and if successful, a hash of the file \
with the fields <code>sha256</code> and <code>integrity</code>.
""",
Downloads a file to the output path for the provided url and returns a struct \
containing <code>success</code>, a flag which is <code>true</code> if the \
download completed successfully, and if successful, a hash of the file \
with the fields <code>sha256</code> and <code>integrity</code>. \
When <code>sha256</code> or <code>integrity</code> is user specified, setting an explicit \
<code>canonical_id</code> is highly recommended. e.g. \
<a href='/rules/lib/repo/cache#get_default_canonical_id'><code>get_default_canonical_id</code></a>
""",
useStarlarkThread = true,
parameters = {
@Param(
Expand Down Expand Up @@ -801,11 +804,14 @@ public Object download(
name = "download_and_extract",
doc =
"""
Downloads a file to the output path for the provided url, extracts it, and returns a \
struct containing <code>success</code>, a flag which is <code>true</code> if the \
download completed successfully, and if successful, a hash of the file with the \
fields <code>sha256</code> and <code>integrity</code>.
""",
Downloads a file to the output path for the provided url, extracts it, and returns a \
struct containing <code>success</code>, a flag which is <code>true</code> if the \
download completed successfully, and if successful, a hash of the file with the \
fields <code>sha256</code> and <code>integrity</code>. \
When <code>sha256</code> or <code>integrity</code> is user specified, setting an explicit \
<code>canonical_id</code> is highly recommended. e.g. \
<a href='/rules/lib/repo/cache#get_default_canonical_id'><code>get_default_canonical_id</code></a>
""",
useStarlarkThread = true,
parameters = {
@Param(
Expand Down
44 changes: 44 additions & 0 deletions src/test/shell/bazel/external_integration_test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -3036,6 +3036,50 @@ EOF
test -h "$execroot/external/+_repo_rules+ext" || fail "Expected symlink to external repo."
}

function test_default_canonical_id_enabled() {
cat > repo.bzl <<EOF
load("@bazel_tools//tools/build_defs/repo:cache.bzl", "get_default_canonical_id")
def _impl(rctx):
print("canonical_id", repr(get_default_canonical_id(rctx, ["url-1", "url-2"])))
rctx.file("BUILD", "")
dummy_repository = repository_rule(_impl)
EOF
touch BUILD
cat > MODULE.bazel <<EOF
dummy_repository = use_repo_rule('//:repo.bzl', 'dummy_repository')
dummy_repository(name = 'foo')
EOF

# NOTE: Test environment modifies defaults, so --repo_env must be explicitly set
bazel query @foo//:all --repo_env=BAZEL_HTTP_RULES_URLS_AS_DEFAULT_CANONICAL_ID=1 \
2>$TEST_log || fail 'Expected fetch to succeed'
expect_log "canonical_id \"url-1 url-2\""
}

function test_default_canonical_id_disabled() {
cat > repo.bzl <<EOF
load("@bazel_tools//tools/build_defs/repo:cache.bzl", "get_default_canonical_id")
def _impl(rctx):
print("canonical_id", repr(get_default_canonical_id(rctx, ["url-1", "url-2"])))
rctx.file("BUILD", "")
dummy_repository = repository_rule(_impl)
EOF
touch BUILD
cat > MODULE.bazel <<EOF
dummy_repository = use_repo_rule('//:repo.bzl', 'dummy_repository')
dummy_repository(name = 'foo')
EOF

# NOTE: Test environment modifies defaults, so --repo_env must be explicitly set
bazel query @foo//:all --repo_env=BAZEL_HTTP_RULES_URLS_AS_DEFAULT_CANONICAL_ID=0 \
2>$TEST_log || fail 'Expected fetch to succeed'
expect_log "canonical_id \"\""
}

function test_environ_incrementally() {
# Set up workspace with a repository rule to examine env vars. Assert that undeclared
# env vars don't trigger reevaluations.
Expand Down
8 changes: 4 additions & 4 deletions src/test/tools/bzlmod/MODULE.bazel.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions tools/build_defs/repo/BUILD
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@ genrule(
)

REPO_BZL_FILES = [
"cache",
"git",
"http",
"local",
Expand Down
25 changes: 23 additions & 2 deletions tools/build_defs/repo/cache.bzl
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@

"""Returns the default canonical id to use for downloads."""

visibility("private")
visibility("public")

DEFAULT_CANONICAL_ID_ENV = "BAZEL_HTTP_RULES_URLS_AS_DEFAULT_CANONICAL_ID"

Expand All @@ -37,7 +37,28 @@ machines without the file in the cache. This behavior can be disabled with
""".format(env = DEFAULT_CANONICAL_ID_ENV)

def get_default_canonical_id(repository_ctx, urls):
"""Returns the default canonical id to use for downloads."""
"""Returns the default canonical id to use for downloads.
Returns `""` (empty string) when Bazel is run with
`--repo_env=BAZEL_HTTP_RULES_URLS_AS_DEFAULT_CANONICAL_ID=0`.
e.g.
```python
load("@bazel_tools//tools/build_defs/repo:cache.bzl", "get_default_canonical_id")
# ...
repository_ctx.download_and_extract(
url = urls,
integrity = integrity
canonical_id = get_default_canonical_id(repository_ctx, urls),
),
```
Args:
repository_ctx: The repository context of the repository rule calling this utility
function.
urls: A list of URLs matching what is passed to `repository_ctx.download` and
`repository_ctx.download_and_extract`.
"""
if repository_ctx.os.environ.get(DEFAULT_CANONICAL_ID_ENV) == "0":
return ""

Expand Down

0 comments on commit 6052ad6

Please sign in to comment.