Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

http_archive: hash of content file different from origin file #20090

Closed
honwen opened this issue Nov 8, 2023 · 14 comments
Closed

http_archive: hash of content file different from origin file #20090

honwen opened this issue Nov 8, 2023 · 14 comments
Labels
team-OSS Issues for the Bazel OSS team: installation, release processBazel packaging, website type: bug untriaged

Comments

@honwen
Copy link

honwen commented Nov 8, 2023

Description of the bug:

hash of content file different from origin file

Which category does this issue belong to?

Core

What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

WORKSPACE:

workspace(
    name = "demo",
)

load("@bazel_tools//tools/build_defs/repo:http.bzl", "http_archive")

http_archive(
    name = "ruff_macos",
    build_file_content = """exports_files(["ruff"], visibility = ["//visibility:public"])""",
    sha256 = "27a2800606b417d8f3102354efd0c09b8e08c94f754aaea9809a74a76e7456da",
    urls = [
        "https://github.com/astral-sh/ruff/releases/download/v0.1.4/ruff-x86_64-apple-darwin.tar.gz",
    ],
)

Flow:

  1. bazel build @ruff_macos//:ruff

  2. sha256sum $(bazel info output_base)/external/ruff_macos/ruff

    cbb610d58da995d6cee14804e088be7e2ece50f85b7b3afef3fa7e617af522de


but origin file hash(sha256sum) is c644ed5d190a4db8b153e84b25e45ae4ed00be7e690d802feeecee22876f4a22 ruff

Which operating system are you running Bazel on?

linux(x86-64), macos(m1)

What is the output of bazel info release?

release 6.4.0

@sgowroji
Copy link
Member

sgowroji commented Nov 8, 2023

Hi @honwen, Could you please elaborate on your query with complete details? Thanks!

@sgowroji sgowroji added the team-OSS Issues for the Bazel OSS team: installation, release processBazel packaging, website label Nov 8, 2023
@honwen
Copy link
Author

honwen commented Nov 8, 2023

@sgowroji a way to reproduct problem

#!/bin/bash

ws_root=/tmp/demo

url=https://github.com/astral-sh/ruff/releases/download/v0.1.4/ruff-x86_64-apple-darwin.tar.gz

mkdir -p $ws_root
cd $ws_root

wget -q $url
sha256=$(sha256sum $(basename $url) | awk '{print $1}')
tar -zxf $(basename $url)

echo "# Hash of origin"
sha256sum ruff

cat <<EOF >WORKSPACE
workspace(
    name = "demo",
)

load("@bazel_tools//tools/build_defs/repo:http.bzl", "http_archive")

http_archive(
    name = "ruff_macos",
    build_file_content = """exports_files(["ruff"], visibility = ["//visibility:public"])""",
    sha256 = "$sha256",
    url = "$url",
)
EOF

bazel build @ruff_macos//:ruff

echo "# Hash of bazel http_archive"
sha256sum $(bazel info output_base)/external/ruff_macos/ruff

log:

# Hash of origin
c644ed5d190a4db8b153e84b25e45ae4ed00be7e690d802feeecee22876f4a22  ruff
Starting local Bazel server and connecting to it...
INFO: Analyzed target @ruff_macos//:ruff (1 packages loaded, 1 target configured).
INFO: Found 1 target...
WARNING: /root/.cache/bazel/_bazel_root/b9375f222b2f27c6ebce6859ade299db/external/ruff_macos/BUILD.bazel:1:14: @ruff_macos//:ruff is a source file, nothing will be built for it. If you want to build a target that consumes this file, try --compile_one_dependency
INFO: Elapsed time: 2.546s, Critical Path: 0.04s
INFO: 1 process: 1 internal.
INFO: Build completed successfully, 1 total action
# Hash of bazel http_archive
cbb610d58da995d6cee14804e088be7e2ece50f85b7b3afef3fa7e617af522de  /root/.cache/bazel/_bazel_root/b9375f222b2f27c6ebce6859ade299db/external/ruff_macos/ruff

problem

cbb610d58da995d6cee14804e088be7e2ece50f85b7b3afef3fa7e617af522de != c644ed5d190a4db8b153e84b25e45ae4ed00be7e690d802feeecee22876f4a22

@tjgq
Copy link
Contributor

tjgq commented Nov 8, 2023

$(bazel info output_base)/external/ruff_macos contains the decompressed contents of ruff-x86_64-apple-darwin.tar.gz. $(bazel info output_base)/external/ruff_macos/ruff is one of the files inside the .tar.gz, not the .tar.gz itself.

@honwen
Copy link
Author

honwen commented Nov 8, 2023

$(bazel info output_base)/external/ruff_macos contains the decompressed contents of ruff-x86_64-apple-darwin.tar.gz. $(bazel info output_base)/external/ruff_macos/ruff is one of the files inside the .tar.gz, not the .tar.gz itself.

file ruff is decompressed by tar from ruff-x86_64-apple-darwin.tar.gz

files external/ruff_macos/ruff is decompressed by bazel

the problem is this two files' hash is not the same

@tjgq
Copy link
Contributor

tjgq commented Nov 8, 2023

My apologies, you are entirely correct. It appears that the file in external has been corrupted - it contains a prefix of 512 bytes followed by (what I presume are) the correct file contents:

$ hexdump -C $(bazel info output_base)/external/ruff_macos/ruff | head -n 5
00000000  32 0a 30 0a 31 34 36 30  36 33 33 36 0a 31 34 36  |2.0.14606336.146|
00000010  31 34 35 32 38 0a 38 33  39 36 38 0a 00 00 00 00  |14528.83968.....|
00000020  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000200  cf fa ed fe 07 00 00 01  03 00 00 00 02 00 00 00  |................|

(Note that cf fa ed fe is the Mach-O header)

Interestingly, these bytes also appear in the decompressed .tar file:

$ gunzip ruff-x86_64-apple-darwin.tar.gz
$ hexdump -C ruff-x86_64-apple-darwin.tar | head -n 50
00000000  50 61 78 48 65 61 64 65  72 2f 72 75 66 66 00 00  |PaxHeader/ruff..|
00000010  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000060  00 00 00 00 30 30 30 37  35 35 20 00 30 30 30 37  |....000755 .0007|
00000070  36 35 20 00 30 30 30 30  32 34 20 00 30 30 30 30  |65 .000024 .0000|
00000080  30 30 30 30 32 30 32 20  31 34 35 32 31 32 36 31  |0000202 14521261|
00000090  31 36 35 20 30 31 34 33  37 30 00 20 78 00 00 00  |165 014370. x...|
000000a0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000100  00 75 73 74 61 72 00 30  30 72 75 6e 6e 65 72 00  |.ustar.00runner.|
00000110  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000120  00 00 00 00 00 00 00 00  00 73 74 61 66 66 00 00  |.........staff..|
00000130  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000140  00 00 00 00 00 00 00 00  00 30 30 30 30 30 30 20  |.........000000 |
00000150  00 30 30 30 30 30 30 20  00 00 00 00 00 00 00 00  |.000000 ........|
00000160  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000200  33 30 20 6d 74 69 6d 65  3d 31 36 39 39 30 34 36  |30 mtime=1699046|
00000210  30 30 35 2e 35 30 35 37  32 39 38 31 38 0a 32 32  |005.505729818.22|
00000220  20 47 4e 55 2e 73 70 61  72 73 65 2e 6d 61 6a 6f  | GNU.sparse.majo|
00000230  72 3d 31 0a 32 32 20 47  4e 55 2e 73 70 61 72 73  |r=1.22 GNU.spars|
00000240  65 2e 6d 69 6e 6f 72 3d  30 0a 32 34 20 47 4e 55  |e.minor=0.24 GNU|
00000250  2e 73 70 61 72 73 65 2e  6e 61 6d 65 3d 72 75 66  |.sparse.name=ruf|
00000260  66 0a 33 32 20 47 4e 55  2e 73 70 61 72 73 65 2e  |f.32 GNU.sparse.|
00000270  72 65 61 6c 73 69 7a 65  3d 31 34 36 39 38 34 39  |realsize=1469849|
00000280  36 0a 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |6...............|
00000290  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000400  47 4e 55 53 70 61 72 73  65 46 69 6c 65 2e 30 2f  |GNUSparseFile.0/|
00000410  72 75 66 66 00 00 00 00  00 00 00 00 00 00 00 00  |ruff............|
00000420  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000460  00 00 00 00 30 30 30 37  35 35 20 00 30 30 30 37  |....000755 .0007|
00000470  36 35 20 00 30 30 30 30  32 34 20 00 30 30 30 37  |65 .000024 .0007|
00000480  30 30 32 35 30 30 30 20  31 34 35 32 31 32 36 31  |0025000 14521261|
00000490  31 36 35 20 30 31 35 31  37 36 00 20 30 00 00 00  |165 015176. 0...|
000004a0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000500  00 75 73 74 61 72 00 30  30 72 75 6e 6e 65 72 00  |.ustar.00runner.|
00000510  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000520  00 00 00 00 00 00 00 00  00 73 74 61 66 66 00 00  |.........staff..|
00000530  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000540  00 00 00 00 00 00 00 00  00 30 30 30 30 30 30 20  |.........000000 |
00000550  00 30 30 30 30 30 30 20  00 00 00 00 00 00 00 00  |.000000 ........|
00000560  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000600  32 0a 30 0a 31 34 36 30  36 33 33 36 0a 31 34 36  |2.0.14606336.146|
00000610  31 34 35 32 38 0a 38 33  39 36 38 0a 00 00 00 00  |14528.83968.....|
00000620  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*

Given the appearance of the string GNUSparseFile, my suspicion is that the tar decompressor is failing to handle sparse files properly.

tjgq added a commit to tjgq/bazel that referenced this issue Nov 8, 2023
By upgrading to the latest version of the Apache Commons compress library.

Fixes bazelbuild#20090.
tjgq added a commit to tjgq/bazel that referenced this issue Nov 8, 2023
By upgrading to the latest version of the Apache Commons compress library.

Fixes bazelbuild#20090.
tjgq added a commit to tjgq/bazel that referenced this issue Nov 8, 2023
By upgrading to the latest version of the Apache Commons compress library.

Fixes bazelbuild#20090.
tjgq added a commit to tjgq/bazel that referenced this issue Nov 8, 2023
By upgrading to the latest version of the Apache Commons compress library.

Fixes bazelbuild#20090.
@alexeagle
Copy link
Contributor

Awesome, thank you @tjgq ! I was struggling to diagnose this since it seemed very unlikely that the "bug is in the compiler". Really appreciate your fast turnaround on a fix.

tjgq added a commit to tjgq/bazel that referenced this issue Nov 8, 2023
By upgrading to the latest version of the Apache Commons compress library.

Fixes bazelbuild#20090.
tjgq added a commit to tjgq/bazel that referenced this issue Nov 8, 2023
By upgrading to the latest version of the Apache Commons compress library.

Fixes bazelbuild#20090.
tjgq added a commit to tjgq/bazel that referenced this issue Nov 8, 2023
By upgrading the Apache Commons Compress library to 1.20. I'm
deliberately not upgrading to the most recent one (1.24.0)
because it would require an additional JDK module (java.desktop)
and significantly regress the binary size.

Fixes bazelbuild#20090.
tjgq added a commit to tjgq/bazel that referenced this issue Nov 9, 2023
By upgrading the Apache Commons Compress library to 1.20. I'm
deliberately not upgrading to the most recent one (1.24.0)
because it would require an additional JDK module (java.desktop)
and significantly regress the binary size.

Fixes bazelbuild#20090.
tjgq added a commit to tjgq/bazel that referenced this issue Nov 9, 2023
By upgrading the Apache Commons Compress library to 1.20. I'm
deliberately not upgrading to the most recent one (1.24.0)
because it would require an additional JDK module (java.desktop)
and significantly regress the binary size.

Fixes bazelbuild#20090.
keertk pushed a commit that referenced this issue Nov 9, 2023
By upgrading the Apache Commons Compress library to 1.20. I'm
deliberately not upgrading to the most recent one (1.24.0)
because it would require an additional JDK module (java.desktop)
and significantly regress the binary size.

Fixes #20090.

Closes #20110.

PiperOrigin-RevId: 580935354
Change-Id: I6c9728ac3fd925432f44a55efaef8f5b52d428c0
keertk added a commit that referenced this issue Nov 9, 2023
By upgrading the Apache Commons Compress library to 1.20. I'm
deliberately not upgrading to the most recent one (1.24.0) because it
would require an additional JDK module (java.desktop) and significantly
regress the binary size.

Fixes #20090.

Closes #20110.

Commit
93729f4

PiperOrigin-RevId: 580935354
Change-Id: I6c9728ac3fd925432f44a55efaef8f5b52d428c0

---------

Co-authored-by: Tiago Quelhas <tjgq@google.com>
@honwen
Copy link
Author

honwen commented Nov 10, 2023

Awesome

@FrancoisPoinsot
Copy link

FrancoisPoinsot commented Nov 20, 2023

I am running in the same issue
I tested a similar http_archive rule than originaly posted with bazel version set to 7.0.0rc4 which should contain the fix above.
I now get some new error:

	download_info = ctx.download_and_extract(
Error in download_and_extract: java.io.IOException: Error extracting /private/var/tmp/_bazel_francois/6d431d294d8a3653c13ebe30b5a7858d/external/ruff-bin_darwin_amd64/temp11509240615786190344/ruff-x86_64-apple-darwin.tar.gz to /private/var/tmp/_bazel_francois/6d431d294d8a3653c13ebe30b5a7858d/external/ruff-bin_darwin_amd64/temp11509240615786190344: Truncated TAR archive

Same result with the last 3 versions of ruff

@meteorcloudy
Copy link
Member

@FrancoisPoinsot Can you please file a new issue for this? We'll look into this asap.

@meteorcloudy
Copy link
Member

I can reproduce this locally, filed #20269

Wyverald pushed a commit that referenced this issue Dec 13, 2023
By upgrading the Apache Commons Compress library to 1.20. I'm
deliberately not upgrading to the most recent one (1.24.0)
because it would require an additional JDK module (java.desktop)
and significantly regress the binary size.

Fixes #20090.

Closes #20110.

PiperOrigin-RevId: 580935354
Change-Id: I6c9728ac3fd925432f44a55efaef8f5b52d428c0
iancha1992 pushed a commit that referenced this issue Dec 14, 2023
By upgrading the Apache Commons Compress library to 1.20. I'm
deliberately not upgrading to the most recent one (1.24.0) because it
would require an additional JDK module (java.desktop) and significantly
regress the binary size.

Fixes #20090.

Closes #20110.

PiperOrigin-RevId: 580935354
Change-Id: I6c9728ac3fd925432f44a55efaef8f5b52d428c0

Co-authored-by: Tiago Quelhas <tjgq@google.com>
@honwen
Copy link
Author

honwen commented Jan 15, 2024

@iancha1992

6.5.0rc1 does not fix this

$ bazel version
Bazelisk version: v1.19.0
Build label: 6.5.0rc1
Build target: bazel-out/k8-opt/bin/src/main/java/com/google/devtools/build/lib/bazel/BazelServer_deploy.jar
Build time: Fri Jan 12 16:16:29 2024 (1705076189)
Build timestamp: 1705076189
Build timestamp as int: 1705076189
$ bazel build @ruff_macos//:ruff

INFO: Repository ruff_macos instantiated at:
  /dummy/WORKSPACE:7:13: in <toplevel>
Repository rule http_archive defined at:
  /root/.cache/bazel/_bazel_root/055f5651039f56d0c617210297f70629/external/bazel_tools/tools/build_defs/repo/http.bzl:372:31: in <toplevel>
INFO: repository @ruff_macos' used the following cache hits instead of downloading the corresponding file.
 * Hash '27a2800606b417d8f3102354efd0c09b8e08c94f754aaea9809a74a76e7456da' for https://github.com/astral-sh/ruff/releases/download/v0.1.4/ruff-x86_64-apple-darwin.tar.gz
If the definition of 'repository @ruff_macos' was updated, verify that the hashes were also updated.
ERROR: An error occurred during the fetch of repository 'ruff_macos':
   Traceback (most recent call last):
        File "/root/.cache/bazel/_bazel_root/055f5651039f56d0c617210297f70629/external/bazel_tools/tools/build_defs/repo/http.bzl", line 132, column 45, in _http_archive_impl
                download_info = ctx.download_and_extract(
Error in download_and_extract: java.io.IOException: Error extracting /root/.cache/bazel/_bazel_root/055f5651039f56d0c617210297f70629/external/ruff_macos/temp11281371888047318188/ruff-x86_64-apple-darwin.tar.gz to /root/.cache/bazel/_bazel_root/055f5651039f56d0c617210297f70629/external/ruff_macos/temp11281371888047318188: Truncated TAR archive
ERROR: /codematrix/release/WORKSPACE:7:13: fetching http_archive rule //external:ruff_macos: Traceback (most recent call last):
        File "/root/.cache/bazel/_bazel_root/055f5651039f56d0c617210297f70629/external/bazel_tools/tools/build_defs/repo/http.bzl", line 132, column 45, in _http_archive_impl
                download_info = ctx.download_and_extract(
Error in download_and_extract: java.io.IOException: Error extracting /root/.cache/bazel/_bazel_root/055f5651039f56d0c617210297f70629/external/ruff_macos/temp11281371888047318188/ruff-x86_64-apple-darwin.tar.gz to /root/.cache/bazel/_bazel_root/055f5651039f56d0c617210297f70629/external/ruff_macos/temp11281371888047318188: Truncated TAR archive
ERROR: java.io.IOException: Error extracting /root/.cache/bazel/_bazel_root/055f5651039f56d0c617210297f70629/external/ruff_macos/temp11281371888047318188/ruff-x86_64-apple-darwin.tar.gz to /root/.cache/bazel/_bazel_root/055f5651039f56d0c617210297f70629/external/ruff_macos/temp11281371888047318188: Truncated TAR archive
INFO: Elapsed time: 0.080s
INFO: 0 processes.
FAILED: Build did NOT complete successfully (0 packages loaded)
    Fetching /root/.cache/bazel/_bazel_root/055f5651039f56d0c617210297f70629/external/ruff_macos; Extracting ruff-x86_64-apple-darwin.tar.gz

@alexeagle
Copy link
Contributor

The associated bug isn't fixed, so this should probably be reopened.
Note that rules_lint has a workaround for this (just call system tar on MacOS)

@tjgq
Copy link
Contributor

tjgq commented Jan 16, 2024

We might want to try upgrading to Apache Commons Compress 1.25 instead of 1.20 (I chose 1.20 for binary size reasons; I no longer recall if I actually tried 1.25 first, sorry) and file a bug against them with the offending file if it doesn't work.

@Wyverald
Copy link
Member

1.25 doesn't work either. I have an open issue against them at https://issues.apache.org/jira/projects/COMPRESS/issues/COMPRESS-654

The associated bug isn't fixed, so this should probably be reopened.

Let's not reopen this one. This issue is about sparse archives not being supported at all, and was fixed by upgrading Apache Commons Compress. #20269 is about Apache Commons Compress running into trouble with certain valid sparse archives.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
team-OSS Issues for the Bazel OSS team: installation, release processBazel packaging, website type: bug untriaged
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants