Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unstable/unstable-small boot tests blocked due to alsa-firmware failing nix verify #132286

Closed
JJJollyjim opened this issue Aug 1, 2021 · 18 comments
Labels
0.kind: bug Something is broken 1.severity: channel blocker Blocks a channel

Comments

@JJJollyjim
Copy link
Member

Describe the bug

nix verify fails in the boot tests, like so:

machine # path '/nix/store/7d37hb96a4m0z7kx0crh7vqyi24r1iwz-alsa-firmware-1.2.1' was modified! expected hash 'sha256:04051aw1fpag65s5nrvaj8bf55x8nzckq6610lz0vfxy7sh9cp7n', got 'sha256:154zsxs2d2zxqgr0mia3yj26m5szwygab3z122ws9ydzq9ilglkw'
machine: output: 
error: 
Traceback (most recent call last):
  File "/nix/store/mb9lhv3n6h20ybbpr2j8xdhbv5jgijwb-nixos-test-driver/bin/.nixos-test-driver-wrapped", line 943, in run_tests
    exec(tests, globals())
  File "<string>", line 1, in <module>
  File "<string>", line 10, in <module>
  File "/nix/store/mb9lhv3n6h20ybbpr2j8xdhbv5jgijwb-nixos-test-driver/bin/.nixos-test-driver-wrapped", line 483, in succeed
    raise Exception(
Exception: command `nix verify -r --no-trust /run/current-system` failed (exit code 1)
cleaning up
(0.00 seconds)

See e.g. the history here: https://hydra.nixos.org/job/nixos/unstable-small/nixos.tests.boot.biosCdrom.x86_64-linux/all

Steps To Reproduce

On master, nix-build nixos/release.nix -A tests.boot.biosCdrom.x86_64-linux. This failed for me on the first try:

machine # path '/nix/store/fyx0rpdshfb9np7sb2dqr50rm72xzwkh-alsa-firmware-1.2.1' was modified! expected hash 'sha256:04051aw1fpag65s5nrvaj8bf55x8nzckq6610lz0vfxy7sh9cp7n', got 'sha256:154zsxs2d2zxqgr0mia3yj26m5szwygab3z122ws9ydzq9ilglkw'

Interestingly, that path is fine on my host machine:

$ nix verify -r --no-trust /nix/store/fyx0rpdshfb9np7sb2dqr50rm72xzwkh-alsa-firmware-1.2.1
$ echo $?
0
$

Expected behavior

The boot tests succeed, and channels are unblocked.

Notify maintainers

Unclear who to ping here, since I'm not sure if this is a nix daemon bug, a corruption issue on cache.nixos.org, or an alsa-firmware problem...

Maintainer information:

# a list of nixpkgs attributes affected by the problem
attribute: alsa-firmware
# a list of nixos modules affected by the problem
module:
@JJJollyjim JJJollyjim added 0.kind: bug Something is broken 1.severity: channel blocker Blocks a channel labels Aug 1, 2021
@FRidh
Copy link
Member

FRidh commented Aug 1, 2021

@roberth I think you referred to a related Nix issue earlier when I asked, but I can't recall what it was.

@vcunat
Copy link
Member

vcunat commented Aug 1, 2021

It was this PR: #123943 (but not in the sense that it fixes this issue)

@vcunat
Copy link
Member

vcunat commented Aug 1, 2021

It would be nice to understand why the modification happens.

I tried just dumb bisection. The problem seems reliably reproducible on my machine, and I arrived at c5114b3. (retried multiple times on the commit and its parent) But it doesn't really make sense, and reverting that merge on master does not fix it for me anyway.

@JJJollyjim
Copy link
Member Author

I grabbed the nar out of a running test VM with nix-store --dump /nix/store/fyx0rpdshfb9np7sb2dqr50rm72xzwkh-alsa-firmware-1.2.1.

Here's the diff-of-xxds:
vimdiff <(xxd fyx0rpdshfb9np7sb2dqr50rm72xzwkh-alsa-firmware-1.2.1.nar-from-vm) <(curl https://cache.nixos.org/nar/1qsqiayqgg7y3pvzrsg1mkxbq2y33mbbwv6wc315b434l9vh002i.nar.xz | unxz - | xxd)

image

@JJJollyjim
Copy link
Member Author

Checking edolstra's PHD thesis to find out what this actually means :)

@vcunat
Copy link
Member

vcunat commented Aug 1, 2021

Maybe it would be easier to somehow "unpack" it into a directory and compare those.

@JJJollyjim
Copy link
Member Author

JJJollyjim commented Aug 1, 2021

It's the sizes of the files /nix/store/fyx0rpdshfb9np7sb2dqr50rm72xzwkh-alsa-firmware-1.2.1/share/alsa/firmware/usx2yloader/us224.prepad and /nix/store/fyx0rpdshfb9np7sb2dqr50rm72xzwkh-alsa-firmware-1.2.1/share/alsa/firmware/usx2yloader/us428.prepad.

In the cache version, they're 70 and 72 bytes respectively, in the vm version, they're 71 each.

@JJJollyjim
Copy link
Member Author

JJJollyjim commented Aug 1, 2021

If I'm understanding correctly, this is a simple non-reproducable build, and due to some messy nix behavior, that is manifesting as a failed verify, since the narinfos come from two places with two different ideas of what the final output hash should be?

I don't believe enforcing true reproducibility as a channel blocker is the intention of this test, right?

@vcunat
Copy link
Member

vcunat commented Aug 1, 2021

I had checked that first: on my machine the nix-build -QA alsa-firmware --check always succeeds (i.e. reproduces the hash from cache.nixos.org). Of course, in the VM it might be different, but so far I suspect the problem is somewhere else.

@JJJollyjim
Copy link
Member Author

Ah of course, the VM isn't actually doing the build it's presumably getting it from the same place as the host. Which means something spookier is going on...

@vcunat
Copy link
Member

vcunat commented Aug 1, 2021

Maybe it's some bug related to those files containing just zero bytes.

@JJJollyjim
Copy link
Member Author

Ah yeah, looking at the alsa-firmware sources these files do seem to just be static files, so the build being truely reproducable seems unlikely. The correct sizes are:

.rw-r--r-- 71 jamie 14 Nov  2019 us122.prepad
.rw-r--r-- 70 jamie 14 Nov  2019 us224.prepad
.rw-r--r-- 72 jamie 14 Nov  2019 us428.prepad

So cache.nix.org is correct and the VM is incorrect. I assume the VM is getting it from the iso file it's booting? Will check that next.

Cosmic rays? :3

@JJJollyjim
Copy link
Member Author

Yep, I have mounted the squashfs from the ISO and it is indeed wrong:

-r--r--r-- 1 root root 71 Jan  1  1970 share/alsa/firmware/usx2yloader/us122.prepad
-r--r--r-- 1 root root 71 Jan  1  1970 share/alsa/firmware/usx2yloader/us224.prepad
-r--r--r-- 1 root root 71 Jan  1  1970 share/alsa/firmware/usx2yloader/us428.prepad

@vcunat
Copy link
Member

vcunat commented Aug 1, 2021

I suspect plougher/squashfs-tools@19b161c I'll verify.

@ncfavier
Copy link
Member

ncfavier commented Aug 1, 2021

That commit doesn't seem to be part of squashfs-tools 4.5 though.

@JJJollyjim
Copy link
Member Author

JJJollyjim commented Aug 1, 2021

No, the bug was introduced in 4.5, and the fix is not yet part of a release, see the top of the readme:

NEWS
----

2021-07-25 Important bug found in release.

A new point release will be forthcomming in the
next couple of days.  Sooner if no other release
bugs are reported.

@ncfavier
Copy link
Member

ncfavier commented Aug 1, 2021

Ah, thanks

@vcunat vcunat closed this as completed in bc3416a Aug 1, 2021
@vcunat
Copy link
Member

vcunat commented Aug 1, 2021

Yes, that patch fixed the bug. At least for me.

I assume it was elusive because sparsity of files isn't perfectly reproducible (especially when you have /nix/store on a different FS, or maybe the order of files mattered).

JJJollyjim added a commit to JJJollyjim/nixpkgs that referenced this issue Aug 1, 2021
This is the same test which blocks nixos-unstable-small. It recently
caused a long blockage, due to a regression in squashfsTools itself
corrupting the iso image, see NixOS#132286.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0.kind: bug Something is broken 1.severity: channel blocker Blocks a channel
Projects
None yet
Development

No branches or pull requests

4 participants