Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bootstrapFiles should be updated periodically to avoid arcane bugs in stale binaries #253713

Open
trofi opened this issue Sep 6, 2023 · 38 comments
Labels
0.kind: bug 6.topic: bootstrap Bootstrapping, avoiding pre-built binaries. Often overlaps with cross-compilation.

Comments

@trofi
Copy link
Contributor

trofi commented Sep 6, 2023

Current bootstrapFiles in pkgs/stdenv/linux/bootstrap-files are not that current. Some of then come from 2019, some of them are more recent.

This causes various issues when newer packages (usually toolchains like gcc) fail to build against old tools.

A few recent examples:

Suggestion: update bootstrapFiles as part of the NixOS release process at least for the platforms supported by Hydra. That way we would get guaranteed 6 month cadence of updates.

@trofi
Copy link
Contributor Author

trofi commented Sep 7, 2023

Another example where update of bootstrapTools update is worth it is missing files to produce PIE and static-PIE files:

@trofi trofi changed the title bootstrapFiles should be updated periorically to avoid arcane bugs in stalie binaries bootstrapFiles should be updated periorically to avoid arcane bugs in stale binaries Sep 7, 2023
@FliegendeWurst FliegendeWurst added the 6.topic: bootstrap Bootstrapping, avoiding pre-built binaries. Often overlaps with cross-compilation. label Sep 7, 2023
@trofi
Copy link
Contributor Author

trofi commented Sep 7, 2023

Yet another example is future transition to 64-bit time_t where would would need to rebuild bootstrapFiles a few more times as individual upstream packages are switching to 64-bit time_t internally:

@trofi trofi changed the title bootstrapFiles should be updated periorically to avoid arcane bugs in stale binaries bootstrapFiles should be updated periodically to avoid arcane bugs in stale binaries Sep 8, 2023
@Artturin
Copy link
Member

Artturin commented Sep 9, 2023

@lovesegfault

I have created this script for you to hopefully help you update the bootstrap files.

#!/usr/bin/env bash

# before running git check out the latest eval revision here https://hydra.nixos.org/jobset/nixpkgs/cross-trunk
# hydra-check only checks the newest evaluation

crossSystems=()
mapfile -t crossSystems < <(nix eval --impure --expr 'let rel-cross = (import ./pkgs/top-level/release-cross.nix {}).bootstrapTools; in builtins.attrNames rel-cross' --json | jq '.[]' --raw-output)

whatToGet=(bootstrapFiles.bootstrapTools bootstrapFiles.busybox)

for system in "${crossSystems[@]}"; do
    echo ""
    echo "$system"
    for what in "${whatToGet[@]}"; do
        if nix run "github:nix-community/hydra-check" -- --arch "" --jobset "nixpkgs/cross-trunk" "bootstrapTools.${system}.${what}" 1>/dev/null; then
            echo -n "${what}:"
            output=$(nix build --quiet --no-link --json -f pkgs/top-level/release-cross.nix "bootstrapTools.${system}.${what}" | jq '.[] | .outputs | .out' --raw-output)
            sha256sum "$output"
        else
            echo "${what} for ${system} did not build in this evaluation and may have to be built manually"
            echo "\$ nix build --quiet --no-link --json -f pkgs/top-level/release-cross.nix \"bootstrapTools.${system}.${what}\" | jq '.[] | .outputs | .out' --raw-output"
            echo ""
        fi
    done
done

in #151399 (comment) you asked for the sha256sum of all the on-server components, however shouldn't it be okay to directly upload the packages from bootstrapFiles

bootstrapFiles = {
# Make them their own store paths to test that busybox still works when the binary is named /nix/store/HASH-busybox
busybox = runCommand "busybox" {} "cp ${build}/on-server/busybox $out";
bootstrapTools = runCommand "bootstrap-tools.tar.xz" {} "cp ${build}/on-server/bootstrap-tools.tar.xz $out";

Please join https://matrix.to/#/#stdenv:nixos.org so we can figure out a way to make the process more convenient for you.

@Ma27
Copy link
Member

Ma27 commented Sep 11, 2023

cc @NixOS/nixos-release-managers and @vcunat

@vcunat
Copy link
Member

vcunat commented Sep 12, 2023

For the record, I believe it is practical for us to regenerate bootstrap tools once in a while, at least for the better supported platforms. Though twice a year might be an excessive rate IMO; every other year would probably still be OK.

@vcunat
Copy link
Member

vcunat commented Sep 12, 2023

I know that some people in the community would prefer to go the other way and reduce the binaries as much as possible. (quick example)

However as it is now, making the bootstrap sequence much longer would make it very painful to change some parts like stdenv's setup.sh, even if the early bootstrap phases don't need them. If people are interested in that, I think we'd need to do those steps "separately"... basically like an independent method of producing good-enough tools that would then be plugged into our current bootstrapping instead of the binaries.

@Artturin
Copy link
Member

I know that some people in the community would prefer to go the other way and reduce the binaries as much as possible. (quick example)

However as it is now, making the bootstrap sequence much longer would make it very painful to change some parts like stdenv's setup.sh, even if the early bootstrap phases don't need them. If people are interested in that, I think we'd need to do those steps "separately"... basically like an independent method of producing good-enough tools that would then be plugged into our current bootstrapping instead of the binaries.

It'll eventually be done once minimal-bootstrap is ready #227914

@vcunat
Copy link
Member

vcunat commented Sep 12, 2023

Ah, thanks. I missed or forgot that one.

@trofi
Copy link
Contributor Author

trofi commented Sep 15, 2023

Another casualty of stale binaries:

gcc-8 (used in bootstrapTools) uses response files for a linker as soon as there are response files in it's inputs.
gcc-12 does not exhibit this behaviour. Normally it would not be an issue if expand-response-params would alsays be available. But it's not available in early bootstrap.

This makes it harder to use response files unconditionally for things like #255192

@lovesegfault
Copy link
Member

Alright, taking this weekend to handle this, also I re-installed Element on my devices so that folks can reach me on Matrix.

Sorry for all the delays, work has been insane :)

@alyssais
Copy link
Member

Oh, convenient that this is going on — I've just got Hydra to start building native bootstrap binaries for x86_64-unknown-linux-musl and aarch64-unknown-linux-musl (previously these were built, uh, not by Hydra), so if those could also be included in any bootstrap update, that'd be great! The process should be exactly the same as for all the other platforms with native Hydra builds.

@alyssais
Copy link
Member

Do we still have time to get this done for 23.11?

@RaitoBezarius
Copy link
Member

It's unclear to me what should be done to unblock this, are we waiting on OP or a merge?

@alyssais
Copy link
Member

AIUI it's a matter of somebody with the permissions to upload to tarballs.nixos.org having the time and knowledge to do that. Usually that's @lovesegfault — I don't know if there's anybody else who meets those requirements.

@trofi
Copy link
Contributor Author

trofi commented Oct 21, 2023

I think we need a process to be:

  • documented
  • automated with a script
  • executed periodically

@RaitoBezarius
Copy link
Member

I think we need a process to be:

  • documented
  • automated with a script
  • executed periodically

Agreed, I am trying to understand if we would like to get this cycle done even manually and who would be up to build this process and submit it to infrastructure people so we can implement it for 24.05 probably.

@Artturin
Copy link
Member

Artturin commented Oct 22, 2023

One of the native bootstrap-files is failing to build
#258032
Would fix it but I don't know if it's the correct way
(A cross job should be added for it, but it'll still fail)

One of the cross bootstrap-files is failing to build

nix build --impure --expr 'with import ./pkgs/top-level/release-cross.nix {supportedSystems = [builtins.currentSystem];}; builtins.mapAttrs (k: v: v.bootstrapFiles.bootstrapTools) bootstrapTools'

Fails at the check phase of diffutils-x86_64-unknown-linux-musl

@trofi

This comment was marked as off-topic.

@RaitoBezarius

This comment was marked as off-topic.

@ghost

This comment was marked as off-topic.

@trofi

This comment was marked as off-topic.

@ghost

This comment was marked as off-topic.

@ghost

This comment was marked as off-topic.

@zimbatm

This comment was marked as abuse.

@trofi

This comment was marked as off-topic.

@ghost
Copy link

ghost commented Oct 24, 2023

I see libgcc failures on ppc32 and alpha.

@trofi, it's not a regression if there is no test coverage for it. Please add a test if you care about these platforms. Nobody else has, so it stands to reason that nobody else does. You can be the first!

You've been dredging up a sequence of increasingly-obscure platforms in order to argue for reverting a project I invested an extraordinary amount of time in. Two of these platforms are no longer available for sale and the third of which is a multi-million-dollar mainframe. I find it very hard to believe that you own an S390 or use a DEC Alpha on a regular basis.

Taking into account all of the above, you seem to be going out of your way to make this personal. Please don't do that.

@trofi
Copy link
Contributor Author

trofi commented Oct 25, 2023

It has nothing personal about it. Debian occasionally asks me to debug the bugs on ports they care about. I think it's reasonable to try and help them. I only point out they used to work and proposed a possible solution to make them work again.

But I can stop responding to any threads you are in on github (or anywhere else) if you think it's personal and it will make you feel better.

@vcunat
Copy link
Member

vcunat commented Oct 25, 2023

I agree we shouldn't block progress of anything on regressions for very "exotic" platforms (e.g. alpha, s390*). Some basic guidance was agreed on on https://github.com/nixos/rfcs/blob/master/rfcs/0046-platform-support-tiers.md

@trofi
Copy link
Contributor Author

trofi commented Oct 25, 2023

Absolutely. I would like to see at least a subset of bootstrapFiles to be updated on a regular basis. The to-be-written update script might need to be adapted to skip broken unsupported targets.

@lovesegfault
Copy link
Member

FWIW, even without a script, I am willing to take a whole day to update all bootstrap files on each release of NixOS for the foreseeable future.

If that's what folks want, I can propose a process that would allow me to do it in a single day. I don't think what we're lacking is automation, primarily, but coordination.

@alyssais
Copy link
Member

What coordination is lacking? Isn't it a case of downloading the latest builds of stdenvBootstrapTools (and maybe the cross versions for non-Hydra architectures), uploading them to tarballs.nixos.org, and updating the URLs and hashes? What is it that you need from other people?

@lovesegfault
Copy link
Member

Well, that's roughly it, but there are two important bits missing:

  1. I can't keep up with everything that's going on. Someone needs to prepare a "bootstrapFiles manifest" with everything that needs uploading and the associated info (see below for what that means)
  2. I can upload bootstrap tarballs, but I cannot remove them, so I am cautious when performing the upload as I cannot undo mistakes.

Here's the process I follow to upload a new tarball, @amjoseph-nixpkgs is probably the person most familiar with this, since I've uploaded a number of tarballs for them:

  1. Look at the Hydra build linked on GH, make sure nothing is off (built the
    right thing, on the right pkgs instance/channel/etc)
  2. Check the build logs on Hydra, make sure there are no glaring issues that did
    not end up failing the build
  3. Get the nix-store path, and sha256's from GH
  4. nix build -L $storePath
  5. Get the nixpkgs commit which generated the tarball
  6. Check it out, build it, make sure it's reproducible
  7. Figure out the correct subpath for this tarball
    1. x86_64-darwinstdenv-darwin/x86_64
  8. Check that the hashes within match the ones from GH
  9. Construct the upload path
    1. s3://nixpkgs-tarballs/$os/$arch/$nixpkgs_commit
    2. e.g. s3://nixpkgs-tarballs/stdenv-darwin/x86_64/05ef940b94fe76e7ac06ea45a625adc8e4be96f9
  10. Upload it
    1. e.g. aws s3 cp --recursive --acl public-read /nix/store/9h4d7s313wv3gkfwi493yr1wvdsz9lf2-stdenv-bootstrap-tools/on-server/ s3://nixpkgs-tarballs/stdenv-darwin/x86_64/05ef940b94fe76e7ac06ea45a625adc8e4be96f9
  11. Download the uploaded tarballs, check hashes from Hydra match the hashes from the local build and match the hashes that were downloaded.

Here's an example of this process being followed: #188334

This is very manual, and is a lot of work for me.

I think the coordination I'd like to see is:

  1. A boostrapTarballManifest file, which contains all the info above for all the tarballs we want to upload.
  2. A review process where, after the manifest is created for a given nixpkgs revision, folks share the burden of validating the tarballs (i.e. the work I do of looking through build logs, re-building the tarballs and verifying that they are reproducible, etc.)
  3. A tool I can use to ingest said manifest after it's been reviewed, and upload all the tarballs

I am happy to hack on (3), but I can't do the work of figuring out what (1) and (2) should look like, I think the stakeholders here should collaborate to define that.

@lovesegfault
Copy link
Member

A follow-on thought, one thing that deserves consideration is the structure of the files in s3, it's kind of nonsensical right now.

If the plan for the future is "we always update all the tarballs periodically", perhaps it should be:

bootstrap/<nixpkgs-commit>/<arch>/<contents>

So, for example, we'd have:

bootstrap/05ef940b94fe76e7ac06ea45a625adc8e4be96f9/x86_64-apple-darwin/<stuff>
bootstrap/05ef940b94fe76e7ac06ea45a625adc8e4be96f9/aarch64-apple-darwin/<stuff>
bootstrap/05ef940b94fe76e7ac06ea45a625adc8e4be96f9/x86_64-unknown-linux-gnu/<stuff>
...

(cc. @Ericson2314, @amjoseph-nixpkgs b/c of platform naming)

@sdht0
Copy link
Contributor

sdht0 commented Nov 1, 2023

Any chance this is still happening for 23.11?

@trofi
Copy link
Contributor Author

trofi commented Dec 7, 2023

Filed #272750 to provide a mechanism to override bootstrapFiles on user's side.

trofi added a commit to trofi/nixpkgs that referenced this issue Jan 26, 2024
This matches jobsets for cross-jobs. This way it' will be a bit easier
to automatically extract `bootstrapTools` for mass updates in
NixOS#253713
@trofi
Copy link
Contributor Author

trofi commented Jan 26, 2024

I wrote a tiny PoC to extract all the info from hydra release for a single target (I'll try to document one-target-at-a-time).

I found a few inconsistencies across cross/native jobsets. #284090 should fix one of them.

The PoC itself
cat pkgs/stdenv/linux/bootstrap-files/refresh-tarballs.bash
#!/usr/bin/env bash

# How it works:
#
# For a given <target>:
# 1. fetch latest successful '.dist` job that contains reference tp busybox and bootstrapFiles
# 2. fetch oldest evaluation that contained that path, extract nixpkgs commit
# 3. fetch the `.dist` artifacts
# 4. canculate hashes and crash the commit

usage() {
    echo "Usage: $0 [ --commit ] <target>" >&2
    echo "Examples:" >&2
    echo "    $0 --commit i686-unknown-linux-gnu" >&2
    echo "    $0 aarch64-unknown-linux-gnu" >&2
    exit 1
}

die() {
    echo "ERROR: $*" >&2
    exit 1
}

info() {
    echo "INFO: $*" >&2
}

[[ ${#@} -eq 0 ]] && usage

target=$1

# Native and cross jobsets differ a bit. We'll have to pick the concrete
# one based on target.
jobset=unknown-jobset
job=unknown-job
s3_prefix=unknown-s3-prefix

case $target in
    # native targets:
    i686-unknown-linux-gnu)
        jobset="nixpkgs/trunk"
        job="stdenvBootstrapTools.${target}.dist"
        s3_prefix="stdenv-linux/i686"
        ;;
    riscv64-unknown-linux-gnu)
        jobset="nixpkgs/cross-trunk"
        job="bootstrapTools.${target}.dist"
        s3_prefix="stdenv-linux/riscv64"
        ;;

    # cross targets:
    *)
        die "Unsupported '$target': please extend '$0' for it's support"
        ;;
esac

latest_build_uri="https://hydra.nixos.org/job/$jobset/$job/latest-finished"
latest_build="$target.latest-build"
info "Fetching latest successful build from '${latest_build_uri}'"
curl -s -H "Content-Type: application/json" -L "$latest_build_uri" > "$latest_build"
[[ $? -ne 0 ]] && die "Failed to fetch latest successful build"

# We pick oldest instead of latest to make the result more stable across
# across unrelated updates. Ideally two subsequent runs should produce
# the same output (provided there are no bootstrapTools updates
# committed between the two).
latest_build_id=$(jq '.id' < "$latest_build")
oldest_eval_id=$(jq '.jobsetevals|min' < "$latest_build")
build_uri="https://hydra.nixos.org/build/${latest_build_uri}"

eval_uri="https://hydra.nixos.org/eval/${oldest_eval_id}"
eval_meta="$target.eval-meta"
info "Fetching oldest eval details from '${eval_uri}' (can take a minute)"
curl -s -H "Content-Type: application/json"  -L "${eval_uri}" > "$eval_meta"
[[ $? -ne 0 ]] && die "Failed to fetch eval metadata"

nixpkgs_revision=$(jq --raw-output ".jobsetevalinputs.nixpkgs.revision" < "$eval_meta")

drvpath=$(jq --raw-output '.drvpath' < "${latest_build}")
outpath=$(jq --raw-output '.buildoutputs.out.path' < "${latest_build}")
build_timestamp=$(jq --raw-output '.timestamp' < "${latest_build}")
build_time=$(TZ=UTC LANG=C date --date="@${build_timestamp}" --rfc-email)

info "Fetching bootstrap tools manifest to calculate hashes from '${outpath}'"
bootstrap_manifest=$(nix-store --realize "${outpath}")/nix-support/hydra-build-products

# The file format is the following:
#    file tarball /nix/store/...-stdenv-bootstrap-tools/on-server/bootstrap-tools.tar.xz
#    file busybox /nix/store/...-stdenv-bootstrap-tools/on-server/busybox

tools_dir=unknown-tools-dir
bootstrap_tools_hash=unknown-bootstrap-tools-hash
busybox_hash=unknown-busybox-hash
while read type flavour path; do
    case "$type:$flavour" in
        file:tarball)
            tools_dir=$(dirname "$path")
            bootstrap_tools_hash=$(nix-hash --to-sri sha256:$(nix-prefetch-url --name bootstrap-tools.tar.xz "file://$path"))
            ;;
        file:busybox)
            busybox_hash=$(nix-hash --to-sri sha256:$(nix-prefetch-url --executable --name busybox "file://$path"))
            ;;
        *)
            die "'$type:$flavour' is unhandled file type"
    esac
done < "$bootstrap_manifest"

# Calculate hashes:
#   $ nix-hash --to-sri sha256:...

target_file="${target}.nix"
info "Writing '${target_file}'"
cat > "${target_file}" <<EOF
# Autogenerated by pkgs/stdenv/linux/bootstrap-files/refresh-tarballs.bash as:
# $ ./refresh-tarballs.bash ${target}
#
# Metadata:
# - latest hydra build: ${latest_build_uri}
# - resolved hydra build: ${build_uri}
# - build time: ${build_time}
# - nixpkgs revision: ${nixpkgs_revision}
# - manifest derivation: ${drvpath}
# - manifest output: ${outpath}
# - tools directory: ${tools_dir}
{
  busybox = import <nix/fetchurl.nix> {
    url = "http://tarballs.nixos.org/${s3_prefix}/${nixpkgs_revision}/busybox";
    hash = "${busybox_hash}";
    executable = true;
  };

  bootstrapTools = import <nix/fetchurl.nix> {
    url = "http://tarballs.nixos.org/${s3_prefix}/${nixpkgs_revision}/bootstrap-tools.tar.xz";
    hash = "${bootstrap_tools_hash}";
  };
}
EOF

I'll send PoC itself for review if it's a reasonable direction and will update at least i686 and musl targets to check the procedure with @lovesegfault

trofi added a commit to trofi/nixpkgs that referenced this issue Jan 28, 2024
…date tarballs

This script attempts to document the exact procedure used to upload
bootstrap binaries used previously. I modeled it after most recent
NixOS#282517 upload.

There is one deviation from it to make it easier to handle mass updates
for NixOS#253713:

The binaries are expected to be stored in `stdenv/$target` (and not
something like `stdenv-linux/i686`.

The script handles both native and cross- linux targets. `darwin` will
need a bit more work to fin into this scheme, but it should be easy.

Example run to generate `i686-linux` update:

    $ maintainers/scripts/bootstrap-files/refresh-tarballs.bash --commit --targets=i686-unknown-linux-gnu
trofi added a commit to trofi/nixpkgs that referenced this issue Jan 28, 2024
…date tarballs

This script attempts to document the exact procedure used to upload
bootstrap binaries used previously. I modeled it after most recent
NixOS#282517 upload.

There is one deviation from it to make it easier to handle mass updates
for NixOS#253713:

The binaries are expected to be stored in `stdenv/$target` (and not
something like `stdenv-linux/i686`.

The script handles both native and cross- linux targets. `darwin` will
need a bit more work to fin into this scheme, but it should be easy.

Example run to generate `i686-linux` update:

    $ maintainers/scripts/bootstrap-files/refresh-tarballs.bash --commit --targets=i686-unknown-linux-gnu
@trofi
Copy link
Contributor Author

trofi commented Jan 28, 2024

Proposed a script that automatically generates bootstrap script update as:

Example PR for i686-unknown-linux-gnu:

trofi added a commit to trofi/nixpkgs that referenced this issue Jan 28, 2024
…es/ directory

The change moves definition of bootstrap files slightly closer to
`linux` structure to eventually allow those to update in bulk:
NixOS#253713
@trofi
Copy link
Contributor Author

trofi commented Jan 28, 2024

Proposed a small unification on darwin side to get closer to linux WRT seed file location:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0.kind: bug 6.topic: bootstrap Bootstrapping, avoiding pre-built binaries. Often overlaps with cross-compilation.
Projects
None yet
Development

No branches or pull requests

10 participants