Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC 0087] Promote aarch64-linux to Tier 1 support #87

Closed
wants to merge 16 commits into from

Conversation

vikanezrimaya
Copy link
Member

@vikanezrimaya vikanezrimaya commented Mar 9, 2021

Suggested in nixos-org-configurations#142.

Did aarch64-linux gather enough attention to receive a promotion to a supported architecture? Maybe it should block channels now, since a lot of users are requesting a channel for aarch64 builds and support is improving. The only question is: is aarch64-linux stable and build times are swift enough to not be a deadweight on x86_64-linux-based channels?

Rendered: https://github.com/kisik21/rfcs/blob/patch-1/rfcs/0087-aarch64-tier1.md

Accepting this RFC will make NixOS/infra#142 and NixOS/nixpkgs#83049 obsolete.

Suggested in [nixos-org-configurations#142](NixOS/infra#142).

`aarch64-linux` gathers enough attention to receive a promotion to a supported architecture?

Rendered: TBA
@vikanezrimaya vikanezrimaya changed the title Create 0087-aarch64-tier1.md [RFC 0087] Promote aarch64-linux to Tier 1 support Mar 9, 2021
@vikanezrimaya vikanezrimaya marked this pull request as draft March 9, 2021 18:14
Thanks @grahamc

Co-authored-by: Graham Christensen <graham@grahamc.com>
rfcs/0087-aarch64-tier1.md Outdated Show resolved Hide resolved
rfcs/0087-aarch64-tier1.md Outdated Show resolved Hide resolved
rfcs/0087-aarch64-tier1.md Outdated Show resolved Hide resolved
vikanezrimaya and others added 4 commits March 9, 2021 19:01
It's mostly perceived coverage due to channel bumps waiting for stuff as @samueldr noted
Co-authored-by: Graham Christensen <graham@grahamc.com>
Thank you for helping me with this RFC ✨
Co-authored-by: Graham Christensen <graham@grahamc.com>
to block channel advances in case of failures.

Merging this RFC should happen simultaneously with the merging of documentation and perhaps
a NixOS module for configuring qemu-binfmt as an aarch64 builder on x86_64 machines.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At one time in the past, building with qemu-user+binfmt was iffy. It might have been specific to armv6 though. I personally wouldn't consider suggesting qemu-user+binfmt until it has been shown that a full rebuild without nixos.org cache works for aarch64.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sometimes it is iffy for me on aarch64 or armv7l IIRC. I remember not being able to build some packages with emulation, having to resort to natively building them. Sadly I don't remember which ones 😢

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can add a note about setting up some sort of sub-project to track down and debug these issues, but I'm afraid this might be out-of-scope for this RFC. Should I?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a recommendation that is unrelated to the RFC?

Relatedly, I would prefer that the wording doesn't suggest qemu-user+binfmt as an approved method of testing aarch64-linux. But I don't know how to phrase any of this :/.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugh, I'm also confused around precise phrasing for this, but I'll probably need to change the wording to decrease the importance of that binfmt bit in the RFC manuscript somehow. Let me try something...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries, I don't know either, let's hear what other people think about how to approach suggesting-but-not-ratifying-qemu+binfmt.

Copy link
Member Author

@vikanezrimaya vikanezrimaya Mar 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a note in Future work suggesting setting up an effort to track down emulation errors: 6c46f9e

@samueldr
Copy link
Member

samueldr commented Mar 9, 2021

Note that we have a strong precedent in favor of this:

In 2017:
NixOS/nixpkgs@74c4e30

In 2018:
NixOS/nixpkgs#52534
NixOS/nixpkgs@36a0c13

Which was changed to limited support due to technical issues with the hydra evaluator.

NixOS/nixpkgs@1bfe8f1

Limited support is our current status on the situation.

Note that the hydra evaluator has been rewritten since. It is extremely likely it will now evaluate fine as a supported system.

There are build failures sightings when using emulation, these need to be tracked down (but that's out-of-scope for this RFC).
@vikanezrimaya
Copy link
Member Author

@samueldr I added the prior art to the Motivation section. Please tell me if it's more appropriate elsewhere.

# Alternatives
[alternatives]: #alternatives

Create an aarch64-focused channel that would build same things current `unstable` does, but for aarch64 only. This has a significant drawback: it is possible for the x86_64 channel and the aarch64 channels to never pass on the same commit, making deployment to a heterogeneous cluster of x86_64 and aarch64 machines very challenging.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't find this argument very persuasive. Track master instead and let your own CI perform tests that are relevant for you. The amount of rebuilding you get on master will typically be small. Same goes for stable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where would one run this CI? Not everyone has a Raspberry Pi or a whole cluster connected to SSDs for fast builds (I don't build on my Raspberry Pi for this reason - the builds are slower than if I'd do it on my laptop via emulation, and Hydra would be simply a lot faster than any laptop in existence, speeding it up further), and those Raspberry Pi users who run just one or two would be part of the target audience of this RFC.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Asking people to run their own CI for what is supposedly still a Linux distribution for users it not a good argument. Many of us that are involved in the project do that for their person stuff. I started the discussion about having the channel because someone that just started out with NixOS (on a RPi) was having issues with lots and lots of rebuilds on their little machine.
Making adoption easier is one of the goals we should be aiming for. Asking for huge leaps (no FP background to full custom Nix CI...) is not a good way.

rfcs/0087-aarch64-tier1.md Outdated Show resolved Hide resolved
rfcs/0087-aarch64-tier1.md Outdated Show resolved Hide resolved

# Alternatives
[alternatives]: #alternatives

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am of the opinion a separate channel should be the first step. Show there are enough resources and commitment to keep it green for half a year to a year. If that is the case, we can upgrade it to Tier 1. I don't think generally there is enough understanding in how much day-to-day effort goes into actually keeping the channels green. At the same time, I have no idea how few breakage there is with aarch64 nowadays.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added your suggestion to the manuscript.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The channel already blocks on aarch64-linux since late 2018 for the limited support set. So I guess we're ready to upgrade it to Tier 1 since things were kept green for half a year to a year.

It's all about having the jobs being tried to be built. Not about adding new blockers.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth specifying which channel we are talking about here. If you mean the unstable channel: it mostly works. If you mean the stable channel: It has been blocked for a few days when I opened the channel PR and probably nobody noticed yet.

A separate channel could let us gauge how much work we'll need to keep the channel green.

Co-authored-by: Frederik Rietdijk <fridh@fridh.nl>
rfcs/0087-aarch64-tier1.md Outdated Show resolved Hide resolved
Added a note that we didn't fully disable builds, just demoted the architecture to partial support.

Co-authored-by: Samuel Dionne-Riel @samueldr
I think the note in Detailed design about getting AWS instances in case we need burst capacity solves the question about having enough CPU power.
@vikanezrimaya vikanezrimaya marked this pull request as ready for review March 10, 2021 11:45
@lheckemann lheckemann added status: open for nominations Open for shepherding team nominations and removed status: new labels Apr 1, 2021
@vikanezrimaya
Copy link
Member Author

@dhess The "advance notice" is part of the amended manuscript I'm working on right now - I'm planning on adding a proposal to ping everyone on the PR that would implement the RFC, but we could also bring the people in charge on board right here in this pull request by pinging them, if this is acceptable. (My first thought would be to ping our maintainer team but then I realized it has 1630 people in it and quickly scrapped that idea - targeted pings on stdenv components' maintainers and packages usually included in the core NixOS install could be more effective)

@vikanezrimaya
Copy link
Member Author

A small detour into Nixpkgs maintainer lists has given me the following snippet that will print out a list of people we might need to ping first, comprised of maintainers of packages I remember being a part of stdenv and all the packages included with a default NixOS install (see nixos/modules/config/system-path.nix)

Nix snippet (launch with nix eval --impure -f ./core-maintainers.nix)
let
  nixpkgs = (builtins.getFlake "nixpkgs");
  pkgs = nixpkgs.legacyPackages.aarch64-linux;
  lib = nixpkgs.lib;
  dedupList = list: nixpkgs.lib.attrNames (nixpkgs.lib.listToAttrs (map (v: { name = v; value = null; }) list));
in
dedupList (map (m: m.github) (nixpkgs.lib.flatten (map (package: package.meta.maintainers or []) (with pkgs; [
  # Notable packages from stdenv or its dependencies - incomplete! should probably include whole closure of stdenv if I knew how to compute it and its maintainers
  stdenv.cc
  stdenv.cc.libc
  binutils
  coreutils
  findutils
  diffutils
  patch
  gnumake
  autoconf
  perl
  gnum4
  libidn2
  libunistring
  dejagnu
  gmp mpfr libmpc
  gettext linuxHeaders
  bison
  zlib
  texinfo
] ++ [
  # Packages included by default in a NixOS install. (see nixos/modules/config/system-path.nix)
  linuxPackages.kernel
  acl
  attr
  bashInteractive
  bzip2
  coreutils-full
  cpio
  curl
  diffutils
  findutils
  gawk
  getent
  getconf
  gnugrep
  gnupatch
  gnused
  gnutar
  gzip
  xz
  less
  libcap
  ncurses
  netcat
  openssh
  mkpasswd
  procps
  su
  time
  util-linux
  which
  zstd
  nano
  perl
  rsync
  strace
]))))

Evaluating this Nix file should produce a list of maintainers of notable or important packages who might need an advance notice. @dhess is this list sufficient and should we ping them here or in a future implementation RFC?

 - Added suggestions on how to alleviate increased work on maintainers of critical packages in case of a platform-specific breakage based on prior art in RFCs
 - Added an alternative way to solve the problem discussed in NixOS#87 (comment)
 - Added counterarguments to the drawback section discussed in the first meeting
@dhess
Copy link

dhess commented Nov 20, 2021

I'm certainly not the expert! Maybe @domenkozar or @grahamc can weigh in?

My non-expert opinion: it seems likely that most of the packages you've identified there won't significantly be impacted by aarch64-specific issues, so you could probably make this a more manageable set of packages & maintainers by eliminating those from the first pass.

The ones that that I would personally start with are compilers, languages, and associated tooling (looks like cc, libc, perl, and binutils in your list there), and the Linux kernel — basically, anything that seems likely to have an architecture-specific component in its source code.

@vikanezrimaya
Copy link
Member Author

First round of updates based on meeting notes seems to be done. Please pester me if I missed anything.

@dhess language-specific components do seem like a good starting point. I will see if the list can be updated to include things like Rust and compilers for other languages I know of and Nixpkgs has in its collection.

@dhess
Copy link

dhess commented Nov 20, 2021

@dhess language-specific components do seem like a good starting point. I will see if the list can be updated to include things like Rust and compilers for other languages I know of and Nixpkgs has in its collection.

Yeah, that's a good point. Even though Python, Rust, Go, & Haskell (just to name a few examples) aren't part of stdenv or the installer release, they're important enough that if aarch64-linux is promoted to Tier 1, they'll be expected to work just as well as they do on x86_64-linux. Therefore, we should be cognizant of the support burden we'll be adding to the maintainers of those packages if aarch64-linux is promoted, and if aarch64-linux has any significant issues in those language ecosystems right now, that should be taken into account in the decision.

@vikanezrimaya
Copy link
Member Author

Given the silence, may I advise to the shepherd team (@samueldr @kloenk @dhess @grahamc) to hold the next meeting sometime soon? On the agenda will probably be polishing the list of critical packages and their maintainers.

I took the liberty to create a when2meet page for it: https://www.when2meet.com/?13805346-cScpg. My availability is slightly uncertain and I will probably adjust if it turns out that people agree on some other time as more preferable.

I will also advertise the RFC in the relevant chatroom to bring slightly more attention to it - I'm afraid some might've forgotten about its existence due to my unfortunate radio silence (sorry!)

@vikanezrimaya
Copy link
Member Author

Oh. Additionally, I've had a brilliant idea and I need to write it down so 1) I don't forget it; 2) so we may discuss it!

How about just before stabilizing, we'll hold a one-off ZHF targeted on the aarch64-linux branch? Sudden inclusion of aarch64 into channel blockers might provoke a huge channel blockage if it turns out that something critical hasn't been working all along (example: a MariaDB update introducing a new dependency broke the build on aarch64-linux, where the dependency doesn't exist: NixOS/nixpkgs#147205). An officially organized ZHF event will help with catching those and help us attract more attention before pulling the trigger on aarch64-linux blocking channel advances.

@vikanezrimaya
Copy link
Member Author

vikanezrimaya commented Dec 8, 2021

To the shepherd team (@samueldr @grahamc @kloenk @dhess): the when2meet page seems to have some good options for a meeting on the Dec 11, 12, 18 and 19, from 4:30PM to 9:00 PM (all times are UTC+0300). Are any of the datetimes mentioned above ok for us? We probably need to pick one for the second meeting (not urgent, just would be nice to know for planning)

@dhess
Copy link

dhess commented Dec 8, 2021

Any of those times work for me.

@edolstra
Copy link
Member

Did the meeting happen?

@dhess
Copy link

dhess commented Jan 12, 2022

Nope.

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/future-of-channels-and-channels-nixos-org-in-a-flakes-world/11563/13

@tomberek
Copy link
Contributor

tomberek commented Mar 9, 2022

Any updates?

@dhess
Copy link

dhess commented Mar 9, 2022

If there's been any progress on this issue, I'm unaware of it.

@vikanezrimaya
Copy link
Member Author

vikanezrimaya commented Mar 11, 2022 via email

@kevincox
Copy link
Contributor

If anyone is willing to take over this RFC please feel free do do so. Especially @grahamc as you are listed as a co-author.

Note that there may be some permissions issues due to GitHub write permissions. If that is the case it may be easier to close this PR and open a new one.

If no one steps up the best option is probably to close this PR for now and it can always be used as a base if someone finds time to drive it.

@dhess
Copy link

dhess commented Mar 23, 2022

I think this RFC is really important, and I'm happy to continue to participate as a shepherd, but unfortunately I don't have the time to drive it.

@edolstra
Copy link
Member

edolstra commented Apr 6, 2022

Hi, we've decided to close this RFC for now. If somebody wants to step up to drive this RFC forward, please let us know and we can reopen it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.