Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC 0087] Promote aarch64-linux to Tier 1 support #87

Closed
wants to merge 16 commits into from
Closed
Changes from 10 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
96 changes: 96 additions & 0 deletions rfcs/0087-aarch64-tier1.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
---
feature: aarch64-tier1
start-date: 2021-03-09
author: Vika Shleina
co-authors: Graham Christensen
shepherd-team: TBD
shepherd-leader: TBD
related-issues: TBD
---

# Summary
[summary]: #summary

Move `aarch64-linux` from a Tier 2 platform to Tier 1, as described in [RFC 0046](/rfcs/0046-platform-support-tiers.md)

# Motivation
[motivation]: #motivation

`aarch64-linux` support in Nixpkgs and NixOS matures over time and becomes
more and more stable, and more devices appear having NixOS on ARM support.
Moving it to a Tier 1 platform will allow us to block release channels on
aarch64-related build failures, making it easier and safer for ARM users
to upgrade their systems, and will help in keeping software versions in
sync between several architectures due to `x86_64-linux` and `aarch64-linux`
builds sharing a channel.

`aarch64-linux` will benefit from increased percieved binary cache coverage
vikanezrimaya marked this conversation as resolved.
Show resolved Hide resolved
as an additional result of channel bumps waiting for aarch64 builds to finish,
saving on build times for end users.

## Prior art
There were prior attempts at the same feat, but they failed due to technical
limitations of Hydra:
- NixOS/nixpkgs@74c4e30 - disabled in 2017 because of memory issues
- NixOS/nixpkgs#52534, NixOS/nixpkgs@36a0c13 - re-enabled in 2018 to pre-build important outputs
- NixOS/nixpkgs@1bfe8f1 - disabled again due to hydra-evaluator issues
vikanezrimaya marked this conversation as resolved.
Show resolved Hide resolved

Since then, hydra-evaluator has been rewritten, which probably will make
these concerns obsolete.

# Detailed design
[design]: #detailed-design

If this RFC is accepted, `aarch64-linux` builds will be added to stable
and unstable channels' `tested` aggregate jobs on Hydra, giving them ability
to block channel advances. Hydra will start building aarch64 packages and run
aarch64-based tests as part of stable and unstable channels, including them in
the binary cache, increasing its coverage as a result.
vikanezrimaya marked this conversation as resolved.
Show resolved Hide resolved

It is possible that the availability of aarch64 builders from Equinix Metal will at times be
reduced, causing delays in aarch64 build capacity. We will extend the
nixos-org-configurations implementation of hydra-provisioner to dynamically allocate
aarch64 builders on AWS during these capacity shortfalls.

# Examples and Interactions
[examples-and-interactions]: #examples-and-interactions

<!-- This section illustrates the detailed design. This section should clarify all
confusion the reader has from the previous sections. It is especially important
to counterbalance the desired terseness of the detailed design; if you feel
your detailed design is rudely short, consider making this section longer
instead. -->

In [nixos/release-combined.nix](https://github.com/NixOS/nixpkgs/blob/master/nixos/release-combined.nix)
`aarch64-linux` will be moved to `supportedSystems`, enabling NixOS tests
to block channel advances in case of failures.

Merging this RFC should happen simultaneously with the merging of documentation
around configuring qemu-binfmt as a fallback method for building aarch64 packages on
x86_64 machines. Additionally, a sub-project that's out-of-scope for this RFC may be
established to catch build failures (of which sightings were reported) when using
emulation.

The list of NixOS AMIs on NixOS.org will also be extended to include aarch64 images.

# Drawbacks
[drawbacks]: #drawbacks

Some build failures could unneccesarily delay channel advances, delaying critical updates.

# Alternatives
[alternatives]: #alternatives

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am of the opinion a separate channel should be the first step. Show there are enough resources and commitment to keep it green for half a year to a year. If that is the case, we can upgrade it to Tier 1. I don't think generally there is enough understanding in how much day-to-day effort goes into actually keeping the channels green. At the same time, I have no idea how few breakage there is with aarch64 nowadays.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added your suggestion to the manuscript.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The channel already blocks on aarch64-linux since late 2018 for the limited support set. So I guess we're ready to upgrade it to Tier 1 since things were kept green for half a year to a year.

It's all about having the jobs being tried to be built. Not about adding new blockers.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth specifying which channel we are talking about here. If you mean the unstable channel: it mostly works. If you mean the stable channel: It has been blocked for a few days when I opened the channel PR and probably nobody noticed yet.

Create an aarch64-focused channel that would build same things current `unstable` does, but for aarch64 only. This has a significant drawback: it is possible for the x86_64 channel and the aarch64 channels to never pass on the same commit, making deployment to a heterogeneous cluster of x86_64 and aarch64 machines very challenging.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't find this argument very persuasive. Track master instead and let your own CI perform tests that are relevant for you. The amount of rebuilding you get on master will typically be small. Same goes for stable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where would one run this CI? Not everyone has a Raspberry Pi or a whole cluster connected to SSDs for fast builds (I don't build on my Raspberry Pi for this reason - the builds are slower than if I'd do it on my laptop via emulation, and Hydra would be simply a lot faster than any laptop in existence, speeding it up further), and those Raspberry Pi users who run just one or two would be part of the target audience of this RFC.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Asking people to run their own CI for what is supposedly still a Linux distribution for users it not a good argument. Many of us that are involved in the project do that for their person stuff. I started the discussion about having the channel because someone that just started out with NixOS (on a RPi) was having issues with lots and lots of rebuilds on their little machine.
Making adoption easier is one of the goals we should be aiming for. Asking for huge leaps (no FP background to full custom Nix CI...) is not a good way.


# Unresolved questions
[unresolved]: #unresolved-questions

Do we have enough machines to handle aarch64 builds without delaying `x86_64-linux` builds?
vikanezrimaya marked this conversation as resolved.
Show resolved Hide resolved

# Future work
[future]: #future-work

Track down build failures when using `boot.binfmt.emulatedSystems` and qemu-binfmt to build
aarch64 packages on `x86_64-linux` machines (e.g. by building a minimal closure fully without
binary caches and emulation).