Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

k3s: add packaging README regarding release versioning #224483

Merged
merged 1 commit into from
May 22, 2023

Conversation

euank
Copy link
Member

@euank euank commented Apr 3, 2023

Description of changes

This adds a k3s packaging README, and starts it off with a section on how I think we can reason about k3s versions.

This strategy is pretty much entirely copied from what @mweinelt described over here: #222604 (comment)

It's copied because I think it's a good idea, I should add.

Fortunately, k8s upstream switched to supporting releases for a year (as of a while ago), which should let us keep NixOS release users on supported k8s/k3s versions.


However, there's one thing I'm very much not sure about:

This document implicitly calls for removing versions of k3s from release branches when we cut a release.

However, I do not see a good place in the release checklist to make sure that bit actually happens.

I think the closest we have is the checklist in "branch off" here: https://github.com/NixOS/release-wiki/blob/d6d9e732904c8e002c65e333b5e4e5c956064596/src/Branch-Off.md

However, I'd feel kinda weird putting k3s in that checklist. Is there a better place to do this? Is there some way to make sure the k3s maintainers get pinged during the release process to verify there's only 1 version, it's the newest, and it will definitely have an appropriate duration of support?

I think even without that, this document is useful to have, but this issue seems like it's also a totally reasonable place to discuss the mechanics of that bit. I'd appreciate any insight!

@euank euank changed the title K3s readme k3s: add packaging README regarding release versioning Apr 3, 2023
@RaitoBezarius
Copy link
Member

Such documentation should probably end up in the NixOS manual, see prior art on Nextcloud, Garage, etc.

@RaitoBezarius
Copy link
Member

This document implicitly calls for removing versions of k3s from release branches when we cut a release.

Probably in "pre-release cleanup": https://nixos.github.io/release-wiki/Feature-Freeze-Announcement.html

@RaitoBezarius
Copy link
Member

Also, probably, you should build a @NixOS/k3s or be part of the @NixOS/kubernetes team to be pinged during feature freeze.

release is cut, since it goes EOL before the NixOS 23.11 release is made, we would
not want to include it. Similarly, k3s 1.25 would go EOL before NixOS 23.11.

As such, we should only include k3s 1.26 in the 23.05 release.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm k3s maintainer. I opened the original issue about the need of having several versions of k3s in parallel during the lifecycle of a release. In #213943 (comment) you have all the details.

As you can see there, the very 1st requirement for doing a properly supported k8s upgrade is:

Upgrade your server nodes to the latest patch version available. One node at a time.

We need to keep the different versions alive to be able to comply with this very 1st requirement.

For example, if we drop k3s 1.25.8+k3s1 before releasing nixos 22.05 and then, one month later, there's a new k3s 1.25.9+k3s1 release, users would be unable upgrade unless:

  1. They upgrade ignoring the recommended supported upstream procedure.
  2. They package 1.25 themselves.

None of these is nice, so we should keep the versions around.

FWIW, sometimes one can't upgrade k8s directly, not because the release isn't yet out, but because you have some operator that still doesn't support the new version. Upgrading a cluster is a very delicate operation, so in this case I think we should just have available versions for all the upstream supported versions, as long as they're supported.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are shuffling around in which NixOS release we keep which k3s release, to try to overlap the supported releases with the NixOS support cycle.

Can you point out where this plan comes out short?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought that's what I just did 😅. I guess I didn't explain myself finely... it'll be better with an example.

A real-world example: dealing with Rancher

Let's say it's 2023-05-01. NixOS 23.05 is released. My servers are using NixOS 22.11 with k3s-v1.24.10+k3s1. I want to upgrade them.

Why am I using that K3s version instead of 1.25 or 1.26? Because I use Rancher 2.7.1 on that cluster. And, according to Rancher's support matrix, the highest supported k3s version by Rancher 2.7.1 is 1.24:

image

Also because 1.24 is still a supported k8s and k3s release until 2023-07-28. So, everything is supported if I stay on 1.24. If I update to 1.25, Rancher is not supported. Thus, I stay on 1.24.

When will Rancher support 1.25? According to rancher/rancher#38701, quite soon in the 2.7.2 release. But I still don't know when that'll be released.

What am I to expect from NixOS? Well, I expect it still has K3s 1.24 releases available, because that's still supported upstream. Let's say NixOS is nice to me and does that. I upgrade my servers to NixOS 23.05 but keep K3s running on the 1.24 derivation.

Time goes by, a couple of weeks pass, and we're at 2023-05-15. It turns out Rancher 2.7.2 got released. It supports k3s 1.25. Cool! Let's upgrade. I install Rancher through the helm chart, so it has nothing to do with NixOS. Let's say I do that and it upgrades without problems.

Ok, time to upgrade my cluster! How? Following #213943 (comment). As explained above, step number 1 is to upgrade the cluster to the latest patch release of the minor release I'm currently using. Which one is it? k3s v1.24.12+k3s1 is already available (although in this future scenario, it could be something even newer).

Since I maintain K3s and just noticed it's some versions behind upstream, I open a PR to nixpkgs, we merge that, I update my servers, and get the latest patch release for 1.24 (which BTW includes CVE fixes). The task is done: I'm on the latest K3s 1.4.x release. 🏆

Now I must update to K3s 1.25.x on the most updated patch version. Let's take a look. Currently, on NixOS that's 1.25.3+k3s1; but upstream is on v1.25.8+k3s1 already. Just like before, I update it on nixpkgs before proceeding to the next step.

The next step is a bit more delicate. I have to upgrade my cluster to 1.25 by order (servers first, one by one; then workers in no particular order). K3s 1.25.8 is already on nixpkgs, so I upgrade my servers doing that process. Cool! , finished!

Now, should I take this chance to update to 1.26? Well, I'll have to start over again:

  1. Make sure Rancher supports k3s 1.26
  2. If so, make sure NixOS is on the latest k3s 1.25.x and 1.26.x releases (This time I have more chances to get a "yes" because there's an automated update script).
  3. When that's done, Do the update.

How does the example matter to NixOS?

The example shows that upgrading K3s for production is complex and delicate. It also shows that a sysadmin can still need to stick to lower-but-still-supported releases for a while because of good reasons.

If nixpkgs drops support for K3s < 1.26 while upstream still supports them, then the required step of upgrading to the newest patch release of the minor release you're currently running can't be done (with official packages).

NixOS users should be able to predict k3s support based on the upstream calendar, because the other in-cluster tools that they are using use that calendar, not NixOS'.

My proposal

So IMHO, to make NixOS the best OS for running k3s, it should:

  • Provide one K3s package per minor version supported upstream at the date of the launch of NixOS.
  • Automated update scripts for all those minor versions, so our dear update bot makes sure they always match the latest patch version from upstream. This is done already for 1.26, so we can stick with manual updates for prior versions and just care for this on >= 1.26 if you want.
  • Once upstream drops support of a minor version, NixOS does too. But not before.

Copy link
Member

@mweinelt mweinelt Apr 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As such, we should only include k3s 1.26 in the 23.05 release.

I think this is what got you spooked. The idea going forward is:

  • Include only the latest release in a new NixOS release
    • so that k3s versions don't go EOL during a NixOS release support lifecycle
  • Backport all minor releases into the previous release, except for the latest
    • so that the oldest k3s release in the previous NixOS release is just one minor release before the one in the new NixOS release
  • Backport all newer releases of k3s into the new release
    • until a k3s release support covers the full release cycle of NixOS n+1

Basically we're flipping the order how things are done. Instead of stuffing the new NixOS release with end of life releases, that the user needs to pass through for updates, we instead provide an update path on the previous NixOS release.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, do you mean that the new plan would be this?

  • NixOS 23.05:
    • Released with k3s 1.26
    • Gets k3s 1.27 and 1.28 when they're published.
  • NixOS 22.11:
    • Has only k3s 1.24-1.25
    • Will keep getting patches for 1.24.x and 1.25.x for the whole lifecycle of NixOS 23.05. (Pay attention here because I don't think this will be true).

That last bold point is the pain point for me. According to https://endoflife.date/nixos, NixOS 22.11 will EOL on 2023-06-30. So NixOS will go EOL before k3s 1.25 goes EOL.

So, does that mean that by the time I can upgrade the cluster to 1.26 (if Rancher takes more than NixOS to upgrade support) I won't have an upgrade path? 😵

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand what you're saying, @yajo, and I think it's sorta a real concern, but I don't think NixOS actually wants to support it.

The observation, if I understand it correctly, is that the following points conflict:

  1. Old unsupported NixOS releases will not receive updates (naturally)
  2. K8s requires you to update to the latest patch release before doing a major release
  3. Therefore, updating from an unsupported NixOS release to a newer one will be unsupported if there are any k8s patch releases

That seems true, but I also think that you can only encounter that issue if you're running an unsupported NixOS release. It seems totally expected that an unsupported NixOS release isn't supported, which I think sums up the issue there.

Said another way, having a "correct" path to upgrade is a moving target, and while the plan described in this document makes it so we hit that target while NixOS releases are still supported, upstream changes may make it so we no longer meet that target.

In your example, if you updated to NixOS 23.05 / k3s 1.26 before 22.11 went out of support, you would remain in a supported configuration by NixOS and k3s/k8s the whole time.
I think that's totally fine. Stay on supported NixOS releases, and things can work, stay on an unsupported release, and you're now in an unsupported path (that still probably works! it was supported in the past!).

Which brings us to the other point you're discussing - Rancher.
It seems like Rancher's support matrix lags behind quite a bit.

I think the actionable thing you're requesting here is to update the policy from "NixOS's supported releases attempt to have the latest k3s release when it is cut, ensuring it is supported for the NixOS release lifecycle" to "NixOS's supported releases have the latest k3s release and a k3s release supported by Rancher".

I think if we change to that statement, the rest naturally falls out of that correctly.

That said, I personally don't want to support older k3s versions. I don't use Rancher, and their support matrix and updates seem to be at a pace which doesn't really align that well with NixOS's release lifecycle, so I'm wary it's not a great fit.

Is there some factor that makes tying our supported versions to rancher's slower support matrix compelling?
Do I understand the issue you're seeing here correctly?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems true, but I also think that you can only encounter that issue if you're running an unsupported NixOS release. It seems totally expected that an unsupported NixOS release isn't supported, which I think sums up the issue there.

I mean, if someone wants to run unsupported NixOS release and get backports of patch releases for k3s, it's not really hard to do it (I would even go to fairly trivial in my experience), but you have to do it yourself or pay someone to do it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, this is unexpected and confusing, TBH.

I just used Rancher as an example. But I have a mix of operators, apps and custom deployments running in K8s where each one of them evolves at a different pace. I just picked the 1st that would mean a problem. Rancher, in this case. But it is quite easy to see that any of them can be a problem because of this choice on NixOS side.

It is expected that all of them support the currently-supported k8s versions. But we can't expect all of them to support the latest k8s version at the date of launch of the latest NixOS version. Even less when there are 2 NixOS releases per year and 3 k8s releases per year. There'll always be some drift.

With the proposed "solution", you force NixOS users to choose between:

  1. Running on an unsupported K8s version.
  2. Running on an unsupported NixOS version.
  3. Upgrading using an unsupported process.
  4. Running apps / operators on unsupported platforms.

Not a very pleasant choice to make.

There are other cases where NixOS has various supported versions of the same app. You can use python37, python38, python39, python310 and python311 only on NixOS 22.11. Ain't that the magic of NixOS? Why can't we just do the same for k3s?

Copy link
Member Author

@euank euank May 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for the slow response!

I guess I don't really know the best thing to do here. I agree that other apps/operators can lag behind some amount, which can make it more difficult to upgrade promptly.

Basically, I think the options we have on the NixOS side are:

  1. For each release, maintain the maximum number of versions of k3s that will be supported throughout the whole release
  2. For each release, maintain the minimum number that will allow a safe upgrade path
  3. For each release, maintain all k3s versions that were supported when the release was cut, even if they may go EOL during it.

I'm arguing for 2 because it's less maintenance work, and because in practice I haven't run into the issues you speak of. Everything I use has worked "fine" when upgrading, even if I upgrade before they announce official support or such. The k8s project's backwards compatibility story means that's supposed to typically be the case.

I believe you're arguing for 3, right?

@euank
Copy link
Member Author

euank commented Apr 14, 2023

Thank you for the comments earlier @RaitoBezarius.

I see prior art for these documents in package-specific directories too (i.e. chromium and so on).

I think it makes sense to describe the intended support and upgrade path in the manual, and have a separate README with any nixos-maintainer specific stuff. I'll update it in-line with that.

Probably in "pre-release cleanup":

Thanks! I'll add a bullet point to that that's k3s specific. It feels a little weird, but perhaps other proejcts will join in on the "trim all old releases for X" bullet point list eventually.

I'll make that PR after this PR is merged so I can be sure we're all in agreement.

Also, probably, you should build a @NixOS/k3s or be part of the https://github.com/orgs/NixOS/teams/kubernetes team to be pinged during feature freeze.

I'm happy with either approach, though I don't have permissions to change membership afaik.
Is there good prior art here, or is it all fairly ad-hoc?
I'm happy with whichever makes the most sense, but I don't know enough to know which that is.

@zowoq
Copy link
Contributor

zowoq commented Apr 14, 2023

Also, probably, you should build a @NixOS/k3s or be part of the https://github.com/orgs/NixOS/teams/kubernetes team to be pinged during feature freeze.

I'm happy with either approach, though I don't have permissions to change membership afaik. Is there good prior art here, or is it all fairly ad-hoc? I'm happy with whichever makes the most sense, but I don't know enough to know which that is.

There really isn't any overlap between the k3s and kubernetes packaging and modules so it should be a separate team.

@RaitoBezarius
Copy link
Member

Thank you for the comments earlier @RaitoBezarius.

I see prior art for these documents in package-specific directories too (i.e. chromium and so on).

I think it makes sense to describe the intended support and upgrade path in the manual, and have a separate README with any nixos-maintainer specific stuff. I'll update it in-line with that.

Probably in "pre-release cleanup":

Thanks! I'll add a bullet point to that that's k3s specific. It feels a little weird, but perhaps other proejcts will join in on the "trim all old releases for X" bullet point list eventually.

Joining is very much welcomed because RMs have to be aware of all packages that can be in scope, and it's not always easy.
Ideally, I would prefer to have an automation with endoflife.date and our packages and triage some of them manually wrt to upgrade paths.

I'll make that PR after this PR is merged so I can be sure we're all in agreement.

Also, probably, you should build a @NixOS/k3s or be part of the https://github.com/orgs/NixOS/teams/kubernetes team to be pinged during feature freeze.

I'm happy with either approach, though I don't have permissions to change membership afaik. Is there good prior art here, or is it all fairly ad-hoc? I'm happy with whichever makes the most sense, but I don't know enough to know which that is.

As per zowoq's last message, I just created https://github.com/orgs/NixOS/teams/k3s with you as a maintainer, you can add new team members. :)

@RaitoBezarius
Copy link
Member

What is the state here? We are close to branch-off.

@euank
Copy link
Member Author

euank commented May 20, 2023

Apologies for not pushing forward this issue for a bit! Life's been busy.

It doesn't feel like we have consensus on the general approach to take here, per the discussion thread above.

For this release specifically, I think our two options are basically:

  1. 23.05 with k3s 1.26, backport 1.27 to it once it's packaged
  2. 23.05 with k3s 1.25 + 1.26, backport 1.27 once it's packaged.

The argument against 2 is that 1.25 will go EOL in october, before NixOS 23.05 is EOL.

The argument for 2 is that some parts of the k8s ecosystem may require 1.25 still (and it is still supported, even though it EOLs before the nixos release), and so if you want both updates for k3s and a supported nixos version, that's the only solution.

@yajo brought up rancher as an example of software that still only supports 1.25.

I don't think software lagging that far behind on k8s versions is common though, and I haven't personally had issues requiring me to stick on an old version within 6 months of its EOL date yet.

My personal preference remains to do option 1 above. I personally don't think we want to ship software that will go EOL as part of a NixOS release branch, and if someone wants to maintain 1.25 on nixpkgs-unstable right up until it goes EOL, I think that might be a better compromise.

@yajo does the above work for you, and seem like a reasonable path forward for this release?

@yajo
Copy link
Contributor

yajo commented May 22, 2023

My personal preference remains to do option 1 above. I personally don't think we want to ship software that will go EOL as part of a NixOS release branch, and if someone wants to maintain 1.25 on nixpkgs-unstable right up until it goes EOL, I think that might be a better compromise.

Seems reasonable.

@RaitoBezarius
Copy link
Member

@euank If we are good to go, let me know, I can merge this and we can follow up with whatever is needed.

@euank
Copy link
Member Author

euank commented May 22, 2023

Thanks for the followup @yajo and @RaitoBezarius!

I think the main pending things for this are:

  1. Getting the user-important details of the release policy in the manual (basically "Here's how to upgrade k3s + nixos stable releases in a supported way")
  2. Updating the release-wiki to call out a ping to the k3s team / trimming of versions.

I think both of those can be done after merging this, so I think merging this still makes sense.

@RaitoBezarius RaitoBezarius merged commit e215adf into NixOS:master May 22, 2023
@RaitoBezarius
Copy link
Member

Note that branch-off is scheduled for today, probably in CEST evening.

@euank euank mentioned this pull request May 23, 2023
12 tasks
euank added a commit to euank/nixpkgs that referenced this pull request May 23, 2023
In-line with the policy described
[here](https://github.com/NixOS/nixpkgs/blob/30b82a186bc585872624a298a5169d1d237ce6a4/pkgs/applications/networking/cluster/k3s/README.md#versions-in-nixos-releases)
(xref NixOS#224483), drop versions of k3s that will not be supported for the
full duration of the NixOS release.

Since 22.11 has k3s 1.25, that means we must have k3s 1.26 at least.

Both k3s 1.24 and 1.25 will lose support before the 23.11 nixos release
goes out of support, so we should drop them. Respectively, 1.24 loses
support in July 2023, and 1.25 loses support in October 2023. NixOS is
supported through December 2023.
euank added a commit to euank/nixpkgs that referenced this pull request May 23, 2023
In-line with the policy described
[here](https://github.com/NixOS/nixpkgs/blob/30b82a186bc585872624a298a5169d1d237ce6a4/pkgs/applications/networking/cluster/k3s/README.md#versions-in-nixos-releases)
(xref NixOS#224483), drop versions of k3s that will not be supported for the
full duration of the NixOS release.

Since 22.11 has k3s 1.25, that means we must have k3s 1.26 at least.

Both k3s 1.24 and 1.25 will lose support before the 23.11 nixos release
goes out of support, so we should drop them. Respectively, 1.24 loses
support in July 2023, and 1.25 loses support in October 2023. NixOS is
supported through December 2023.
teggotic pushed a commit to teggotic/nixpkgs that referenced this pull request Sep 17, 2023
In-line with the policy described
[here](https://github.com/NixOS/nixpkgs/blob/30b82a186bc585872624a298a5169d1d237ce6a4/pkgs/applications/networking/cluster/k3s/README.md#versions-in-nixos-releases)
(xref NixOS#224483), drop versions of k3s that will not be supported for the
full duration of the NixOS release.

Since 22.11 has k3s 1.25, that means we must have k3s 1.26 at least.

Both k3s 1.24 and 1.25 will lose support before the 23.11 nixos release
goes out of support, so we should drop them. Respectively, 1.24 loses
support in July 2023, and 1.25 loses support in October 2023. NixOS is
supported through December 2023.
teggotic pushed a commit to teggotic/nixpkgs that referenced this pull request Sep 17, 2023
In-line with the policy described
[here](https://github.com/NixOS/nixpkgs/blob/30b82a186bc585872624a298a5169d1d237ce6a4/pkgs/applications/networking/cluster/k3s/README.md#versions-in-nixos-releases)
(xref NixOS#224483), drop versions of k3s that will not be supported for the
full duration of the NixOS release.

Since 22.11 has k3s 1.25, that means we must have k3s 1.26 at least.

Both k3s 1.24 and 1.25 will lose support before the 23.11 nixos release
goes out of support, so we should drop them. Respectively, 1.24 loses
support in July 2023, and 1.25 loses support in October 2023. NixOS is
supported through December 2023.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants