Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[release-1.27] Backports for 2023-10 release #8615

Merged
merged 21 commits into from
Oct 13, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
044f6b5
Disable HTTP on main etcd client port
brandond Sep 22, 2023
ff3f810
Don't ignore assets in home dir if system assets exist
brandond Sep 26, 2023
fece87a
Pass SystemdCgroup setting through to nvidia runtime options
brandond Sep 27, 2023
e2cdecd
Bump containerd to v1.7.7-k3s1
brandond Oct 12, 2023
ddc7f01
Bump busybox to v1.36.1
brandond Oct 12, 2023
5cc1d41
Add ADR for etcd snapshot CRD migration
brandond Jul 28, 2023
2fe9c4b
Minor updates as per design review discussion
brandond Aug 14, 2023
d72e35b
Add new CRD for etcd snapshots
brandond Sep 8, 2023
b5fa6dd
Move etcd snapshot code into separate file
brandond Sep 28, 2023
ef02cd1
Elide old snapshot data when apiserver rejects configmap with ErrRequ…
brandond Sep 29, 2023
7b960b2
Tidy s3 upload functions
brandond Sep 29, 2023
b3f9969
Consistently set snapshotFile timestamp
brandond Sep 29, 2023
299bf2b
Move s3 snapshot list functionality to s3.go
brandond Sep 30, 2023
d0fac5a
Store extra metadata and cluster ID for snapshots
brandond Oct 2, 2023
865769b
Sort snapshots by time and key in tabwriter output
brandond Oct 10, 2023
75e0491
Move snapshot delete into local/s3 functions
brandond Oct 5, 2023
b7471c9
Switch to managing ETCDSnapshotFile resources
brandond Oct 3, 2023
42bb670
Add server token hash to CR and S3
brandond Oct 10, 2023
825bd81
Fix etcd snapshot integration tests
brandond Oct 10, 2023
7c727f2
Switch build target from main.go to a package. (#8342)
dlorenc Oct 12, 2023
d1ed5c0
Bump traefik, golang.org/x/net, google.golang.org/grpc
brandond Oct 13, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 10 additions & 6 deletions cmd/k3s/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -220,16 +220,20 @@ func getAssetAndDir(dataDir string) (string, string) {
// extract checks for and if necessary unpacks the bindata archive, returning the unique path
// to the extracted bindata asset.
func extract(dataDir string) (string, error) {
// first look for global asset folder so we don't create a HOME version if not needed
_, dir := getAssetAndDir(datadir.DefaultDataDir)
// check if content already exists in requested data-dir
asset, dir := getAssetAndDir(dataDir)
if _, err := os.Stat(filepath.Join(dir, "bin", "k3s")); err == nil {
return dir, nil
}

asset, dir := getAssetAndDir(dataDir)
// check if target content already exists
if _, err := os.Stat(filepath.Join(dir, "bin", "k3s")); err == nil {
return dir, nil
// check if content exists in default path as a fallback, prior
// to extracting. This will prevent re-extracting into the user's home
// dir if the assets already exist in the default path.
if dataDir != datadir.DefaultDataDir {
_, defaultDir := getAssetAndDir(datadir.DefaultDataDir)
if _, err := os.Stat(filepath.Join(defaultDir, "bin", "k3s")); err == nil {
return defaultDir, nil
}
}

// acquire a data directory lock
Expand Down
60 changes: 60 additions & 0 deletions docs/adrs/etcd-snapshot-cr.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
# Store etcd snapshot metadata in a Custom Resource

Date: 2023-07-27

## Status

Accepted

## Context

K3s currently stores a list of etcd snapshots and associated metadata in a ConfigMap. Other downstream
projects and controllers consume the content of this ConfigMap in order to present cluster administrators with
a list of snapshots that can be restored.

On clusters with more than a handful of nodes, and reasonable snapshot intervals and retention periods, the snapshot
list ConfigMap frequently reaches the maximum size allowed by Kubernetes, and fails to store any additional information.
The snapshots are still created, but they cannot be discovered by users or accessed by tools that consume information
from the ConfigMap.

When this occurs, the K3s service log shows errors such as:
```
level=error msg="failed to save local snapshot data to configmap: ConfigMap \"k3s-etcd-snapshots\" is invalid: []: Too long: must have at most 1048576 bytes"
```

A side-effect of this is that snapshot metadata is lost if the ConfigMap cannot be updated, as the list is the only place that it is stored.

Reference:
* https://github.com/rancher/rke2/issues/4495
* https://github.com/k3s-io/k3s/blob/36645e7311e9bdbbf2adb79ecd8bd68556bc86f6/pkg/etcd/etcd.go#L1503-L1516

### Existing Work

Rancher already has a `rke.cattle.io/v1 ETCDSnapshot` Custom Resource that contains the same information after it's been
imported by the management cluster:
* https://github.com/rancher/rancher/blob/027246f77f03b82660dc2e91df6bf2cd549163f0/pkg/apis/rke.cattle.io/v1/etcd.go#L48-L74

It is unlikely that we would want to use this custom resource in its current package; we may be able to negotiate moving
it into a neutral project for use by both projects.

## Decision

1. Instead of populating snapshots into a ConfigMap using the JSON serialization of the private `snapshotFile` type, K3s
will manage creation of an new Custom Resource Definition with similar fields.
2. Metadata on each snapshot will be stored in a distinct Custom Resource.
3. The new Custom Resource will be cluster-scoped, as etcd and its snapshots are a cluster-level resource.
4. Snapshot metadata will also be written alongside snapshot files created on disk and/or uploaded to S3. The metadata
files will have the same basename as their corresponding snapshot file.
5. A hash of the server token will be stored as an annotation on the Custom Resource, and stored as metadata on snapshots uploaded to S3.
This hash should be compared to a current etcd snapshot's token hash to determine if the server token must be rolled back as part of the
snapshot restore process.
6. Downstream consumers of etcd snapshot lists will migrate to watching Custom Resource types, instead of the ConfigMap.
7. K3s will observe a three minor version transition period, where both the new Custom Resources, and the existing
ConfigMap, will both be used.
8. During the transition period, older snapshot metadata may be removed from the ConfigMap while those snapshots still
exist and are referenced by new Custom Resources, if the ConfigMap exceeds a preset size or key count limit.

## Consequences

* Snapshot metadata will no longer be lost when the number of snapshots exceeds what can be stored in the ConfigMap.
* There will be some additional complexity in managing the new Custom Resource, and working with other projects to migrate to using it.
37 changes: 19 additions & 18 deletions go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ replace (
github.com/Microsoft/hcsshim => github.com/Microsoft/hcsshim v0.11.0
github.com/Mirantis/cri-dockerd => github.com/k3s-io/cri-dockerd v0.3.4-k3s1 // k3s/release-1.27
github.com/cloudnativelabs/kube-router/v2 => github.com/k3s-io/kube-router/v2 v2.0.0-20230925161250-364f994b140b
github.com/containerd/containerd => github.com/k3s-io/containerd v1.7.6-k3s1.27
github.com/containerd/containerd => github.com/k3s-io/containerd v1.7.7-k3s1.27
github.com/coreos/go-systemd => github.com/coreos/go-systemd v0.0.0-20190321100706-95778dfbb74e
github.com/docker/distribution => github.com/docker/distribution v2.8.2+incompatible
github.com/docker/docker => github.com/docker/docker v24.0.0-rc.2.0.20230801142700-69c9adb7d386+incompatible
Expand Down Expand Up @@ -43,11 +43,11 @@ replace (
go.opentelemetry.io/otel/trace => go.opentelemetry.io/otel/trace v1.13.0
go.opentelemetry.io/proto/otlp => go.opentelemetry.io/proto/otlp v0.19.0
golang.org/x/crypto => golang.org/x/crypto v0.1.0
golang.org/x/net => golang.org/x/net v0.8.0
golang.org/x/net => golang.org/x/net v0.17.0
golang.org/x/sys => golang.org/x/sys v0.6.0
google.golang.org/api => google.golang.org/api v0.60.0
google.golang.org/genproto => google.golang.org/genproto v0.0.0-20220502173005-c8bf987b8c21
google.golang.org/grpc => google.golang.org/grpc v1.51.0
google.golang.org/grpc => google.golang.org/grpc v1.58.3
gopkg.in/square/go-jose.v2 => gopkg.in/square/go-jose.v2 v2.6.0
k8s.io/api => github.com/k3s-io/kubernetes/staging/src/k8s.io/api v1.27.6-k3s1
k8s.io/apiextensions-apiserver => github.com/k3s-io/kubernetes/staging/src/k8s.io/apiextensions-apiserver v1.27.6-k3s1
Expand Down Expand Up @@ -141,10 +141,10 @@ require (
go.etcd.io/etcd/etcdutl/v3 v3.5.9
go.etcd.io/etcd/server/v3 v3.5.9
go.uber.org/zap v1.24.0
golang.org/x/crypto v0.11.0
golang.org/x/crypto v0.14.0
golang.org/x/net v0.14.0
golang.org/x/sync v0.3.0
golang.org/x/sys v0.11.0
golang.org/x/sys v0.13.0
google.golang.org/grpc v1.57.0
gopkg.in/yaml.v2 v2.4.0
inet.af/tcpproxy v0.0.0-20200125044825-b6bb9b5b8252
Expand All @@ -165,7 +165,7 @@ require (
)

require (
cloud.google.com/go/compute v1.18.0 // indirect
cloud.google.com/go/compute v1.21.0 // indirect
cloud.google.com/go/compute/metadata v0.2.3 // indirect
github.com/AdaLogics/go-fuzz-headers v0.0.0-20230811130428-ced1acdcaa24 // indirect
github.com/AdamKorcz/go-118-fuzz-build v0.0.0-20230306123547-8075edf89bb0 // indirect
Expand All @@ -184,7 +184,7 @@ require (
github.com/JeffAshton/win_pdh v0.0.0-20161109143554-76bb4ee9f0ab // indirect
github.com/MakeNowJust/heredoc v1.0.0 // indirect
github.com/Microsoft/go-winio v0.6.1 // indirect
github.com/Microsoft/hcsshim v0.11.0 // indirect
github.com/Microsoft/hcsshim v0.11.1 // indirect
github.com/NYTimes/gziphandler v1.1.1 // indirect
github.com/Rican7/retry v0.1.0 // indirect
github.com/antlr/antlr4/runtime/Go/antlr v1.4.10 // indirect
Expand All @@ -211,7 +211,8 @@ require (
github.com/containerd/go-cni v1.1.9 // indirect
github.com/containerd/go-runc v1.0.0 // indirect
github.com/containerd/imgcrypt v1.1.7 // indirect
github.com/containerd/nri v0.3.0 // indirect
github.com/containerd/log v0.1.0 // indirect
github.com/containerd/nri v0.4.0 // indirect
github.com/containerd/stargz-snapshotter/estargz v0.14.3 // indirect
github.com/containerd/ttrpc v1.2.2 // indirect
github.com/containerd/typeurl v1.0.2 // indirect
Expand Down Expand Up @@ -267,12 +268,12 @@ require (
github.com/google/gofuzz v1.2.0 // indirect
github.com/google/pprof v0.0.0-20230323073829-e72429f035bd // indirect
github.com/google/shlex v0.0.0-20191202100458-e7afc7fbc510 // indirect
github.com/googleapis/gax-go/v2 v2.7.0 // indirect
github.com/googleapis/gax-go/v2 v2.11.0 // indirect
github.com/gregjones/httpcache v0.0.0-20180305231024-9cad4c3443a7 // indirect
github.com/grpc-ecosystem/go-grpc-middleware v1.3.0 // indirect
github.com/grpc-ecosystem/go-grpc-prometheus v1.2.0 // indirect
github.com/grpc-ecosystem/grpc-gateway v1.16.0 // indirect
github.com/grpc-ecosystem/grpc-gateway/v2 v2.7.0 // indirect
github.com/grpc-ecosystem/grpc-gateway/v2 v2.11.3 // indirect
github.com/hanwen/go-fuse/v2 v2.3.0 // indirect
github.com/hashicorp/errwrap v1.1.0 // indirect
github.com/hashicorp/go-cleanhttp v0.5.2 // indirect
Expand Down Expand Up @@ -340,7 +341,7 @@ require (
github.com/pmezard/go-difflib v1.0.0 // indirect
github.com/pquerna/cachecontrol v0.1.0 // indirect
github.com/prometheus/client_golang v1.16.0 // indirect
github.com/prometheus/client_model v0.3.0 // indirect
github.com/prometheus/client_model v0.4.0 // indirect
github.com/prometheus/common v0.42.0 // indirect
github.com/prometheus/procfs v0.10.1 // indirect
github.com/rs/xid v1.5.0 // indirect
Expand Down Expand Up @@ -382,17 +383,17 @@ require (
go.starlark.net v0.0.0-20200306205701-8dd3e2ee1dd5 // indirect
go.uber.org/atomic v1.10.0 // indirect
go.uber.org/multierr v1.9.0 // indirect
golang.org/x/mod v0.10.0 // indirect
golang.org/x/oauth2 v0.8.0 // indirect
golang.org/x/term v0.11.0 // indirect
golang.org/x/text v0.12.0 // indirect
golang.org/x/mod v0.11.0 // indirect
golang.org/x/oauth2 v0.10.0 // indirect
golang.org/x/term v0.13.0 // indirect
golang.org/x/text v0.13.0 // indirect
golang.org/x/time v0.3.0 // indirect
golang.org/x/tools v0.9.3 // indirect
golang.org/x/tools v0.10.0 // indirect
golang.zx2c4.com/wireguard v0.0.0-20230325221338-052af4a8072b // indirect
golang.zx2c4.com/wireguard/wgctrl v0.0.0-20230429144221-925a1e7659e6 // indirect
google.golang.org/api v0.108.0 // indirect
google.golang.org/api v0.126.0 // indirect
google.golang.org/appengine v1.6.7 // indirect
google.golang.org/genproto v0.0.0-20230526161137-0005af68ea54 // indirect
google.golang.org/genproto v0.0.0-20230711160842-782d3b101e98 // indirect
google.golang.org/protobuf v1.31.0 // indirect
gopkg.in/gcfg.v1 v1.2.0 // indirect
gopkg.in/inf.v0 v0.9.1 // indirect
Expand Down
Loading