Skip to content
This repository has been archived by the owner on Jul 1, 2023. It is now read-only.

etcd3 #286

Merged
merged 68 commits into from
May 23, 2018
Merged

etcd3 #286

Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
68 commits
Select commit Hold shift + click to select a range
bca28fa
port etcd3 branch to latest master
Apr 10, 2018
42bc200
dep ensure
Apr 10, 2018
02f65e1
dep update etcd, grpc, and use go 1.10.1 for builds
Apr 10, 2018
5925c83
use systemctl instead of dbus which appears to be unreliable
Apr 13, 2018
ecbd282
use systemctl instead of dbus which appears to be unreliable
Apr 13, 2018
f8b1e69
ignore error when trying to restart service
Apr 13, 2018
372b7d3
ignore error when trying to restart service
Apr 13, 2018
c6d664c
have etcd-upgrade service closer to full etcd configuration
Apr 13, 2018
0052cd4
re-organize upgrade steps to make it easier to call from gravity
Apr 13, 2018
95721e4
have wait-for-etcd.sh optionally take a parameter for the etcd url
Apr 16, 2018
1aa0668
run etcd based on symlink and not env variable
Apr 16, 2018
b900c10
run execstartpre as root
Apr 16, 2018
d7ad830
Set systemctl to not block on operations
Apr 16, 2018
0b805d1
use TLS when connecting to the upgrade service
Apr 16, 2018
473d527
have apiserver select backend based on variable
Apr 17, 2018
515ce4b
use systemd-dropin for setting up etcd gateway during upgrade
Apr 18, 2018
dd30793
actually add etcd gateway
Apr 18, 2018
b8a1134
etcd gateway can only bind to a single port
Apr 18, 2018
2f5fa44
consistently use port 2379 for etcd
Apr 18, 2018
62c9592
set execute bit on etcdctl3 script
Apr 19, 2018
a722b5c
unify master/proxy upgrade process, as proxy doesn't need upgrade
Apr 24, 2018
83e86a0
Don't start etcd-upgrade service on proxy nodes
Apr 25, 2018
5115a8b
Use latest etcd on first run
Apr 25, 2018
4c553a8
create etcd directory if it doesn't exist
Apr 25, 2018
738582b
create etcd directory if it doesn't exist
Apr 25, 2018
ee01273
Use planet to initialize etcd version
Apr 26, 2018
a84ea72
only write dropin symlink if it doesn't already exist
Apr 26, 2018
6bfe435
fix symlink for systemd droping on repeat starts
Apr 26, 2018
8fa291f
store version state on persisteant etcd volume
Apr 27, 2018
5262773
update etcd environment variables
Apr 27, 2018
62cbd53
refactor etcd build to be easier to maintain
Apr 27, 2018
aae1f8d
fix: assumed etcd version to match etcd versioning
Apr 27, 2018
bd48451
revert etcd variable renaming, as they are needed by etcd2 cli
Apr 27, 2018
c24e484
test embedding verison information into orbit.manifest.json
Apr 30, 2018
9016853
test embedding verison information into orbit.manifest.json
May 1, 2018
023505b
revendor to include etcd backup/restore for v3 data
May 2, 2018
adb3a7a
test etcd3 -> etcd3 upgrade
May 2, 2018
1d6c3a7
revendor to include etcd backup/restore for v3 data
May 3, 2018
c9e3231
wait when shutting down etcd
May 3, 2018
1dee8f8
wait when shutting down etcd
May 3, 2018
eb4972b
add etcd rollback support
May 4, 2018
b730aa3
fix promotion logic to work with new upgrade
May 4, 2018
a234c6c
fix promotion logic to work with new upgrade
May 4, 2018
3fb3817
fix promotion logic to work with new upgrade
May 4, 2018
00e4045
trigger systemd daemon-reload during etcd promotion
May 4, 2018
092d61f
restore --no-block behaviour on systemctl commands
May 4, 2018
7cdb6b1
Merge remote-tracking branch 'origin/master' into kevin/etcd3-2
May 6, 2018
713d441
write etcd latest version to orbit manifest
May 7, 2018
8c80eef
cleanup
May 7, 2018
925dc17
debug planet build error
May 7, 2018
bec3770
revert makefil debug
May 7, 2018
7d97789
start addressing review comments
May 9, 2018
3135aa9
adjust etcd upgrade process to be resilient against repeated runs
May 9, 2018
0799cc3
fix assumed etcd version during init; and path selection during upgrade
May 9, 2018
ca0fd82
clear etcd latest symlink, so chown -r doesn't follow bad path outsid…
May 10, 2018
a6c5f4d
ignore service in failed status
May 10, 2018
c9e8832
ensure upgrade service also use new data directory path
May 10, 2018
b7f5c4d
set etcd directory ownership during init
May 10, 2018
e6834af
use env variable to locate etcd data directory
May 11, 2018
3478ef6
also add environment to etcd-upgrade service
May 11, 2018
8f0168a
use octal permissions
May 11, 2018
f607dc9
fix unit tests
May 11, 2018
1159a90
address review comments
May 11, 2018
a3c669e
don't write etcd version env variable when rolling back to 2.3.8
May 11, 2018
8196fb6
testing, set latest version to 3.3.3
May 11, 2018
182dff9
use notfound error when failing to get etcd version
May 11, 2018
5a36748
revert testing etcd versions
May 23, 2018
2634907
address review comments
May 23, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
51 changes: 42 additions & 9 deletions Gopkg.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

12 changes: 11 additions & 1 deletion Gopkg.toml
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,10 @@ ignored = ["github.com/Sirupsen/logrus"]
go-tests = true
unused-packages = true

[[override]]
name = "github.com/coreos/go-systemd"
version = "v16"

[[override]]
name = "bitbucket.org/ww/goautoneg"
# Bitbucket doesn't support git protocol, which causes dep to timeout trying to connect
Expand All @@ -36,13 +40,19 @@ ignored = ["github.com/Sirupsen/logrus"]
name = "github.com/hashicorp/serf"
revision = "11bb88abf7b17f0b794b51416a9107d781e95f35"

[[override]]
# target codec version from etcd, since dep doesn't seem to grab it correctly
# https://github.com/coreos/etcd/issues/8715#issuecomment-377013707
name = "github.com/ugorji/go"
version = "v1.1.1"

[[constraint]]
name = "github.com/cenkalti/backoff"
version = "2.0.0"

[[constraint]]
name = "github.com/coreos/etcd"
version = "3.2.4"
version = "3.3.3"

[[constraint]]
name = "github.com/docker/docker"
Expand Down
7 changes: 6 additions & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -48,9 +48,14 @@ SECCOMP_VER := 2.3.1-2.1
DOCKER_VER := 17.03.2
# we currently use our own flannel fork: gravitational/flannel
FLANNEL_VER := master
ETCD_VER := v2.3.8
HELM_VER := v2.8.1

# ETCD Versions to include in the release
# This list needs to include every version of etcd that we can upgrade from + latest
ETCD_VER := v2.3.8 v3.3.4
# This is the version of etcd we should upgrade to (from the version list)
ETCD_LATEST_VER := v3.3.4
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd swap the variable definitions to make it less error-prone:

# This is the version of etcd we should upgrade to (from the version list)
ETCD_LATEST_VER := v3.3.4
# ETCD Versions to include in the release
# This list needs to include every version of etcd that we can upgrade from + latest
ETCD_VER := v2.3.8 $(ETCD_LATEST_VER)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I would embed $(ETCD_LATEST_VER) in ETCD_VER, to me it adds some amount of risks, that if someone say bumps ETCD to v3.3.5, that they may forget to add it to the ETCD_VER list, everything would work but upgrades from that version may not.

The way it is now, if someone bumps ETCD_LATEST_VER to v3.3.5, but forgets to update ETCD_VER, the new release will clearly not work.


PUBLIC_IP := 127.0.0.1
export
PLANET_PACKAGE_PATH=$(PWD)
Expand Down
2 changes: 1 addition & 1 deletion build.assets/docker/buildbox.dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ FROM planet/base
ENV GOPATH /gopath
ENV GOROOT /opt/go
ENV PATH $PATH:$GOPATH/bin:$GOROOT/bin
ENV GOVERSION 1.8.3
ENV GOVERSION 1.10.1
Copy link
Contributor

@a-palchikov a-palchikov May 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you need go1.10 for anything specifically or is it just a timely update?

Also, do you know if go1.10 is an umbrella binary for 1.10.x or it is the 1.10.0? If they could release from release-branch.go1.10 as capturing the latest patch (i.e. 1.10.2 atm) but not sure if they actually do that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Etcd requires go 1.9, but I bumped to 1.10 since I was updating the version anyways: etcd-io/etcd#8548 (comment)

As for an umbrella binary, I don't know, I was sticking with the full version, as I suspect that's less likely for an upstream change to affect us.


# Have our own /etc/passwd with users populated from 990 to 1000
COPY passwd /etc/passwd
Expand Down
4 changes: 2 additions & 2 deletions build.assets/makefiles/base/network/flanneld.service
Original file line number Diff line number Diff line change
Expand Up @@ -18,11 +18,11 @@ ExecStartPre=/usr/bin/etcdctl \
--cert-file=/var/state/etcd.cert \
--key-file=/var/state/etcd.key \
--ca-file=/var/state/root.cert \
--peers https://127.0.0.1:4001 set /coreos.com/network/config '{"Network":"${KUBE_POD_SUBNET}", "Backend": {"Type": "${FLANNEL_BACKEND}", "RouteTableFilter": ["tag:KubernetesCluster=${KUBE_CLUSTER_ID}"]}}'
--peers https://127.0.0.1:2379 set /coreos.com/network/config '{"Network":"${KUBE_POD_SUBNET}", "Backend": {"Type": "${FLANNEL_BACKEND}", "RouteTableFilter": ["tag:KubernetesCluster=${KUBE_CLUSTER_ID}"]}}'
ExecStartPre=/bin/systemctl is-active etcd.service
ExecStart=/usr/bin/flanneld \
--ip-masq=true \
--etcd-endpoints https://127.0.0.1:4001,https://127.0.0.1:2379 \
--etcd-endpoints https://127.0.0.1:2379 \
--etcd-cafile=/var/state/root.cert \
--etcd-certfile=/var/state/etcd.cert \
--etcd-keyfile=/var/state/etcd.key \
Expand Down
4 changes: 3 additions & 1 deletion build.assets/makefiles/base/network/wait-for-etcd.sh
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
#!/bin/bash

PEERS=${1:-https://127.0.0.1:2379}

n=0
until [ $n -ge 10 ]
do
Expand All @@ -9,7 +11,7 @@ do
--ca-file=/var/state/root.cert \
--timeout="5s" \
--total-timeout="30s" \
--peers https://127.0.0.1:4001 cluster-health && exit 0
--peers ${PEERS} cluster-health && exit 0
n=$[$n+1]
sleep 3
done
4 changes: 3 additions & 1 deletion build.assets/makefiles/buildbox.mk
Original file line number Diff line number Diff line change
Expand Up @@ -30,14 +30,16 @@ build:
make -e \
KUBE_VER=$(KUBE_VER) \
FLANNEL_VER=$(FLANNEL_VER) \
ETCD_VER=$(ETCD_VER) \
ETCD_VER="$(ETCD_VER)" \
ETCD_LATEST_VER=$(ETCD_LATEST_VER) \
-C /assets/makefiles -f $(TARGET)-docker.mk
ifeq ($(TARGET),master)
$(MAKE) -C $(ASSETS)/makefiles/master/k8s-master -e -f containers.mk
endif

planet-image:
cp $(ASSETS)/orbit.manifest.json $(TARGETDIR)
sed -i "s/REPLACE_ETCD_LATEST_VERSION/$(ETCD_LATEST_VER)/g" $(TARGETDIR)/orbit.manifest.json
cp $(ASSETDIR)/planet $(ROOTFS)/usr/bin/
cp $(ASSETDIR)/docker-import $(ROOTFS)/usr/bin/
@echo -e "\n---> Moving current symlink to $(TARGETDIR)\n"
Expand Down
9 changes: 9 additions & 0 deletions build.assets/makefiles/etcd/etcd-gateway.dropin
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# This systemd drop in file, will change the etcd unit to run a gateway
# instead of the etcd service

[Service]
ExecStart=
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extra ExecStart?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to reset the previous value (here is a faint hint at resetting the values of a list attribute before setting a new one - look at the end of the document).

ExecStart=/usr/bin/etcd gateway start \
--endpoints=${PLANET_ETCD_GW_ENDPOINTS} \
--listen-addr=0.0.0.0:2379 \
--trusted-ca-file=/var/state/root.cert
37 changes: 37 additions & 0 deletions build.assets/makefiles/etcd/etcd-upgrade.service
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
[Unit]
Description=Temporary Etcd Service used for upgrades
Conflicts=etcd.service

# This works by launching etcd, but bound to a non-default loopback interface.
# This is to prevent etcd from being used, while it is being upgraded, and the
# database is inconsistent

[Service]
Restart=always
RestartSec=5
StartLimitInterval=3600
StartLimitBurst=720
Type=notify
TimeoutStartSec=0
EnvironmentFile=/etc/container-environment
EnvironmentFile=-/ext/etcd/etcd-version.txt
ExecStartPre=/usr/bin/planet etcd init
ExecStart=/usr/bin/etcd \
--name=${PLANET_ETCD_MEMBER_NAME} \
--data-dir=/ext/etcd/${PLANET_ETCD_VERSION} \
--initial-advertise-peer-urls=https://${PLANET_PUBLIC_IP}:2380 \
--advertise-client-urls=https://127.0.0.2:2379,https://127.0.0.2:4001 \
--listen-client-urls=https://127.0.0.2:2379,https://127.0.0.2:4001 \
--listen-peer-urls=https://${PLANET_PUBLIC_IP}:2380,https://${PLANET_PUBLIC_IP}:7001 \
--cert-file=/var/state/etcd.cert \
--key-file=/var/state/etcd.key \
--trusted-ca-file=/var/state/root.cert \
--client-cert-auth \
--peer-cert-file=/var/state/etcd.cert \
--peer-key-file=/var/state/etcd.key \
--peer-trusted-ca-file=/var/state/root.cert \
--peer-client-cert-auth $ETCD_OPTS \
--initial-cluster-state new
User=planet
Group=planet
PermissionsStartOnly=true
40 changes: 28 additions & 12 deletions build.assets/makefiles/etcd/etcd.mk
Original file line number Diff line number Diff line change
@@ -1,19 +1,35 @@
.PHONY: all

ARCH := amd64
TARGET := etcd-$(ETCD_VER)-linux-$(ARCH)
TARGET_TARBALL := $(TARGET).tar.gz

DOWNLOAD:=$(ASSETDIR)/$(TARGET_TARBALL)

all: $(DOWNLOAD)
@echo "\n---> Building etcd:\n"
cd $(ASSETDIR) && tar -xzf $(ASSETDIR)/$(TARGET_TARBALL)
mkdir -p $(ROOTFS)/var/etcd
cp -afv $(ASSETDIR)/$(TARGET)/etcd $(ROOTFS)/usr/bin
cp -afv $(ASSETDIR)/$(TARGET)/etcdctl $(ROOTFS)/usr/bin
all: $(ETCD_VER)
@echo -e "\n---> Building etcd:\n"

@echo -e "\n---> Setup etcd services:\n"
cd $(ASSETDIR)
cp -afv ./etcd.service $(ROOTFS)/lib/systemd/system/
ln -sf /lib/systemd/system/etcd.service $(ROOTFS)/lib/systemd/system/multi-user.target.wants/
cp -afv ./etcd-upgrade.service $(ROOTFS)/lib/systemd/system/
cp -afv ./etcd-gateway.dropin $(ROOTFS)/lib/systemd/system/
cp -afv ./etcdctl3 $(ROOTFS)/usr/bin/etcdctl3
chmod +x $(ROOTFS)/usr/bin/etcdctl3
ln -sf /lib/systemd/system/etcd.service $(ROOTFS)/lib/systemd/system/multi-user.target.wants/

# mask the etcd-upgrade service so that it can only be run if intentionally unmasked
ln -sf /dev/null $(ROOTFS)/etc/systemd/system/etcd-upgrade.service

# Write to the release file to indicate the latest release
echo PLANET_ETCD_VERSION=$(ETCD_LATEST_VER) >> $(ROOTFS)/etc/planet-release
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is appending to planet-release intentional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, right now it's only for etcd, but strictly speaking, if we were to embed version information for other dependencies, we could also include additional version information.


.PHONY: $(ETCD_VER)
$(ETCD_VER):
@echo -e "\n---> $@ - Downloading etcd\n"
curl -L https://github.com/coreos/etcd/releases/download/$@/etcd-$@-linux-$(ARCH).tar.gz \
-o $(ASSETDIR)/$@.tar.gz;

@echo -e "\n---> $@ - Extracting etcd\n"
cd $(ASSETDIR)
tar -xzf $(ASSETDIR)/$@.tar.gz

$(DOWNLOAD):
curl -L https://github.com/coreos/etcd/releases/download/$(ETCD_VER)/$(TARGET_TARBALL) -o $(DOWNLOAD)
cp -afv etcd-$@-linux-$(ARCH)/etcd $(ROOTFS)/usr/bin/etcd-$@
cp -afv etcd-$@-linux-$(ARCH)/etcdctl $(ROOTFS)/usr/bin/etcdctl-$@
8 changes: 6 additions & 2 deletions build.assets/makefiles/etcd/etcd.service
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
[Unit]
Description=Etcd Service
Conflicts=etcd-upgrade.service

[Service]
Restart=always
Expand All @@ -9,9 +10,11 @@ StartLimitBurst=720
Type=notify
TimeoutStartSec=0
EnvironmentFile=/etc/container-environment
EnvironmentFile=-/ext/etcd/etcd-version.txt
ExecStartPre=/usr/bin/planet etcd init
ExecStart=/usr/bin/etcd \
--name=${ETCD_MEMBER_NAME} \
--data-dir=/ext/etcd \
--name=${PLANET_ETCD_MEMBER_NAME} \
--data-dir=/ext/etcd/${PLANET_ETCD_VERSION} \
--initial-advertise-peer-urls=https://${PLANET_PUBLIC_IP}:2380 \
--advertise-client-urls=https://${PLANET_PUBLIC_IP}:2379,https://${PLANET_PUBLIC_IP}:4001 \
--listen-client-urls=https://0.0.0.0:2379,https://0.0.0.0:4001 \
Expand All @@ -26,3 +29,4 @@ ExecStart=/usr/bin/etcd \
--peer-client-cert-auth $ETCD_OPTS
User=planet
Group=planet
PermissionsStartOnly=true
7 changes: 7 additions & 0 deletions build.assets/makefiles/etcd/etcdctl3
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
#!/bin/bash
#
# This is a helper script, to make it easier to access the etcd3 datastore
#

ETCDCTL_API=3 ETCDCTL_CERT_FILE="" ETCDCTL_KEY_FILE="" ETCDCTL_CA_FILE="" ETCDCTL_PEERS="" \
/usr/bin/etcdctl --key /var/state/etcd.key --cert /var/state/etcd.cert --cacert /var/state/root.cert "$@"
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,11 @@ Documentation=https://github.com/GoogleCloudPlatform/kubernetes
Wants=etcd.service

[Service]
# Default to etcd storage backend
# TODO - remove when we no longer support etcd2 upgrade paths
Environment=KUBE_STORAGE_BACKEND=etcd2
# Override KUBE_STORAGE_BACKEND if we're using a version 3 compatible etcd
EnvironmentFile=-/ext/etcd/etcd-version.txt
EnvironmentFile=/etc/container-environment
ExecStartPre=/usr/bin/scripts/wait-for-etcd.sh
ExecStartPre=/bin/systemctl is-active etcd.service
Expand All @@ -30,7 +35,7 @@ ExecStart=/usr/bin/kube-apiserver \
--etcd-cafile=/var/state/root.cert \
--etcd-certfile=/var/state/etcd.cert \
--etcd-keyfile=/var/state/etcd.key \
--storage-backend=etcd2 \
--storage-backend=${KUBE_STORAGE_BACKEND} \
--event-ttl=24h0m0s \
--bind-address=${PLANET_PUBLIC_IP} \
--logtostderr=true $KUBE_CLOUD_FLAGS \
Expand Down
3 changes: 2 additions & 1 deletion build.assets/orbit.manifest.json
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
{
"version": "0.0.1",
"labels": [
{"name": "os", "value": "linux"}
{"name": "os", "value": "linux"},
{"name": "version-etcd", "value": "REPLACE_ETCD_LATEST_VERSION"}
],
"commands": [
{"name": "start", "args": ["rootfs/usr/bin/planet", "start"]},
Expand Down
2 changes: 2 additions & 0 deletions tool/planet/cfg.go
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,8 @@ type Config struct {
// EtcdInitialCluster configures the value of ETCD_INITIAL_CLUSTER environment variable
// inside the container
EtcdInitialCluster string
// EtcdGatewayList is a list of etcd endpoints that the etcd gateway can use to reach the cluster
EtcdGatewayList string
// EtcdInitialClusterState configures the value of ETCD_INITIAL_CLUSTER_STATE environment variable
// inside the container
EtcdInitialClusterState string
Expand Down
Loading