Skip to content
This repository has been archived by the owner on Feb 12, 2021. It is now read-only.

etcd/etcd-live-*: replace etcd2 with etcd-member.service #1132

Merged
merged 1 commit into from
Aug 21, 2017

Conversation

radhikapc
Copy link
Contributor

/etcd/etcd-live*: deprecate etcd2 and etcd
Ref: coreos/bugs#1877

@@ -1,225 +1,236 @@
# Enabling HTTPS in an existing etcd cluster

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same contents as etcd/etcd-live-cluster-reconfiguration.md?

@radhikapc
Copy link
Contributor Author

radhikapc commented Aug 4, 2017

fixed it.

Copy link

@heyitsanthony heyitsanthony left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

two bits that aren't clear on the CL side:

  1. how to get listen client urls from a ct compiled unit file-- old doc assumes ETCD_LISTEN_CLIENT_URLS vs current cl generates --listen-client-urls
  2. preferred way to launch v3 etcdctl


```sh
$ sudo systemd-tmpfiles --create /usr/lib64/tmpfiles.d/etcd2.conf
$ sudo systemd-tmpfiles --create /usr/lib64/tmpfiles.d/etcd.conf

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/usr/lib64/tmpfiles.d/etcd-wrapper.conf


```sh
$ sudo systemctl stop etcd2
$ sudo systemctl stop etcd-member.service
```

If you have etcd proxy nodes, they should update members list automatically according to the [`--proxy-refresh-interval`][proxy-refresh] configuration option.

Then, on one of the *member* nodes, run the following command to backup the current [data directory][data-dir]:

```sh

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the backup command only saves v2 data; should this document include instructions for v3 backup (presumably using rkt to launch etcdctl)?

What is the preferred way to launch containerized etcdctl? This works for v3 save/restore, but doesn't seem ideal:

$ sudo mkdir /var/lib/etcd_backup
$ sudo rkt run quay.io/coreos/etcd --net=host \
    --mount volume=backup,target=/var/lib/etcd_backup \
    --volume=backup,kind=host,source=/var/lib/etcd_backup,readOnly=false \
    --environment=ETCDCTL_API=3 \
    --exec=/usr/local/bin/etcdctl -- snapshot save /var/lib/etcd_backup/backup.db
$ sudo rkt run quay.io/coreos/etcd --net=host \
    --mount volume=backup,target=/var/lib/etcd_backup \
    --volume=backup,kind=host,source=/var/lib/etcd_backup/,readOnly=false \
    --exec=/usr/local/bin/etcdctl -- snapshot restore --data-dir=/var/lib/etcd_backup/datadir /var/lib/etcd_backup/backup.db
$ sudo rm -rf /var/lib/etcd
$ sudo mv /var/lib/etcd_backup/datadir /var/lib/etcd
$ sudo chown etcd -R /var/lib/etcd

@@ -17,14 +17,14 @@ $ grep ETCD_LISTEN_CLIENT_URLS /run/systemd/system/etcd-member.service.d/20-clct
Environment="ETCD_LISTEN_CLIENT_URLS=http://0.0.0.0:2379"
```

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems outdated; ETCD_LISTEN_CLIENT_URLS isn't set in the drop-in by default. Should it be

$ grep listen-client-urls /run/systemd/system/etcd-member.service.d/20-clct-etcd-member.conf
  --listen-client-urls="http://0.0.0.0:2379" \

?

@@ -17,14 +17,14 @@ $ grep ETCD_LISTEN_CLIENT_URLS /run/systemd/system/etcd-member.service.d/20-clct
Environment="ETCD_LISTEN_CLIENT_URLS=http://0.0.0.0:2379"
```

In this case etcd is listening only on port 2379. We'll add port 4001 with a systemd [drop-in][drop-ins] unit file. Create the file `/etc/systemd/system/etcd2.service.d/25-insecure_localhost.conf`. In this file, write an excerpt that appends the new URL on port 4001 to the existing value we retrieved in the step above:
In this case etcd is listening only on port 2379. We'll add port 4001 with a systemd [drop-in][drop-ins] unit file. Create the file `/etc/systemd/system/etcd-member.service.d/25-insecure_localhost.conf`. In this file, write an excerpt that appends the new URL on port 4001 to the existing value we retrieved in the step above:

```
[Service]
Environment="ETCD_LISTEN_CLIENT_URLS=http://0.0.0.0:2379,http://127.0.0.1:4001"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here too, what if there unit file has --listen-client-urls? I've seen reports where ETCD_LISTEN_CLIENT_URLS is conflicting with --listen-client-urls

@@ -202,7 +202,7 @@ $ etcdctl cluster-health
Check etcd status and availability of the insecure port on the loopback interface:

```sh
$ systemctl status etcd2
$ systemctl status etcd-member.service
$ curl http://127.0.0.1:4001/v2/stats/self

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

more modern:

$ systemctl status etcd-member.service
$ curl http://127.0.0.1:4001/metrics
$ curl http://127.0.0.1:4001/health

@euank
Copy link
Contributor

euank commented Aug 7, 2017

preferred way to launch etcdctl3 cli

Our answer right now isn't great. torcx will help with this a little if we start shipping it that way.

The three basic options are:

  1. alias etcdctl="sudo -E rkt enter $(cat /var/lib/coreos/etcd-member-wrapper.uuid) /usr/local/bin/etcdctl"
  2. Download and install etcdctl from the etcd release page (into e.g. /opt/bin)
  3. run a container for running it (e.g. sudo rkt run --net=host --insecure-options=image --interactive docker://quay.io/coreos/etcd:v3.1.10 or docker run --net=host -it quay.io/coreos/etcd:v3.1.10)

1/3 can be made more ergonomic by writing them to a profile file in /etc/profile.d/etcdctl3.

1 has the problem that it won't be portable across changing from etcd-member to etcd3 provided by torcx.

Edit: Actually, come to think of it, I'm not even sure why the etcdctl in our path isn't just etcd-v3 yet.
Before it was because the old etcd was still in the image, but now that it's not...

Edit2: and none of the above fix backups being awkward

@euank
Copy link
Contributor

euank commented Aug 7, 2017

After thinking about the above, I think downloading etcdctl into /opt for now, and us updating the docs once we have a "less bad" solution (likely torcx) would be the right move.

@radhikapc radhikapc requested a review from cgonyeo August 7, 2017 23:18
@cgonyeo
Copy link
Contributor

cgonyeo commented Aug 7, 2017

how to get listen client urls from a ct compiled unit file-- old doc assumes ETCD_LISTEN_CLIENT_URLS vs current cl generates --listen-client-urls

@heyitsanthony do you mean how does the etcd-wrapper script gets the listen client urls from a ct generated unit? It'll pass the --listen-client-urls argument on the command line. All etcd arguments specified in a ct config are passed to etcd via command line arguments.

@heyitsanthony
Copy link

@dgonyeo OK, so the expected behavior is the generated unit file will have --listen-client-urls and that should be the grep target instead of ETCD_LISTEN_CLIENT_URLS like the docs suggest now?

@cgonyeo
Copy link
Contributor

cgonyeo commented Aug 7, 2017

@heyitsanthony yes

@cgonyeo
Copy link
Contributor

cgonyeo commented Aug 8, 2017

@radhikapc @heyitsanthony and myself had a quick video call about the etcd-live-cluster-reconfiguration.md file, and agreed on some things

  • change etcd cluster size section
    • this section should be rewritten to not describe modifying an existing, misconfigured node, and instead describe how to create a new node that joins an existing etcd cluster (thereby growing the cluster)
  • replace a failed node section
    • this needs to be rewritten to not use environment variables to override etcd options, as ct doesn't provide arguments to etcd via environment variables. The document should probably instruct users to modify ct's generated systemd dropin (at /etc/systemd/system/etcd-member.service.d/20-clct-etcd-member.conf)
  • disaster recovery section
    • this section is for etcd v2, and needs to have a section written for etcd v3

@radhikapc radhikapc force-pushed the etcd2-etcd3 branch 6 times, most recently from 5836cdb to 41a60e4 Compare August 14, 2017 20:51
ETCD_NAME="node2"
ETCD_INITIAL_CLUSTER="52d2c433e31d54526cf3aa660304e8f1=http://0.0.0.1:2380,node2=http://0.0.0.2:2380,2cb7bb694606e5face87ee7a97041758=http://0.0.0.3:2380"
ETCD_INITIAL_CLUSTER_STATE="existing"
```

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

github markdown isn't catching the end of this block; probably because of the leading spaces

listen_client_urls: http://0.0.0.0:2379,http://0.0.0.0:4001
listen_peer_urls: http://0.0.0.0:2380
discovery: https://discovery.etcd.io/<token>
initial_cluster: demo-etcd-1=https://0.0.0.1:2380,demo-etcd-2=https://0.0.0.2:2380,demo-etcd-3=https://0.0.0.3:2380

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0.0.0.{1,2,3,...} are broadcast addresses, would be clearer to use use 10.240.0.{1,2,3,4}.

initial_cluster: demo-etcd-1=http://10.240.0.1:2380,demo-etcd-2=http://10.240.0.2:2380,demo-etcd-3=http://10.240.0.3:2380

advertise_client_urls: http://<PEER_ADDRESS>:2379
listen_peer_urls: http://0.0.0.0:2380
initial_advertise_peer_urls: http://<PEER_ADDRESS>:2380

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/PEER_ADDRESS/10.240.0.1 since it should match demo-etcd-1 in the initial-cluster command

--name="demo-etcd-1" \
--listen-peer-urls="http://0.0.0.0:2380" \
--listen-client-urls="https://10.240.0.1:2379,http://0.0.0.0:4001" \
--initial-advertise-peer-urls="http://<PEER_ADDRESS>:2380" \

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/PEER_ADDRESS/10.240.0.1

--listen-peer-urls="http://0.0.0.0:2380" \
--listen-client-urls="https://10.240.0.1:2379,http://0.0.0.0:4001" \
--initial-advertise-peer-urls="http://<PEER_ADDRESS>:2380" \
--initial-cluster="demo-etcd-1=https://0.0.0.1:2380,demo-etcd-2=https://0.0.0.2:2380,demo-etcd-3=https://0.0.0.3:2380" \

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--initial-cluster="demo-etcd-1=http://10.240.0.1:2380,demo-etcd-2=http://10.240.0.2:2380,demo-etcd-3=http://10.240.0.3:2380" \


`$ sudo rm -rf /var/lib/etcd`

6. Move the backup file into `/var/lib/etcd`:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move the restored member directory to /var/lib/etcd:


`$ sudo ETCDCTL_API=3 /opt/bin/etcdctl snapshot save /var/lib/etcd_backup/backup.db`

4. Restore the snapshot file:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Restore the snapshot file into a new member directory /var/lib/etcd_backup/etcd:


`$ etcdctl cluster-health`

10. Add the new member on each nodes:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the member add advice isn't very clear (e.g., how to spin up new hosts), instead can avoid that and point back to the change cluster size guide:

  1. The restored cluster is now running with a single node. See the etcd cluster resizing guide above to add more nodes.


```sh
$ etcdctl member list
e6c2bda2aa1f2dcf: name=1be6686cc2c842db035fdc21f56d1ad0 peerURLs=http://10.0.1.2:2380 clientURLs=http://10.0.1.2:2379

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/10.0.1.2/10.240.0.2 to be consistent


10. Spin up new nodes. Follow the instruction given in section [add-new-node]. Ensure that the version is given in the config file: For example:

```yaml container-linux-config

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this doesn't need to be repeated if it's already in the guide above

@radhikapc radhikapc force-pushed the etcd2-etcd3 branch 4 times, most recently from 43c2c40 to d225ba3 Compare August 14, 2017 23:52
discovery: https://discovery.etcd.io/<token>
name: demo-etcd-1
listen_client_urls: https://10.240.0.1:2379,http://0.0.0.0:4001
advertise_client_urls: http://0.240.0.4:2379

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


Run `sudo systemctl daemon-reload` to parse the new and edited units. Check whether the new [drop-in][drop-in] is valid by checking the service's journal: `sudo journalctl _PID=1 -e -u etcd2`. If everything is ok, run `sudo systemctl restart etcd2` to activate your changes. You will see that the former proxy node has become a cluster member:
ETCD_NAME="node4"
ETCD_INITIAL_CLUSTER="demo-etcd-1=http://10.240.0.1:2380,demo-etcd-2=http://10.240.0.2:2380,demo-etcd-3=http://10.240.0.3:2380,node4=http://10.0.1.4:2380"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/10.0.1.4/10.240.1.4

$ sudo rm -rf /var/lib/etcd2/*
```
```sh
$ etcdctl member add node2 http://0.0.0.2:2380

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$ etcdctl member add demo-etcd-2 http://10.240.0.2:2380


Check that the `/var/lib/etcd2/` directory exists and is empty. If you removed this directory accidentally, you can recreate it with the proper modes by using:
ETCD_NAME="node2"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ETCD_NAME="demo-etcd-2"


Check that the `/var/lib/etcd2/` directory exists and is empty. If you removed this directory accidentally, you can recreate it with the proper modes by using:
ETCD_NAME="node2"
ETCD_INITIAL_CLUSTER="demo-etcd-1=http://10.240.0.1:2380,demo-etcd-2=http://10.240.0.2:2380,node3=http://10.240.0.3:2380"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

advertise_client_urls: http://10.240.0.1:2379
listen_peer_urls: http://0.0.0.0:2380
initial_advertise_peer_urls: http://10.240.0.1:2380
initial_cluster: ddemo-etcd-1=http://10.240.0.1:2380,demo-etcd-2=http://10.240.0.2:2380,demo-etcd-3=http://10.240.0.3:2380,node4=http://10.240.0.4:2380

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

# ETCDCTL_INITIAL_CLUSTER from etcdctl add
initial_cluster: demo-etcd-1=http://10.240.0.1:2380,demo-etcd-2=http://10.240.0.2:2380,demo-etcd-3=http://10.240.0.3:2380

initial_advertise_peer_urls: http://10.240.0.1:2380
initial_cluster: ddemo-etcd-1=http://10.240.0.1:2380,demo-etcd-2=http://10.240.0.2:2380,demo-etcd-3=http://10.240.0.3:2380,node4=http://10.240.0.4:2380
initial_cluster_token: demo-etcd-token
initial_cluster_state: new

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

initial_cluster_state: existing

```sh
$ etcdctl cluster-health
```
`$ sudo ETCDCTL_API=3 /opt/bin/etcdctl snapshot --data-dir /var/lib/etcd_backup/etcd restore backup.db \

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use ```? single-` renders as one line


`$ etcdctl cluster-health`

10. The restored cluster is now running with a single node. For information on adding more nodes, see [Change etcd cluster size][change-etcd-cluster-size].

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change-cluster-size? broken link


If the output contains no errors, remove the `/run/systemd/system/etcd2.service.d/98-force-new-cluster.conf` drop-in file, and reload systemd services: `sudo systemctl daemon-reload`. It is not necessary to restart the `etcd2` service after this step.
10. Spin up new nodes. Ensure that the version is given in the config file.
For information on adding more nodes, see [Change etcd cluster size][change-etcd-cluster-size].

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change-cluster-size?

Copy link

@heyitsanthony heyitsanthony left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm after ip fix. Thanks!

discovery: https://discovery.etcd.io/<token>
name: demo-etcd-1
listen_client_urls: https://10.240.0.1:2379,http://0.0.0.0:4001
advertise_client_urls: http://10.240.0.4:2379

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/10.240.0.4/10.240.0.1 to match the corresponding --advertise-client-urls=http://10.240.0.1:2379 below

/etcd/etcd-live*: deprecate etcd2 and etcd
    Ref: coreos/bugs#1877
@radhikapc radhikapc merged commit 9f62a2f into coreos:master Aug 21, 2017
@radhikapc radhikapc deleted the etcd2-etcd3 branch August 21, 2017 18:13
Copy link
Contributor

@euank euank left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies I didn't get to this until a bit late

```

will generate the following [drop-in][drop-in]:
The config file is first validated and transformed into a machine-readable form, which is then sent directly to a Container Linux provisioning target. The [drop-in][drop-in] generated from the example config file is given below:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"sent directly to a Container Linux provisioning target" is wishful thinking.

Maybe something like

"The above Container Linux config file can be used to provision a machine. Provisioning with it will result in the following drop-in being generated:"

listen_peer_urls: http://0.0.0.0:2380
discovery: https://discovery.etcd.io/<token>
name: demo-etcd-1
listen_client_urls: https://10.240.0.1:2379,http://0.0.0.0:4001
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is one https, the other http? Shouldn't these both be http?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants