Skip to content
This repository has been archived by the owner on Feb 5, 2020. It is now read-only.

How should I configure DNS for etcd in Tectonic Installer for DigitalOcean? #632

Closed
aknuds1 opened this issue May 10, 2017 · 16 comments
Closed

Comments

@aknuds1
Copy link
Contributor

aknuds1 commented May 10, 2017

I asked this originally on the mailing list, but was asked to raise an issue instead in order to reach the right people:

In implementing the Tectonic Installer for DigitalOcean, I've come across the problem that only the first etcd node comes up successfully since other nodes are not given a DNS address. I've copied this aspect from the AWS etcd module's DNS configuration.

Even though this is how it's done already in the AWS setup, I just cannot see how it's supposed to work, since each etcd node is configured as if it would have a corresponding DNS address. Can someone please enlighten me as to how etcd nodes should be configured wrt. DNS? I would appreciate any help!

@aknuds1
Copy link
Contributor Author

aknuds1 commented May 10, 2017

Update: I've tried to assign a DNS address also for my second node, but it fails to join the cluster. I see messages like the following in its log:

May 10 19:29:32 socialfoodie-etcd-1 etcd-wrapper[966]: 2017-05-10 19:29:32.469561 E | rafthttp: request sent was ignored (cluster ID mismatch: peer[a70dce350307d365]=5ed5cd3650f7baa8, local=4191bc921b1dbd02)

What am I doing wrong?

@alekssaul
Copy link
Contributor

Does the DNS records get created correctly? You should have A records for all your etcd nodes and SRV records for etcd-server and etcd-client.

@aknuds1
Copy link
Contributor Author

aknuds1 commented May 11, 2017

@alekssaul I think they should be working. I have A records socialfoodie-etcd-0.socialfoodie.club and socialfoodie-etcd-1.socialfoodie.club as well as SRV records _etcd-client._tcp.socialfoodie.club and _etcd-server._tcp.socialfoodie.club.

What do you think of the error message? Before I assigned a DNS record to the second etcd node, it would complain about not being able to resolve its address (socialfoodie-etcd-1.socialfoodie.club), so at least that issue disappeared.

@alekssaul
Copy link
Contributor

Code in the PR looks ok to me.. The SRV records do too. Generally, you want to have odd number of etcd nodes (3 & 5 are typical HA recommendations). Did you tear down the environment and re-create it when you made the DNS changes?

@aknuds1
Copy link
Contributor Author

aknuds1 commented May 11, 2017

@alekssaul I guess I didn't, although I rebooted the etcd nodes.

@alekssaul
Copy link
Contributor

Can you please try to terraform destroy and terraform apply again.

@aknuds1
Copy link
Contributor Author

aknuds1 commented May 11, 2017

@alekssaul OK, in the process. Might take a bit of time though, since I must update the Route 53 domain registration to point to the newly created hosted zone's name servers. I wish the latter weren't necessary, but haven't seen any way of avoiding it :(

@aknuds1
Copy link
Contributor Author

aknuds1 commented May 11, 2017

@alekssaul I've brought up a new cluster, now the roles are reversed. The second member reports a healthy cluster (etcdctl cluster-health) whereas the first one says it's "unavailable or misconfigured".

@aknuds1
Copy link
Contributor Author

aknuds1 commented May 11, 2017

@alekssaul The node that doesn't work logs messages like these:

May 11 01:57:35 socialfoodie-etcd-0.socialfoodie.club etcd-wrapper[2524]: 2017-05-11 01:57:35.005802 I | raft: a70dce350307d365 is starting a new election at term 113
May 11 01:57:35 socialfoodie-etcd-0.socialfoodie.club etcd-wrapper[2524]: 2017-05-11 01:57:35.006289 I | raft: a70dce350307d365 became candidate at term 114
May 11 01:57:35 socialfoodie-etcd-0.socialfoodie.club etcd-wrapper[2524]: 2017-05-11 01:57:35.006604 I | raft: a70dce350307d365 received MsgVoteResp from a70dce350307d365 at term
114
May 11 01:57:35 socialfoodie-etcd-0.socialfoodie.club etcd-wrapper[2524]: 2017-05-11 01:57:35.006888 I | raft: a70dce350307d365 [logterm: 1, index: 2] sent MsgVote request to 6e32
e8c6cb8345ac at term 114
May 11 01:57:35 socialfoodie-etcd-0.socialfoodie.club etcd-wrapper[2524]: 2017-05-11 01:57:35.010193 E | rafthttp: request sent was ignored (cluster ID mismatch: remote[6e32e8c6cb8345ac]=f76182f7ad1e367c, local=4191bc921b1dbd02)

@alekssaul
Copy link
Contributor

Also may want to consider using; https://www.terraform.io/docs/providers/do/r/domain.html instead of Route53 since DO seem to have dns-as-a-service solution. The challenge is, DO_domain provider doesn't seem to support SRV records. You can work around it by using static discovery like;

Environment="ETCD_IMAGE_TAG={{.etcd_image_tag}}"
Environment="ETCD_NAME={{.etcd_name}}"
Environment="ETCD_ADVERTISE_CLIENT_URLS=http://{{.domain_name}}:2379"
Environment="ETCD_INITIAL_ADVERTISE_PEER_URLS=http://{{.domain_name}}:2380"
Environment="ETCD_LISTEN_CLIENT_URLS=http://0.0.0.0:2379"
Environment="ETCD_LISTEN_PEER_URLS=http://0.0.0.0:2380"
Environment="ETCD_INITIAL_CLUSTER={{.etcd_initial_cluster}}"
Environment="ETCD_STRICT_RECONFIG_CHECK=true"

@aknuds1
Copy link
Contributor Author

aknuds1 commented May 11, 2017

@alekssaul Thanks, a DO native solution would be better, but I don't think we're seeing any issues due to DNS are we? I mean, etcd complains of cluster ID mismatch.

@alekssaul
Copy link
Contributor

alekssaul commented May 11, 2017

clusterID mismatch seem to be related to a configuration difference between the etcd cluster nodes per etcd-io/etcd#3710 .

alekssaul added a commit to alekssaul/tectonic-installer that referenced this issue May 11, 2017
- remove Route53 dependency
- fixes coreos#632 by changing the `--name=etcd` flag.
- uses DO DNS for DNS record creation
@alekssaul
Copy link
Contributor

It looks like you had --name=etcd flag for etcd service. I believe this was causing all the etcd members to register with same name. Submitted a PR to your repo remove route53 dependency and fix the etcd issue

@aknuds1
Copy link
Contributor Author

aknuds1 commented May 12, 2017

@alekssaul Thanks :) Trying your PR now. It's funny that etcd nodes should be misconfigured, although what you're saying such as the name being non-unique makes sense, since I copied the configuration from the AWS setup.

@aknuds1
Copy link
Contributor Author

aknuds1 commented May 12, 2017

@alekssaul The PR works perfectly! I made some minor modifications, but overall it led to a healthy etcd cluster up and running on DO and without need for Route 53.

I guess the static discovery approach of etcd cluster members could be problematic though if one should auto scale the cluster? It's not a case for me at the moment, but worth thinking about.

@aknuds1
Copy link
Contributor Author

aknuds1 commented May 12, 2017

Closing this for now as it works pretty well.

@aknuds1 aknuds1 closed this as completed May 12, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants