How should I configure DNS for etcd in Tectonic Installer for DigitalOcean? #632

aknuds1 · 2017-05-10T18:00:58Z

I asked this originally on the mailing list, but was asked to raise an issue instead in order to reach the right people:

In implementing the Tectonic Installer for DigitalOcean, I've come across the problem that only the first etcd node comes up successfully since other nodes are not given a DNS address. I've copied this aspect from the AWS etcd module's DNS configuration.

Even though this is how it's done already in the AWS setup, I just cannot see how it's supposed to work, since each etcd node is configured as if it would have a corresponding DNS address. Can someone please enlighten me as to how etcd nodes should be configured wrt. DNS? I would appreciate any help!

aknuds1 · 2017-05-10T19:30:52Z

Update: I've tried to assign a DNS address also for my second node, but it fails to join the cluster. I see messages like the following in its log:

May 10 19:29:32 socialfoodie-etcd-1 etcd-wrapper[966]: 2017-05-10 19:29:32.469561 E | rafthttp: request sent was ignored (cluster ID mismatch: peer[a70dce350307d365]=5ed5cd3650f7baa8, local=4191bc921b1dbd02)

What am I doing wrong?

alekssaul · 2017-05-11T00:34:28Z

Does the DNS records get created correctly? You should have A records for all your etcd nodes and SRV records for etcd-server and etcd-client.

aknuds1 · 2017-05-11T01:22:52Z

@alekssaul I think they should be working. I have A records socialfoodie-etcd-0.socialfoodie.club and socialfoodie-etcd-1.socialfoodie.club as well as SRV records _etcd-client._tcp.socialfoodie.club and _etcd-server._tcp.socialfoodie.club.

What do you think of the error message? Before I assigned a DNS record to the second etcd node, it would complain about not being able to resolve its address (socialfoodie-etcd-1.socialfoodie.club), so at least that issue disappeared.

alekssaul · 2017-05-11T01:29:32Z

Code in the PR looks ok to me.. The SRV records do too. Generally, you want to have odd number of etcd nodes (3 & 5 are typical HA recommendations). Did you tear down the environment and re-create it when you made the DNS changes?

aknuds1 · 2017-05-11T01:30:52Z

@alekssaul I guess I didn't, although I rebooted the etcd nodes.

alekssaul · 2017-05-11T01:31:43Z

Can you please try to terraform destroy and terraform apply again.

aknuds1 · 2017-05-11T01:46:43Z

@alekssaul OK, in the process. Might take a bit of time though, since I must update the Route 53 domain registration to point to the newly created hosted zone's name servers. I wish the latter weren't necessary, but haven't seen any way of avoiding it :(

aknuds1 · 2017-05-11T01:57:07Z

@alekssaul I've brought up a new cluster, now the roles are reversed. The second member reports a healthy cluster (etcdctl cluster-health) whereas the first one says it's "unavailable or misconfigured".

aknuds1 · 2017-05-11T01:58:08Z

@alekssaul The node that doesn't work logs messages like these:

May 11 01:57:35 socialfoodie-etcd-0.socialfoodie.club etcd-wrapper[2524]: 2017-05-11 01:57:35.005802 I | raft: a70dce350307d365 is starting a new election at term 113
May 11 01:57:35 socialfoodie-etcd-0.socialfoodie.club etcd-wrapper[2524]: 2017-05-11 01:57:35.006289 I | raft: a70dce350307d365 became candidate at term 114
May 11 01:57:35 socialfoodie-etcd-0.socialfoodie.club etcd-wrapper[2524]: 2017-05-11 01:57:35.006604 I | raft: a70dce350307d365 received MsgVoteResp from a70dce350307d365 at term
114
May 11 01:57:35 socialfoodie-etcd-0.socialfoodie.club etcd-wrapper[2524]: 2017-05-11 01:57:35.006888 I | raft: a70dce350307d365 [logterm: 1, index: 2] sent MsgVote request to 6e32
e8c6cb8345ac at term 114
May 11 01:57:35 socialfoodie-etcd-0.socialfoodie.club etcd-wrapper[2524]: 2017-05-11 01:57:35.010193 E | rafthttp: request sent was ignored (cluster ID mismatch: remote[6e32e8c6cb8345ac]=f76182f7ad1e367c, local=4191bc921b1dbd02)

alekssaul · 2017-05-11T02:00:44Z

Also may want to consider using; https://www.terraform.io/docs/providers/do/r/domain.html instead of Route53 since DO seem to have dns-as-a-service solution. The challenge is, DO_domain provider doesn't seem to support SRV records. You can work around it by using static discovery like;

tectonic-installer/platforms/metal/cl/bootkube-controller.yaml.tmpl

Lines 10 to 17 in eb3f75c

    
           Environment="ETCD_IMAGE_TAG={{.etcd_image_tag}}" 
        
           Environment="ETCD_NAME={{.etcd_name}}" 
        
           Environment="ETCD_ADVERTISE_CLIENT_URLS=http://{{.domain_name}}:2379" 
        
           Environment="ETCD_INITIAL_ADVERTISE_PEER_URLS=http://{{.domain_name}}:2380" 
        
           Environment="ETCD_LISTEN_CLIENT_URLS=http://0.0.0.0:2379" 
        
           Environment="ETCD_LISTEN_PEER_URLS=http://0.0.0.0:2380" 
        
           Environment="ETCD_INITIAL_CLUSTER={{.etcd_initial_cluster}}" 
        
           Environment="ETCD_STRICT_RECONFIG_CHECK=true"

aknuds1 · 2017-05-11T02:01:41Z

@alekssaul Thanks, a DO native solution would be better, but I don't think we're seeing any issues due to DNS are we? I mean, etcd complains of cluster ID mismatch.

alekssaul · 2017-05-11T02:02:54Z

clusterID mismatch seem to be related to a configuration difference between the etcd cluster nodes per etcd-io/etcd#3710 .

- remove Route53 dependency - fixes coreos#632 by changing the `--name=etcd` flag. - uses DO DNS for DNS record creation

alekssaul · 2017-05-11T03:11:54Z

It looks like you had --name=etcd flag for etcd service. I believe this was causing all the etcd members to register with same name. Submitted a PR to your repo remove route53 dependency and fix the etcd issue

aknuds1 · 2017-05-12T02:54:58Z

@alekssaul Thanks :) Trying your PR now. It's funny that etcd nodes should be misconfigured, although what you're saying such as the name being non-unique makes sense, since I copied the configuration from the AWS setup.

aknuds1 · 2017-05-12T03:01:36Z

@alekssaul The PR works perfectly! I made some minor modifications, but overall it led to a healthy etcd cluster up and running on DO and without need for Route 53.

I guess the static discovery approach of etcd cluster members could be problematic though if one should auto scale the cluster? It's not a case for me at the moment, but worth thinking about.

aknuds1 · 2017-05-12T03:26:24Z

Closing this for now as it works pretty well.

alekssaul added a commit to alekssaul/tectonic-installer that referenced this issue May 11, 2017

Updates to support etcd nodes on Digital Ocean

c4b3419

- remove Route53 dependency - fixes coreos#632 by changing the `--name=etcd` flag. - uses DO DNS for DNS record creation

alekssaul mentioned this issue May 11, 2017

Updates to support etcd nodes on Digital Ocean aknuds1/tectonic-installer#1

Merged

aknuds1 closed this as completed May 12, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How should I configure DNS for etcd in Tectonic Installer for DigitalOcean? #632

How should I configure DNS for etcd in Tectonic Installer for DigitalOcean? #632

aknuds1 commented May 10, 2017

aknuds1 commented May 10, 2017 •

edited

Loading

alekssaul commented May 11, 2017

aknuds1 commented May 11, 2017 •

edited

Loading

alekssaul commented May 11, 2017

aknuds1 commented May 11, 2017

alekssaul commented May 11, 2017

aknuds1 commented May 11, 2017 •

edited

Loading

aknuds1 commented May 11, 2017

aknuds1 commented May 11, 2017 •

edited

Loading

alekssaul commented May 11, 2017

aknuds1 commented May 11, 2017 •

edited

Loading

alekssaul commented May 11, 2017 •

edited

Loading

alekssaul commented May 11, 2017

aknuds1 commented May 12, 2017 •

edited

Loading

aknuds1 commented May 12, 2017 •

edited

Loading

aknuds1 commented May 12, 2017

How should I configure DNS for etcd in Tectonic Installer for DigitalOcean? #632

How should I configure DNS for etcd in Tectonic Installer for DigitalOcean? #632

Comments

aknuds1 commented May 10, 2017

aknuds1 commented May 10, 2017 • edited Loading

alekssaul commented May 11, 2017

aknuds1 commented May 11, 2017 • edited Loading

alekssaul commented May 11, 2017

aknuds1 commented May 11, 2017

alekssaul commented May 11, 2017

aknuds1 commented May 11, 2017 • edited Loading

aknuds1 commented May 11, 2017

aknuds1 commented May 11, 2017 • edited Loading

alekssaul commented May 11, 2017

aknuds1 commented May 11, 2017 • edited Loading

alekssaul commented May 11, 2017 • edited Loading

alekssaul commented May 11, 2017

aknuds1 commented May 12, 2017 • edited Loading

aknuds1 commented May 12, 2017 • edited Loading

aknuds1 commented May 12, 2017

aknuds1 commented May 10, 2017 •

edited

Loading

aknuds1 commented May 11, 2017 •

edited

Loading

aknuds1 commented May 11, 2017 •

edited

Loading

aknuds1 commented May 11, 2017 •

edited

Loading

aknuds1 commented May 11, 2017 •

edited

Loading

alekssaul commented May 11, 2017 •

edited

Loading

aknuds1 commented May 12, 2017 •

edited

Loading

aknuds1 commented May 12, 2017 •

edited

Loading