Split-brain? #2733

CSharpRU · 2017-05-16T05:52:07Z

Hi there,

I have Vault with etcdv3 in HA mode. Accidentally, I've got situation when all nodes are online in standby mode, but no one is trying to make itself master.

How to fix that?

vishalnayak · 2017-05-16T10:46:48Z

@CSharpRU Can you please share the Vault version, config file and the server logs?

CSharpRU · 2017-05-16T12:19:16Z

Version: 0.7.0

Config:

listener "tcp" {
  address = "ip:8200"

  cluster_address = "ip:8201"

  tls_disable = "false"
  tls_cert_file = "/etc/vault/cert.pem"
  tls_key_file = "/etc/vault/key.pem"
}

storage "etcd" {
  address = "https://localhost:2379"
  etcd_api = "v3"

  ha_enabled = "true"

  tls_ca_file = "/etc/ssl/ca.pem"
  tls_cert_file = "/etc/ssl/cert.pem"
  tls_key_file = "/etc/ssl/key.pem"

  cluster_addr = "hostname:8201"
  disable_clustering = "false"
  redirect_addr = "https://hostname:8200"
}

Logs:

2017/05/16 08:32:30.598868 [TRACE] physical/cache: creating LRU cache: size=32768
2017/05/16 08:32:30.605254 [TRACE] cluster listener addresses synthesized: cluster_addresses=
2017/05/16 08:32:41.092506 [TRACE] physical/cache: creating LRU cache: size=32768
2017/05/16 08:32:41.098427 [TRACE] cluster listener addresses synthesized: cluster_addresses=
2017/05/16 08:32:55.128303 [INFO ] core: vault is unsealed
2017/05/16 08:32:55.128412 [INFO ] core: entering standby mode
2017/05/16 08:32:55.131117 [TRACE] core: clearing forwarding clients
2017/05/16 08:32:55.131121 [TRACE] core: done clearing forwarding clients
2017/05/16 08:32:58.846141 [TRACE] core: found new active node information, refreshing
2017/05/16 08:32:58.848357 [TRACE] core: parsing information for new active node: 
2017/05/16 08:32:58.848496 [TRACE] core: refreshing forwarding connection
2017/05/16 08:32:58.848501 [TRACE] core: clearing forwarding clients
2017/05/16 08:32:58.848505 [TRACE] core: done clearing forwarding clients
2017/05/16 08:32:58.848536 [TRACE] core: done refreshing forwarding connection
2017/05/16 08:41:02.456065 [INFO ] core: acquired lock, enabling active operation

vishalnayak · 2017-05-16T14:40:39Z

@CSharpRU The node for which you have attached the logs seems to have become an active node. How many nodes are in the cluster and what is the output of vault status on each?

CSharpRU · 2017-05-16T14:44:39Z

@vishalnayak I've fixed it already by removing leader and lock keys from etcd. vault status output on each node was Mode: standby and with the same leader every time (even after restart of the whole cluster), other info as usual. 3 nodes in the cluster.

vishalnayak · 2017-05-16T14:48:56Z

@CSharpRU Glad to know that its working. If you happen to know what had caused the lock keys to go to that state, please do let us know. Closing this issue for now.

jefferai · 2017-05-16T14:50:29Z

@xiang90 do you want to look into this?

CSharpRU · 2017-05-16T14:54:16Z

@vishalnayak I think that it was caused by etcd and Vault outage (killed by memory). But I can't find anything in logs (maybe because it was level=err) and I can't explain why those keys were staying and new "election" wasn't started.

jefferai · 2017-05-16T14:56:25Z

It should recover; xiang90 maintains the etcdv3 backend, hence my ping.

CSharpRU · 2017-05-16T14:58:05Z

@jefferai Thanks, waiting for @xiang90 answer :)

xiang90 · 2017-05-16T14:59:40Z

@CSharpRU Can you reproduce it? Probably provide a step by step guide or a script so that we can look into it more?

CSharpRU · 2017-05-16T16:27:05Z

@xiang90 nope, it was made by our devops guy. I'll ask him tomorrow, maybe he'll give some info about that.

raoofm · 2017-05-16T20:29:20Z

@CSharpRU @xiang90 @jefferai

This should be fixed by #2526

vault version being used for this issue is 0.7.0 and it should be fixed if the version is upgraded to 0.7.1 or later.

CSharpRU · 2017-05-17T03:53:27Z

Thanks, will update and try to repeat it!

vishalnayak closed this as completed May 16, 2017

jefferai reopened this May 16, 2017

CSharpRU closed this as completed May 17, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Split-brain? #2733

Split-brain? #2733

CSharpRU commented May 16, 2017

vishalnayak commented May 16, 2017

CSharpRU commented May 16, 2017

vishalnayak commented May 16, 2017

CSharpRU commented May 16, 2017

vishalnayak commented May 16, 2017

jefferai commented May 16, 2017

CSharpRU commented May 16, 2017

jefferai commented May 16, 2017

CSharpRU commented May 16, 2017

xiang90 commented May 16, 2017

CSharpRU commented May 16, 2017

raoofm commented May 16, 2017

CSharpRU commented May 17, 2017 •

edited

Loading

Split-brain? #2733

Split-brain? #2733

Comments

CSharpRU commented May 16, 2017

vishalnayak commented May 16, 2017

CSharpRU commented May 16, 2017

vishalnayak commented May 16, 2017

CSharpRU commented May 16, 2017

vishalnayak commented May 16, 2017

jefferai commented May 16, 2017

CSharpRU commented May 16, 2017

jefferai commented May 16, 2017

CSharpRU commented May 16, 2017

xiang90 commented May 16, 2017

CSharpRU commented May 16, 2017

raoofm commented May 16, 2017

CSharpRU commented May 17, 2017 • edited Loading

CSharpRU commented May 17, 2017 •

edited

Loading