an election requires a node with id [STALE_STATE_CONFIG] #53734

admidelu · 2020-03-18T15:04:37Z

Elasticsearch version (bin/elasticsearch --version):
7.6.1 ELK on k8s

Description of the problem including expected versus actual behavior:

master not discovered or elected yet, an election requires a node with id [STALE_STATE_CONFIG]

Steps to reproduce:
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
name: ig-cluster
spec:
version: 7.6.1
nodeSets:
- name: masternode
count: 3
config:
node.master: true
node.data: false
node.ingest: false
cluster.remote.connect: true
volumeClaimTemplates:
- metadata:
name: elasticsearch-data
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 500Gi
storageClassName: local-storage
podTemplate:
spec:
nodeSelector:
server: datababe
containers:
- name: elasticsearch
resources:
limits:
memory: 20Gi
cpu: 4
requests:
memory: 20Gi
cpu: 1
env:
- name: ES_JAVA_OPTS
value: -Xms10g -Xmx10g
- name: datanode
count: 9
config:
node.master: false
node.data: true
node.ingest: true
cluster.remote.connect: true
volumeClaimTemplates:
- metadata:
name: elasticsearch-data
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 750Gi
storageClassName: local-storage
podTemplate:
spec:
nodeSelector:
server: datababe
containers:
- name: elasticsearch
resources:
limits:
memory: 28Gi
cpu: 5
requests:
memory: 28Gi
cpu: 1
env:
- name: ES_JAVA_OPTS
value: -Xms14g -Xmx14g

Provide logs (if relevant):
{"type": "server", "timestamp": "2020-03-18T14:54:03,323Z", "level": "WARN", "component": "o.e.c.c.ClusterFormationFailureHelper", "cluster.name": "ig-cluster", "node.name": "ig-cluster-es-masternode-1", "message": "master not discovered or elected yet, an election requires a node with id [STALE_STATE_CONFIG], have discovered [{ig-cluster-es-masternode-1}{xZ7cq3T5QlOb_FfvxHAqIg}{d8bAOYWLTuexRQQWpbQ8qg}{10.38.128.2}{10.38.128.2:9300}{lm}{ml.machine_memory=21474836480, xpack.installed=true, ml.max_open_jobs=20}, {ig-cluster-es-masternode-2}{85-4QFMIRyG4BGwYKELtNw}{M1u0PE58TzWKy291KGcMig}{10.38.0.1}{10.38.0.1:9300}{lm}{ml.machine_memory=21474836480, ml.max_open_jobs=20, xpack.installed=true}, {ig-cluster-es-masternode-0}{4PRMeSNYSUS2lPvdfL-LVg}{zOjwCE8LT0uJQsWG_rIByg}{10.41.64.1}{10.41.64.1:9300}{lm}{ml.machine_memory=21474836480, ml.max_open_jobs=20, xpack.installed=true}] which is not a quorum; discovery will continue using [127.0.0.1:9300, 127.0.0.1:9301, 127.0.0.1:9302, 127.0.0.1:9303, 127.0.0.1:9304, 127.0.0.1:9305, 10.38.0.1:9300, 10.41.64.1:9300] from hosts providers and [{ig-cluster-es-masternode-1}{xZ7cq3T5QlOb_FfvxHAqIg}{d8bAOYWLTuexRQQWpbQ8qg}{10.38.128.2}{10.38.128.2:9300}{lm}{ml.machine_memory=21474836480, xpack.installed=true, ml.max_open_jobs=20}] from last-known cluster state; node term 3, last-accepted version 34 in term 3" }

The text was updated successfully, but these errors were encountered:

We mark cluster states persisted on master-ineligible nodes as potentially-stale using the voting configuration `{STALE_STATE_CONFIG}` which prevents these nodes from being elected as master if they are restarted as master-eligible. Today we do not handle this special voting configuration differently in the `ClusterFormationFailureHandler`, leading to a mysterious message `an election requires a node with id [STALE_STATE_CONFIG]` if the election does not succeed. This commit adds a special case description for this situation to explain better why this node cannot win an election. Closes elastic#53734

elasticmachine · 2020-03-20T15:40:05Z

Pinging @elastic/es-distributed (:Distributed/Cluster Coordination)

DaveCTurner · 2020-03-20T15:44:01Z

This node wasn't master-eligible last time it joined the cluster, so its on-disk cluster state is stale. It needs to find a node that was master-eligible the last time it joined the cluster. I opened #53878 to improve the log message in this case -- I propose:

... an election requires one or more nodes that have already participated as master-eligible nodes in the cluster, but this node was not master-eligible the last time it joined the cluster ...

We mark cluster states persisted on master-ineligible nodes as potentially-stale using the voting configuration `{STALE_STATE_CONFIG}` which prevents these nodes from being elected as master if they are restarted as master-eligible. Today we do not handle this special voting configuration differently in the `ClusterFormationFailureHandler`, leading to a mysterious message `an election requires a node with id [STALE_STATE_CONFIG]` if the election does not succeed. This commit adds a special case description for this situation to explain better why this node cannot win an election. Closes #53734

DaveCTurner mentioned this issue Mar 20, 2020

Describe STALE_STATE_CONFIG in ClusterFormationFH #53878

Merged

DaveCTurner added :Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. >enhancement labels Mar 20, 2020

DaveCTurner self-assigned this Mar 20, 2020

DaveCTurner closed this as completed in #53878 Mar 20, 2020

codebrain mentioned this issue Apr 1, 2020

7.7.0 meta ticket (Part 2) elastic/elasticsearch-net#4533

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

an election requires a node with id [STALE_STATE_CONFIG] #53734

an election requires a node with id [STALE_STATE_CONFIG] #53734

admidelu commented Mar 18, 2020

elasticmachine commented Mar 20, 2020

DaveCTurner commented Mar 20, 2020

an election requires a node with id [STALE_STATE_CONFIG] #53734

an election requires a node with id [STALE_STATE_CONFIG] #53734

Comments

admidelu commented Mar 18, 2020

elasticmachine commented Mar 20, 2020

DaveCTurner commented Mar 20, 2020