Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

an election requires a node with id [STALE_STATE_CONFIG] #53734

Closed
admidelu opened this issue Mar 18, 2020 · 2 comments · Fixed by #53878
Closed

an election requires a node with id [STALE_STATE_CONFIG] #53734

admidelu opened this issue Mar 18, 2020 · 2 comments · Fixed by #53878
Assignees
Labels
:Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. >enhancement

Comments

@admidelu
Copy link

Elasticsearch version (bin/elasticsearch --version):
7.6.1 ELK on k8s

Description of the problem including expected versus actual behavior:

master not discovered or elected yet, an election requires a node with id [STALE_STATE_CONFIG]

Steps to reproduce:
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
name: ig-cluster
spec:
version: 7.6.1
nodeSets:
- name: masternode
count: 3
config:
node.master: true
node.data: false
node.ingest: false
cluster.remote.connect: true
volumeClaimTemplates:
- metadata:
name: elasticsearch-data
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 500Gi
storageClassName: local-storage
podTemplate:
spec:
nodeSelector:
server: datababe
containers:
- name: elasticsearch
resources:
limits:
memory: 20Gi
cpu: 4
requests:
memory: 20Gi
cpu: 1
env:
- name: ES_JAVA_OPTS
value: -Xms10g -Xmx10g
- name: datanode
count: 9
config:
node.master: false
node.data: true
node.ingest: true
cluster.remote.connect: true
volumeClaimTemplates:
- metadata:
name: elasticsearch-data
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 750Gi
storageClassName: local-storage
podTemplate:
spec:
nodeSelector:
server: datababe
containers:
- name: elasticsearch
resources:
limits:
memory: 28Gi
cpu: 5
requests:
memory: 28Gi
cpu: 1
env:
- name: ES_JAVA_OPTS
value: -Xms14g -Xmx14g

Provide logs (if relevant):
{"type": "server", "timestamp": "2020-03-18T14:54:03,323Z", "level": "WARN", "component": "o.e.c.c.ClusterFormationFailureHelper", "cluster.name": "ig-cluster", "node.name": "ig-cluster-es-masternode-1", "message": "master not discovered or elected yet, an election requires a node with id [STALE_STATE_CONFIG], have discovered [{ig-cluster-es-masternode-1}{xZ7cq3T5QlOb_FfvxHAqIg}{d8bAOYWLTuexRQQWpbQ8qg}{10.38.128.2}{10.38.128.2:9300}{lm}{ml.machine_memory=21474836480, xpack.installed=true, ml.max_open_jobs=20}, {ig-cluster-es-masternode-2}{85-4QFMIRyG4BGwYKELtNw}{M1u0PE58TzWKy291KGcMig}{10.38.0.1}{10.38.0.1:9300}{lm}{ml.machine_memory=21474836480, ml.max_open_jobs=20, xpack.installed=true}, {ig-cluster-es-masternode-0}{4PRMeSNYSUS2lPvdfL-LVg}{zOjwCE8LT0uJQsWG_rIByg}{10.41.64.1}{10.41.64.1:9300}{lm}{ml.machine_memory=21474836480, ml.max_open_jobs=20, xpack.installed=true}] which is not a quorum; discovery will continue using [127.0.0.1:9300, 127.0.0.1:9301, 127.0.0.1:9302, 127.0.0.1:9303, 127.0.0.1:9304, 127.0.0.1:9305, 10.38.0.1:9300, 10.41.64.1:9300] from hosts providers and [{ig-cluster-es-masternode-1}{xZ7cq3T5QlOb_FfvxHAqIg}{d8bAOYWLTuexRQQWpbQ8qg}{10.38.128.2}{10.38.128.2:9300}{lm}{ml.machine_memory=21474836480, xpack.installed=true, ml.max_open_jobs=20}] from last-known cluster state; node term 3, last-accepted version 34 in term 3" }

DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this issue Mar 20, 2020
We mark cluster states persisted on master-ineligible nodes as
potentially-stale using the voting configuration `{STALE_STATE_CONFIG}` which
prevents these nodes from being elected as master if they are restarted as
master-eligible. Today we do not handle this special voting configuration
differently in the `ClusterFormationFailureHandler`, leading to a mysterious
message `an election requires a node with id [STALE_STATE_CONFIG]` if the
election does not succeed.

This commit adds a special case description for this situation to explain
better why this node cannot win an election.

Closes elastic#53734
@DaveCTurner DaveCTurner added :Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. >enhancement labels Mar 20, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (:Distributed/Cluster Coordination)

@DaveCTurner DaveCTurner self-assigned this Mar 20, 2020
@DaveCTurner
Copy link
Contributor

This node wasn't master-eligible last time it joined the cluster, so its on-disk cluster state is stale. It needs to find a node that was master-eligible the last time it joined the cluster. I opened #53878 to improve the log message in this case -- I propose:

... an election requires one or more nodes that have already participated as master-eligible nodes in the cluster, but this node was not master-eligible the last time it joined the cluster ...

DaveCTurner added a commit that referenced this issue Mar 20, 2020
We mark cluster states persisted on master-ineligible nodes as
potentially-stale using the voting configuration `{STALE_STATE_CONFIG}` which
prevents these nodes from being elected as master if they are restarted as
master-eligible. Today we do not handle this special voting configuration
differently in the `ClusterFormationFailureHandler`, leading to a mysterious
message `an election requires a node with id [STALE_STATE_CONFIG]` if the
election does not succeed.

This commit adds a special case description for this situation to explain
better why this node cannot win an election.

Closes #53734
DaveCTurner added a commit that referenced this issue Mar 20, 2020
We mark cluster states persisted on master-ineligible nodes as
potentially-stale using the voting configuration `{STALE_STATE_CONFIG}` which
prevents these nodes from being elected as master if they are restarted as
master-eligible. Today we do not handle this special voting configuration
differently in the `ClusterFormationFailureHandler`, leading to a mysterious
message `an election requires a node with id [STALE_STATE_CONFIG]` if the
election does not succeed.

This commit adds a special case description for this situation to explain
better why this node cannot win an election.

Closes #53734
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. >enhancement
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants