Skip to content
This repository has been archived by the owner on Apr 4, 2023. It is now read-only.

Elasticsearch master election during upgrade #344

Open
cehoffman opened this issue May 6, 2018 · 1 comment
Open

Elasticsearch master election during upgrade #344

cehoffman opened this issue May 6, 2018 · 1 comment

Comments

@cehoffman
Copy link

Is this a BUG REPORT or FEATURE REQUEST?:

/kind feature

What happened:
Upgrade of elasticsearch cluster resulted in multiple master elections.

What you expected to happen:
Only one master election is done at end of upgrade

How to reproduce it (as minimally and precisely as possible):

  1. Create a cluster at verison X with multiple master
  2. Cause last master in statefulset to become leader
  3. Update cluster to verison Y

Anything else we need to know?:
The controller-manager should delete all master pods except the current leader when doing an upgrade. The current leader should be the last pod deleted and updated.

Environment:

  • Kubernetes version (use kubectl version):
  • Cloud provider or hardware configuration**:
  • Install tools:
  • Others:
@munnerz
Copy link
Contributor

munnerz commented May 8, 2018

This is currently fairly difficult for us to do, as we rely upon StatefulSet for the upgrade functionality under the hood, and use the RollingUpdate strategy.

If we switch to OnDelete we will then lose the 'partition' functionality which we currently rely upon to ensure updates to nodes in a cluster aren't triggered early if their pods are deleted. If we switch, when a k8s node fails in the cluster during an upgrade, any pods running on that node will be immediately upgraded next time they start (potentially breaking delicate upgrade procedures).

Therefore, the only way we can do this is to implement our own alternative to StatefulSet, which chose which replica to update based on some database specific predicate.

There has already been discussion over on the Elastic GitHub and forums about triggering manual re-elections in order to make this process more graceful as a stop-gap: elastic/elasticsearch#17493.

Their line seems to be "it shouldn't take that long to re-elect" - but as you say, it'd be nice if we can minimise interruptions. It might be possible to achieve this with a custom discovery plugin, but right now we use the in-built SRV record discovery mechanism, so this would be a new component entirely.

@wallrj wallrj modified the milestone: v0.2 May 15, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants