Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Force change cluster's elected master node #17493

Closed
redserpent7 opened this issue Apr 3, 2016 · 17 comments
Closed

Force change cluster's elected master node #17493

redserpent7 opened this issue Apr 3, 2016 · 17 comments
Labels
discuss :Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one.

Comments

@redserpent7
Copy link

Hi,

Today I've been trying to increase the storage of my elasticsearch nodes. The nodes are hosted on AWS EC2 each with an attached EBS of 10 GB. I was trying to increase the EBS size to 20GB for each node and it all went fine until I went and restarted the cluster master.

It took about 30 seconds for the cluster to elect a new master and during that time all requests failed and all the other node gave me the 503 error when I tried to check their status.

I am wondering if there is a way to change the cluster master to a specific node instantly without having to wait for the nodes to elect a new master.

For example, lets say my cluster has three nodes:

Node1 (Cluster Master)
Node2
Node3

What I would like to do is change the size of the drive for lets say Node2, then once that node joins the cluster and all shards get reallocated, I force elect it as the cluster master, then I can safely change the configuration for the other two nodes.

Is this possible in ES? If not, how can I go about reducing the time it takes for the nodes to elect a new master?

@clintongormley
Copy link

You don't mention what version of Elasticsearch you're using. @bleskes why would it take 30s to elect a master?

@redserpent7
Copy link
Author

@clintongormley I am running 1.7.2, not sure why it took 30 secs the first time. I tried it some time later and it did not take that long to elect.

BTW, my PING timeout is set to 5s and the PING retry is set to the default, (not specified in yml)

@bleskes
Copy link
Contributor

bleskes commented Apr 5, 2016

It should take 3s for a clean restart. It might take longer if network is slow or nodes are so overloaded that they fail to process the master loss quickly.

The only way to remove a master from it's position is to restart it. In theory it is possible to implement a clean mastership transfer but it's very tricky and there things we should do first. For now, I will close the issue.

@redserpent7 - if you keep running into a 30s master election please open up an issue with the revelent details (logs, timing + the output of _cat/master on all the nodes. this is a handy program for that)

@bleskes bleskes closed this as completed Apr 5, 2016
@redserpent7
Copy link
Author

@bleskes I did try restarting the master node several times and it did not take a long time for the other nodes to re-elect a new master, probably it had something to do with AWS EC2 at the time of my initial restart.

Its a non issue for me really as the cluster in question is my testing environment while the production environment nodes have a big enough drives attached to them that should last long which makes the increase very infrequent.

I would like though if you can consider implementing the clean mastership transfer in a future version.

@bleskes
Copy link
Contributor

bleskes commented Apr 5, 2016

Thank you for letting us know.

I would like though if you can consider implementing the clean mastership transfer in a future version.

I agree. We just have bigger fish to fry first.

@munnerz
Copy link

munnerz commented Mar 22, 2017

Hey @bleskes - I'm currently working on some automation for running Elasticsearch in a clustered fashion on top of Kubernetes, and would love to be able to manually trigger a master re-election (or alternatively, disallow the current master from being master, similar to setting cluster.routing.allocation.exclude. Right now upon a scale down event involving the master node, the cluster can turn red for up to 30s (thus serving no requests).

Are there any plans to implement this yet? Is it something you'd consider adding still?

(FYI, I am using ES 5.2.2 here)

@bleskes
Copy link
Contributor

bleskes commented Mar 22, 2017

@munnerz can you open a topic on discuss.elastic.co and link it here? we can continue talking there. I have a feeling this will become a discussion ;)

@munnerz
Copy link

munnerz commented Mar 22, 2017

@clintongormley clintongormley added :Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. and removed :Cluster labels Feb 13, 2018
@devoncrouse
Copy link

Now that the issue is closed, and the discussion is archived, and I'm curious if this is being tracked anywhere. Running on ephemeral infrastructure in AWS/OCI, I can completely reprovision new data and clients nodes without users noticing, but when I get to the last (active) master, I must still endure a stressful ~30 seconds of cluster unavailability. Just as with shard allocation settings, I'd like to be able to exclude one or more masters, have the cluster complete any queued operations against the active, and then gracefully elect a new master without completely rejecting operations. Any thoughts? Seems like a big fish.

@bleskes
Copy link
Contributor

bleskes commented Feb 15, 2018

@devoncrouse during master re-election the cluster is available for search and indexing waits until a new master is reelected. This should take 3 seconds plus a little overheard. If it takes 30 seconds something else is not going right.

@DaveCTurner
Copy link
Contributor

@devoncrouse how do you shut down the elected master? Do you terminate the Elasticsearch process with a signal, or do you simply pull the plug on the machine?

I ask this because if you simply pull the plug then the established connections to the master are not actively dropped, so it looks like a networking blip, and Elasticsearch waits for the network to be restored for a while before starting a new election. If you terminate Elasticsearch first then the connections are actively dropped and a master should be elected more quickly.

@devoncrouse
Copy link

Aha, that's probably my issue; I assumed it would have been getting signaled on machine shutdown, but I see now what's happening. Thanks for the reply.

@michalm86
Copy link

Hi, I am running 3-node cluster using docker-compose and I experience ~30s (or even 45s) unavailability during re-election (to trigger re-election I run: 'docker stop master_node').
I reported this here: https://discuss.elastic.co/t/timed-out-waiting-for-all-nodes-to-process-published-state-and-cluster-unavailability/138590

Could anyone please take a look?

@michalm86
Copy link

Thank you @DaveCTurner

@luyuncheng
Copy link
Contributor

if there is a master leave but cluster settings not changed, what if changed the cluster block level into METADATA_WRITE?
I wonder whether a data node can continuously write when only change the leader without metadata changed.

@DaveCTurner
Copy link
Contributor

@luyuncheng this issue was closed over 3 years ago, so this isn't a good place to ask a question like yours. I see you've asked the same question on another closed issue too. I recommend not doing this. If you would like to discuss your question, please open a thread on the discussion forum instead.

@luyuncheng
Copy link
Contributor

@DaveCTurner Sorry about this, I opened a new thread on the discussion forum: Link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss :Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one.
Projects
None yet
Development

No branches or pull requests

8 participants