Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Platform] Allow promoting a t-server only node to master+t-server #5831

Closed
streddy-yb opened this issue Sep 25, 2020 · 2 comments
Closed

[Platform] Allow promoting a t-server only node to master+t-server #5831

streddy-yb opened this issue Sep 25, 2020 · 2 comments
Assignees
Labels
area/platform Yugabyte Platform kind/enhancement This is an enhancement of an existing feature
Milestone

Comments

@streddy-yb
Copy link
Contributor

If a master node goes down permanently in an AZ, allow promoting an existing t-server only node in that AZ to become master.

@streddy-yb streddy-yb added kind/enhancement This is an enhancement of an existing feature area/platform Yugabyte Platform labels Sep 25, 2020
@streddy-yb streddy-yb added this to the v2.3 milestone Sep 25, 2020
@chirag-yb chirag-yb modified the milestones: v2.3, 2.2.4 Oct 9, 2020
SergeyPotachev added a commit that referenced this issue Oct 16, 2020
Summary:
  - New action implemented;
  - Fixed issue with Uilt.needMasterQuorumRestore and Util.areMastersUnderReplicated;
  - Written additional Junit tests for Util.java related to Uilt.needMasterQuorumRestore and Util.areMastersUnderReplicated;
  - Adjusted behaviour of NodeManager.getConfigureSubCommand - when UniverseDefinitionTaskBase.createConfigureServerTasks is called with isShell = true and updateMasterAddrs = true then the configuration file will be updated with the //**empty master_addresses**// setting (without installing software/packages). [this change is common with D9629]

Additional information:
  If count of running masters is less than the replication factor (RF), some of other nodes (with t-server running only) could be promoted to be a new master. Not all such nodes could be promoted - it depends on RF and AZ counts, and on a distribution of currently working masters.

If the node is available to be promoted from t-server to master+t-server, in the list of actions for the node the new action "Start Master" appears:

{F14208}

Test Plan:
Common scenario:
  1. Create universe with RF = 3, 3 AZs and 6 nodes.
  2. Check that all tserver-only nodes don't have action 'Start Master'.
  3. Stop a node with a master.
  4. Check that tserver-only node from the same zone has action 'Start Master'.
  5. Do 'Start master' for the same node (from step 4).
  6. Check that the master process is up and running well.
  7. Repeat steps 1-6 for different nodes.

- Check the scenario for different combinations of RF/AZ/Nodes count (as example, with RF=3, 2 AZs and 6 nodes; with RF=1, 1 AZ, 3 nodes etc.). Check the action 'Start Master' appearance for different nodes.
- Check the scenario for different types of providers.

Reviewers: daniel, sshevchenko, sanketh

Reviewed By: sanketh

Subscribers: jenkins-bot, yugaware

Differential Revision: https://phabricator.dev.yugabyte.com/D9614
@streddy-yb
Copy link
Contributor Author

@SergeyPotachev - Can you back port this to 2.2?

@SergeyPotachev
Copy link
Contributor

SergeyPotachev commented Oct 19, 2020

@streddy-yb Yes.

SergeyPotachev added a commit that referenced this issue Oct 24, 2020
…de to master+t-server

Summary:
  - New action implemented;
  - Fixed issue with Uilt.needMasterQuorumRestore and Util.areMastersUnderReplicated;
  - Written additional Junit tests for Util.java related to Uilt.needMasterQuorumRestore and
Util.areMastersUnderReplicated;
  - Adjusted behaviour of NodeManager.getConfigureSubCommand - when
UniverseDefinitionTaskBase.createConfigureServerTasks is called with
isShell = true and updateMasterAddrs = true then the configuration file
will be updated with the //**empty master_addresses**// setting (without
installing software/packages). [this change is common with D9629];
  - Additional fix to not show 'Start Master' on nodes of Read Replica cluster.

Additional information:
  If count of running masters is less than the replication factor (RF),
some of other nodes (with t-server running only) could be promoted to be
a new master. Not all such nodes could be promoted - it depends on RF
and AZ counts, and on a distribution of currently working masters.

If the node is available to be promoted from t-server to master+t-server,
in the list of actions for the node the new action "Start Master" appears:

{F14208}

Test Plan:
Jenkins: rebase: 2.2

Common scenario:
  1. Create universe with RF = 3, 3 AZs and 6 nodes.
  2. Check that all tserver-only nodes don't have action 'Start Master'.
  3. Stop a node with a master.
  4. Check that tserver-only node from the same zone has action 'Start Master'.
  5. Do 'Start master' for the same node (from step 4).
  6. Check that the master process is up and running well.
  7. Repeat steps 1-6 for different nodes.

- Check the scenario for different combinations of RF/AZ/Nodes count (as
example, with RF=3, 2 AZs and 6 nodes; with RF=1, 1 AZ, 3 nodes etc.).
Check the action 'Start Master' appearance for different nodes.
- Check the scenario for different types of providers.
- Check that t-server nodes from Read Replica cluster don't have action 'Start Master'
when a count of masters is less than RF.

Reviewers: sanketh

Reviewed By: sanketh

Subscribers: jenkins-bot, yugaware

Differential Revision: https://phabricator.dev.yugabyte.com/D9719
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/platform Yugabyte Platform kind/enhancement This is an enhancement of an existing feature
Projects
None yet
Development

No branches or pull requests

3 participants