Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix size of rolling-upgrade bootstrap config #38031

Conversation

DaveCTurner
Copy link
Contributor

@DaveCTurner DaveCTurner commented Jan 30, 2019

Zen2 nodes will bootstrap themselves once they believe there to be no remaining
Zen1 master-eligible nodes in the cluster, as long as minimum_master_nodes is
satisfied.

Today the bootstrap configuration comprises just the ids of the known
master-eligible nodes, and this might be too small to be safe. For instance, if
there are 5 master-eligible nodes (so that minimum_master_nodes is 3) then the
bootstrap configuration could comprise just 3 nodes, of which 2 form a quorum,
and this does not intersect other quorums that might arise, leading to a
split-brain.

This commit fixes this by expanding the bootstrap configuration so that its
quorums satisfy minimum_master_nodes, by adding some of the IDs of the other
master-eligible nodes in the last-published cluster state.

@DaveCTurner DaveCTurner added >bug v7.0.0 :Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. labels Jan 30, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

Today a Zen2 node will bootstrap itself once it believes there to be no
remaining Zen1 master-eligible nodes in the cluster as long as
minimum_master_nodes is satisfied.

However the bootstrap configuration comprises just the ids of the known
master-eligible nodes, and this might be too small to be safe. For instance, if
there are 5 master-eligible nodes (so that minimum_master_nodes is 3) then the
bootstrap configuration could comprise just 3 nodes, of which 2 form a quorum,
and this does not intersect other quorums that might arise, leading to a
split-brain.

This commit fixes this by expanding the bootstrap configuration so that its
quorums satisfy minimum_master_nodes, using known node IDs if possible and
placeholders if not.
@DaveCTurner DaveCTurner force-pushed the 2019-01-30-discovery-upgrade-placeholders branch from 97dfa43 to d93012e Compare January 30, 2019 13:11
@DaveCTurner DaveCTurner merged commit eadcb5f into elastic:master Jan 31, 2019
@DaveCTurner DaveCTurner deleted the 2019-01-30-discovery-upgrade-placeholders branch January 31, 2019 08:00
jasontedor added a commit to jasontedor/elasticsearch that referenced this pull request Jan 31, 2019
* master: (100 commits)
  Push primary term to replication tracker (elastic#38044)
  Introduce ability to minimize round-trips in CCS (elastic#37828)
  Don't Assert Ack on when Publish Timeout is 0 in Test (elastic#38077)
  Reduce object creation in Rounding class (elastic#38061)
  Treat put-mapping calls with `_doc` as a top-level key as typed calls. (elastic#38032)
  Fix test bug when testing the merging of mappings and templates. (elastic#38021)
  spelling: java script -- not JavaScript (elastic#37057)
  Enable SSL in reindex with security QA tests (elastic#37600)
  Disable BWC tests during backport (elastic#38074)
  SQL: Added SSL configuration options tests (elastic#37875)
  Minor fixes in the release notes script. (elastic#37967)
  Fix typo in docs. (elastic#38018)
  Update Lucene repo for 7.0.0-alpha2 (elastic#37985)
  Fix size of rolling-upgrade bootstrap config (elastic#38031)
  fix DateIndexNameProcessorTests offset pattern (elastic#38069)
  Speed up converting of temporal accessor to zoned date time (elastic#37915)
  Work around JDK8 timezone bug in tests (elastic#37968)
  Correct arg names when update mapping/settings from leader (elastic#38063)
  Introduce ssl settings to reindex from remote (elastic#37527)
  Mute testRetentionLeasesSyncOnExpiration
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. v7.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants