Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add information when master node left to DiscoveryNodes' shortSummary() #28197

Merged
merged 4 commits into from
Jan 22, 2018

Conversation

tlrx
Copy link
Member

@tlrx tlrx commented Jan 12, 2018

This commit changes DiscoveryNodes.Delta.shortSummary() in order to
add information to the summary when the master node left.

Related to #28169

@tlrx tlrx added >enhancement review :Distributed/Discovery-Plugins Anything related to our integration plugins with EC2, GCP and Azure v7.0.0 v6.2.0 labels Jan 12, 2018
@tlrx tlrx requested a review from ywelsch January 12, 2018 14:10
Copy link
Contributor

@bleskes bleskes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some questions / suggestions. let me know what you think.

@@ -448,7 +452,7 @@ public boolean hasChanges() {
}

public boolean masterNodeChanged() {
return newMasterNode != null;
return newMasterNode != previousMasterNode;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should use ephimeral ids, not object identity

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, it's a bit scary, sorry.

@@ -400,10 +400,14 @@ public Delta delta(DiscoveryNodes other) {
DiscoveryNode previousMasterNode = null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not just set them both and figure things out in the delta class?

if (newMasterNode.getId().equals(localNodeId)) {
// we are the master, no nodes we removed, we are actually the first master
sb.append("new_master ").append(newMasterNode());
if (newMasterNode() != null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need all this string crafting? can we just check for change and say "its was x, now it's y"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would simplify the summary, indeed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, there is some effort involved with string crafting. But I think it helps the end user understand what happened, for example,

  • "stepped down as master" (if localNode == previousMaster && newMaster == null)
  • "became master" (if localNode == newMaster)
  • "lost master" (if localNode != previousMaster && newMaster == null)
  • "detected master" (newMaster != null && newMaster != localNode)
  • ...?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm good if you guys think it's worth it. My only concern is that people will start thinking stuff that are necessarily true - (or might be not true in the future) - like are we guaranteed to always have an even with a master stepped down? Maybe we have a master swap, or a restart and the intermediate cluster state was lost in batching?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We talked with @ywelsch and we agreed on logging a simple message with previous/current master values for now, as crafting more detailed messages could lead the user to expect a specific message while the related event could be lost because of cluster state batching etc.

For now it will log messages like:

[node_tm0] master node changed {previous [], current [{node_tm1}{...}
[node_tm2] master node changed {previous [{node_tm2}{...}], current []}, reason: not enough master nodes

but we may revisit this in the future.

Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's also look at the DiscoveryNodes.delta(...), which has a pretty unconventional definition of previousMasterNode and newMasterNode.

@tlrx
Copy link
Member Author

tlrx commented Jan 15, 2018

Let's also look at the DiscoveryNodes.delta(...), which has a pretty unconventional definition of previousMasterNode and newMasterNode.

I guess this is because the Delta reports null values for previous and master nodes when it didn't change.

I changed and simplified things, please let me know if that's too much. Thanks

Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left two more suggestions

return newMasterNode != null;
String newMasterId = (newMasterNode != null) ? newMasterNode.getEphemeralId() : null;
String previousMasterId = (previousMasterNode != null) ? previousMasterNode.getEphemeralId() : null;
return Objects.equals(newMasterId, previousMasterId) == false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of this is equivalent to the simpler

return Objects.equals(newMasterNode, previousMasterNode) == false;

(see equals implementation of DiscoveryNode)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, thanks

DiscoveryNode previousMasterNode = null;
if (other.masterNodeId != null) {
previousMasterNode = other.getMasterNode();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getMasterNode() already returns null if masterNodeId is null.
Maybe cleaner to make that explicit in the getMasterNode() method (and not rely on the map implementation to do that for us) and also use @Nullable on the return type. The effect is that we won't need these conditions here and can just write return new Delta(other.getMasterNode(), getMasterNode(), ...

Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @tlrx

This commit changes `DiscoveryNodes.Delta.shortSummary()` in order to
add information to the summary when the master node left.
@tlrx tlrx merged commit 119b1b5 into elastic:master Jan 22, 2018
jasontedor added a commit that referenced this pull request Jan 22, 2018
* master:
  Trim down usages of `ShardOperationFailedException` interface (#28312)
  Do not return all indices if a specific alias is requested via get aliases api.
  [Test] Lower bwc version for rank-eval rest tests
  CountedBitSet doesn't need to extend BitSet. (#28239)
  Calculate sum in Kahan summation algorithm in aggregations (#27807) (#27848)
  Remove the `update_all_types` option. (#28288)
  Add information when master node left to DiscoveryNodes' shortSummary() (#28197)
  Provide explanation of dangling indices, fixes #26008 (#26999)
jasontedor added a commit to matarrese/elasticsearch that referenced this pull request Jan 24, 2018
* master: (94 commits)
  Completely remove Painless Type from AnalyzerCaster in favor of Java Class. (elastic#28329)
  Fix spelling error
  Reindex: Wait for deletion in test
  Reindex: log more on rare test failure
  Ensure we protect Collections obtained from scripts from self-referencing (elastic#28335)
  [Docs] Fix asciidoc style in composite agg docs
  Adds the ability to specify a format on composite date_histogram source (elastic#28310)
  Provide a better error message for the case when all shards failed (elastic#28333)
  [Test] Re-Add integer_range and date_range field types for query builder tests (elastic#28171)
  Added Put Mapping API to high-level Rest client (elastic#27869)
  Revert change that does not return all indices if a specific alias is requested via get alias api. (elastic#28294)
  Painless: Replace Painless Type with Java Class during Casts (elastic#27847)
  Notify affixMap settings when any under the registered prefix matches (elastic#28317)
  Trim down usages of `ShardOperationFailedException` interface (elastic#28312)
  Do not return all indices if a specific alias is requested via get aliases api.
  [Test] Lower bwc version for rank-eval rest tests
  CountedBitSet doesn't need to extend BitSet. (elastic#28239)
  Calculate sum in Kahan summation algorithm in aggregations (elastic#27807) (elastic#27848)
  Remove the `update_all_types` option. (elastic#28288)
  Add information when master node left to DiscoveryNodes' shortSummary() (elastic#28197)
  ...
tlrx added a commit to tlrx/elasticsearch that referenced this pull request Jan 24, 2018
The DiscoveryNodes.Delta was changed in elastic#28197. Previous/Master nodes
are now always set in the `Delta` (before the change they were set only
if the master changed) and the `masterChanged()` method is now based on
object equality and nodes ephemeral ids (before the change it was based
on nodes id).

This commit adapts the DiscoveryNodesTests.testDeltas() to reflect the
changes.
tlrx added a commit that referenced this pull request Jan 25, 2018
The DiscoveryNodes.Delta was changed in #28197. Previous/Master nodes
are now always set in the `Delta` (before the change they were set only
if the master changed) and the `masterChanged()` method is now based on
object equality and nodes ephemeral ids (before the change it was based
on nodes id).

This commit adapts the DiscoveryNodesTests.testDeltas() to reflect the
changes.
@tlrx tlrx added the v6.3.0 label Jan 25, 2018
@tlrx
Copy link
Member Author

tlrx commented Feb 21, 2018

I don't think it worth backporting this to 6.2.3

@jimczi jimczi added v7.0.0-beta1 and removed v7.0.0 labels Feb 7, 2019
@tlrx tlrx deleted the log-master-left branch July 1, 2019 13:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/Discovery-Plugins Anything related to our integration plugins with EC2, GCP and Azure >enhancement v6.3.0 v7.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants