-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] _cluster/stats API returning incorrect cluster_manager count #6103
Comments
Bug is due to the fact it is iterating over all the roles (Master & Cluster Manager are 2 roles associated with nodes) and it will end up counting twice.
@sandeshkr419 do you want to pick up the fix? |
I assigned the issue to you @sandeshkr419. Thanks in advance for the help! If for some reason you can't get to this then please provide an update here. |
Ouch. If someone wants to start with a failing unit test that'd be awesome. |
I started working on this and noticed this difference in 'master' role in single node vs dedicated cluster-manager setup: Single node / localhost:
Dedicated cluster-manager cluster:
Quick question: Any reason, why is the deprecated |
Making this change actually counts correctly based on roles. So for the above output of Single node setup (
Multi-node setup (master count is 3 because
So making change to |
It sounds reasonable. Consider whether there's a breaking change here or just a bug fix. Write a bunch of specs that describe the various scenarios, and let's see if any existing ones break. |
@tlfeng Can you provide any insight into why the code behaves this way? |
I will carefully look into the problem and provide insight. I'm surprised to see the REST API response is different for single node and multi-node cluster. The current test may only cover the situation of single-node cluster. |
Looking into @tlfeng changes: #2424, it seems like the original idea was to replace the "master" role to "cluster_manager" role just and not have both the roles assigned to node. Present output for
Looks like bug is with the roles attached to cluster-manager nodes in dedicated setup. I'll find out the code fixes for this - how to have only |
This is where we are adding deprecated master role: https://github.com/opensearch-project/OpenSearch/blob/main/server/src/main/java/org/opensearch/node/Node.java#L454 I have verified the changes by removing this line and checking the output of Let me verify any other impact of this change before raising the PR. |
@sandeshkr419 Please raise the [DRAFT] PR, so it is easy to comment and track. |
Raised a draft PR to get early feedback on changes. Getting correct responses for The test fails now as node-roles are not updated when cluster is initialized with deprecated
I propose we remove this test as we have already deprecated the Although the node role count and cluster stats still feature a count for @shwetathareja @andrross @dblock @tlfeng Seeking comments on this! |
Not part of this issue, I'd open a separate issue to deprecate the |
The problem with removing
Also plugins accessing in-memory node.roles will not get master as the role. We need to keep both around but not double count. |
@shwetathareja Master role doesn't shows up in non-dedicated setup also before my changes although it comes only in dedicated-setup, so ideally it should not be a breaking change. Although to avoid any breaking, we can just fix the count in The concern with that test is that starts up a domain with deprecated
If we remove setDeprecatedMasterRole() - then above settings fail to initialize nodes with
Shouldn't we should start the cluster with
Node role counts still gets master roles correctly in this case. |
thanks @sandeshkr419 . Understood your concern. Looks like we need to be on the same first with @tlfeng what is the expected backward compatibility we are aiming for here. IMHO we shouldnt break any API behavior in 2.x |
@shwetathareja Thanks for your opinion! Backwards compatibility is a critical aspect when making API changes. According to the description in the PR. by default,
Although About the Cluster Stats API,
Looks like the API response for a "dedicated cluster_manager" cluster is incorrect, and In addition, setDeprecatedMasterRole() used to load |
@shwetathareja @tlfeng - Are we on the same page that the non-dedicated setup should also have Presently, non-dedicated setup only have |
@sandeshkr419 After reading the description in my PR 2424 which introduced the |
I'm concerned on consistency in behavior with a dedicated and non-dedicated setup. Either we should have Role count logic can be corrected accordingly once we reach a consensus on what node roles should be assigned in dedicated and non-dedicated setup. |
I agree with this. Here is the behavior I would expect:
This behavior shouldn't change if these nodes have other roles assigned to them. @tlfeng What do you think? It looks like there is a bug with double-counting in the stats API, right? |
@andrross Thank you for your opinion! The responses of "Node Info" API and "Cluster stats" API that you expected is the same with my intention in my PR #2424 (introduce cluster_manager role). But "Case 2" where assigning both |
@tlfeng Indeed!
So the expected behavior is:
Do we just need to fix the double-counting happening today in "Case 2" then? |
@andrross Yes, the double-counting needs to be fixed. A cause for the problem seems is |
@andrross Both roles Becaus, in @tlfeng PR: #2424 it was already decided that But if we remove this deprecated master role, then as @shwetathareja pointed out, we may not be backward compatible. |
@sandeshkr419 I think we don't want both roles |
No we definitely do not want both The way compatibility is handled is based off of the node configuration. If the users specifies |
+1 to @andrross suggestion where API responds master/ cluster_manager depending on user configuration. |
Narrowed down the issue. This issue does not occurs when I have modified the fix where in I remove I have added test cases for better understanding of scenarios - Asserting both the While I'm adding more test cases seeking early comments on draft code changes. @shwetathareja @andrross @tlfeng Also, in response to @andrross comments:
Whatever
Since the changes were in getting roles from |
Modified the changes as per @andrross suggestions - legacy Any concerns @shwetathareja @tlfeng or difference in opinions? |
@andrross @shwetathareja @tlfeng Gentle reminder to review the PR and let me know any additional steps that are required for merging? |
Describe the bug
_cluster/stats
API returns wrong count of nodes withcluster_manager
role.To Reproduce
Steps to reproduce the behavior:
cluster_manager
role._cat/nodes
- which should show correct roles of each nodes._cluster/stats
Expected behavior
Count of
cluster_manager
andmaster
should be 3 in above case.Plugins
None
Screenshots
None
Host/Environment (please complete the following information):
Additional context
The above response was correct till OS 1.3.x
The text was updated successfully, but these errors were encountered: