-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allocation of new shards on remote store nodes #145
base: mixed-mode
Are you sure you want to change the base?
Conversation
Signed-off-by: Gaurav Bafna <gbbafna@amazon.com>
Signed-off-by: Gaurav Bafna <gbbafna@amazon.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall. Please add the extra set of assertions on the new ITs
if (direction.equals(Direction.REMOTE_STORE)) { | ||
if (!primaryShardNode.isRemoteStoreNode() && targetNode.isRemoteStoreNode()) { | ||
return allocation.decision(Decision.NO, NAME, | ||
"can not allocate replica shard on a remote node when primary shard is not already active on some remote node"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Can we change the message to something like below:
cannot allocate replica shard copy on a remote node since primary shard copy is not yet migrated to remote
logger.info(" --> verify non-allocation of primary shard"); | ||
RoutingTable routingTable = client.admin().cluster().prepareState().execute().actionGet().getState().getRoutingTable(); | ||
ShardRouting primaryShardRouting = routingTable.index(TEST_INDEX).shard(0).primaryShard(); | ||
assertFalse(primaryShardRouting.active()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also add some extra set of assertions for this test:
- Cluster health should be RED (since primary shard is not allocated yet)
- Run the
_cluster/allocation/explain
API and assert on the reason for shard allocation. The reason should match the String we publish from the decider on aNO
decision
logger.info(" --> verify non-allocation of replica shard"); | ||
routingTable = client.admin().cluster().prepareState().execute().actionGet().getState().getRoutingTable(); | ||
replicaShardRouting = routingTable.index(TEST_INDEX).shard(0).replicaShards().get(0); | ||
assertFalse(replicaShardRouting.active()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above. The assertion should be on the reason for UNASSIGNED shard and the cluster state should be yellow.
logger.info(" --> verify allocation of replica shard"); | ||
routingTable = client.admin().cluster().prepareState().execute().actionGet().getState().getRoutingTable(); | ||
replicaShardRouting = routingTable.index(TEST_INDEX).shard(0).replicaShards().get(0); | ||
assertTrue(replicaShardRouting.active()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add the same assertions here also, as mentioned above
clusterSettings.addSettingsUpdateConsumer(RemoteStoreNodeService.DIRECTION_SETTING, this::setDirection); | ||
} | ||
|
||
private void setDirection (Direction direction) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we rename Direction
better to maybe MigrationRoute
something that is more explicit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack.
* | ||
* @opensearch.internal | ||
*/ | ||
public class RemoteStoreAllocationDecider extends AllocationDecider { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to have version checks as well to ensure remote store nodes are not on a higher version than the rest of the existing nodes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1. Lets name it RemoteStoreMigrationAllocationDecider
.
55a8ef9
to
7161727
Compare
* | ||
* @opensearch.internal | ||
*/ | ||
public class RemoteStoreAllocationDecider extends AllocationDecider { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1. Lets name it RemoteStoreMigrationAllocationDecider
.
.filter(nd -> nd.nodeId().equals(primaryShardRouting.currentNodeId())) | ||
.findFirst().get().node(); | ||
|
||
if (direction.equals(Direction.REMOTE_STORE)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets handle the other direction : DOCREP
as well .
Description
[Describe what this change achieves]
Related Issues
Resolves #[Issue number to be closed when this PR is merged]
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.