-
Notifications
You must be signed in to change notification settings - Fork 509
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HDDS-11083. Avoid duplicate creation of RunningDatanodeState #6886
Conversation
ci : https://github.com/jianghuazhu/ozone/actions/runs/9717049941 It seems they all failed for the same reason. @adoroszlai @hemantk-12 @errose28 , can you help review this PR. |
1 similar comment
ci : https://github.com/jianghuazhu/ozone/actions/runs/9717049941 It seems they all failed for the same reason. @adoroszlai @hemantk-12 @errose28 , can you help review this PR. |
These tests are not encountering |
750d650
to
ac48935
Compare
Thanks @adoroszlai for the comments and review. |
30126e0
to
61ad24c
Compare
ci:https://github.com/jianghuazhu/ozone/actions/runs/9974167102 |
/** | ||
* Clean up some resources. | ||
*/ | ||
void clear(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add default no-op implementation.
@Override | ||
public void clear() { | ||
|
||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No change needed in this file after adding default no-op implementation.
@Override | ||
public void clear() { | ||
|
||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No change needed in this file after adding default no-op implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have updated it.
Do you think there are other places in this PR that need to be improved? If so, I will work hard to improve it. @adoroszlai @hemantk-12 |
@jianghuazhu Thanks for working on this. Sorry, I have no time to review it, please ask others. |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jianghuazhu , thanks for working on this! Please see the comment/question inlined.
@@ -655,41 +659,45 @@ public void execute(ExecutorService service, long time, TimeUnit unit) | |||
// we called stop DatanodeStateMachine, this sets state to SHUTDOWN, and | |||
// there is a chance of getting task as null. | |||
if (task != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's change the if-statement to return instead of add a new indentation.
@@ -654,7 +658,11 @@ public void execute(ExecutorService service, long time, TimeUnit unit)
// Adding not null check, in a case where datanode is still starting up, but
// we called stop DatanodeStateMachine, this sets state to SHUTDOWN, and
// there is a chance of getting task as null.
- if (task != null) {
+ if (task == null) {
+ return;
+ }
+
+ try {
if (this.isEntering()) {
task.onEnter();
}
@@ -691,6 +699,8 @@ public void execute(ExecutorService service, long time, TimeUnit unit)
// that we can terminate the datanode.
setShutdownOnError();
}
+ } finally {
+ task.clear();
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll update later.
Callable<EndPointStates> endpointTask = null; | ||
for (EndpointStateMachine endpointStateMachine : connectionManager.getValues()) { | ||
if (endpointStateMachine.getAddressString().equals(endpoint.getAddressString())) { | ||
if (endpoint.getState().getValue() == EndPointStates.GETVERSION.getValue()) { | ||
endpointTask = new VersionEndpointTask(endpoint, conf, | ||
context.getParent().getContainer()); | ||
} else if (endpoint.getState().getValue() == EndPointStates.REGISTER.getValue()) { | ||
endpointTask = RegisterEndpointTask.newBuilder() | ||
.setConfig(conf) | ||
.setEndpointStateMachine(endpoint) | ||
.setContext(context) | ||
.setDatanodeDetails(context.getParent().getDatanodeDetails()) | ||
.setOzoneContainer(context.getParent().getContainer()) | ||
.build(); | ||
} else if (endpoint.getState().getValue() == EndPointStates.HEARTBEAT.getValue()) { | ||
endpointTask = HeartbeatEndpointTask.newBuilder() | ||
.setConfig(conf) | ||
.setEndpointStateMachine(endpoint) | ||
.setDatanodeDetails(context.getParent().getDatanodeDetails()) | ||
.setContext(context) | ||
.build(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the reason to create an endpointTask
each time instead of using the endpointTasks
map?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @szetszwo .
There are two reasons for this:
- If endpointTasks is kept and initialized once, unfortunately, some CI fail.
- If endpointTasks is created each time execute is executed, a VersionEndpointTask, RegisterEndpointTask, and HeartbeatEndpointTask are created. However, HeartbeatEndpointTask is more often used, so it can be created based on the endpoint state, which is less expensive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jianghuazhu , thanks for the updates and the explanation! Please see the comments inlined.
for (EndpointStateMachine endpointStateMachine : connectionManager.getValues()) { | ||
if (endpointStateMachine.getAddressString().equals(endpoint.getAddressString())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This for loop and the if-statement are not needed since endpointStateMachine
is not used to build the task. Also, we already has it in the execute(..)
method.
Line 142 in a6b3392
for (EndpointStateMachine endpoint : connectionManager.getValues()) { |
Callable<EndPointStates> endpointTask = null; | ||
for (EndpointStateMachine endpointStateMachine : connectionManager.getValues()) { | ||
if (endpointStateMachine.getAddressString().equals(endpoint.getAddressString())) { | ||
if (endpoint.getState().getValue() == EndPointStates.GETVERSION.getValue()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use switch-case instead.
private Callable<EndPointStates> getEndPointTask( | ||
EndpointStateMachine endpoint) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's rename it to buildEndPointTask
.
private Callable<EndPointStates> buildEndPointTask(EndpointStateMachine endpoint) {
switch (endpoint.getState()) {
case GETVERSION:
return new VersionEndpointTask(endpoint, conf, context.getParent().getContainer());
case REGISTER:
return RegisterEndpointTask.newBuilder()
.setConfig(conf)
.setEndpointStateMachine(endpoint)
.setContext(context)
.setDatanodeDetails(context.getParent().getDatanodeDetails())
.setOzoneContainer(context.getParent().getContainer())
.build();
case HEARTBEAT:
return HeartbeatEndpointTask.newBuilder()
.setConfig(conf)
.setEndpointStateMachine(endpoint)
.setDatanodeDetails(context.getParent().getDatanodeDetails())
.setContext(context)
.build();
default:
return null;
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @szetszwo .
I'll update soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated it, @szetszwo .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 the change looks good.
Just have restarted the failed tests. |
|
Thanks @jianghuazhu for the patch, @szetszwo for the review. |
What changes were proposed in this pull request?
When DN is running, the RunningDatanodeState is created every time the heartbeat is sent to the SCM. This operation is very frequent and should be reused as much as possible.
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-11083
How was this patch tested?
Make sure that your unit tests pass.