-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[fix][broker] release orphan replicator after replicator is closed. #21203
Conversation
pulsar-broker/src/main/java/org/apache/pulsar/broker/service/AbstractReplicator.java
Outdated
Show resolved
Hide resolved
pulsar-broker/src/main/java/org/apache/pulsar/broker/service/persistent/PersistentTopic.java
Outdated
Show resolved
Hide resolved
pulsar-broker/src/test/java/org/apache/pulsar/broker/service/AbstractReplicatorTest.java
Show resolved
Hide resolved
pulsar-broker/src/main/java/org/apache/pulsar/broker/service/AbstractReplicator.java
Outdated
Show resolved
Hide resolved
pulsar-broker/src/main/java/org/apache/pulsar/broker/service/AbstractReplicator.java
Outdated
Show resolved
Hide resolved
@@ -116,6 +118,13 @@ public String getRemoteCluster() { | |||
// This method needs to be synchronized with disconnects else if there is a disconnect followed by startProducer | |||
// the end result can be disconnect. | |||
public synchronized void startProducer() { | |||
// This method comes from some actives call and may be call again after disconnect | |||
// so here we will first mark isClosed is false | |||
isClosed = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once the stat is set to closed
, it can no longer be recovered to no-closed
. If the replicator needs to keep working, it shouldn't be set to closed
, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Normally this is the case. But here, we can call the method startProducer again after call the method disconnect.
In fact, this class can be closed and opened again.
In fact, this class can be closed and opened again.
So we should only set the stat of Replicator to closed
when it is exactly is closed, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once the stat is set to
closed
, it can no longer be recovered tono-closed
. If the replicator needs to keep working, it shouldn't be set toclosed
, right?
Normally this is the case. But here, we can call the method startProducer
again after call the method disconnect
.
In fact, this class can be closed and opened again.
Essentially, what we want to do here is to allow the retry thread (checkTopicActiveAndRetryStartProducer
) to end normally when the replicator is closed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once we add a stat, it must be meaningful, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@codelipenghui @Technoboy- What do you think about this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, this class can be closed and opened again.
There is already a state [Stopped, Starting, Started, Stopping]
is working on this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is currently no clear indication that this replicator will not be used again after it is disconnect
.
Mabey it is not very reasonable to use the close
status identifier here, but it can work normally.
I think it is also reasonable to use some other markers as judgment conditions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you provide a definition for this new state isClosed
? How is it different from State.Stopped
when it is true
?
Add a context: This PR is trying to fix the issue below
Once the producer creates success, the loop will be stopped. Explain: After the cursor was closed. The method if (exception instanceof CursorAlreadyClosedException) {
log.error("[{}] Error reading entries because replicator is"
+ " already deleted and cursor is already closed {}, ({})",
replicatorId, ctx, exception.getMessage(), exception);
// replicator is already deleted and cursor is already closed so, producer should also be stopped
closeProducerAsync();
return;
} So this PR only trying to fix the scenario that the producer always creates fails. |
YES That's right |
@poorbarcode PTAL. |
@hanmz Please check the CI failure. |
/pulsarbot rerun-failure-checks |
Current PR is pending on this concern. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this new state is not meaningful in the current design.
#21946 will fix the issue that the current PR tries to fix. |
Motivation
When the replicator.producer start failed the method does not exit the loop(checkTopicActiveAndRetryStartProducer) even if the replicator is disconnect. You can reproduce by the test testExitRetryStartProducerAfterReplicatorDisconnect.
Modifications
If the replicator is already disconnected, stop to retryStartProducer.
Documentation
doc
doc-required
doc-not-needed
doc-complete
Matching PR in forked repository
PR in forked repository: hanmz#2