Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix] [broker] Fix brokers still retry start replication after closed the topic #23237

Merged
merged 2 commits into from
Sep 2, 2024

Conversation

poorbarcode
Copy link
Contributor

Motivation

Reproduce steps:

  • create a topic
  • enable replication, but the replicator failed to start
    • broker will trigger a delayed(60s) task to retry
  • unload the topic
  • Issue: the retry task will retry to restart the replicator again and again
2024-08-30T11:41:11,955 - INFO  - [pulsar-io-64-13:PersistentTopic] - [persistent://public/ns1/tp_-0f9aa339-4e8e-404c-8afd-ea49aad60dba] Starting replicator to remote: r2
2024-08-30T11:41:11,957 - ERROR - [pulsar-io-64-13:PersistentTopic] - [persistent://public/ns1/tp_-0f9aa339-4e8e-404c-8afd-ea49aad60dba] Policies update failed org.apache.pulsar.broker.service.BrokerServiceException$PersistenceException: org.apache.bookkeeper.mledger.ManagedLedgerException: ManagedLedger public/ns1/persistent/tp_-0f9aa339-4e8e-404c-8afd-ea49aad60dba has already been closed, scheduled retry in 60 seconds
java.util.concurrent.CompletionException: org.apache.pulsar.broker.service.BrokerServiceException$PersistenceException: org.apache.bookkeeper.mledger.ManagedLedgerException: ManagedLedger public/ns1/persistent/tp_-0f9aa339-4e8e-404c-8afd-ea49aad60dba has already been closed
	at java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:332) ~[?:?]
	at java.base/java.util.concurrent.CompletableFuture.andTree(CompletableFuture.java:1527) ~[?:?]
	at java.base/java.util.concurrent.CompletableFuture.allOf(CompletableFuture.java:2419) ~[?:?]
	at org.apache.pulsar.common.util.FutureUtil.waitForAll(FutureUtil.java:60) ~[classes/:?]
	at org.apache.pulsar.broker.service.persistent.PersistentTopic.lambda$checkReplication$69(PersistentTopic.java:1959) ~[classes/:?]
	at java.base/java.util.concurrent.CompletableFuture.uniComposeStage(CompletableFuture.java:1187) ~[?:?]
	at java.base/java.util.concurrent.CompletableFuture.thenCompose(CompletableFuture.java:2309) ~[?:?]
	at org.apache.pulsar.broker.service.persistent.PersistentTopic.checkReplication(PersistentTopic.java:1916) ~[classes/:?]
	at org.apache.pulsar.broker.service.persistent.PersistentTopic.checkReplicationAndRetryOnFailure(PersistentTopic.java:1861) ~[classes/:?]
	at io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:96) [netty-common-4.1.111.Final.jar:4.1.111.Final]
	at io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:153) [netty-common-4.1.111.Final.jar:4.1.111.Final]
	at io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:173) [netty-common-4.1.111.Final.jar:4.1.111.Final]
	at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:166) [netty-common-4.1.111.Final.jar:4.1.111.Final]
	at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:469) [netty-common-4.1.111.Final.jar:4.1.111.Final]
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:569) [netty-transport-4.1.111.Final.jar:4.1.111.Final]
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:994) [netty-common-4.1.111.Final.jar:4.1.111.Final]
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [netty-common-4.1.111.Final.jar:4.1.111.Final]
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [netty-common-4.1.111.Final.jar:4.1.111.Final]
	at java.base/java.lang.Thread.run(Thread.java:833) [?:?]
Caused by: org.apache.pulsar.broker.service.BrokerServiceException$PersistenceException: org.apache.bookkeeper.mledger.ManagedLedgerException: ManagedLedger public/ns1/persistent/tp_-0f9aa339-4e8e-404c-8afd-ea49aad60dba has already been closed
	at org.apache.pulsar.broker.service.persistent.PersistentTopic$8.openCursorFailed(PersistentTopic.java:2119) ~[classes/:?]
	at org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl.asyncOpenCursor(ManagedLedgerImpl.java:942) ~[classes/:?]
	at org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl.asyncOpenCursor(ManagedLedgerImpl.java:930) ~[classes/:?]
	at org.apache.pulsar.broker.service.persistent.PersistentTopic.startReplicator(PersistentTopic.java:2104) ~[classes/:?]
	at org.apache.pulsar.broker.service.persistent.PersistentTopic.lambda$checkReplication$69(PersistentTopic.java:1942) ~[classes/:?]
	... 14 more
Caused by: org.apache.bookkeeper.mledger.ManagedLedgerException: ManagedLedger public/ns1/persistent/tp_-0f9aa339-4e8e-404c-8afd-ea49aad60dba has already been closed

Modifications

Do not retry to start a replicator after the topic is closed

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

Matching PR in forked repository

PR in forked repository: x

@poorbarcode poorbarcode added type/bug The PR fixed a bug or issue reported a bug release/3.0.7 release/3.3.2 labels Aug 30, 2024
@poorbarcode poorbarcode added this to the 4.0.0 milestone Aug 30, 2024
@poorbarcode poorbarcode self-assigned this Aug 30, 2024
@github-actions github-actions bot added the doc-not-needed Your PR changes do not impact docs label Aug 30, 2024
Copy link
Member

@lhotari lhotari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, there's some duplication of code in isClose, but that's not critical.

@lhotari
Copy link
Member

lhotari commented Aug 30, 2024

@poorbarcode please fix checkstyle

@poorbarcode
Copy link
Contributor Author

@poorbarcode please fix checkstyle

Thanks @lhotari for mentioning this to me ❤️

@codecov-commenter
Copy link

codecov-commenter commented Aug 30, 2024

Codecov Report

Attention: Patch coverage is 58.82353% with 7 lines in your changes missing coverage. Please review.

Project coverage is 74.59%. Comparing base (bbc6224) to head (0958ce5).
Report is 560 commits behind head on master.

Files with missing lines Patch % Lines
...sar/broker/service/persistent/PersistentTopic.java 58.82% 2 Missing and 5 partials ⚠️
Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff              @@
##             master   #23237      +/-   ##
============================================
+ Coverage     73.57%   74.59%   +1.01%     
- Complexity    32624    34281    +1657     
============================================
  Files          1877     1924      +47     
  Lines        139502   144934    +5432     
  Branches      15299    15855     +556     
============================================
+ Hits         102638   108108    +5470     
+ Misses        28908    28563     -345     
- Partials       7956     8263     +307     
Flag Coverage Δ
inttests 27.89% <5.88%> (+3.30%) ⬆️
systests 24.67% <5.88%> (+0.35%) ⬆️
unittests 73.94% <58.82%> (+1.09%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...sar/broker/service/persistent/PersistentTopic.java 79.43% <58.82%> (+0.97%) ⬆️

... and 552 files with indirect coverage changes

@poorbarcode poorbarcode requested review from gaoran10 and removed request for gaoran10 August 31, 2024 18:02
@Technoboy- Technoboy- merged commit aee2ee5 into apache:master Sep 2, 2024
51 checks passed
grssam pushed a commit to grssam/pulsar that referenced this pull request Sep 4, 2024
@poorbarcode poorbarcode deleted the fix/replication_after_closed branch September 5, 2024 03:25
poorbarcode added a commit that referenced this pull request Sep 5, 2024
nikhil-ctds pushed a commit to datastax/pulsar that referenced this pull request Sep 5, 2024
…the topic (apache#23237)

(cherry picked from commit aee2ee5)
(cherry picked from commit 311b6af)
lhotari pushed a commit that referenced this pull request Sep 5, 2024
srinath-ctds pushed a commit to datastax/pulsar that referenced this pull request Sep 5, 2024
…the topic (apache#23237)

(cherry picked from commit aee2ee5)
(cherry picked from commit 311b6af)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cherry-picked/branch-3.0 cherry-picked/branch-3.3 doc-not-needed Your PR changes do not impact docs release/3.0.7 release/3.3.2 type/bug The PR fixed a bug or issue reported a bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants