[Bug][broker] MessageDeduplication replay timeout would cause topic loading stuck and become unavailable #23003
Closed
3 tasks done
Labels
type/bug
The PR fixed a bug or issue reported a bug
Search before asking
Read release policy
Version
master
Minimal reproduce step
enter this dead loop, and topic loading keep failed.
What did you expect to see?
topic become available
What did you see instead?
The issue's root is as following:
#21540 , this pr modify that topic would be closed if 60s timeout.
#22479, this pr add a logic that takeSnapshot after MessageDeduplication replay, so that topic loading won't timeout.
#22860, this pr refactor the topic loading process. Now topic loading should not be concurrent. If topic loading would timeout, the loading process is sequentially "create -> timeout -> close -> create".
However, topic loading is still stuck. The reason is if topic loading timeout, the topic would close. However, topic close and takeSnapshot is executed concurrently, so takeSnapshot may throw exception since topic has been closed. This would result in each time we retry loading topic, we need to replaying the same entries in MessageDeduplication, and we are always 60s timeout.
The error log is :
Anything else?
No response
Are you willing to submit a PR?
The text was updated successfully, but these errors were encountered: