-
Notifications
You must be signed in to change notification settings - Fork 122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve the robustness of old ZooKeeper log removal #1040
Conversation
Motivation: `OldLogRemover` in `ZooKeeperCommandExecutor` currently catches a `Throwable` when deleting an old log or its log blocks. However, it has two issues doing so: - It doesn't handle an exception that's raised when reading the metadata of the old log. - `Throwable` is way too wide exception to catch. Catching a `KeeperException` whose code is `NONODE` will be enough. - Note that the failure will only transfer the leadership to other replica, rather than stopping the whole replication process. Modifications: - `OldLogRemover` now catches `KeeperException` whose code is `NONODE` only. - An attempt to read a missing log node's metadata is now handled properly. - Added more detail to the log messages about missing nodes - Split `deleteLog()` into `deleteLog()` and `deleteLogBlock()` Result: - The leadership is not transferred anymore when `OldLogRemover` attempts to retrieve a missing log node's metadata, which is not really a critical issue. - Instead, the leadership will be transferred when an exception occurs not because of a missing node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks nice. Thanks, @trustin! 🙇♂️🙇♂️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, I'm not sure how a null LogMeta
is possible, but agree with the isolation/additional logging 👍
Question: Is there any chance that the replica, which receives the leadership, raises the same exception? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
…rnings (#1070) **Motivation** With the improvement introduced by #1040, we have been observing logs where CentralDogma attempts to delete logs that don't exist. In order to 1) verify which nodes are being prematurely deleted and 2) to observe trends as to why this may occur, I propose that `LogMeta#toString` be implemented.
Motivation:
OldLogRemover
inZooKeeperCommandExecutor
currently catches aThrowable
when deleting an old log or its log blocks. However, it has two issues doing so:Throwable
is way too wide exception to catch. Catching aKeeperException
whose code isNONODE
will be enough.Modifications:
OldLogRemover
now catchesKeeperException
whose code isNONODE
only.deleteLog()
intodeleteLog()
anddeleteLogBlock()
Result:
OldLogRemover
attempts to retrieve a missing log node's metadata, which is not really a critical issue.