-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[fix][broker] Fix broker cache eviction of entries read by active cursors #17273
[fix][broker] Fix broker cache eviction of entries read by active cursors #17273
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
great work
managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedLedgerImpl.java
Show resolved
Hide resolved
managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedLedgerImpl.java
Outdated
Show resolved
Hide resolved
I marked this as draft until I have fixed the non-durable cursor behavior. The intended behavior is described in #6787 where this was added. It seems that broker caching should take non-durable cursors into account, but it shouldn't be taken into account for trimming ledgers. #12045 fixed the behavior for non-durable cursor caching as a side-effect. However, that wasn't explained in the PR itself. |
This is ready for review now. |
// if removed subscription was the slowest subscription : update cursor and let it clear cache: | ||
// till new slowest-cursor's read-position | ||
discardEntriesFromCache((ManagedCursorImpl) activeCursors.getSlowestReader(), | ||
getPreviousPosition((PositionImpl) activeCursors.getSlowestReader().getReadPosition())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@michaeljmarshall @cdbartholomew These are the lines that I think that were completely invalid previously and could lead to the cache being evicted unintentionally.
…and apache#16605 - apache#17195 changed the method signature that apache#16605 depended upon
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I like the updated design @lhotari.
Hello @lhotari |
Great work!!! @lhotari |
…sors (#17273) * [fix][broker] Fix broken build caused by conflict between #17195 and #16605 - #17195 changed the method signature that #16605 depended upon * [fix][broker] Keep sorted list of cursors ordered by read position of active cursors when cacheEvictionByMarkDeletedPosition=false Fixes #16054 - calculate the sorted list of when a read position gets updated - this resolves #9958 in a proper way - #12045 broke the caching solution as explained in #16054 - remove invalid tests - fix tests - add more tests to handle corner cases * Address review comment * Handle durable & non-durable in the correct way * Fix cache tests since now entries get evicted reactively * Address review comment about method names * Change signature for add method so that position must be passed - this is more consistent with cursorUpdated method where the position is passed * Update javadoc for ManagedCursorContainer * Address review comment * Simplify ManagedCursorContainer * Clarify javadoc * Ensure that cursors are tracked by making sure that initial position isn't null unintentionally * Prevent race in updating activeCursors
@lhotari hi, could you please cherry-pick this PR to branch-2.9? thanks. |
…sors (#17273) * [fix][broker] Fix broken build caused by conflict between #17195 and #16605 - #17195 changed the method signature that #16605 depended upon * [fix][broker] Keep sorted list of cursors ordered by read position of active cursors when cacheEvictionByMarkDeletedPosition=false Fixes #16054 - calculate the sorted list of when a read position gets updated - this resolves #9958 in a proper way - #12045 broke the caching solution as explained in #16054 - remove invalid tests - fix tests - add more tests to handle corner cases * Address review comment * Handle durable & non-durable in the correct way * Fix cache tests since now entries get evicted reactively * Address review comment about method names * Change signature for add method so that position must be passed - this is more consistent with cursorUpdated method where the position is passed * Update javadoc for ManagedCursorContainer * Address review comment * Simplify ManagedCursorContainer * Clarify javadoc * Ensure that cursors are tracked by making sure that initial position isn't null unintentionally * Prevent race in updating activeCursors (cherry picked from commit 856ef15)
…sors (#17273) * [fix][broker] Fix broken build caused by conflict between #17195 and #16605 - #17195 changed the method signature that #16605 depended upon * [fix][broker] Keep sorted list of cursors ordered by read position of active cursors when cacheEvictionByMarkDeletedPosition=false Fixes #16054 - calculate the sorted list of when a read position gets updated - this resolves #9958 in a proper way - #12045 broke the caching solution as explained in #16054 - remove invalid tests - fix tests - add more tests to handle corner cases * Address review comment * Handle durable & non-durable in the correct way * Fix cache tests since now entries get evicted reactively * Address review comment about method names * Change signature for add method so that position must be passed - this is more consistent with cursorUpdated method where the position is passed * Update javadoc for ManagedCursorContainer * Address review comment * Simplify ManagedCursorContainer * Clarify javadoc * Ensure that cursors are tracked by making sure that initial position isn't null unintentionally * Prevent race in updating activeCursors (cherry picked from commit 856ef15)
Fixes #16054
Fixes #9958
Motivation
The broker cache eviction of entries read by active cursors has been broken in Pulsar since 2.8.2 version.
This broke in the PR #12045 changes. It was a change to optimize the broker cache behavior where there was a high over head of the eviction task (issue reported as #9958 and also reported on Pulsar Slack by multiple users during that time).
PR #12045 change was partially mitigated by a new feature "cacheEvictionByMarkDeletedPosition" added by PR #14985. However that introduces an issue that entries could get cached for too long and unnecessarily.
The main motivation of this PR is to fix and restore broker cache eviction of entries based on the earliest read position of active cursors (consumers) and also handle this in an efficient and performant way so that the original intention of PR #12045 will be covered, the performance issue reported as #9958.
Modifications
add a new ManagedCursorContainer instance to ManagedLedgerImpl that keeps a sorted list of cursors ordered by the read position
make changes to ManagedLedgerImpl and ManagedCursorImpl to notify the ledger instance when the cursor's read position gets updated. This makes it possible to react to the changes and sort the cursors only when there has been changes to the data that is used for sorting. This reactive approach has been used for the mark delete position and this exists. It fixes the performance issue in the most efficient way.
There were some hard to understand tests added in Add a cache eviction policy:Evicting cache data by the slowest markDeletedPosition #14985. Those tests have been replaced by simpler tests that will prevent future regressions.
Some tests contained invalid assertions perhaps caused by earlier bugs in the eviction by read position. This has been fixed.
More tests have been added to cover possible corner cases