Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix][broker] Fix broker cache eviction of entries read by active cursors #17273

Merged
merged 15 commits into from
Aug 31, 2022

Conversation

lhotari
Copy link
Member

@lhotari lhotari commented Aug 25, 2022

Fixes #16054
Fixes #9958

Motivation

The broker cache eviction of entries read by active cursors has been broken in Pulsar since 2.8.2 version.
This broke in the PR #12045 changes. It was a change to optimize the broker cache behavior where there was a high over head of the eviction task (issue reported as #9958 and also reported on Pulsar Slack by multiple users during that time).
PR #12045 change was partially mitigated by a new feature "cacheEvictionByMarkDeletedPosition" added by PR #14985. However that introduces an issue that entries could get cached for too long and unnecessarily.

The main motivation of this PR is to fix and restore broker cache eviction of entries based on the earliest read position of active cursors (consumers) and also handle this in an efficient and performant way so that the original intention of PR #12045 will be covered, the performance issue reported as #9958.

Modifications

  • add a new ManagedCursorContainer instance to ManagedLedgerImpl that keeps a sorted list of cursors ordered by the read position

  • make changes to ManagedLedgerImpl and ManagedCursorImpl to notify the ledger instance when the cursor's read position gets updated. This makes it possible to react to the changes and sort the cursors only when there has been changes to the data that is used for sorting. This reactive approach has been used for the mark delete position and this exists. It fixes the performance issue in the most efficient way.

  • There were some hard to understand tests added in Add a cache eviction policy:Evicting cache data by the slowest markDeletedPosition #14985. Those tests have been replaced by simpler tests that will prevent future regressions.

  • Some tests contained invalid assertions perhaps caused by earlier bugs in the eviction by read position. This has been fixed.

  • More tests have been added to cover possible corner cases

@lhotari lhotari added type/bug The PR fixed a bug or issue reported a bug area/broker doc-not-needed Your PR changes do not impact docs labels Aug 25, 2022
@lhotari lhotari self-assigned this Aug 25, 2022
@lhotari lhotari requested a review from hangc0276 August 25, 2022 07:57
@lhotari lhotari added this to the 2.12.0 milestone Aug 25, 2022
Copy link
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

great work

@lhotari lhotari marked this pull request as draft August 25, 2022 10:51
@lhotari
Copy link
Member Author

lhotari commented Aug 25, 2022

I marked this as draft until I have fixed the non-durable cursor behavior. The intended behavior is described in #6787 where this was added.

It seems that broker caching should take non-durable cursors into account, but it shouldn't be taken into account for trimming ledgers.

#12045 fixed the behavior for non-durable cursor caching as a side-effect. However, that wasn't explained in the PR itself.

@lhotari lhotari marked this pull request as ready for review August 25, 2022 19:19
@lhotari
Copy link
Member Author

lhotari commented Aug 25, 2022

This is ready for review now.

Comment on lines -3529 to -3537
// if removed subscription was the slowest subscription : update cursor and let it clear cache:
// till new slowest-cursor's read-position
discardEntriesFromCache((ManagedCursorImpl) activeCursors.getSlowestReader(),
getPreviousPosition((PositionImpl) activeCursors.getSlowestReader().getReadPosition()));
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@michaeljmarshall @cdbartholomew These are the lines that I think that were completely invalid previously and could lead to the cache being evicted unintentionally.

eolivelli added a commit to datastax/pulsar that referenced this pull request Aug 29, 2022
Copy link
Member

@michaeljmarshall michaeljmarshall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I like the updated design @lhotari.

@lhotari lhotari merged commit 856ef15 into apache:master Aug 31, 2022
@mattisonchao
Copy link
Member

Hello @lhotari
Could you help to cherry-pick this PR to branch-2.9 (To avoid cherry-picking involving bugs)

@HQebupt
Copy link
Contributor

HQebupt commented Sep 13, 2022

Great work!!! @lhotari

@HQebupt
Copy link
Contributor

HQebupt commented Sep 15, 2022

Hello @lhotari
Could you help to cherry-pick this PR to branch-2.9 (To avoid cherry-picking involving bugs)

I can do it if you agree. @lhotari

Technoboy- pushed a commit that referenced this pull request Oct 31, 2022
…sors (#17273)

* [fix][broker] Fix broken build caused by conflict between #17195 and #16605

- #17195 changed the method signature that #16605 depended upon

* [fix][broker] Keep sorted list of cursors ordered by read position of active cursors when cacheEvictionByMarkDeletedPosition=false

Fixes #16054

- calculate the sorted list of when a read position gets updated
- this resolves #9958 in a proper way
  - #12045 broke the caching solution as explained in #16054
- remove invalid tests
- fix tests
- add more tests to handle corner cases

* Address review comment

* Handle durable & non-durable in the correct way

* Fix cache tests since now entries get evicted reactively

* Address review comment about method names

* Change signature for add method so that position must be passed

- this is more consistent with cursorUpdated method where the position is passed

* Update javadoc for ManagedCursorContainer

* Address review comment

* Simplify ManagedCursorContainer

* Clarify javadoc

* Ensure that cursors are tracked by making sure that initial position isn't null unintentionally

* Prevent race in updating activeCursors
@congbobo184
Copy link
Contributor

congbobo184 commented Nov 15, 2022

@lhotari hi, could you please cherry-pick this PR to branch-2.9? thanks.

congbobo184 pushed a commit that referenced this pull request Nov 17, 2022
…sors (#17273)

* [fix][broker] Fix broken build caused by conflict between #17195 and #16605

- #17195 changed the method signature that #16605 depended upon

* [fix][broker] Keep sorted list of cursors ordered by read position of active cursors when cacheEvictionByMarkDeletedPosition=false

Fixes #16054

- calculate the sorted list of when a read position gets updated
- this resolves #9958 in a proper way
  - #12045 broke the caching solution as explained in #16054
- remove invalid tests
- fix tests
- add more tests to handle corner cases

* Address review comment

* Handle durable & non-durable in the correct way

* Fix cache tests since now entries get evicted reactively

* Address review comment about method names

* Change signature for add method so that position must be passed

- this is more consistent with cursorUpdated method where the position is passed

* Update javadoc for ManagedCursorContainer

* Address review comment

* Simplify ManagedCursorContainer

* Clarify javadoc

* Ensure that cursors are tracked by making sure that initial position isn't null unintentionally

* Prevent race in updating activeCursors

(cherry picked from commit 856ef15)
@congbobo184 congbobo184 added the cherry-picked/branch-2.9 Archived: 2.9 is end of life label Nov 17, 2022
congbobo184 pushed a commit that referenced this pull request Dec 7, 2022
…sors (#17273)

* [fix][broker] Fix broken build caused by conflict between #17195 and #16605

- #17195 changed the method signature that #16605 depended upon

* [fix][broker] Keep sorted list of cursors ordered by read position of active cursors when cacheEvictionByMarkDeletedPosition=false

Fixes #16054

- calculate the sorted list of when a read position gets updated
- this resolves #9958 in a proper way
  - #12045 broke the caching solution as explained in #16054
- remove invalid tests
- fix tests
- add more tests to handle corner cases

* Address review comment

* Handle durable & non-durable in the correct way

* Fix cache tests since now entries get evicted reactively

* Address review comment about method names

* Change signature for add method so that position must be passed

- this is more consistent with cursorUpdated method where the position is passed

* Update javadoc for ManagedCursorContainer

* Address review comment

* Simplify ManagedCursorContainer

* Clarify javadoc

* Ensure that cursors are tracked by making sure that initial position isn't null unintentionally

* Prevent race in updating activeCursors

(cherry picked from commit 856ef15)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
10 participants