Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix] [ml] fix wrong msg backlog of non-durable cursor after trim ledgers #21250

Merged

Conversation

poorbarcode
Copy link
Contributor

@poorbarcode poorbarcode commented Sep 25, 2023

Background

  • But after trimming ledgers, ml.lastConfirmedPosition relies on a deleted ledger when the current ledger of ML is empty.
  • Cursor prevents setting markDeletedPosition to a value larger than ml.lastConfirmedPosition, but there are no entries to read[1].
  • The code description in the method advanceCursors said: do not make cursor.markDeletedPosition larger than ml.lastConfirmedPosition[2]

Issue

If there is no durable cursor, the markDeletedPosition might be set to {current_ledger, -1}, and async mark delete will be prevented by the rule-2 above. So he backlog, readPosition, and markDeletedPosition of the cursor will be in an incorrect position after trimming the ledger. You can reproduce it by the test testTrimLedgerIfNoDurableCursor


[1]: https://github.com/apache/pulsar/blob/master/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedCursorImpl.java#L1979-L2000

if (((PositionImpl) ledger.getLastConfirmedEntry()).compareTo(newPosition) < 0) {
    boolean shouldCursorMoveForward = false;
    try {
        long ledgerEntries = ledger.getLedgerInfo(markDeletePosition.getLedgerId()).get().getEntries();
        Long nextValidLedger = ledger.getNextValidLedger(ledger.getLastConfirmedEntry().getLedgerId());
        shouldCursorMoveForward = nextValidLedger != null
                && (markDeletePosition.getEntryId() + 1 >= ledgerEntries)
                && (newPosition.getLedgerId() == nextValidLedger);
    } catch (Exception e) {
        log.warn("Failed to get ledger entries while setting mark-delete-position", e);
    }

    if (shouldCursorMoveForward) {
        log.info("[{}] move mark-delete-position from {} to {} since all the entries have been consumed",
                ledger.getName(), markDeletePosition, newPosition);
    } else {
        if (log.isDebugEnabled()) {
            log.debug("[{}] Failed mark delete due to invalid markDelete {} is ahead of last-confirmed-entry {}"
                     + " for cursor [{}]", ledger.getName(), position, ledger.getLastConfirmedEntry(), name);
        }
        callback.markDeleteFailed(new ManagedLedgerException("Invalid mark deleted position"), ctx);
        return;
    }
}

[2]: https://github.com/apache/pulsar/blob/master/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedLedgerImpl.java#L2861-L2863

// move the mark delete position to the highestPositionToDelete only if it is smaller than the add confirmed
// to prevent the edge case where the cursor is caught up to the latest and highestPositionToDelete may be
// larger than the last add confirmed

Modifications

Do not make cursor.markDeletedPosition larger than ml.lastConfirmedPosition when advancing non-durable cursors.

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

Matching PR in forked repository

PR in forked repository: x

@poorbarcode poorbarcode self-assigned this Sep 25, 2023
@poorbarcode poorbarcode added type/bug The PR fixed a bug or issue reported a bug release/3.0.2 release/2.11.3 release/2.10.6 labels Sep 25, 2023
@poorbarcode poorbarcode added this to the 3.2.0 milestone Sep 25, 2023
@github-actions github-actions bot added the doc-not-needed Your PR changes do not impact docs label Sep 25, 2023
@poorbarcode poorbarcode changed the title [draft] [fix] [broker] fix wrong msg backlog of non-durable cursor after trim ledgers [fix] [broker] fix wrong msg backlog of non-durable cursor after trim ledgers Sep 26, 2023
@poorbarcode poorbarcode changed the title [fix] [broker] fix wrong msg backlog of non-durable cursor after trim ledgers [fix] [ml] fix wrong msg backlog of non-durable cursor after trim ledgers Sep 26, 2023
@codelipenghui
Copy link
Contributor

/pulsarbot run-failure-checks

@poorbarcode poorbarcode force-pushed the fix/reader_invalid_read_pos2 branch from 035be8e to 01d9e37 Compare September 28, 2023 15:56
@poorbarcode
Copy link
Contributor Author

rebase master

@codelipenghui
Copy link
Contributor

/pulsarbot run-failure-checks

@poorbarcode poorbarcode force-pushed the fix/reader_invalid_read_pos2 branch from 01d9e37 to ef2f8a0 Compare October 8, 2023 05:55
@poorbarcode
Copy link
Contributor Author

rebase master

@poorbarcode poorbarcode force-pushed the fix/reader_invalid_read_pos2 branch from ef2f8a0 to f0c7d02 Compare October 8, 2023 09:44
@poorbarcode
Copy link
Contributor Author

rebase master

@codecov-commenter
Copy link

Codecov Report

Merging #21250 (f0c7d02) into master (dbb1577) will increase coverage by 36.52%.
Report is 5 commits behind head on master.
The diff coverage is 77.77%.

Impacted file tree graph

@@              Coverage Diff              @@
##             master   #21250       +/-   ##
=============================================
+ Coverage     36.69%   73.21%   +36.52%     
- Complexity    12193    32512    +20319     
=============================================
  Files          1698     1887      +189     
  Lines        130509   140233     +9724     
  Branches      14261    15440     +1179     
=============================================
+ Hits          47889   102675    +54786     
+ Misses        76281    29463    -46818     
- Partials       6339     8095     +1756     
Flag Coverage Δ
inttests 24.08% <9.59%> (-0.05%) ⬇️
systests 24.74% <8.08%> (+0.05%) ⬆️
unittests 72.51% <77.77%> (+40.63%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
...che/bookkeeper/mledger/impl/ManagedCursorImpl.java 78.81% <100.00%> (+33.64%) ⬆️
.../bookkeeper/mledger/impl/NonDurableCursorImpl.java 85.41% <100.00%> (ø)
...ookie/rackawareness/BookieRackAffinityMapping.java 80.71% <100.00%> (+10.71%) ⬆️
...n/java/org/apache/pulsar/broker/PulsarService.java 82.69% <100.00%> (+14.01%) ⬆️
...ker/loadbalance/extensions/models/TopKBundles.java 90.78% <100.00%> (+90.78%) ⬆️
...org/apache/pulsar/broker/loadbalance/LoadData.java 91.66% <0.00%> (+25.00%) ⬆️
...ache/pulsar/broker/namespace/NamespaceService.java 71.94% <75.00%> (+28.30%) ⬆️
...xtensions/channel/ServiceUnitStateChannelImpl.java 84.48% <58.82%> (+83.94%) ⬆️
...che/bookkeeper/mledger/impl/ManagedLedgerImpl.java 80.71% <57.89%> (+32.78%) ⬆️
...dbalance/extensions/ExtensibleLoadManagerImpl.java 77.75% <82.39%> (+75.66%) ⬆️

... and 1445 files with indirect coverage changes

@poorbarcode poorbarcode merged commit ca77982 into apache:master Oct 8, 2023
liangyuanpeng pushed a commit to liangyuanpeng/pulsar that referenced this pull request Oct 11, 2023
…gers (apache#21250)

### Background
- But after trimming ledgers, `ml.lastConfirmedPosition` relies on a deleted ledger when the current ledger of ML is empty. 
- Cursor prevents setting `markDeletedPosition` to a value larger than `ml.lastConfirmedPosition`, but there are no entries to read<sup>[1]</sup>.
- The code description in the method `advanceCursors` said: do not make `cursor.markDeletedPosition` larger than `ml.lastConfirmedPosition`<sup>[2]</sup>

### Issue
If there is no durable cursor, the `markDeletedPosition` might be set to `{current_ledger, -1}`, and `async mark delete` will be prevented by the `rule-2` above. So he `backlog`, `readPosition`, and `markDeletedPosition` of the cursor will be in an incorrect position after trimming the ledger. You can reproduce it by the test `testTrimLedgerIfNoDurableCursor`

### Modifications
Do not make `cursor.markDeletedPosition` larger than `ml.lastConfirmedPosition` when advancing non-durable cursors.
poorbarcode added a commit that referenced this pull request Oct 11, 2023
…gers (#21250)

### Background
- But after trimming ledgers, `ml.lastConfirmedPosition` relies on a deleted ledger when the current ledger of ML is empty.
- Cursor prevents setting `markDeletedPosition` to a value larger than `ml.lastConfirmedPosition`, but there are no entries to read<sup>[1]</sup>.
- The code description in the method `advanceCursors` said: do not make `cursor.markDeletedPosition` larger than `ml.lastConfirmedPosition`<sup>[2]</sup>

### Issue
If there is no durable cursor, the `markDeletedPosition` might be set to `{current_ledger, -1}`, and `async mark delete` will be prevented by the `rule-2` above. So he `backlog`, `readPosition`, and `markDeletedPosition` of the cursor will be in an incorrect position after trimming the ledger. You can reproduce it by the test `testTrimLedgerIfNoDurableCursor`

### Modifications
Do not make `cursor.markDeletedPosition` larger than `ml.lastConfirmedPosition` when advancing non-durable cursors.

(cherry picked from commit ca77982)
poorbarcode added a commit that referenced this pull request Oct 11, 2023
…gers (#21250)

### Background
- But after trimming ledgers, `ml.lastConfirmedPosition` relies on a deleted ledger when the current ledger of ML is empty.
- Cursor prevents setting `markDeletedPosition` to a value larger than `ml.lastConfirmedPosition`, but there are no entries to read<sup>[1]</sup>.
- The code description in the method `advanceCursors` said: do not make `cursor.markDeletedPosition` larger than `ml.lastConfirmedPosition`<sup>[2]</sup>

### Issue
If there is no durable cursor, the `markDeletedPosition` might be set to `{current_ledger, -1}`, and `async mark delete` will be prevented by the `rule-2` above. So he `backlog`, `readPosition`, and `markDeletedPosition` of the cursor will be in an incorrect position after trimming the ledger. You can reproduce it by the test `testTrimLedgerIfNoDurableCursor`

### Modifications
Do not make `cursor.markDeletedPosition` larger than `ml.lastConfirmedPosition` when advancing non-durable cursors.

(cherry picked from commit ca77982)
poorbarcode added a commit that referenced this pull request Oct 11, 2023
…gers (#21250)

### Background
- But after trimming ledgers, `ml.lastConfirmedPosition` relies on a deleted ledger when the current ledger of ML is empty.
- Cursor prevents setting `markDeletedPosition` to a value larger than `ml.lastConfirmedPosition`, but there are no entries to read<sup>[1]</sup>.
- The code description in the method `advanceCursors` said: do not make `cursor.markDeletedPosition` larger than `ml.lastConfirmedPosition`<sup>[2]</sup>

### Issue
If there is no durable cursor, the `markDeletedPosition` might be set to `{current_ledger, -1}`, and `async mark delete` will be prevented by the `rule-2` above. So he `backlog`, `readPosition`, and `markDeletedPosition` of the cursor will be in an incorrect position after trimming the ledger. You can reproduce it by the test `testTrimLedgerIfNoDurableCursor`

### Modifications
Do not make `cursor.markDeletedPosition` larger than `ml.lastConfirmedPosition` when advancing non-durable cursors.

(cherry picked from commit ca77982)
vinayakmalik95 pushed a commit to tmdc-io/pulsar that referenced this pull request Oct 12, 2023
…gers (apache#21250)

### Background
- But after trimming ledgers, `ml.lastConfirmedPosition` relies on a deleted ledger when the current ledger of ML is empty. 
- Cursor prevents setting `markDeletedPosition` to a value larger than `ml.lastConfirmedPosition`, but there are no entries to read<sup>[1]</sup>.
- The code description in the method `advanceCursors` said: do not make `cursor.markDeletedPosition` larger than `ml.lastConfirmedPosition`<sup>[2]</sup>

### Issue
If there is no durable cursor, the `markDeletedPosition` might be set to `{current_ledger, -1}`, and `async mark delete` will be prevented by the `rule-2` above. So he `backlog`, `readPosition`, and `markDeletedPosition` of the cursor will be in an incorrect position after trimming the ledger. You can reproduce it by the test `testTrimLedgerIfNoDurableCursor`

### Modifications
Do not make `cursor.markDeletedPosition` larger than `ml.lastConfirmedPosition` when advancing non-durable cursors.
mukesh-ctds pushed a commit to datastax/pulsar that referenced this pull request Apr 15, 2024
…gers (apache#21250)

### Background
- But after trimming ledgers, `ml.lastConfirmedPosition` relies on a deleted ledger when the current ledger of ML is empty.
- Cursor prevents setting `markDeletedPosition` to a value larger than `ml.lastConfirmedPosition`, but there are no entries to read<sup>[1]</sup>.
- The code description in the method `advanceCursors` said: do not make `cursor.markDeletedPosition` larger than `ml.lastConfirmedPosition`<sup>[2]</sup>

### Issue
If there is no durable cursor, the `markDeletedPosition` might be set to `{current_ledger, -1}`, and `async mark delete` will be prevented by the `rule-2` above. So he `backlog`, `readPosition`, and `markDeletedPosition` of the cursor will be in an incorrect position after trimming the ledger. You can reproduce it by the test `testTrimLedgerIfNoDurableCursor`

### Modifications
Do not make `cursor.markDeletedPosition` larger than `ml.lastConfirmedPosition` when advancing non-durable cursors.

(cherry picked from commit ca77982)
(cherry picked from commit 6895919)
mukesh-ctds pushed a commit to datastax/pulsar that referenced this pull request Apr 17, 2024
…gers (apache#21250)

### Background
- But after trimming ledgers, `ml.lastConfirmedPosition` relies on a deleted ledger when the current ledger of ML is empty.
- Cursor prevents setting `markDeletedPosition` to a value larger than `ml.lastConfirmedPosition`, but there are no entries to read<sup>[1]</sup>.
- The code description in the method `advanceCursors` said: do not make `cursor.markDeletedPosition` larger than `ml.lastConfirmedPosition`<sup>[2]</sup>

### Issue
If there is no durable cursor, the `markDeletedPosition` might be set to `{current_ledger, -1}`, and `async mark delete` will be prevented by the `rule-2` above. So he `backlog`, `readPosition`, and `markDeletedPosition` of the cursor will be in an incorrect position after trimming the ledger. You can reproduce it by the test `testTrimLedgerIfNoDurableCursor`

### Modifications
Do not make `cursor.markDeletedPosition` larger than `ml.lastConfirmedPosition` when advancing non-durable cursors.

(cherry picked from commit ca77982)
(cherry picked from commit 6895919)
mukesh-ctds pushed a commit to datastax/pulsar that referenced this pull request Apr 17, 2024
…gers (apache#21250)

### Background
- But after trimming ledgers, `ml.lastConfirmedPosition` relies on a deleted ledger when the current ledger of ML is empty.
- Cursor prevents setting `markDeletedPosition` to a value larger than `ml.lastConfirmedPosition`, but there are no entries to read<sup>[1]</sup>.
- The code description in the method `advanceCursors` said: do not make `cursor.markDeletedPosition` larger than `ml.lastConfirmedPosition`<sup>[2]</sup>

### Issue
If there is no durable cursor, the `markDeletedPosition` might be set to `{current_ledger, -1}`, and `async mark delete` will be prevented by the `rule-2` above. So he `backlog`, `readPosition`, and `markDeletedPosition` of the cursor will be in an incorrect position after trimming the ledger. You can reproduce it by the test `testTrimLedgerIfNoDurableCursor`

### Modifications
Do not make `cursor.markDeletedPosition` larger than `ml.lastConfirmedPosition` when advancing non-durable cursors.

(cherry picked from commit ca77982)
(cherry picked from commit 6895919)
mukesh-ctds pushed a commit to datastax/pulsar that referenced this pull request Apr 19, 2024
…gers (apache#21250)

### Background
- But after trimming ledgers, `ml.lastConfirmedPosition` relies on a deleted ledger when the current ledger of ML is empty.
- Cursor prevents setting `markDeletedPosition` to a value larger than `ml.lastConfirmedPosition`, but there are no entries to read<sup>[1]</sup>.
- The code description in the method `advanceCursors` said: do not make `cursor.markDeletedPosition` larger than `ml.lastConfirmedPosition`<sup>[2]</sup>

### Issue
If there is no durable cursor, the `markDeletedPosition` might be set to `{current_ledger, -1}`, and `async mark delete` will be prevented by the `rule-2` above. So he `backlog`, `readPosition`, and `markDeletedPosition` of the cursor will be in an incorrect position after trimming the ledger. You can reproduce it by the test `testTrimLedgerIfNoDurableCursor`

### Modifications
Do not make `cursor.markDeletedPosition` larger than `ml.lastConfirmedPosition` when advancing non-durable cursors.

(cherry picked from commit ca77982)
(cherry picked from commit 6895919)
srinath-ctds pushed a commit to datastax/pulsar that referenced this pull request Apr 23, 2024
…gers (apache#21250)

### Background
- But after trimming ledgers, `ml.lastConfirmedPosition` relies on a deleted ledger when the current ledger of ML is empty.
- Cursor prevents setting `markDeletedPosition` to a value larger than `ml.lastConfirmedPosition`, but there are no entries to read<sup>[1]</sup>.
- The code description in the method `advanceCursors` said: do not make `cursor.markDeletedPosition` larger than `ml.lastConfirmedPosition`<sup>[2]</sup>

### Issue
If there is no durable cursor, the `markDeletedPosition` might be set to `{current_ledger, -1}`, and `async mark delete` will be prevented by the `rule-2` above. So he `backlog`, `readPosition`, and `markDeletedPosition` of the cursor will be in an incorrect position after trimming the ledger. You can reproduce it by the test `testTrimLedgerIfNoDurableCursor`

### Modifications
Do not make `cursor.markDeletedPosition` larger than `ml.lastConfirmedPosition` when advancing non-durable cursors.

(cherry picked from commit ca77982)
(cherry picked from commit 6895919)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants