[fix][broker] Key-shared subscription must follow consumer redelivery as per shared sub semantic #21657

rdhabalia · 2023-12-02T00:51:18Z

Motivation

SHARED or key-SHARED subscription must dispatch redelivered messages in any scenario. every shared subscription should dispatch already delivered unack messages. You can follow strict ordering for new messages which broker is reading first time by advancing readPosition of the cursor but broker can dispatch already delivered unack messages when its required without restricting any scenario.

However, key-shared subscription is incorrectly handling redelivered messages by keep reading redelivered messages , discarding them and not dispatching any single messages to the consumer by incorrectly changing the semantics of consumer delivery ordering. broker doesn't dispatch redelivery message if that message id is smaller than consumer's assigned offset-message-id when it joined. broker assigns cursor's current read position as consumer's min-message-id offset to manage ordering but delivered messageId can be smaller than that position and redelivery should not be restricted by ordering as we already discussed semantics of shared subscription earlier. But as broker handles it incorrectly in key-shared because of that key-shared subscription topics which have connected consumers with positive permits are not able to receive any messages and dispatching is stuck also broker is keep performing same cold reads across those stuck topics and wasting storage and CPU resources by discarding read messages. which impacts application, broker and bookies and such buggy handling is semantically and practically invalid.

Right now, such multiple topics with key-shared subscription and redelivery messages can significantly impact broker and bookies by keep reading large number of messages without dispatching them and client application are not able to consume any messages which also impacts application significantly.

Modifications

Allow dispatching of redelivered messages, avoid reading and discarding duplicate messages, and fix broken stuck dispatcher on unack messages.

Verifying this change

Make sure that the change passes the CI checks.

(Please pick either of the following options)

This change is a trivial rework / code cleanup without any test coverage.

(or)

This change is already covered by existing tests, such as (please describe tests).

(or)

This change added tests and can be verified as follows:

(example:)

Added integration tests for end-to-end deployment with large payloads (10MB)
Extended integration test for recovery after broker failure

Does this pull request potentially affect one of the following parts:

If the box was checked, please highlight the changes

Documentation

doc
doc-required
doc-not-needed
doc-complete

Matching PR in forked repository

PR in forked repository:

… as per shared sub semantic

github-actions · 2023-12-02T00:51:48Z

@rdhabalia Please add the following content to your PR description and select a checkbox:

- [ ] `doc` <!-- Your PR contains doc changes -->
- [ ] `doc-required` <!-- Your PR changes impact docs and you will update later -->
- [ ] `doc-not-needed` <!-- Your PR changes do not impact docs -->
- [ ] `doc-complete` <!-- Docs have been already added -->

joeCarf

hi, seems that there are test failures ~

eolivelli · 2023-12-04T15:01:32Z

pulsar-broker/src/test/java/org/apache/pulsar/client/api/KeySharedSubscriptionTest.java

+        Producer<Integer> producer = createProducer(topic, enableBatch);
+        int count = 0;
+        for (int i = 0; i < 10; i++) {
+            // Send the same key twice so that we'll have a batch message


I see that enableBatch is false, did you want to also add that case ?

removed those comments.

eolivelli · 2023-12-04T15:02:59Z

pulsar-broker/src/test/java/org/apache/pulsar/client/api/KeySharedSubscriptionTest.java

+        }
+
+        @Cleanup
+        Consumer<Integer> consumer2 = createConsumer(topic);


consumer2 will be closed by lombok after last usage.

The point in time you close the consumer may alter the execution of the test
what about closing the consumers explicitly and not use Lombok ?

consumer2 needs to be open for the test to consume all messages and then let it be cleaned by Lombok, we don't have to close it explicitly and it won't impact the test as well.

eolivelli · 2023-12-04T15:03:22Z

pulsar-broker/src/test/java/org/apache/pulsar/client/api/KeySharedSubscriptionTest.java

@@ -1630,4 +1630,63 @@ public void testContinueDispatchMessagesWhenMessageDelayed() throws Exception {
        log.info("Got {} other messages...", sum);
        Assert.assertEquals(sum, delayedMessages + messages);
    }
+
+    @Test
+    public void test()


can we add a more meaningful test ?

oops.. sorry, I just added this test to create an issue. let me fix tests and naming.

poorbarcode · 2023-12-05T06:54:29Z

.../apache/pulsar/broker/service/persistent/PersistentStickyKeyDispatcherMultipleConsumers.java

-        // message [2,3] is lower than the recentJoinedPosition 4,
-        // so the message [2,3] will dispatched to the consumer2
-        // But the message [2,3] should not dispatch to consumer2.
-
        if (readType == ReadType.Replay) {


However, the key-shared subscription is incorrectly handling redelivered messages by reading redelivered messages, discarding them, and not dispatching any single messages to the consumer by incorrectly changing the semantics of the consumer delivery order. Broker doesn't dispatch redelivery message if that message-id is smaller than the consumer's assigned offset-message-id when joined.

These codes guarantee the ordering of messages during the scenario below:

No. Consumer 1 Consumer 2 Consumer 3 Consumer 4

stat handling k1,k2, recent-join: null handling k3,k4, recent-join: null

1 received M1(k1), M2(k2) received 1000 messages (M3(k3)...M1002(k3))

2 added

description assigned k2 which from Consumer 1

stat handling k1 handling k3,k4 handling k2, recent-join: M1002

3 closed

description assigned k3 which from Consumer 2 assigned k4 which from Consumer 2

stat handling k1,k3, recent-join: null handling k2, k4, recent-join: M1002

4 received M3(k3)...M1000(k3), the incoming queue is full now.

5 added

description assigned k3 which from Consumer 1

state handling k1, recent-join: null handling k2, k4, recent-join: M1002 handling k3, recent-join: M1002

6 received M1001(k3)...M1002(k3)

I think we should solve the issue above first, then try to improve here.

Related to #20776, please take a look

@poorbarcode can you share the URL where we have defined the contract of key-shared sub.

@poorbarcode can you share the URL where we have defined the contract of the key-shared sub?

I do not know what URL you wanted, is the doc of Key_Share Subscription Doc?

@poorbarcode I mean do we have any document where we have shared what kind of ordering guarantee we provide to users? because as I said in the issue, once one consumer is closed, broker can redeliver unack messages of that consumers without considering the ordering instead blocking forever.
So, I just want to see if we have any doc where we have defined what user can expect in terms ordering for key-shared sub. I had checked earlier URL which you shared but that doesn't talk about the ordering or redelivery ordering.
In this PR with latest commit, It maintains the ordering guarantee but it also handles redelivery of unack messages of closed consumer without blocking dispatcher forever.

So, if we have any contract defined then we can check if this PR violates the user contract for the key-shared subscription because right now, key-shared sub is not usable and it is wasting lot of broker/bookie resources.

Preserving order of processing

thanks @Technoboy- for sharing the link and I was searching for this documentation where it talks about message ordering guarantee for key-shared.

The broker will start delivering messages to the new consumer only when all messages up to the read position have been acknowledged. This will guarantee that a certain key is processed by a single consumer at any given time. The trade-off is that if one of the existing consumers is stuck and no time-out was defined (acknowledging for you), the new consumer won't receive any messages until the stuck consumer resumes or gets disconnected.

As we have documented new consumer won't receive any messages until the stuck consumer resumes or gets disconnected. So, it must receive if other consumer gets disconnected.
However, right now, dispatching gets stuck when consumer gets disconnected and this PR has the test to reproduce it.
and this PR exactly fixes that issue to unblock dispatching if consumer disconnects and redeliver that consumer's unack messages.

So, this PR should fix that fundamental issue to unblock stuck consumers when they should not be stuck.

So, this PR should fix that fundamental issue to unblock stuck consumers when they should not be stuck.

Great points @rdhabalia . We can continue to resolve this issue as part of PIP-379, #23309 is the PIP document.

@lhotari you have updated a contract similar to this PR and tried the same behavior in #23309 . then why this PR was blocked for 5 months and then you closed it?
I really don't have words to mention what's going on. does it really look good to people who blocked the PR and came up with a similar approach and did not let other people's work move forward? You also know that the same things keep happening again and again in other PRs as well and you have also witnessed this kind of thing in Pulsar very recently.
@lhotari I don't want to target anyone here but want to ask a simple question: does it look good to do such kind of actions? does it make any difference to their lives by doing it? Because I really don't understand what's going on in this Project recently.

…ages

codecov-commenter · 2023-12-06T23:30:46Z

Codecov Report

Attention: Patch coverage is 75.00000% with 3 lines in your changes missing coverage. Please review.

Project coverage is 73.35%. Comparing base (2bf1354) to head (f419701).
Report is 793 commits behind head on master.

Files with missing lines	Patch %	Lines
...ersistentStickyKeyDispatcherMultipleConsumers.java	72.72%	1 Missing and 2 partials ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##             master   #21657      +/-   ##
============================================
+ Coverage     73.24%   73.35%   +0.11%     
- Complexity    32752    32759       +7     
============================================
  Files          1893     1893              
  Lines        140730   140760      +30     
  Branches      15500    15504       +4     
============================================
+ Hits         103071   103260     +189     
+ Misses        29563    29394     -169     
- Partials       8096     8106      +10

Flag	Coverage Δ
inttests	`24.11% <0.00%> (?)`
systests	`24.70% <0.00%> (+0.03%)`	⬆️
unittests	`72.66% <75.00%> (-0.02%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
...a/org/apache/bookkeeper/mledger/ManagedCursor.java	`42.85% <ø> (ø)`
...che/bookkeeper/mledger/impl/ManagedCursorImpl.java	`78.95% <ø> (-0.18%)`	⬇️
...ava/org/apache/pulsar/broker/service/Consumer.java	`86.30% <100.00%> (+0.02%)`	⬆️
...ersistentStickyKeyDispatcherMultipleConsumers.java	`81.73% <72.72%> (-2.93%)`	⬇️

... and 103 files with indirect coverage changes

codelipenghui · 2023-12-07T05:09:09Z

Hi @rdhabalia,

I can confirm this is an issue that we need to fix it.
I tried on my laptop to find a simpler solution for this issue.

The main idea of this solution is to remove the consumer from the recently joined consumer map if there are redelivered messages(from the redeliver method and disconnect) greater than the joined position of the consumer. PTAL.

diff --git a/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/persistent/PersistentStickyKeyDispatcherMultipleConsumers.java b/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/persistent/PersistentStickyKeyDispatcherMultipleConsumers.java
index 8f05530f58..b1ffe596b8 100644
--- a/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/persistent/PersistentStickyKeyDispatcherMultipleConsumers.java
+++ b/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/persistent/PersistentStickyKeyDispatcherMultipleConsumers.java
@@ -53,6 +53,7 @@ import org.apache.pulsar.common.api.proto.CommandSubscribe.SubType;
 import org.apache.pulsar.common.api.proto.KeySharedMeta;
 import org.apache.pulsar.common.api.proto.KeySharedMode;
 import org.apache.pulsar.common.util.FutureUtil;
+import org.apache.pulsar.common.util.collections.ConcurrentLongLongPairHashMap;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 
@@ -138,6 +139,9 @@ public class PersistentStickyKeyDispatcherMultipleConsumers extends PersistentDi
 
     @Override
     public synchronized void removeConsumer(Consumer consumer) throws BrokerServiceException {
+        consumer.getPendingAcks().keys().stream().max(ConcurrentLongLongPairHashMap.LongPair::compareTo).ifPresent(longPair -> {
+            removeConsumerFromRecentlyJoinedConsumersByPosition(PositionImpl.get(longPair.first, longPair.second));
+        });
         // The consumer must be removed from the selector before calling the superclass removeConsumer method.
         // In the superclass removeConsumer method, the pending acks that the consumer has are added to
         // redeliveryMessages. If the consumer has not been removed from the selector at this point,
@@ -327,7 +331,7 @@ public class PersistentStickyKeyDispatcherMultipleConsumers extends PersistentDi
             isDispatcherStuckOnReplays = true;
             return true;
         }  else if (currentThreadKeyNumber == 0) {
-            return true;
+            return totalBytesSent != 0;
         }
         return false;
     }
@@ -404,26 +408,37 @@ public class PersistentStickyKeyDispatcherMultipleConsumers extends PersistentDi
         });
     }
 
-    private boolean removeConsumersFromRecentJoinedConsumers() {
-        Iterator<Map.Entry<Consumer, PositionImpl>> itr = recentlyJoinedConsumers.entrySet().iterator();
+    @Override
+    public synchronized void redeliverUnacknowledgedMessages(Consumer consumer, List<PositionImpl> positions) {
+        positions.stream().max(PositionImpl::compareTo).ifPresent(this::removeConsumerFromRecentlyJoinedConsumersByPosition);
+        super.redeliverUnacknowledgedMessages(consumer, positions);
+    }
+
+    private boolean removeConsumerFromRecentlyJoinedConsumersByPosition(PositionImpl position) {
         boolean hasConsumerRemovedFromTheRecentJoinedConsumers = false;
-        PositionImpl mdp = (PositionImpl) cursor.getMarkDeletedPosition();
-        if (mdp != null) {
-            PositionImpl nextPositionOfTheMarkDeletePosition =
-                    ((ManagedLedgerImpl) cursor.getManagedLedger()).getNextValidPosition(mdp);
-            while (itr.hasNext()) {
-                Map.Entry<Consumer, PositionImpl> entry = itr.next();
-                if (entry.getValue().compareTo(nextPositionOfTheMarkDeletePosition) <= 0) {
-                    itr.remove();
-                    hasConsumerRemovedFromTheRecentJoinedConsumers = true;
-                } else {
-                    break;
-                }
+        PositionImpl positionToRemove =
+                ((ManagedLedgerImpl) cursor.getManagedLedger()).getNextValidPosition(position);
+        Iterator<Map.Entry<Consumer, PositionImpl>> itr = recentlyJoinedConsumers.entrySet().iterator();
+        while (itr.hasNext()) {
+            Map.Entry<Consumer, PositionImpl> entry = itr.next();
+            if (entry.getValue().compareTo(positionToRemove) <= 0) {
+                itr.remove();
+                hasConsumerRemovedFromTheRecentJoinedConsumers = true;
+            } else {
+                break;
             }
         }
         return hasConsumerRemovedFromTheRecentJoinedConsumers;
     }
 
+    private boolean removeConsumersFromRecentJoinedConsumers() {
+        PositionImpl mdp = (PositionImpl) cursor.getMarkDeletedPosition();
+        if (mdp != null) {
+            return removeConsumerFromRecentlyJoinedConsumersByPosition(mdp);
+        }
+        return false;
+    }
+
     @Override
     protected synchronized NavigableSet<PositionImpl> getMessagesToReplayNow(int maxMessagesToRead) {
         if (isDispatcherStuckOnReplays) {
diff --git a/pulsar-broker/src/test/java/org/apache/pulsar/client/api/KeySharedSubscriptionTest.java b/pulsar-broker/src/test/java/org/apache/pulsar/client/api/KeySharedSubscriptionTest.java
index 18fb141be3..1d462ee884 100644
--- a/pulsar-broker/src/test/java/org/apache/pulsar/client/api/KeySharedSubscriptionTest.java
+++ b/pulsar-broker/src/test/java/org/apache/pulsar/client/api/KeySharedSubscriptionTest.java
@@ -48,6 +48,8 @@ import java.util.concurrent.Executors;
 import java.util.concurrent.Future;
 import java.util.concurrent.TimeUnit;
 import java.util.concurrent.atomic.AtomicInteger;
+
+import io.swagger.util.Json;
 import lombok.Cleanup;
 import org.apache.bookkeeper.mledger.impl.PositionImpl;
 import org.apache.pulsar.broker.service.Topic;
@@ -1630,4 +1632,67 @@ public class KeySharedSubscriptionTest extends ProducerConsumerBase {
         log.info("Got {} other messages...", sum);
         Assert.assertEquals(sum, delayedMessages + messages);
     }
+
+    @Test(invocationCount = 10)
+    public void test()
+            throws Exception {
+        String topic = "persistent://public/default/key_shared-" + UUID.randomUUID();
+        boolean enableBatch = false;
+        Set<Integer> values = new HashSet<>();
+
+        @Cleanup
+        Consumer<Integer> consumer1 = createConsumer(topic);
+
+        @Cleanup
+        Producer<Integer> producer = createProducer(topic, enableBatch);
+        int count = 0;
+        for (int i = 0; i < 10; i++) {
+            // Send the same key twice so that we'll have a batch message
+            String key = String.valueOf(random.nextInt(NUMBER_OF_KEYS));
+            producer.newMessage().key(key).value(count++).send();
+        }
+
+        @Cleanup
+        Consumer<Integer> consumer2 = createConsumer(topic);
+
+        for (int i = 0; i < 10; i++) {
+            // Send the same key twice so that we'll have a batch message
+            String key = String.valueOf(random.nextInt(NUMBER_OF_KEYS));
+            producer.newMessage().key(key).value(count++).send();
+        }
+
+        @Cleanup
+        Consumer<Integer> consumer3 = createConsumer(topic);
+
+        consumer2.redeliverUnacknowledgedMessages();
+
+        for (int i = 0; i < 10; i++) {
+            // Send the same key twice so that we'll have a batch message
+            String key = String.valueOf(random.nextInt(NUMBER_OF_KEYS));
+            producer.newMessage().key(key).value(count++).send();
+        }
+        consumer1.close();
+
+        for(int i = 0; i < count; i++) {
+            Message<Integer> msg = consumer2.receive(10, TimeUnit.SECONDS);
+            if (msg!=null) {
+                values.add(msg.getValue());
+            } else {
+                break;
+            }
+        }
+        for(int i = 0; i < count; i++) {
+            Message<Integer> msg = consumer3.receive(10, TimeUnit.SECONDS);
+            if (msg!=null) {
+                values.add(msg.getValue());
+            } else {
+                break;
+            }
+        }
+        System.out.println(Json.pretty(admin.topics().getStats(topic)));
+        System.out.println(Json.pretty(admin.topics().getInternalStats(topic)));
+
+        assertEquals(values.size(), count);
+
+    }
 }

All the Key_Shared subscription tests get passed.

cc @poorbarcode @Technoboy-

rdhabalia · 2023-12-07T06:33:15Z

@codelipenghui
simply adding removed consuner's unack messages into redelivery/replay list will not work because there are scenarios where key-shared dispatcher adds additionally filtered messages(due to max-message limit, or ordering) into replay list which are associated with already connected consumers and we have to differentiate those messages and actually unacked messages, and this PR addresses that differentiation.

rdhabalia · 2023-12-07T07:21:49Z

@poorbarcode
can we remove the blocker if there is no concern about the PR and can we merge it as it's really needed for various production systems.

codelipenghui

simply adding removed consuner's unack messages into redelivery/replay list will not work because there are scenarios where key-shared dispatcher adds additionally filtered messages(due to max-message limit, or ordering) into replay list which are associated with already connected consumers and we have to differentiate those messages and actually unacked messages, and this PR addresses that differentiation.

@rdhabalia Could you please update the test to reflect your concern? And the solution I provided is not "adding removed consuner's unack messages into redelivery/replay list." And I think my solution is essentially the same idea as yours. If a consumer calls redeliver or a connection is closed triggers the redelivery message which is greater than the position of the consumer joined, which means differentiating the "additionally filtered messages" and "messages sent to consumers before".

And could you please revert all the test related changes except for your newly added tests? Or provide an explanation for why the test should be changed? I'm afraid the change will introduce new regressions even if it can fix some issues.

I will leave change request here to make sure everything is clear on this PR before merge it.

codelipenghui · 2023-12-07T09:08:08Z

.../apache/pulsar/broker/service/persistent/PersistentStickyKeyDispatcherMultipleConsumers.java

+    private boolean isEntryPendingAck(long ledgerId, long entryId) {
+        int size = consumerList.size();
+        for (int i = 0; i < size; i++) {
+            Consumer consumer = consumerList.get(i);
+            if (consumer != null && consumer.isPendingAck(ledgerId, entryId)) {
+                return true;
+            }
+        }
+        return false;
+    }


It's super expensive if you have high traffic and many consumers, No?

checking pending-ack is not expensive as it checks from the map. also, it's not being executed in every single message but it gets checked if broker sees filtered messages and also only if message is not deleted. so, it won't be expensive and frequently executed.

regarding test cases, I had to change them because most of them were assuming consumers were stuck on unack messages on closed consumer and some of them are flaky as this test class is part of flaky test pipeline.

I'm afraid the change will introduce new regressions even if it can fix some issues. I will leave change request here to make sure everything is clear on this PR before merge it.

Sure, let's make sure, there are no regression issues and change doesn't violate the user contract. this seems a fundamental fix to avoid all hack and stuck issue. but you guys can verify and merge if you don't see any issue.

simply adding removed consuner's unack messages into redelivery/replay list will not work because there are scenarios where key-shared dispatcher adds additionally filtered messages(due to max-message limit, or ordering) into replay list

@rdhabalia How about this one? I do not fully understand what is the exact issue that my approach can't resolve. It changed less and doesn't need to add APIs to the managed ledger, and it is good for performance because we don't check any message-level status. And it will not require us to change any existing tests. IMO, it's safe for us at least, even if it cannot resolve all the potential issues, but we can fix them case by case, not just fix potential issues with a more complex solution and with potential risk (test is changed). WDYT?

If you assent with my approach to only fix #21656. I can also create a PR and we can keep this PR open to find a solution for the potential issues that you pointed out (simply adding removed consuner's unack messages into redelivery/replay list will not work because there are scenarios where key-shared dispatcher adds additionally filtered messages(due to max-message limit, or ordering) into replay list).

sure, let's move forward with the best approach and let's make sure to fix to stuck dispatching as it's impacting many production systems right now and not giving positive experience about Pulsar.

poorbarcode · 2023-12-08T10:12:56Z

pulsar-broker/src/test/java/org/apache/pulsar/client/api/KeySharedSubscriptionTest.java

+     * @throws Exception
+     */
+    @Test
+    public void testKeySharedMessageRedeliveryWithoutStuck()


can we remove the blocker if there is no concern about the PR and can we merge it as it's really needed for various production systems.

I created a channel to trace the context more easily.

Menu:

1.The concerns of the test testKeySharedMessageRedeliveryWithoutStuck

2.Clear the problem the PR is supposed to solve

3.This PR will bring issues

1.The concerns of the test testKeySharedMessageRedeliveryWithoutStuck

The test works like this:

create consumer1

send 10 msgs

create consumer2

send 10 msgs

create consumer3

redeliver all messages of consumer2

send 10 msgs

close consumer1

receive all messages for consumer2

receive all messages for consumer3

Concens

The Step "redeliver all messages of consumer2" is not meaningful. It did nothing.

We should run "receive all messages for consumer2" and "receive all messages for consumer3" in different threads, so they could receive all messages(Acknowledge received messages in time).

What did the test(without my suggestion) prove

It proved that one stuck consumer can stuck another one.

It does not prove that ultimately there will be messages that will not be consumed(see Concern 2 above).

2.Clear the problem the PR is supposed to solve

You just want to improve performance and prevent idle reading, right? (The test can not prove the subscription would be stuck, see Concern 2 above ).

3.This PR will bring issues

No. Consumer 1 Consumer 2 Consumer 3 Consumer 4

stat handling k1,k2, recent-join: null handling k3,k4, recent-join: null

1 received M1(k1), M2(k2)

2 added

description assigned k2 which from Consumer 1

stat handling k1 handling k3,k4 handling k2, recent-join: M2

3 received 1000 messages (M3(k3)...M1002(k3))

4 closed

description assigned k3 which from Consumer 2 assigned k4 which from Consumer 2

stat handling k1,k3, recent-join: null handling k2, k4, recent-join: M2

5 received M3(k3)...M1000(k3), the incoming queue is full now.

6 added

description assigned k3 which from Consumer 1

state handling k1, recent-join: null handling k2, k4, recent-join: M2 handling k3, recent-join: M1002

7 Since the message M2 has not been acked, M1001(k3)...M1002(k3) will be filter out

In Step 7, the code you're removing alleviates some of the message-ordering problems, even if these codes don't solve all of the cases. #20776 is trying to solve all the cases of this problem. But if you remove this mechanism before #20776 is complete, then the out-of-order problem will be worse.

eolivelli

Lgtm

lhotari · 2024-09-13T11:27:14Z

pulsar-broker/src/test/java/org/apache/pulsar/client/api/KeySharedSubscriptionTest.java

+        for(int i = 0; i < count; i++) {
+            Message<Integer> msg = consumer2.receive(100, TimeUnit.MILLISECONDS);
+            if (msg!=null) {
+                values.add(msg.getValue());
+            } else {
+                break;
+            }
+        }
+        for(int i = 0; i < count; i++) {
+            Message<Integer> msg = consumer3.receive(1, TimeUnit.MILLISECONDS);
+            if (msg!=null) {
+                values.add(msg.getValue());
+            } else {
+                break;
+            }
+        }


this test is currently invalid. Messages would need to be acknowledged and consumed concurrently. This test passes at least in branch-3.3 when making the changes.

lhotari · 2024-09-16T07:36:43Z

The problem that this PR is addressing will be covered in PIP-379: Key_Shared Draining Hashes for Improved Message Ordering.
To address this issue, I'll update PIP-379 so that the updated contract will cover how negative acknowledgements and explicit redeliveries such as redeliverUnacknowledgedMessages are handled in Key_Shared in the updated design.
Currently it's not properly documented what happens when an application uses those methods when a Key_Shared subscription is used.
If an application uses "nacks", it should either fail with an exception or there should be a clearly defined behavior. Since failing with an exception on the client side isn't a real option, the remaining option is to clearly define the behavior and make it consistent.

lhotari · 2024-09-16T07:37:52Z

I'm closing this PR since the problem will be covered by PIP-379 and it's implementation. Please see the previous comment.

[fix][broker] Key-shared subscription must follow consumer redelivery…

41d2c60

… as per shared sub semantic

rdhabalia added area/broker doc-not-needed Your PR changes do not impact docs ready-to-test labels Dec 2, 2023

rdhabalia self-assigned this Dec 2, 2023

github-actions bot added doc-label-missing and removed doc-not-needed Your PR changes do not impact docs labels Dec 2, 2023

github-actions bot added doc-not-needed Your PR changes do not impact docs and removed doc-label-missing labels Dec 2, 2023

joeCarf reviewed Dec 3, 2023

View reviewed changes

eolivelli reviewed Dec 4, 2023

View reviewed changes

Fix formatting and comment

a012718

codelipenghui requested review from Technoboy- and poorbarcode December 5, 2023 03:28

codelipenghui added this to the 3.2.0 milestone Dec 5, 2023

poorbarcode requested changes Dec 5, 2023

View reviewed changes

rdhabalia added 2 commits December 4, 2023 23:26

fix ordering and prevent dispatch connected consumer's delivered mess…

9a44010

…ages

fix flaky test

eb86950

fix formatting

f419701

codelipenghui requested changes Dec 7, 2023

View reviewed changes

codelipenghui reviewed Dec 7, 2023

View reviewed changes

poorbarcode reviewed Dec 8, 2023

View reviewed changes

eolivelli approved these changes Dec 8, 2023

View reviewed changes

Technoboy- modified the milestones: 3.2.0, 3.3.0 Dec 22, 2023

This was referenced Jan 11, 2024

[improve][broker] Unblock stuck Key_Shared subscription after consumer reconnect #21579

Closed

[improve][pip] PIP-319 Unblock stuck Key_Shared subscription after consumer reconnect #21615

Closed

coderzc modified the milestones: 3.3.0, 3.4.0 May 8, 2024

lhotari mentioned this pull request Aug 19, 2024

[Bug] Key_Shared subscription could deliver messages late and nonContiguousDeletedMessagesRange could exceed managedLedgerMaxUnackedRangesToPersist #23200

Closed

3 tasks

lhotari reviewed Sep 13, 2024

View reviewed changes

lhotari closed this Sep 16, 2024

rdhabalia reopened this Oct 7, 2024

lhotari modified the milestones: 4.0.0, 4.1.0 Oct 11, 2024

lhotari added the release/4.0.1 label Oct 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[fix][broker] Key-shared subscription must follow consumer redelivery as per shared sub semantic #21657

[fix][broker] Key-shared subscription must follow consumer redelivery as per shared sub semantic #21657

rdhabalia commented Dec 2, 2023 •

edited

Loading

github-actions bot commented Dec 2, 2023

joeCarf left a comment

eolivelli Dec 4, 2023

rdhabalia Dec 5, 2023

eolivelli Dec 4, 2023

rdhabalia Dec 5, 2023

eolivelli Dec 4, 2023

rdhabalia Dec 4, 2023

rdhabalia Dec 5, 2023

poorbarcode Dec 5, 2023 •

edited

Loading

rdhabalia Dec 5, 2023

poorbarcode Dec 5, 2023

rdhabalia Dec 5, 2023

Technoboy- Dec 6, 2023

rdhabalia Dec 6, 2023 •

edited

Loading

lhotari Sep 16, 2024

rdhabalia Oct 7, 2024

codecov-commenter commented Dec 6, 2023 •

edited

Loading

codelipenghui commented Dec 7, 2023 •

edited

Loading

rdhabalia commented Dec 7, 2023

rdhabalia commented Dec 7, 2023 •

edited

Loading

codelipenghui left a comment •

edited

Loading

codelipenghui Dec 7, 2023

rdhabalia Dec 7, 2023

codelipenghui Dec 8, 2023

codelipenghui Dec 8, 2023

rdhabalia Dec 8, 2023

poorbarcode Dec 8, 2023

eolivelli left a comment

lhotari Sep 13, 2024

lhotari commented Sep 16, 2024

lhotari commented Sep 16, 2024

No.	Consumer 1	Consumer 2	Consumer 3	Consumer 4
stat	handling `k1,k2`, `recent-join: null`	handling `k3,k4`, `recent-join: null`
1	received `M1(k1), M2(k2)`	received `1000` messages (`M3(k3)...M1002(k3)`)
2			added
description			assigned `k2` which from Consumer 1
stat	handling `k1`	handling `k3,k4`	handling `k2`, `recent-join: M1002`
3		closed
description	assigned `k3` which from Consumer 2		assigned `k4` which from Consumer 2
stat	handling `k1,k3`, `recent-join: null`		handling `k2, k4`, `recent-join: M1002`
4	received `M3(k3)...M1000(k3)`, the incoming queue is full now.
5				added
description				assigned `k3` which from Consumer 1
state	handling `k1`, `recent-join: null`		handling `k2, k4`, `recent-join: M1002`	handling `k3`, `recent-join: M1002`
6				received `M1001(k3)...M1002(k3)`

[fix][broker] Key-shared subscription must follow consumer redelivery as per shared sub semantic #21657

Are you sure you want to change the base?

[fix][broker] Key-shared subscription must follow consumer redelivery as per shared sub semantic #21657

Conversation

rdhabalia commented Dec 2, 2023 • edited Loading

Motivation

Modifications

Verifying this change

Does this pull request potentially affect one of the following parts:

Documentation

Matching PR in forked repository

github-actions bot commented Dec 2, 2023

joeCarf left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

poorbarcode Dec 5, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rdhabalia Dec 6, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov-commenter commented Dec 6, 2023 • edited Loading

Codecov Report

codelipenghui commented Dec 7, 2023 • edited Loading

rdhabalia commented Dec 7, 2023

rdhabalia commented Dec 7, 2023 • edited Loading

codelipenghui left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Menu:

1.The concerns of the test testKeySharedMessageRedeliveryWithoutStuck

Concens

What did the test(without my suggestion) prove

2.Clear the problem the PR is supposed to solve

3.This PR will bring issues

eolivelli left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lhotari commented Sep 16, 2024

lhotari commented Sep 16, 2024

rdhabalia commented Dec 2, 2023 •

edited

Loading

poorbarcode Dec 5, 2023 •

edited

Loading

rdhabalia Dec 6, 2023 •

edited

Loading

codecov-commenter commented Dec 6, 2023 •

edited

Loading

codelipenghui commented Dec 7, 2023 •

edited

Loading

rdhabalia commented Dec 7, 2023 •

edited

Loading

codelipenghui left a comment •

edited

Loading

1.The concerns of the test `testKeySharedMessageRedeliveryWithoutStuck`