Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PIP-38] Support batch receive in java client. #4621

Merged
merged 20 commits into from
Nov 19, 2019

Conversation

codelipenghui
Copy link
Contributor

@codelipenghui codelipenghui commented Jun 27, 2019

Motivation

Support messages batch receiving, some application scenarios can be made simpler. Users often increase application throughput through batch operations. For example, batch insert or update database.

At present, we provide the ability to receive a single message. If users want to take advantage of batch operating advantages, need to implement a message collector him self. So this proposal aims to provide a universal interface and mechanism for batch receiving messages.

For example:

Messages messages = consumer.batchReceive();
insertToDB(messages);
consumer.acknowledge(messages);

Verifying this change

Added new UT to verify this change.

Does this pull request potentially affect one of the following parts:

If yes was chosen, please highlight the changes

  • Dependencies (does it add or upgrade a dependency): (no)
  • The public API: (yes)
  • The schema: (no)
  • The default values of configurations: (no)
  • The wire protocol: (no)
  • The rest endpoints: (no)
  • The admin cli options: (no)
  • Anything that affects deployment: (no)

Documentation

  • Does this pull request introduce a new feature? (yes)
  • If yes, how is the feature documented? (docs and JavaDocs)

@codelipenghui codelipenghui self-assigned this Jun 27, 2019
@codelipenghui codelipenghui added this to the 2.4.1 milestone Jun 27, 2019
@codelipenghui codelipenghui added area/client type/feature The PR added a new feature or issue requested a new feature labels Jun 27, 2019
Copy link
Member

@sijie sijie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@codelipenghui this is a great change. left a few comments.


Additionally it might be good to have a few more followup changes to optimize this further.

The current pulsar client breaks a message batch to individual messages and collect multiple message into a Messages. There is a lot of unuseful object conversations.

Ideally the pulsar client implementation should

a) keep a queue of Messages. Each Messages is a message batch or multiple message batches.
b) on receiving individual message, it polls a Messages from the queue, and poll a message out of the Messages.

This can allow lazy deserialization and object creation, and it will increase the throughput using batch receive api because your cpu cycles can be reduced.

/**
* Max size of message for a single batch receive, 0 or negative means no limit.
*/
private long maxSizeOfMessages;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maxSizeOfMessages sounds a bit confusing.

I would suggest maxNumMessages and maxNumBytes to replace maxNumberOfMessages and maxSizeOfMessages. thoughts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

* client.newConsumer().batchReceivePolicy(BatchReceivePolicy.builder()
* .maxNumberOfMessages(100)
* .maxSizeOfMessages(5 * 1024 * 1024)
* .timeout(100)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is better to have a builder method timeout(long timeout, TimeUnit timeoutUnit) rather than having two separated methods.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@merlimat
Copy link
Contributor

@codelipenghui Since this is a significant addition to the client API, can you create a PIP with description and examples of the code?
Also, I'd mark this for 2.5 rather that 2.4.1 which should only contain bug-fixes (or very small improvements).

@codelipenghui codelipenghui modified the milestones: 2.4.1, 2.5.0 Jun 28, 2019
@codelipenghui
Copy link
Contributor Author

@merlimat I have already move it to 2.5.0 and will create a PIP soon.

@codelipenghui
Copy link
Contributor Author

@merlimat @sijie I have already create PIP-38 in wiki and google doc, please take a look.

Hope to get your advice on google doc, i will sync the update to the PIP wiki.

@codelipenghui codelipenghui changed the title Support batch receive in java client. [PIP-38] Support batch receive in java client. Jul 1, 2019
@codelipenghui
Copy link
Contributor Author

@sijie I have addressed your comments, please review again.

@codelipenghui
Copy link
Contributor Author

run java8 tests
run Integration Tests

6 similar comments
@codelipenghui
Copy link
Contributor Author

run java8 tests
run Integration Tests

@codelipenghui
Copy link
Contributor Author

run java8 tests
run Integration Tests

@codelipenghui
Copy link
Contributor Author

run java8 tests
run Integration Tests

@codelipenghui
Copy link
Contributor Author

run java8 tests
run Integration Tests

@codelipenghui
Copy link
Contributor Author

run java8 tests
run Integration Tests

@codelipenghui
Copy link
Contributor Author

run java8 tests
run Integration Tests

@sijie
Copy link
Member

sijie commented Jul 3, 2019

I have already create PIP-38 in wiki and google doc, please take a look.

@codelipenghui can you send an email to the dev@ mailing list to start the discussion?

@codelipenghui
Copy link
Contributor Author

@sijie Oh, sorry i forgot it, i will send a email soon.

@codelipenghui
Copy link
Contributor Author

@merlimat
Please help take a look PIP-38 when you have time, here is the discuss thread: https://lists.apache.org/thread.html/3e2a87d31bf8a98142bd68545714cdbf5d87011b4ae3909c5c9f43b9@%3Cdev.pulsar.apache.org%3E
Thanks.

@codelipenghui
Copy link
Contributor Author

run java8 tests
run Integration Tests

@codelipenghui
Copy link
Contributor Author

run java8 tests
run integration tests

@codelipenghui codelipenghui force-pushed the batch_message_receiving branch from fee1af0 to d201d8a Compare August 10, 2019 01:44
Copy link
Member

@sijie sijie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@codelipenghui overall looks good. left some comments there.

* Max number of bytes: 10MB
* Timeout: 100ms
*/
public static final BatchReceivePolicy DEFAULT_POLICY = new BatchReceivePolicy(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in general I am not in favor of using "number of messages" in any configuration or policies. In a multi-tenant system, message size varies between tenants and applications. so I would actually remove the limit of number of message just rely on number of bytes for a default policy.

Hence my recommendation would be:

BatchReceivePolicy DEFAULT_POLICY = new BatchReceivePolicy(-1, 10 * 1024 * 1024, 100, TimeUnit.MILLISECONDS);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix it

/**
* Max number of messages for a single batch receive, 0 or negative means no limit.
*/
private int maxNumMessages;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since you already used builder pattern, just make these variables final

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix it

/**
* Get the list {@link Message}
*/
List<Message<T>> getMessageList();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to expose this? my feeling is to avoid exposing such method until it is really needed.

Copy link
Contributor Author

@codelipenghui codelipenghui Aug 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove

@@ -314,6 +315,14 @@ public ConsumerBuilderImpl(PulsarClientImpl client, Schema<T> schema) {
return this;
}

@Override
public ConsumerBuilder<T> batchReceivePolicy(BatchReceivePolicy batchReceivePolicy) {
checkArgument(batchReceivePolicy != null, "batchReceivePolicy must not be null.");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@codelipenghui we have already moved all the validation to ConsumerConfigurationData. Can you move the validation there?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see all the validations are still in ConsumerBuilderImpl.

/**
* Max bytes of messages for a single batch receive, 0 or negative means no limit.
*/
private long maxNumBytes;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need long here. an int should be good enough. Because you cannot really hold a "long"-sized buffer in memory.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix it

} catch (InterruptedException | ExecutionException e) {
State state = getState();
if (state != State.Closing && state != State.Closed) {
stats.incrementNumReceiveFailed();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need a new metric for batchReceive?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, i will add metrics for batch receive.

protected CompletableFuture<Messages<T>> internalBatchReceiveAsync() {
CompletableFuture<Messages<T>> result = new CompletableFuture<>();
try {
lock.writeLock().lock();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we use a writeLock or a readLock?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use read lock may cause Messages return early(have not reached capacity yet), so use write lock here.

try {
msg = incomingMessages.poll(0L, TimeUnit.MILLISECONDS);
} catch (InterruptedException e) {
// ignore
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@codelipenghui Can we just use poll()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes fix it

void notifyPendingBatchReceivedCallBack(OpBatchReceive<T> opBatchReceive) {
MessagesImpl<T> messages = new MessagesImpl<>(batchReceivePolicy.getMaxNumMessages(),
batchReceivePolicy.getMaxNumBytes());
Message<T> msgPeeked = incomingMessages.peek();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the similar pattern in a few places. Can we make a function for it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@@ -80,6 +91,11 @@ protected ConsumerBase(PulsarClientImpl client, String topic, ConsumerConfigurat
this.pendingReceives = Queues.newConcurrentLinkedQueue();
this.schema = schema;
this.interceptors = interceptors;
this.batchReceivePolicy = conf.getBatchReceivePolicy();
this.pendingBatchReceives = Queues.newConcurrentLinkedQueue();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you delay the creation of this queue util the first batchReceive is called? I am try to reduce creating the queue if the consumer is not using batchReceive.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix it

@codelipenghui codelipenghui force-pushed the batch_message_receiving branch from 4e7a9ec to 5f5692f Compare November 13, 2019 06:38
@codelipenghui
Copy link
Contributor Author

@sijie rebased

@codelipenghui
Copy link
Contributor Author

run java8 tests

1 similar comment
@codelipenghui
Copy link
Contributor Author

run java8 tests

@jiazhai jiazhai merged commit 56517d5 into apache:master Nov 19, 2019
@ssrkr
Copy link

ssrkr commented Nov 10, 2020

Anyone working on adding the support for batchRecieve to Golang client?
People apparently are expecting it already. Please check https://stackoverflow.com/questions/62173028/reading-messages-in-bulk-through-a-pulsar-consumer

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/client type/feature The PR added a new feature or issue requested a new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants