Reduce producer lookups and connections in partitioned producers #11496

Vanlightly · 2021-07-29T08:11:02Z

Is your enhancement request related to a problem? Please describe.
Producers that send messages to partitioned topics start a producer per partition, even when using single partition routing. For topics that have the combination of a large number of producers and a large number of partitions, this can put strain on the brokers. With say 1000 partitions and single partition routing with non-keyed messages, 999 topic owner lookups and producer registrations are performed that could be avoided.

Describe the solution you'd like

Option 1 - Strict Single Partition Routing
The problem is that we have no way of knowing which partitions will be involved upon producer creation, even when using Single Partition routing. The problem with this is that the user code can still use keyed messages which may then involve more than one partition.

Solution: offer a strict single partition routing mode where we guarantee that all messages will only be sent to a single partition, keyed or not. This would allow us to only start a single producer on the creation of the partitioned producer.

Option 2 - Lazy Producer Start
Allow for producers in the partitioned producer class to be started lazily, upon the first message being sent to their particular partition. This would be controlled via a new producer configuration as this behaviour only benefits those who:

have topic with a large number of partitions in a topic
single partition routing is used
messages are non-keyed
potentially have a large number of producers to the topic

Messages will be buffered while the connection to the topic owner is carried out.

The downside is that there will be extra latency on the first messages being published. The send timeout timer is only started once the producer is connected so this means that timeouts should not trigger. Only if the send timeout is set very low and the number of pending messages is high might we typically see send timeouts because of this change.

Describe alternatives you've considered
Just option 1 and 2.

I have an implementation of lazy producer start for the C++ client in case option 2 is preferred. I can contribute the work to both Java and C++ clients whether it is ultimately option 1 or 2 that we select.

equanz · 2021-08-04T05:19:13Z

I have published similar solution in the PIP-79. If so, could you please review or discuss it in the PR?

Vanlightly · 2021-08-05T13:43:39Z

@equanz I have created an MR with my C++ changes, which are very similar to your Java client changes and fall nicely inside your PIP (its a subset of the PIP). Some differences, such as allowing lazy producers to be configurable, and how the lazy start is kicked off.

Fixes #11496 also matches part of PIP 79. C++ implementation that closely matches the proposed Java client changes from reducing partitioned producer connections and lookups: PR 10279 ### Motivation Producers that send messages to partitioned topics start a producer per partition, even when using single partition routing. For topics that have the combination of a large number of producers and a large number of partitions, this can put strain on the brokers. With say 1000 partitions and single partition routing with non-keyed messages, 999 topic owner lookups and producer registrations are performed that could be avoided. PIP 79 also describes this. I wrote this before realising that PIP 79 also covers this. This implementation can be reviewed and contrasted to the Java client implementation in #10279. ### Modifications Allows partitioned producers to start producers for individual partitions lazily. Starting a producer involves a topic owner lookup to find out which broker is the owner of the partition, then registering the producer for that partition with the owner broker. For topics with many partitions and when using SinglePartition routing without keyed messages, all of these lookups and producer registrations are a waste except for the single chosen partition. This change allows the user to control whether a producer on a partitioned topic uses this lazy start or not, via a new config in ProducerConfiguration. When ProducerConfiguration.setLazyStartPartitionedProducers(true) is set, the PartitionedProducerImpl.start() becomes a synchronous operation that only does housekeeping (no network operations). The producer of any given partition is started (which includes a topic owner lookup and registration) upon sending the first message to that partition. While the producer starts, messages are buffered. The sendTimeout timer is only activated once a producer has been fully started, which should give enough time for any buffered messages to be sent. For very short send timeouts, this setting could cause send timeouts during the start phase. The default of 30s should however not cause this issue.

Fixes #11496 also matches part of PIP 79. C++ implementation that closely matches the proposed Java client changes from reducing partitioned producer connections and lookups: PR 10279 ### Motivation Producers that send messages to partitioned topics start a producer per partition, even when using single partition routing. For topics that have the combination of a large number of producers and a large number of partitions, this can put strain on the brokers. With say 1000 partitions and single partition routing with non-keyed messages, 999 topic owner lookups and producer registrations are performed that could be avoided. PIP 79 also describes this. I wrote this before realising that PIP 79 also covers this. This implementation can be reviewed and contrasted to the Java client implementation in #10279. ### Modifications Allows partitioned producers to start producers for individual partitions lazily. Starting a producer involves a topic owner lookup to find out which broker is the owner of the partition, then registering the producer for that partition with the owner broker. For topics with many partitions and when using SinglePartition routing without keyed messages, all of these lookups and producer registrations are a waste except for the single chosen partition. This change allows the user to control whether a producer on a partitioned topic uses this lazy start or not, via a new config in ProducerConfiguration. When ProducerConfiguration.setLazyStartPartitionedProducers(true) is set, the PartitionedProducerImpl.start() becomes a synchronous operation that only does housekeeping (no network operations). The producer of any given partition is started (which includes a topic owner lookup and registration) upon sending the first message to that partition. While the producer starts, messages are buffered. The sendTimeout timer is only activated once a producer has been fully started, which should give enough time for any buffered messages to be sent. For very short send timeouts, this setting could cause send timeouts during the start phase. The default of 30s should however not cause this issue. (cherry picked from commit 9577b84)

…e#11570) Fixes apache#11496 also matches part of PIP 79. C++ implementation that closely matches the proposed Java client changes from reducing partitioned producer connections and lookups: PR 10279 ### Motivation Producers that send messages to partitioned topics start a producer per partition, even when using single partition routing. For topics that have the combination of a large number of producers and a large number of partitions, this can put strain on the brokers. With say 1000 partitions and single partition routing with non-keyed messages, 999 topic owner lookups and producer registrations are performed that could be avoided. PIP 79 also describes this. I wrote this before realising that PIP 79 also covers this. This implementation can be reviewed and contrasted to the Java client implementation in apache#10279. ### Modifications Allows partitioned producers to start producers for individual partitions lazily. Starting a producer involves a topic owner lookup to find out which broker is the owner of the partition, then registering the producer for that partition with the owner broker. For topics with many partitions and when using SinglePartition routing without keyed messages, all of these lookups and producer registrations are a waste except for the single chosen partition. This change allows the user to control whether a producer on a partitioned topic uses this lazy start or not, via a new config in ProducerConfiguration. When ProducerConfiguration.setLazyStartPartitionedProducers(true) is set, the PartitionedProducerImpl.start() becomes a synchronous operation that only does housekeeping (no network operations). The producer of any given partition is started (which includes a topic owner lookup and registration) upon sending the first message to that partition. While the producer starts, messages are buffered. The sendTimeout timer is only activated once a producer has been fully started, which should give enough time for any buffered messages to be sent. For very short send timeouts, this setting could cause send timeouts during the start phase. The default of 30s should however not cause this issue.

Vanlightly added the type/enhancement The enhancements for the existing features or docs. e.g. reduce memory usage of the delayed messages label Jul 29, 2021

sijie mentioned this issue Jul 29, 2021

ISSUE-11496: Reduce producer lookups and connections in partitioned producers streamnative/pulsar-archived#2847

Closed

Vanlightly mentioned this issue Aug 5, 2021

[Issue 11496][C++] Allow partitioned producers to start lazily #11570

Merged

codelipenghui closed this as completed in #11570 Aug 16, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce producer lookups and connections in partitioned producers #11496

Reduce producer lookups and connections in partitioned producers #11496

Vanlightly commented Jul 29, 2021 •

edited

Loading

equanz commented Aug 4, 2021

Vanlightly commented Aug 5, 2021

Reduce producer lookups and connections in partitioned producers #11496

Reduce producer lookups and connections in partitioned producers #11496

Comments

Vanlightly commented Jul 29, 2021 • edited Loading

equanz commented Aug 4, 2021

Vanlightly commented Aug 5, 2021

Vanlightly commented Jul 29, 2021 •

edited

Loading