Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce producer lookups and connections in partitioned producers #11496

Closed
Vanlightly opened this issue Jul 29, 2021 · 2 comments · Fixed by #11570
Closed

Reduce producer lookups and connections in partitioned producers #11496

Vanlightly opened this issue Jul 29, 2021 · 2 comments · Fixed by #11570
Labels
type/enhancement The enhancements for the existing features or docs. e.g. reduce memory usage of the delayed messages

Comments

@Vanlightly
Copy link
Contributor

Vanlightly commented Jul 29, 2021

Is your enhancement request related to a problem? Please describe.
Producers that send messages to partitioned topics start a producer per partition, even when using single partition routing. For topics that have the combination of a large number of producers and a large number of partitions, this can put strain on the brokers. With say 1000 partitions and single partition routing with non-keyed messages, 999 topic owner lookups and producer registrations are performed that could be avoided.

Describe the solution you'd like

Option 1 - Strict Single Partition Routing
The problem is that we have no way of knowing which partitions will be involved upon producer creation, even when using Single Partition routing. The problem with this is that the user code can still use keyed messages which may then involve more than one partition.

Solution: offer a strict single partition routing mode where we guarantee that all messages will only be sent to a single partition, keyed or not. This would allow us to only start a single producer on the creation of the partitioned producer.

Option 2 - Lazy Producer Start
Allow for producers in the partitioned producer class to be started lazily, upon the first message being sent to their particular partition. This would be controlled via a new producer configuration as this behaviour only benefits those who:

  • have topic with a large number of partitions in a topic
  • single partition routing is used
  • messages are non-keyed
  • potentially have a large number of producers to the topic

Messages will be buffered while the connection to the topic owner is carried out.

The downside is that there will be extra latency on the first messages being published. The send timeout timer is only started once the producer is connected so this means that timeouts should not trigger. Only if the send timeout is set very low and the number of pending messages is high might we typically see send timeouts because of this change.

Describe alternatives you've considered
Just option 1 and 2.

I have an implementation of lazy producer start for the C++ client in case option 2 is preferred. I can contribute the work to both Java and C++ clients whether it is ultimately option 1 or 2 that we select.

@Vanlightly Vanlightly added the type/enhancement The enhancements for the existing features or docs. e.g. reduce memory usage of the delayed messages label Jul 29, 2021
@equanz
Copy link
Contributor

equanz commented Aug 4, 2021

I have published similar solution in the PIP-79. If so, could you please review or discuss it in the PR?

@Vanlightly
Copy link
Contributor Author

@equanz I have created an MR with my C++ changes, which are very similar to your Java client changes and fall nicely inside your PIP (its a subset of the PIP). Some differences, such as allowing lazy producers to be configurable, and how the lazy start is kicked off.

codelipenghui pushed a commit that referenced this issue Aug 16, 2021
Fixes #11496 also matches part of PIP 79.

C++ implementation that closely matches the proposed Java client changes from reducing partitioned producer connections and lookups: PR 10279

### Motivation

Producers that send messages to partitioned topics start a producer per partition, even when using single partition routing. For topics that have the combination of a large number of producers and a large number of partitions, this can put strain on the brokers. With say 1000 partitions and single partition routing with non-keyed messages, 999 topic owner lookups and producer registrations are performed that could be avoided.

PIP 79 also describes this. I wrote this before realising that PIP 79 also covers this. This implementation can be reviewed and contrasted to the Java client implementation in #10279.

### Modifications

Allows partitioned producers to start producers for individual partitions lazily. Starting a producer involves a topic owner
lookup to find out which broker is the owner of the partition, then registering the producer for that partition with the owner
broker. For topics with many partitions and when using SinglePartition routing without keyed messages, all of these
lookups and producer registrations are a waste except for the single chosen partition.

This change allows the user to control whether a producer on a partitioned topic uses this lazy start or not, via a new config
in ProducerConfiguration. When ProducerConfiguration.setLazyStartPartitionedProducers(true) is set, the PartitionedProducerImpl.start() becomes a synchronous operation that only does housekeeping (no network operations).
The producer of any given partition is started (which includes a topic owner lookup and registration) upon sending the first message to that partition. While the producer starts, messages are buffered.

The sendTimeout timer is only activated once a producer has been fully started, which should give enough time for any buffered messages to be sent. For very short send timeouts, this setting could cause send timeouts during the start phase. The default of 30s should however not cause this issue.
codelipenghui pushed a commit that referenced this issue Sep 10, 2021
Fixes #11496 also matches part of PIP 79.

C++ implementation that closely matches the proposed Java client changes from reducing partitioned producer connections and lookups: PR 10279

### Motivation

Producers that send messages to partitioned topics start a producer per partition, even when using single partition routing. For topics that have the combination of a large number of producers and a large number of partitions, this can put strain on the brokers. With say 1000 partitions and single partition routing with non-keyed messages, 999 topic owner lookups and producer registrations are performed that could be avoided.

PIP 79 also describes this. I wrote this before realising that PIP 79 also covers this. This implementation can be reviewed and contrasted to the Java client implementation in #10279.

### Modifications

Allows partitioned producers to start producers for individual partitions lazily. Starting a producer involves a topic owner
lookup to find out which broker is the owner of the partition, then registering the producer for that partition with the owner
broker. For topics with many partitions and when using SinglePartition routing without keyed messages, all of these
lookups and producer registrations are a waste except for the single chosen partition.

This change allows the user to control whether a producer on a partitioned topic uses this lazy start or not, via a new config
in ProducerConfiguration. When ProducerConfiguration.setLazyStartPartitionedProducers(true) is set, the PartitionedProducerImpl.start() becomes a synchronous operation that only does housekeeping (no network operations).
The producer of any given partition is started (which includes a topic owner lookup and registration) upon sending the first message to that partition. While the producer starts, messages are buffered.

The sendTimeout timer is only activated once a producer has been fully started, which should give enough time for any buffered messages to be sent. For very short send timeouts, this setting could cause send timeouts during the start phase. The default of 30s should however not cause this issue.

(cherry picked from commit 9577b84)
bharanic-dev pushed a commit to bharanic-dev/pulsar that referenced this issue Mar 18, 2022
…e#11570)

Fixes apache#11496 also matches part of PIP 79.

C++ implementation that closely matches the proposed Java client changes from reducing partitioned producer connections and lookups: PR 10279

### Motivation

Producers that send messages to partitioned topics start a producer per partition, even when using single partition routing. For topics that have the combination of a large number of producers and a large number of partitions, this can put strain on the brokers. With say 1000 partitions and single partition routing with non-keyed messages, 999 topic owner lookups and producer registrations are performed that could be avoided.

PIP 79 also describes this. I wrote this before realising that PIP 79 also covers this. This implementation can be reviewed and contrasted to the Java client implementation in apache#10279.

### Modifications

Allows partitioned producers to start producers for individual partitions lazily. Starting a producer involves a topic owner
lookup to find out which broker is the owner of the partition, then registering the producer for that partition with the owner
broker. For topics with many partitions and when using SinglePartition routing without keyed messages, all of these
lookups and producer registrations are a waste except for the single chosen partition.

This change allows the user to control whether a producer on a partitioned topic uses this lazy start or not, via a new config
in ProducerConfiguration. When ProducerConfiguration.setLazyStartPartitionedProducers(true) is set, the PartitionedProducerImpl.start() becomes a synchronous operation that only does housekeeping (no network operations).
The producer of any given partition is started (which includes a topic owner lookup and registration) upon sending the first message to that partition. While the producer starts, messages are buffered.

The sendTimeout timer is only activated once a producer has been fully started, which should give enough time for any buffered messages to be sent. For very short send timeouts, this setting could cause send timeouts during the start phase. The default of 30s should however not cause this issue.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/enhancement The enhancements for the existing features or docs. e.g. reduce memory usage of the delayed messages
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants