Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[improve][broker] add switch for enable/disable distribute bundles evenly in LoadManager #16059

Merged

Conversation

HQebupt
Copy link
Contributor

@HQebupt HQebupt commented Jun 14, 2022

Motivation

When we use ModualLoadManager as LoadManager in the pulsar cluster, and the load balancer shedding strategy is ThresholdShedder, we found that the unloaded bundles might be loaded by another broker which has resource usage above average load, and make the new broker be overloaded again, which frequently causes unloading bundles.

loadManagerClassName=org.apache.pulsar.broker.loadbalance.impl.ModularLoadManagerImpl
loadBalancerLoadSheddingStrategy=org.apache.pulsar.broker.loadbalance.impl.ThresholdShedder
loadBalancerBrokerThresholdShedderPercentage=10

We found that it tooks more than 10 hours and 400+ times to doing load shedding in the cluster. And the cluster continue to unload bundles when some of topics are lots of traffic in a short time. Below is the unload metric.
image
image

The key reason is that some brokers with lower load but more bundles can not be candidate due to distributing bundles evenly in LoadManager by force. Most of brokers are filtered out by the strategy, only 1 or 2 brokers can be candidate in the total 136 brokers as follows.
image

It could be much better to disable distribute bundles evenly in LoadManager, which can select the broker from those having resource usage below average load, so it can prevent the least loaded broker from quickly becoming heavily loaded.

Therefore, it recommend that enable distribute bundles evenly among all brokers by customers according to user scenarios .

After disabling distribute bundles evenly in LoadManager, the brokers with lower load but more bunldes can be candidate. It reduced the unload times and the cluster is stable with even load on each broker as follows.
image
image

Modifications

  • Add a field property loadBalancerDistributeBundlesEvenlyEnabled for Load Balancer in ServiceConfiguration, keep it to be true as default.
  • Add a switch to enable distribute bundles evenly in ModularLoadManager

Verifying this change

  • Make sure that the change passes the CI checks.

This change is a trivial rework / code cleanup without any test coverage.

Does this pull request potentially affect one of the following parts:

If yes was chosen, please highlight the changes

  • Dependencies (does it add or upgrade a dependency): (no)
  • The public API: (no)
  • The schema: (no)
  • The default values of configurations: (no)
  • The wire protocol: (no)
  • The rest endpoints: (no)
  • The admin cli options: (no)
  • Anything that affects deployment: (no)

Documentation

Check the box below and label this PR (if you have committer privilege).

Need to update docs?

  • no-need-doc

  • doc-not-needed

@Anonymitaet Anonymitaet added the doc-not-needed Your PR changes do not impact docs label Jun 15, 2022
@HQebupt HQebupt force-pushed the loadBalancerDistributeBundlesEvenlyEnabled branch from 31b1977 to 448404d Compare June 15, 2022 03:17
@HQebupt
Copy link
Contributor Author

HQebupt commented Jun 15, 2022

/pulsarbot run-failure-checks

@Jason918 Jason918 added this to the 2.11.0 milestone Jun 16, 2022
@Jason918 Jason918 added type/enhancement The enhancements for the existing features or docs. e.g. reduce memory usage of the delayed messages area/broker labels Jun 16, 2022
Copy link
Contributor

@gaozhangmin gaozhangmin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work, We encountered this issue early too.

@Jason918
Copy link
Contributor

@hangc0276 PTAL

Copy link
Contributor

@hangc0276 hangc0276 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job.

@heesung-sn
Copy link
Contributor

Discussing 2.10 cherry-pick here: https://lists.apache.org/thread/2mq3h9gpqv1b4zyyp2cddfltlqz3wtg0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/broker doc-not-needed Your PR changes do not impact docs type/enhancement The enhancements for the existing features or docs. e.g. reduce memory usage of the delayed messages
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants