Skip to content
This repository has been archived by the owner on Apr 1, 2024. It is now read-only.

ISSUE-18598: Discuss about reverting #16937 "skip mis-configured resource usage(>100%) in load balancer" #5175

Open
1 of 2 tasks
sijie opened this issue Nov 25, 2022 · 0 comments

Comments

@sijie
Copy link
Member

sijie commented Nov 25, 2022

Original Issue: apache#18598


Search before asking

  • I searched in the issues and found nothing similar.

Motivation

apache#16937 has corrected the misconfigured resource usage. But if the user configs the wrong one, the error log will print all the time. See the below logs:

image

And after diving into the modification, we find out that it's a breaking change.
Before apache#16937, the below test could pass, but after apache#16937, the below test fails


    @Test
    public void testBrokerThreshold() {
        LoadData loadData = new LoadData();
        LocalBrokerData broker1 = new LocalBrokerData();
        broker1.setCpu(new ResourceUsage(1000, 100));    // Mock data.  It means we can set `loadBalancerCPUResourceWeight=100`, but the current CPU usage is only 10%.
        broker1.setMemory(new ResourceUsage(10, 100));
        broker1.setDirectMemory(new ResourceUsage(10, 100));
        broker1.setBandwidthIn(new ResourceUsage(500, 1000));
        broker1.setBandwidthOut(new ResourceUsage(500, 1000));
        broker1.setBundles(Sets.newHashSet("bundle-1", "bundle-2"));
        broker1.setMsgThroughputIn(Double.MAX_VALUE);

        LocalBrokerData broker2 = new LocalBrokerData();
        broker2.setCpu(new ResourceUsage(10, 100));
        broker2.setMemory(new ResourceUsage(10, 100));
        broker2.setDirectMemory(new ResourceUsage(10, 100));
        broker2.setBandwidthIn(new ResourceUsage(500, 1000));
        broker2.setBandwidthOut(new ResourceUsage(500, 1000));
        broker2.setBundles(Sets.newHashSet("bundle-3", "bundle-4"));

        BundleData bundleData = new BundleData();
        TimeAverageMessageData timeAverageMessageData = new TimeAverageMessageData();
        timeAverageMessageData.setMsgThroughputIn(1000);
        timeAverageMessageData.setMsgThroughputOut(1000);
        bundleData.setShortTermData(timeAverageMessageData);
        loadData.getBundleData().put("bundle-1", bundleData);

        loadData.getBrokerData().put("broker-1", new BrokerData(broker1));
        loadData.getBrokerData().put("broker-2", new BrokerData(broker2));

        assertFalse(thresholdShedder.findBundlesForUnloading(loadData, conf).isEmpty());
    }

This means if the user configures the wrong weight, the loader balancer may not work anymore.

And since apache#6772 has supported configured resources weight, apache#16937 breaks the case apache#6772 mentioned

It is hard to determine the threshold value, the default threshold is 85%. But for a broker, the max resource usage is few to reach 85%, which will lead to unbalanced traffic between brokers. The heavy traffic broker's read cache hit rate will decrease.

When you restart the most brokers of the pulsar cluster at the same time, the whole traffic in the cluster will goes to the rest brokers. The restarted brokers will have no traffic for a long time, due to the rest brokers max resource usage not reach the threshold.

So I think we need to revert apache#16937

Solution

No response

Alternatives

No response

Anything else?

No response

Are you willing to submit a PR?

  • I'm willing to submit a PR!
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant