-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Discuss about reverting #16937 "skip mis-configured resource usage(>100%) in load balancer" #18598
Comments
Unfortunately #16937 has been released also with 2.10.2 ! Do you mean that now you cannot configure a weight that makes the usage over 100% ? |
Ah, suppose the real CPU usage is only 70%, but we configure |
#16937 added some useful logs. The "problem" you are pointing out is here I am not sure that this is a real problem or not, it looks like that setting loadBalancerCPUResourceWeight= 100 and see CPU usage as 2000% is like a hack. Maybe we can add a flag to allow the old behaviour. Do you know some legit usecase ? Currently I lean toward keeping the current version, and at most add a flag to allow the previous behaviour if you are aware of some user who can be hurt by this change. |
I discussed this issue with Hang a few days ago. Hang is the initial designer of the threshold shedder. The weight is not required to <= 1. I think we mistakenly merged #16937. Users can use any non-negative number for the weight. I support reverting the PR first and revisiting the issue that @heesung-sn wants to fix. |
I think the demo test is not clear, I have updated the test. |
In the Pulsar load balance strategy, including OverloadShedder and ThresholdShedder, the weight of each resource is not ensured in [0, 1]. The total
For the motivation of #16937, the weight of the resource is not misconfigured, and it will break the old behavior, and lead to load balance not working after applying this PR in their cluster. I support reverting this PR. |
Sure. If resourceUsage * weight is expected to be more than 100%, we should revert this change. The motivation does not hold. Thanks for taking care of this. |
Note: If resourceUsage > 100% becomes the winner, the moving avg function will decay it more slowly than others( unfair signal treatment). But I assume this is the intention of the weighted configs too. |
+1 for reverting the PR. And I don't think this feature (set the weight over 1.0) is widely used, So let's just do this the right way. Please do the same with the 2.10 branch and we should add an |
Thanks for your explanations. I agree to revert the patch here and in 2.10 |
Search before asking
Motivation
#16937 has corrected the misconfigured resource usage. But if the user configs the wrong one, the error log will print all the time. See the below logs:
And after diving into the modification, we find out that it's a breaking change.
Before #16937, the below test could pass, but after #16937, the below test fails
This means the real CPU usage is only 70%, but we configure loadBalancerCPUResourceWeight= 2, so the current CPU usage is 140%. This will cause the broker to unload some bundles before #16937. But now, it won't.
And since #6772 has supported configured resources weight, #16937 breaks the case #6772 mentioned
So I think we need to revert #16937
Solution
No response
Alternatives
No response
Anything else?
No response
Are you willing to submit a PR?
The text was updated successfully, but these errors were encountered: