-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bulk: increased split-after size can result in replica imbalance #75664
Comments
cc: @dt |
cc @cockroachdb/bulk-io |
Reverting in #75882 for now, will try to find some time next week to poke at this, and see why the closer-to-default split size does so badly / what the extremely low split size was doing that was better for the allocator. Maybe the bigger range counts were covering for scatter's known padding issues? Might need @aayushshah15 to help me poke at this when I get to it. |
I'd still expect the allocator to fix straight-up replica count divergence though. Very curious why this was happening. Feel free to put something on my calendar if you want to look together, but I can also just try repro-ing on my own with the split size bumped. |
I am playing with this a little bit (i.e. running with The cluster is at http://104.196.120.245:26258/#/metrics/replication/cluster The main things that I've noticed so far are that, with Additionally, as far as I can tell, the allocator is continuously rebalancing. The rate at which we're converging is just slower than the rate at which newer splits are being created. After the import is done, I'd expect this to fully converge.
|
fixed by #77588 |
Describe the problem
Spin-off from #68303 - provided a distilled version. Context on original, here.
It appears as though e12c9e6 causes replica imbalance which doesn't play nicely with large imports.
With that commit, on
clearrange/checks=true
, the import fails as a number of nodes run out of disk due to the imbalance:Without that commit the import succeeds:
Jira issue: CRDB-12773
The text was updated successfully, but these errors were encountered: