-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Roach test for disk space usage #24795
Comments
This issue overlaps with (or is perhaps a duplicate of) #24278. Cc @a-robinson. |
The decommissioning test does aggressive rebalancing. Not suggesting that this should be folded in there per se, but at the very least it's an interesting example to look at. |
Also, the drop test verifies space reclamation, though only after a |
@a-robinson, We are planning to work on this issue, is anybody actively working on #24278 ? Thanks @tschottdorf , will peek into these tests. Is there a load that has a steady number of inserts and deletes, which I can use and that survive with a cluster with high disk usage with low TTL? |
Thanks for the update, @kaavee315. I'm not currently working on #24278. |
Closing by #25051 |
Older versions of Cockroachdb did not seem to give up unused space promptly. We plan to add a test to simulate the conditions and scenarios, in which older versions of cockroach didn't behave well. This is to ensure that the newer versions of CockroachDB do not have these bugs. We need to test at least the following scenarios:
Planned implemntation:-
Create a framework to fill various nodes in the clusters upto the given percentage(assume > 70%). This will first run some inserts through a cockroach workload, until cockroach is full to around 50%, then create a ballast file to take the rest of space. Then we can have steady rate of inserts and deletes using the same workload(though I didn't see any workload of this kind, so maybe just run inserts and deletes concurrently), drop/ truncate tables used by the workload, to test various issues. We can also take down various nodes for simulating failures.
In one phase, when one of the nodes is made nearly full(95%), and others relatively free(80%), we would like to check rebalancing should be triggered and later compaction should free up space(to check #21178, and #22235)
In 2nd phase, we can delete the ballast file on some of the nodes, and check if the rebalancing lead to even disk usage across all nodes.
For the replica transfer loop, we can fill all the nodes close to full, with minor differences(90% and 95%) and check if it causes the steady read/writes to halt (to check #21400)
We would also like to check #21260, when the range oprations increase due to rebalancing in higher disk usage scenarios.
The text was updated successfully, but these errors were encountered: