Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Roach test for disk space usage #24795

Closed
kaavee315 opened this issue Apr 13, 2018 · 6 comments
Closed

Roach test for disk space usage #24795

kaavee315 opened this issue Apr 13, 2018 · 6 comments

Comments

@kaavee315
Copy link
Contributor

kaavee315 commented Apr 13, 2018

Older versions of Cockroachdb did not seem to give up unused space promptly. We plan to add a test to simulate the conditions and scenarios, in which older versions of cockroach didn't behave well. This is to ensure that the newer versions of CockroachDB do not have these bugs. We need to test at least the following scenarios:

  1. Running a cluster close to 95% full with a steady rate of inserts and deletes works.
  2. Rebalancing (due to node down, fullness, large number of inserts, etc.) reclaims space from the rebalancing source node.
  3. The issue where multiple nodes being almost full leads to replica transfer loop.

Planned implemntation:-
Create a framework to fill various nodes in the clusters upto the given percentage(assume > 70%). This will first run some inserts through a cockroach workload, until cockroach is full to around 50%, then create a ballast file to take the rest of space. Then we can have steady rate of inserts and deletes using the same workload(though I didn't see any workload of this kind, so maybe just run inserts and deletes concurrently), drop/ truncate tables used by the workload, to test various issues. We can also take down various nodes for simulating failures.
In one phase, when one of the nodes is made nearly full(95%), and others relatively free(80%), we would like to check rebalancing should be triggered and later compaction should free up space(to check #21178, and #22235)
In 2nd phase, we can delete the ballast file on some of the nodes, and check if the rebalancing lead to even disk usage across all nodes.
For the replica transfer loop, we can fill all the nodes close to full, with minor differences(90% and 95%) and check if it causes the steady read/writes to halt (to check #21400)
We would also like to check #21260, when the range oprations increase due to rebalancing in higher disk usage scenarios.

@petermattis
Copy link
Collaborator

This issue overlaps with (or is perhaps a duplicate of) #24278. Cc @a-robinson.

@tbg
Copy link
Member

tbg commented Apr 15, 2018

The decommissioning test does aggressive rebalancing. Not suggesting that this should be folded in there per se, but at the very least it's an interesting example to look at.

@tbg
Copy link
Member

tbg commented Apr 15, 2018

Also, the drop test verifies space reclamation, though only after a DROP TABLE, which under the hood is different from the rebalancing case.

@kaavee315
Copy link
Contributor Author

@a-robinson, We are planning to work on this issue, is anybody actively working on #24278 ? Thanks @tschottdorf , will peek into these tests. Is there a load that has a steady number of inserts and deletes, which I can use and that survive with a cluster with high disk usage with low TTL?

@a-robinson
Copy link
Contributor

Thanks for the update, @kaavee315. I'm not currently working on #24278.

@kaavee315
Copy link
Contributor Author

Closing by #25051

@tbg tbg closed this as completed May 29, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants