Roach test for disk space usage #24795

kaavee315 · 2018-04-13T20:22:54Z

Older versions of Cockroachdb did not seem to give up unused space promptly. We plan to add a test to simulate the conditions and scenarios, in which older versions of cockroach didn't behave well. This is to ensure that the newer versions of CockroachDB do not have these bugs. We need to test at least the following scenarios:

Running a cluster close to 95% full with a steady rate of inserts and deletes works.
Rebalancing (due to node down, fullness, large number of inserts, etc.) reclaims space from the rebalancing source node.
The issue where multiple nodes being almost full leads to replica transfer loop.

Planned implemntation:-
Create a framework to fill various nodes in the clusters upto the given percentage(assume > 70%). This will first run some inserts through a cockroach workload, until cockroach is full to around 50%, then create a ballast file to take the rest of space. Then we can have steady rate of inserts and deletes using the same workload(though I didn't see any workload of this kind, so maybe just run inserts and deletes concurrently), drop/ truncate tables used by the workload, to test various issues. We can also take down various nodes for simulating failures.
In one phase, when one of the nodes is made nearly full(95%), and others relatively free(80%), we would like to check rebalancing should be triggered and later compaction should free up space(to check #21178, and #22235)
In 2nd phase, we can delete the ballast file on some of the nodes, and check if the rebalancing lead to even disk usage across all nodes.
For the replica transfer loop, we can fill all the nodes close to full, with minor differences(90% and 95%) and check if it causes the steady read/writes to halt (to check #21400)
We would also like to check #21260, when the range oprations increase due to rebalancing in higher disk usage scenarios.

petermattis · 2018-04-13T20:31:11Z

This issue overlaps with (or is perhaps a duplicate of) #24278. Cc @a-robinson.

tbg · 2018-04-15T12:32:08Z

The decommissioning test does aggressive rebalancing. Not suggesting that this should be folded in there per se, but at the very least it's an interesting example to look at.

tbg · 2018-04-15T12:32:33Z

Also, the drop test verifies space reclamation, though only after a DROP TABLE, which under the hood is different from the rebalancing case.

kaavee315 · 2018-04-17T14:32:16Z

@a-robinson, We are planning to work on this issue, is anybody actively working on #24278 ? Thanks @tschottdorf , will peek into these tests. Is there a load that has a steady number of inserts and deletes, which I can use and that survive with a cluster with high disk usage with low TTL?

a-robinson · 2018-04-17T14:37:08Z

Thanks for the update, @kaavee315. I'm not currently working on #24278.

kaavee315 · 2018-05-22T16:04:27Z

Closing by #25051

tbg closed this as completed May 29, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Roach test for disk space usage #24795

Roach test for disk space usage #24795

kaavee315 commented Apr 13, 2018 •

edited

Loading

petermattis commented Apr 13, 2018

tbg commented Apr 15, 2018

tbg commented Apr 15, 2018

kaavee315 commented Apr 17, 2018

a-robinson commented Apr 17, 2018

kaavee315 commented May 22, 2018

Roach test for disk space usage #24795

Roach test for disk space usage #24795

Comments

kaavee315 commented Apr 13, 2018 • edited Loading

petermattis commented Apr 13, 2018

tbg commented Apr 15, 2018

tbg commented Apr 15, 2018

kaavee315 commented Apr 17, 2018

a-robinson commented Apr 17, 2018

kaavee315 commented May 22, 2018

kaavee315 commented Apr 13, 2018 •

edited

Loading