Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: test recovery from out of disk after removing ballast file #22387

Closed
a-robinson opened this issue Feb 5, 2018 · 7 comments
Closed
Assignees
Labels
A-storage Relating to our storage engine (Pebble) on-disk storage. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)

Comments

@a-robinson
Copy link
Contributor

In #22235 (comment), I tested out how some new code behaved in a small cluster that was running out of disk. In order to make some nodes fuller than others, I put 5GB ballast files on their disks. After the cluster filled up, I expected to be able to recover it by shrinking the ballast file down to progressively smaller sizes, but was surprised that even removing 1GB from each node's file wasn't enough for the cluster to recover.

See the comment linked above for more detail, but my takeaway was that rocksdb compactions require more scratch disk space than I would have expected, filling up a node's disk that had hundreds of megabytes of free space despite there being no writes happening on the cluster beyond the normal background load of timeseries data and node liveness heartbeats.

We should better understand this behavior and see if there's a good way to make rocksdb more conscious of the available space on disk when compacting.

I have tarballs of two of the data directories that filled up while doing rocksdb compactions saved for inspecting if anyone wants to take a look. Beware, they're 7.6GB each:
https://storage.googleapis.com/cockroach-alex/alex-disk-0005.tar.gz
https://storage.googleapis.com/cockroach-alex/alex-disk-0006.tar.gz

@a-robinson a-robinson added this to the 2.0 milestone Feb 5, 2018
@petermattis
Copy link
Collaborator

My expectation is that a RocksDB compaction may temporarily require as much disk space as the input sstables. It sounds like more was being required which is surprising to me as well.

@bdarnell
Copy link
Contributor

bdarnell commented Feb 8, 2018

Are we going to be able to do anything here for 2.0 or should we bump it to 2.1?

@petermattis
Copy link
Collaborator

I doubt we'll be able to fix anything, but that's unclear until we do some investigation. I think a small bit of investigation is warranted before 2.0 is released.

@neeral
Copy link
Contributor

neeral commented Feb 12, 2018

@a-robinson how big is the cockroach data directory partition, in comparison to the 5GB ballast file?

@a-robinson
Copy link
Contributor Author

Each data partition was only 10GB.

@a-robinson a-robinson modified the milestones: 2.0, 2.1 Mar 12, 2018
@tbg tbg added the A-storage Relating to our storage engine (Pebble) on-disk storage. label May 15, 2018
@tbg tbg changed the title storage: bad behavior when attempting to recover from full disk failures roachtest: test recovery from out of disk after removing ballast file May 21, 2018
@a-robinson
Copy link
Contributor Author

From https://github.com/facebook/rocksdb/releases/tag/v5.13.1:

SstFileManager now can cancel compactions if they will result in max space errors. SstFileManager users can also use SetCompactionBufferSize to specify how much space must be leftover during a compaction for auxiliary file functions such as logging and flushing.

@tbg tbg added the C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) label Jul 22, 2018
@petermattis petermattis removed this from the 2.1 milestone Oct 5, 2018
@tbg
Copy link
Member

tbg commented Oct 11, 2018

Folding into #7882.

@tbg tbg closed this as completed Oct 11, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-storage Relating to our storage engine (Pebble) on-disk storage. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
Projects
None yet
Development

No branches or pull requests

5 participants