Skip to content

mv pacing

Matthew Von-Maszewski edited this page Jul 9, 2014 · 5 revisions

Status

  • merged to master -
  • code complete -
  • development started - July 9, 2014

History / Context

A large amount of Basho's leveldb work has focused upon increasing the write throughput. The efficiency of Basho's code has reach the point that writes can overpower Read and Iterator (Riak 2i) operations. This branch makes two subtle changes that will allow background compaction writes the opportunity to spread out, leaving more disk capacity for Read and Iterator operations. The code still maintains the full write capacity to process heavy inbound write volumes.

Previous branches introduced the concept of "grooming" versus "non-grooming" compactions. Non-grooming compactions get priority over grooming compactions. Non-grooming compactions will wait in a queue if all compaction threads are already utilized. Grooming compactions do not queue: either there is capacity immediately or the compaction is dropped. This branch adds one more restriction to grooming compactions: there is only one thread available for grooming. The grooming compaction is again dropped if there is some other compaction already running on the single grooming thread.

The second change creates grooming compaction requests for the overlapped levels. Previously, the overlapped levels (levels 0 and 1) only create non-grooming compaction requests. The non-grooming requests occurred once the level had 6 or more .sst table files. The new change will create a grooming compaction request earlier at 4 or 5 .sst table files.

Combined, the two changes spread out the compaction timing of levels 0 and 1. Fewer databases (Riak vnodes) now have overlapping compactions. Overlapping compactions are brutal on systems with low Ram to database ratios (below 2.5 to 3 Gbytes of Ram per database). The overlapping compactions on these machines not only cause heavy disk activity but also flush both the operating system page cache and leveldb block cache. Every competing disk read needed for user Get or Iterator operations must compete for the raw, uncached disk. This branch greatly reduces the overlapping compactions, primarily benefitting moderately loaded systems with low RAM to database ratios.

Branch Description

util/hot_threads.cc & .h

HotThreadPool::FindWaitingThread() now takes a second parameter to indicate whether or not this is a grooming compaction. OkToQueue is true for non-grooming, false for grooming. When false, the function only tests thread 0 for availability. The function exits without assigning the compaction if anything is already executing on thread 0.

HotThreadPool::Submit() is the user of FindWaitingThread(). This function already knows to not add a compaction to the work queue if OkToQueue is false. The only change to this function is to have it pass its OkToQueue parameter to FindWaitingThread().

db/version_set.cc

VersionSet::Finalize() now has an additional rule for overlapped levels (levels 0 and 1). The new rule will create a grooming compaction if the overlapped level has 4 or more .sst table files and the original rule has not created a normal, non-grooming compaction.

Clone this wiki locally