Skip to content

mv timed grooming

Matthew Von-Maszewski edited this page Sep 30, 2015 · 8 revisions

Status

  • merged to master -
  • code complete - August 20, 2015
  • development started - August 20, 2015

History / Context

Basho's changes to leveldb's compaction strategy have previously focus on heavy write loads. This branch is instead focused on light to medium write loads. Its logic does not have an opportunity to activate during heavy write loads. Therefore this branch is an additional strategy, not a replacement.

The existing strategy creates more efficient compactions during heavy write loads by waiting until roughly six overlapping .sst table files exist at level 0, then compacting all of them into one overlapping .sst table file at level 1. Similarly the strategy waits for six .sst table files at level 1 before compacting into level 2. The strategy is very effective under both single database (vnode) loads and multiple database loads.

There is a downside to the existing strategy for light and medium write loads. Read performance temporarily drops noticeabley if there have been no compactions and then suddenly one or more databases (vnodes) start a six file compaction. Read performance would be more consistent if light and medium write loads compacted smaller sets of files more often. This branch initiates smaller compaction sets based upon elapsed time to improve read latencies.

Branch Description

.gitignore

This change is a correction to exclude sst_rewrite from git's checkin analysis. The change is to fix a previous branch. It is not directly related to timed compactions.

db/db_impl.cc

Unit testing uncovered a race condition relating to shutdown and background processing of DBImpl::BackgroundCall2() and DBImpl::BackgroundImmCompactCall(). The two routines now contain a test for a shutdown scenario and no longer report an error to the LOG file in that situation.

DBImpl::MakeRoomForWrite() contains logic for accelerating when write buffers flush. THIS LOGIC WAS CHECKED IN TO FACILITATE TESTING. IT PERFORMED POORLY IN TESTING AND MUST BE REMOVED. All changes within this function will be reverted.

db/dbformat.h

This file defines constants for the two time based grooming triggers. The strategy will attempt a grooming compaction if:

  • a database (vnode) has had no write activity for 10 minutes and level 0/1 contains 2 or more files, or
  • a database (vnode) has had no write activity for 20 minutes and level 0/1 contains 1 or more files

db/version_set.cc

VersionSet::Finalize() implements Basho's compaction selection strategy. The new code for timed compactions is within this function. Previously config::kL0_GroomingTrigger was the sole threshold for initiating a grooming compaction. Now it is the default. kL0_GroomingTrigger10min and kL0_GroomingTrigger20min may override that default when 10 or 20 minutes respectively have passed since a previous compaction on the level.

db/version_set.h

struct CompactionStatus_m now contains an additional member variable, m_LastCompaction. This member variable holds the time of the most recent compaction, or a default of when the database opened. SetCompactionDone() maintains the new member variable. CompactionStatus_s() initializes it.

util/flexcache.h

This change is unrelated to timed grooming. It is a community patch from leveldb PR #152. The patch is applied here so that it will get the benefit of full build and functional test before accepted.

util/thread_tasks.cc

Previously, only user databases were polled for grooming opportunities after each compaction. This adds internal databases to the grooming poll.

util/throttle.cc

These new lines of code cause all databases, user and internal, to be get polled once every 60 seconds for potential grooming. This code enables the VersionSet::Finalize() to activate within a database based upon elapsed time. Previously it was only called as a write buffer filled or another compaction in that database completed.

Clone this wiki locally