Skip to content

mv iterator refresh

Matthew Von-Maszewski edited this page Dec 21, 2013 · 15 revisions

Status

  • merged to develop December 20, 2013
  • code complete December 19, 2013
  • development started December 18, 2013

History / Context

The first version of this feature is actually implemented in basho/eleveldb. eleveldb is Basho's Erlang to leveldb transfer layer. The feature can and should be implemented in the leveldb layer at some future date. The eleveldb layer was chosen due to rushed schedule and author's greater familiarity with iterators in that layer (i.e. yet another hack).

This feature is only needed in very rare situations. Its use is strongly discouraged unless the situation absolutely demands it due to its side effects (listed below).

Basho has a test that populates 64 databases (Riak vnodes) with a total of 2 Terabytes of data (2 billion short binary keys with 1K values). Riak's active anti-entropy (AAE) service scans the growing databases as part of its data protection activities. The AAE scan uses a leveldb iterator to walk each leveldb database to create AAE hash trees. AAE's iterator has scanned nicely in this scenario with both Riak 1.3 and Riak 1.4. The problem started with Riak 2.0 where leveldb Write operations are much more efficient and the AAE iterator gets too little time to make progress.

All leveldb iterators implicitly create a checkpoint in the leveldb files. All files that exist at the time of the checkpoint are frozen. leveldb will not delete files within the checkpoint until the iterator is released / destroyed. Normally iterators come and go quickly. But with AAE and Riak 2.0, the iterator is very long lived, i.e. hours. The database (vnode) AAE was scanning was also receiving a heavy volume of new data. The database bulged to 52 Gbytes of disk space where the remaining databases only had 32 GBytes. But worse, the database overflowed its file cache and started thrashing. AAE's iterator went from 1.5K operations per second to 300 operations per second. Performance across all 64 databases suffered.

Summary of key problem factors:

  • large existing database
  • heavy volume of database Write operations in parallel to iteration
  • iterator held for long time causing file cache to overflow

These problem factors do not occur together very often. When they do, the mv-iterator-refresh branch is a potential benefit.

The mv-iterator-refresh branch only works with forward iterators, i.e. iterators only using Next() operations. The branch establishes a 5 minute timer. When an iterator Next() operation occurs and the timer has expired, the iterator is automatically closed and new one established via a Seek() to the key returned in the previous Next() operation. This technique works, but also has some side-effects that may or may not be acceptable:

mv-iterator-refresh side effects:

  • the iteration no longer represents a single point-in-time view of the database
  • it is possible that the establishing of the replacement iterator can fail if the key used in Seek() no longer exists

The first side-effect must be weighed against the assumptions of the iterator user. Where used in Riak, there are already assumptions made that the dataset from the iterator may not contain all of the most recent Write operations, and there are no assumptions about what portion of the dataset is or is not missing. A couple of very good coding alternative exist that could reduce, likely eliminate, the second side effect. But that is code for a future release. (Namely, continue to use the existing iterator until a key is found that will succeed with a replacement iterator.)

Branch description

basho/leveldb:include/leveldb/options.h

Add the member variable iterator_refresh. Set the default to false. All other usage of the variable happens within the basho/eleveldb repository.

basho/eleveldb:c_src/refobjects.h

Removed the LevelSnapshotWrapper class and moved its active data members into LevelIteratorWrapper. LevelSnapshotWrapper was part of a design model used to improve throughput during eleveldb's transition from synchronous leveldb calls to asynchronous leveldb calls. That particular design model did not yield performance gains, but the class was never removed. Removing it now simplified the mv-iterator-refresh logic.

The LevelIteratorWrapper class now manages both the leveldb::Iterator and leveldb::Snapshot objects. The new member functions PurgeIterator() and RebuildIterator() consolidate the common create and delete activities for shared use from functions and the constructor/destructors.

Member variables m_Options and itr_ref are copies of the same from the Erlang ItrObject. ItrObject is responsible for deleting LevelSnapshotWrapper object as well as these two variables. So the lifecycle of the variables should be safe as copies.

basho/eleveldb:c_src/workitems.cc

The key logic for this branch is contained in two logic blocks added to MoveTask::operator(). The first logic block decides whether or not a new iterator is needed before making the user's request against leveldb. The sequence is very easy:

  • check time to see if current iterator is more than 5 minutes old
  • delete old iterator and snapshot objects
  • create new snapshot and iterator objects
  • Seek() most recent iterator position (on failure, close iterator)

The old logic for executing user's request now occurs. And the second new logic block activates:

  • if user's request succeeds, save key found for potential new iterator creation
  • if request fails, purge the iterator/snapshot objects now to free leveldb checkpoint as early as possible

The potential recreation Seek() can cause an ATOM_ITERATOR_CLOSED error return value. This is a new behavior. All previous error returns are unchanged Here is a summary of error return values:

  • ATOM_ITERATOR_CLOSED occurs when the recreation Seek() fails. It also occurs when something else weird happens to the iterator object (this is previous behavior).
  • ATOM_INVALID_ITERATOR occurs when the iterator reaches the end of the database.
  • ATOM_BADARG occurs when parameter checking finds a bad user parameter

basho/eleveldb: src/eleveldb.erl c_src/eleveldb.cc c_src/workitems.h

These files contain edits to account for LevelIteratorWrapper class going away and/or adding the iterator_refresh option to app.config.

Clone this wiki locally