Skip to content

mv aae iterator hang

Matthew Von-Maszewski edited this page Apr 15, 2016 · 6 revisions

Status

  • merged to develop - April 15, 2016
  • code complete - April 9, 2016
  • development started - April 8, 2016

History / Context

This is a discussion of changes made to eleveldb, Basho's Erlang to leveldb interface. Tradition is to place changes to either in this wiki.

This is a continuation of previous work here:

https://github.com/basho/leveldb/wiki/mv-aae-segfault

As mentioned within the wiki link above, the reuse of an iterator after it reached the end of the key space was never tested. This branch corrects the negotiation of whether the Erlang thread or the eleveldb worker thread sets an essential iterator management flag. Without this correction it is possible for Erlang thread to overwrite the worker thread's action which leads to Erlang waiting forever for another iterator entry that will never arrive.

Branch Description

c_src/eleveldb.cc

async_iterator_move() executes on an Erlang scheduler thread. The modified block of code is the one place in this function where it is possible that this thread and the eleveldb worker thread could be changing the m_PrefetchStarted flag simultaneously. Previously this code block made a simple assignment to m_PrefetchStarted. Now an atomic test and set command is used to only set the variable if it is still in the same state as when outside this code block.

The key consideration is that this code block starts with an atomic test and set of m_HandoffAtomic. The only way both threads could be capable of simultaneously setting m_PrefetchStarted is if the scheduler thread "won" the setting of m_HandoffAtomic. That implies that the worker thread will therefore traverse its "loser" code block that contains the one place it sets m_PrefetchStarted. The new atomic operation for m_PrefetchStarted ensures that if the worker thread changes the variable first that the worker's value overrules the scheduler thread's value.

c_src/workitems.cc

MoveTask::DoWork() executes on an eleveldb worker thread. The modified block of code is the one place in this function where it is possible that this thread and the eleveldb worker thread could be changing the m_PrefetchStarted flag simultaneously. The test and set atomic operation is a hack to create a compiler fence and processor lock around the m_PrefetchStarted flag. The test portion is meaningless. It is the atomic set that is being used (and guaranteed).

src/eleveldb.erl and test/iterator.erl

These files contain updates for unit tests. The iterator.erl unit test does NOT reproduce the race condition corrected by this branch.

Clone this wiki locally