Skip to content
This repository has been archived by the owner on Oct 28, 2021. It is now read-only.

Ethereum capability work loop can be cancelled prematurely #5508

Closed
halfalicious opened this issue Feb 27, 2019 · 3 comments · Fixed by #5523
Closed

Ethereum capability work loop can be cancelled prematurely #5508

halfalicious opened this issue Feb 27, 2019 · 3 comments · Fixed by #5523
Assignees

Comments

@halfalicious
Copy link
Contributor

Ethereum capabilities make use of boost deadline timers to schedule their work loop via Host::scheduleExecution - for example:

void EthereumCapability::onStarting()
{
m_backgroundWorkEnabled = true;
m_host->scheduleExecution(c_backroundWorkPeriodMs, [this]() { doBackgroundWork(); });
}

...and in EthereumCapability::doBackgroundWork:

if (m_backgroundWorkEnabled)
m_host->scheduleExecution(c_backroundWorkPeriodMs, [this]() { doBackgroundWork(); });
}

Note that Host's usage of deadline timers is susceptible to a race condition - Host::scheduleExecution creates a new timer, adds it to a list, and performs an async_wait on it. The host then "garbage collects" expired timers in Host::run, and it determines if a timer is expired by checking its expiration time:

aleth/libp2p/Host.cpp

Lines 687 to 692 in 9e5a853

DEV_GUARDED(x_networkTimers)
{
m_networkTimers.remove_if([](unique_ptr<io::deadline_timer> const& t) {
return t->expires_from_now().total_milliseconds() < 0;
});
}

However, note that when a timer expires, its handler is put into a queue rather than being executed immediately - this means that there's a race condition where a handler can be executed after the timer has been deleted, resulting in the handler being executed with boost error code operation_aborted, which will result in the handler immediately exiting without executing any of its core logic.

@halfalicious
Copy link
Contributor Author

halfalicious commented Mar 10, 2019

Anything which uses Host::scheduleExecution or CapabilityHost::scheduleExecution (which in turn calls Host::scheduleExecution) is susceptible to this bug...currently it looks like only the Capability implementations use this. Here is the code which need to be updated:

  • Any place where a capability background work loop is scheduled:

    • EthereumCapability::onStarting:
      void EthereumCapability::onStarting()
      {
      m_backgroundWorkEnabled = true;
      m_host->scheduleExecution(c_backroundWorkPeriodMs, [this]() { doBackgroundWork(); });
      }
    • EthereumCapability::doBackgroundwork:
      if (m_backgroundWorkEnabled)
      m_host->scheduleExecution(c_backroundWorkPeriodMs, [this]() { doBackgroundWork(); });
      }
    • WarpCapability::onStarting:
      void WarpCapability::onStarting()
      {
      m_backgroundWorkEnabled = true;
      m_host->scheduleExecution(c_backroundWorkPeriodMs, [this]() { doBackgroundWork(); });
      }
    • WarpCapability::doBackgroundWork:
      if (m_backgroundWorkEnabled)
      m_host->scheduleExecution(c_backroundWorkPeriodMs, [this]() { doBackgroundWork(); });
      }
  • Other capability code:

    • EthereumCapability::reset:
      void EthereumCapability::reset()
      {
      m_sync->abortSync();
      // reset() can be called from RPC handling thread,
      // but we access m_latestBlockSent and m_transactionsSent only from the network thread
      m_host->scheduleExecution(0, [this]() {
      m_latestBlockSent = h256();
      m_transactionsSent.clear();
      });
      }
    • EthereumCapability::onTransactionImported:
      void EthereumCapability::onTransactionImported(
      ImportResult _ir, h256 const& _h, h512 const& _nodeId)
      {
      m_host->scheduleExecution(0, [this, _ir, _h, _nodeId]() {
      auto itPeerStatus = m_peers.find(_nodeId);
      if (itPeerStatus == m_peers.end())
      return;
      auto& peer = itPeerStatus->second;
      peer.markTransactionAsKnown(_h);
      switch (_ir)
      {
      case ImportResult::Malformed:
      m_host->updateRating(_nodeId, -100);
      break;
      case ImportResult::AlreadyKnown:
      // if we already had the transaction, then don't bother sending it on.
      m_transactionsSent.insert(_h);
      m_host->updateRating(_nodeId, 0);
      break;
      case ImportResult::Success:
      m_host->updateRating(_nodeId, 100);
      break;
      default:;
      }
      });
      }

I think we can create dedicated boost asio deadline timers to manage the EthereumCapability and WarpCapability work loops. The other code doesn't need actual timers since it just uses scheduleExecution to ensure that code is executed on the network thread, so in those cases we can simply use io_service::post.

@gumb0
Copy link
Member

gumb0 commented Mar 11, 2019

In any case I'd like for Capability implemenations to stay decoupled from Host internals, in particular they shouldn't care about io_service or timers.

I guess we could create a timer in Host per each registered Capability somehow.

@halfalicious
Copy link
Contributor Author

_service or tim

Yea, off of the top of my head I think we need to do something like that so that Capability implementations can continue to run on the network thread otherwise we'd need to have Capability implementations inherit from Worker which might complicate things.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants