Ethereum capability work loop can be cancelled prematurely #5508

halfalicious · 2019-02-27T05:31:52Z

Ethereum capabilities make use of boost deadline timers to schedule their work loop via Host::scheduleExecution - for example:

aleth/libethereum/EthereumCapability.cpp

Lines 413 to 417 in 9e5a853

    
           void EthereumCapability::onStarting() 
        
           { 
        
               m_backgroundWorkEnabled = true; 
        
               m_host->scheduleExecution(c_backroundWorkPeriodMs, [this]() { doBackgroundWork(); }); 
        
           }

...and in EthereumCapability::doBackgroundWork:

aleth/libethereum/EthereumCapability.cpp

Lines 488 to 490 in 9e5a853

    
               if (m_backgroundWorkEnabled) 
        
                   m_host->scheduleExecution(c_backroundWorkPeriodMs, [this]() { doBackgroundWork(); }); 
        
           }

Note that Host's usage of deadline timers is susceptible to a race condition - Host::scheduleExecution creates a new timer, adds it to a list, and performs an async_wait on it. The host then "garbage collects" expired timers in Host::run, and it determines if a timer is expired by checking its expiration time:

aleth/libp2p/Host.cpp

Lines 687 to 692 in 9e5a853

    
           DEV_GUARDED(x_networkTimers) 
        
           { 
        
               m_networkTimers.remove_if([](unique_ptr<io::deadline_timer> const& t) { 
        
                   return t->expires_from_now().total_milliseconds() < 0; 
        
               }); 
        
           }

However, note that when a timer expires, its handler is put into a queue rather than being executed immediately - this means that there's a race condition where a handler can be executed after the timer has been deleted, resulting in the handler being executed with boost error code operation_aborted, which will result in the handler immediately exiting without executing any of its core logic.

The text was updated successfully, but these errors were encountered:

halfalicious · 2019-03-10T00:32:32Z

Anything which uses Host::scheduleExecution or CapabilityHost::scheduleExecution (which in turn calls Host::scheduleExecution) is susceptible to this bug...currently it looks like only the Capability implementations use this. Here is the code which need to be updated:

Any place where a capability background work loop is scheduled:

EthereumCapability::onStarting:

aleth/libethereum/EthereumCapability.cpp

Lines 413 to 417 in 594f141

    
           void EthereumCapability::onStarting() 
        
           { 
        
               m_backgroundWorkEnabled = true; 
        
               m_host->scheduleExecution(c_backroundWorkPeriodMs, [this]() { doBackgroundWork(); }); 
        
           }

EthereumCapability::doBackgroundwork:

aleth/libethereum/EthereumCapability.cpp

Lines 488 to 490 in 594f141

    
               if (m_backgroundWorkEnabled) 
        
                   m_host->scheduleExecution(c_backroundWorkPeriodMs, [this]() { doBackgroundWork(); }); 
        
           }

WarpCapability::onStarting:

aleth/libethereum/WarpCapability.cpp

Lines 326 to 330 in 594f141

    
           void WarpCapability::onStarting() 
        
           { 
        
               m_backgroundWorkEnabled = true; 
        
               m_host->scheduleExecution(c_backroundWorkPeriodMs, [this]() { doBackgroundWork(); }); 
        
           }

WarpCapability::doBackgroundWork:

aleth/libethereum/WarpCapability.cpp

Lines 356 to 358 in 594f141

    
               if (m_backgroundWorkEnabled) 
        
                   m_host->scheduleExecution(c_backroundWorkPeriodMs, [this]() { doBackgroundWork(); }); 
        
           }

Other capability code:

EthereumCapability::reset:

aleth/libethereum/EthereumCapability.cpp

Lines 438 to 448 in 594f141

    
           void EthereumCapability::reset() 
        
           { 
        
               m_sync->abortSync(); 
        
               // reset() can be called from RPC handling thread, 
        
               // but we access m_latestBlockSent and m_transactionsSent only from the network thread 
        
               m_host->scheduleExecution(0, [this]() { 
        
                   m_latestBlockSent = h256(); 
        
                   m_transactionsSent.clear(); 
        
               }); 
        
           }

EthereumCapability::onTransactionImported:

aleth/libethereum/EthereumCapability.cpp

Lines 646 to 673 in 594f141

    
           void EthereumCapability::onTransactionImported( 
        
               ImportResult _ir, h256 const& _h, h512 const& _nodeId) 
        
           { 
        
               m_host->scheduleExecution(0, [this, _ir, _h, _nodeId]() { 
        
                   auto itPeerStatus = m_peers.find(_nodeId); 
        
                   if (itPeerStatus == m_peers.end()) 
        
                       return; 
        
                   auto& peer = itPeerStatus->second; 
        
                   peer.markTransactionAsKnown(_h); 
        
                   switch (_ir) 
        
                   { 
        
                   case ImportResult::Malformed: 
        
                       m_host->updateRating(_nodeId, -100); 
        
                       break; 
        
                   case ImportResult::AlreadyKnown: 
        
                       // if we already had the transaction, then don't bother sending it on. 
        
                       m_transactionsSent.insert(_h); 
        
                       m_host->updateRating(_nodeId, 0); 
        
                       break; 
        
                   case ImportResult::Success: 
        
                       m_host->updateRating(_nodeId, 100); 
        
                       break; 
        
                   default:; 
        
                   } 
        
               }); 
        
           }

I think we can create dedicated boost asio deadline timers to manage the EthereumCapability and WarpCapability work loops. The other code doesn't need actual timers since it just uses scheduleExecution to ensure that code is executed on the network thread, so in those cases we can simply use io_service::post.

gumb0 · 2019-03-11T14:01:14Z

In any case I'd like for Capability implemenations to stay decoupled from Host internals, in particular they shouldn't care about io_service or timers.

I guess we could create a timer in Host per each registered Capability somehow.

halfalicious · 2019-03-13T06:01:22Z

_service or tim

Yea, off of the top of my head I think we need to do something like that so that Capability implementations can continue to run on the network thread otherwise we'd need to have Capability implementations inherit from Worker which might complicate things.

halfalicious added networking bug labels Feb 27, 2019

halfalicious self-assigned this Feb 27, 2019

halfalicious mentioned this issue Mar 27, 2019

Fix use of timers when managing capability background work loops #5523

Merged

halfalicious closed this as completed Mar 27, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ethereum capability work loop can be cancelled prematurely #5508

Ethereum capability work loop can be cancelled prematurely #5508

halfalicious commented Feb 27, 2019

halfalicious commented Mar 10, 2019 •

edited

Loading

gumb0 commented Mar 11, 2019

halfalicious commented Mar 13, 2019

Ethereum capability work loop can be cancelled prematurely #5508

Ethereum capability work loop can be cancelled prematurely #5508

Comments

halfalicious commented Feb 27, 2019

halfalicious commented Mar 10, 2019 • edited Loading

gumb0 commented Mar 11, 2019

halfalicious commented Mar 13, 2019

halfalicious commented Mar 10, 2019 •

edited

Loading