Skip to content
This repository has been archived by the owner on Aug 2, 2022. It is now read-only.

Replay/Resync Optimizations: Low Hanging Fruit #5130

Merged
merged 10 commits into from
Aug 10, 2018

Conversation

wanderingbort
Copy link
Contributor

@wanderingbort wanderingbort commented Aug 9, 2018

Identified and implemented some low hanging fruit for Replays where we assume the block.log is trusted.

Replay

The optimizations

  • don't create database undo sessions when replaying blocks from the block.log unless they are necessary for concensus (as they are when processing failing scheduled transactions)
  • don't add transactions to the deduplication set if they will have expired before the replay ends
  • disable transactions level checks (such as resource billing enforcement) that we can infer will always pass in replay

The caveats

A new config/cli option is provided disable-replay-opts which will disable all of the above optimizations and result in the current processing of replay. There are a few reasons why this option may be neccessary:

  • these optimizations are provided as "best-effort" and they may have bugs. For users that need guarantees on replay, disabling optimizations may be appropriate.
  • these optimizations leave the fork-database in a state where it thinks it may "rewind" a block BUT we have not created the proper database states to do so. This should not happen as we have previously established trust in the block.log as irreversible. The ability to fix this desync depends on a future fork-database audit/refact

The Results

A very basic test of v1.1.1, v1.1.3 and the new optimizations on my development machine yields:

screen shot 2018-08-09 at 9 08 48 am

(re)sync

Light validation mode

This mode enables skipping of auth checks, transaction signature recovery and other tests that can be inferred as passed if the node implicitly trusts the active set of producers to maintain correctness. This is essentially the same trust semantics that a header-validating light client would have.

The Caveats

This is not safe for use on any node that does not implicitly trust the elected set of producers. That should include the other elected producers (this mode is considered invalid if you are running a production node).

This mode is intended to help nodes (re)syncing from trusted peers and any other node that can outsource trust to a set of independent validators. For example, nodes that maintain MongoDB integration behind a layer of full validation-mode peers, or a new node that is syncing into an established blockchain and has well-known-checkpoints to enforce validity.

…ss necessary

- dont add transactions to dedup if they will expire before replay ends
- dont do various checks that cannot fail inside trx_context
maybe_session( maybe_session&& other)
:_session(move(other._session))
{
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency with operator=() add other._session.reset()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the move constructor of _session handles that.

https://github.com/EOSIO/fc/blob/df5a17ef0704d7dd96c444bfd9a70506bcfbc057/include/fc/optional.hpp#L49

Unfortunately, the move assignment operator for fc::optional is broken so, I had to replicate some of its features (just not the broken part).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder what if anything fixing the fc::optional move assignment operator would break.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically neither is broken as moved from state is unspecified, just must allow any methods with no prerequisites.

auto guard_pending = fc::make_scoped_exit([this](){
pending.reset();
});

pending = db.start_undo_session(true);
bool skip_db_sessions = !conf.disable_replay_opts && (s == controller::block_status::irreversible);
if (!skip_db_sessions) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you move the pending->_block_status = s; line up above here, you could use the new skip_db_sessions() instead and consolidate that logic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to revert this change, the branch is what instantiates the inner pending block for that optional. so this change creates a bad access

}

bool controller::skip_trx_checks() const {
return !my->conf.disable_replay_opts &&my->pending && !my->in_trx_requiring_checks && (my->pending->_block_status == block_status::irreversible || my->pending->_block_status == block_status::validated);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since I recommended some other changes, I'll point out this space problem with &&.

@@ -815,14 +870,19 @@ struct controller_impl {
void start_block( block_timestamp_type when, uint16_t confirm_block_count, controller::block_status s ) {
EOS_ASSERT( !pending, block_validate_exception, "pending block is not available" );
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you are here, can you fix the assert message. It is backwards.

This mode enables skipping of auth checks, transaction signature recovery and other tests that can be inferred as passed if the node implicitly trusts the active set of producers to maintain correctness
@wanderingbort wanderingbort changed the title Replay Optimizations: Low Hanging Fruit Replay/Resync Optimizations: Low Hanging Fruit Aug 9, 2018
@heifner heifner merged commit b98c879 into EOSIO:develop Aug 10, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants