Replay/Resync Optimizations: Low Hanging Fruit #5130

wanderingbort · 2018-08-09T13:10:22Z

Identified and implemented some low hanging fruit for Replays where we assume the block.log is trusted.

Replay

The optimizations

don't create database undo sessions when replaying blocks from the block.log unless they are necessary for concensus (as they are when processing failing scheduled transactions)
don't add transactions to the deduplication set if they will have expired before the replay ends
disable transactions level checks (such as resource billing enforcement) that we can infer will always pass in replay

The caveats

A new config/cli option is provided disable-replay-opts which will disable all of the above optimizations and result in the current processing of replay. There are a few reasons why this option may be neccessary:

these optimizations are provided as "best-effort" and they may have bugs. For users that need guarantees on replay, disabling optimizations may be appropriate.
these optimizations leave the fork-database in a state where it thinks it may "rewind" a block BUT we have not created the proper database states to do so. This should not happen as we have previously established trust in the block.log as irreversible. The ability to fix this desync depends on a future fork-database audit/refact

The Results

A very basic test of v1.1.1, v1.1.3 and the new optimizations on my development machine yields:

(re)sync

Light validation mode

This mode enables skipping of auth checks, transaction signature recovery and other tests that can be inferred as passed if the node implicitly trusts the active set of producers to maintain correctness. This is essentially the same trust semantics that a header-validating light client would have.

The Caveats

This is not safe for use on any node that does not implicitly trust the elected set of producers. That should include the other elected producers (this mode is considered invalid if you are running a production node).

This mode is intended to help nodes (re)syncing from trusted peers and any other node that can outsource trust to a set of independent validators. For example, nodes that maintain MongoDB integration behind a layer of full validation-mode peers, or a new node that is syncing into an established blockchain and has well-known-checkpoints to enforce validity.

…ss necessary - dont add transactions to dedup if they will expire before replay ends - dont do various checks that cannot fail inside trx_context

…e-all-checks flag

heifner · 2018-08-09T18:16:31Z

libraries/chain/controller.cpp

+      maybe_session( maybe_session&& other)
+      :_session(move(other._session))
+      {
+      }


For consistency with operator=() add other._session.reset()

the move constructor of _session handles that.

https://github.com/EOSIO/fc/blob/df5a17ef0704d7dd96c444bfd9a70506bcfbc057/include/fc/optional.hpp#L49

Unfortunately, the move assignment operator for fc::optional is broken so, I had to replicate some of its features (just not the broken part).

I wonder what if anything fixing the fc::optional move assignment operator would break.

Technically neither is broken as moved from state is unspecified, just must allow any methods with no prerequisites.

heifner · 2018-08-09T18:54:33Z

libraries/chain/controller.cpp

      auto guard_pending = fc::make_scoped_exit([this](){
         pending.reset();
      });

-      pending = db.start_undo_session(true);
+      bool skip_db_sessions = !conf.disable_replay_opts && (s == controller::block_status::irreversible);
+      if (!skip_db_sessions) {


If you move the pending->_block_status = s; line up above here, you could use the new skip_db_sessions() instead and consolidate that logic.

I have to revert this change, the branch is what instantiates the inner pending block for that optional. so this change creates a bad access

heifner · 2018-08-09T18:56:44Z

libraries/chain/controller.cpp

+}
+
+bool controller::skip_trx_checks() const {
+   return !my->conf.disable_replay_opts  &&my->pending && !my->in_trx_requiring_checks && (my->pending->_block_status == block_status::irreversible || my->pending->_block_status == block_status::validated);


Since I recommended some other changes, I'll point out this space problem with &&.

heifner · 2018-08-09T18:57:20Z

libraries/chain/controller.cpp

@@ -815,14 +870,19 @@ struct controller_impl {
   void start_block( block_timestamp_type when, uint16_t confirm_block_count, controller::block_status s ) {
      EOS_ASSERT( !pending, block_validate_exception, "pending block is not available" );


Since you are here, can you fix the assert message. It is backwards.

This mode enables skipping of auth checks, transaction signature recovery and other tests that can be inferred as passed if the node implicitly trusts the active set of producers to maintain correctness

b1bart added 3 commits August 8, 2018 14:06

- dont open database sessions when replaying irreversible blocks unle…

2e71a0c

…ss necessary - dont add transactions to dedup if they will expire before replay ends - dont do various checks that cannot fail inside trx_context

wrap optional session in a type

240ed8f

create option for disabling replay opts instead of co-opting the forc…

83a02dc

…e-all-checks flag

wanderingbort requested review from heifner and arhag August 9, 2018 13:10

b1bart added 2 commits August 9, 2018 12:04

prevent bad access on new chains

944e8ca

fix bug in wrapper classes move assignment

d846b3a

heifner suggested changes Aug 9, 2018

View reviewed changes

b1bart added 3 commits August 9, 2018 15:10

address PR feedback

2b6419b

add light validation mode

8942727

This mode enables skipping of auth checks, transaction signature recovery and other tests that can be inferred as passed if the node implicitly trusts the active set of producers to maintain correctness

style update

6edc505

wanderingbort changed the title ~~Replay Optimizations: Low Hanging Fruit~~ Replay/Resync Optimizations: Low Hanging Fruit Aug 9, 2018

resolve ambiguity between type and member

33a8220

heifner approved these changes Aug 9, 2018

View reviewed changes

rethink a review comment that was subtly bad

57635bc

heifner merged commit b98c879 into EOSIO:develop Aug 10, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replay/Resync Optimizations: Low Hanging Fruit #5130

Replay/Resync Optimizations: Low Hanging Fruit #5130

wanderingbort commented Aug 9, 2018 •

edited

Loading

heifner Aug 9, 2018

wanderingbort Aug 9, 2018

heifner Aug 9, 2018

heifner Aug 9, 2018

heifner Aug 9, 2018

wanderingbort Aug 10, 2018

heifner Aug 9, 2018

heifner Aug 9, 2018

		@@ -815,14 +870,19 @@ struct controller_impl {
		void start_block( block_timestamp_type when, uint16_t confirm_block_count, controller::block_status s ) {
		EOS_ASSERT( !pending, block_validate_exception, "pending block is not available" );

Replay/Resync Optimizations: Low Hanging Fruit #5130

Replay/Resync Optimizations: Low Hanging Fruit #5130

Conversation

wanderingbort commented Aug 9, 2018 • edited Loading

Replay

The optimizations

The caveats

The Results

(re)sync

Light validation mode

The Caveats

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wanderingbort commented Aug 9, 2018 •

edited

Loading