Skip to content
This repository has been archived by the owner on Aug 2, 2022. It is now read-only.

p2p liveness checking #395

Closed
pmesnier opened this issue Sep 10, 2017 · 10 comments
Closed

p2p liveness checking #395

pmesnier opened this issue Sep 10, 2017 · 10 comments
Assignees
Milestone

Comments

@pmesnier
Copy link
Contributor

Relates to #291. Add a new pair of messages to the net plugin that carry the current time as a payload. These messages could be used for liveness checking in the case of peers that are non-producing full nodes. The two messages, Ping and Pong, may work for detecting clock skew but I'm not certain that will be particularly accurate when dealing with varying levels of network latency introduced by the use asynch messaging. Perhaps sending a second round of messages that carry the result of timing the send of the first message.

@jgiszczak jgiszczak self-assigned this Sep 11, 2017
@pmesnier
Copy link
Contributor Author

Let's also add fields to the handshake message to make it effectively a "ping" as well. Note that each side will independently send a handshake message at connection start time. To make latency measurement work, we'll have to make the peer answer a handshake with a "pong" message.

@jgiszczak
Copy link
Contributor

Debugging is ongoing.

@jgiszczak
Copy link
Contributor

Packets are transmitted and received and processed and... something is slightly tangled in the processing of each peer's timestamps, so the numbers make no sense. I just have to work through it.

@jgiszczak
Copy link
Contributor

Processing is working. Currently attempting to explain unusually high measured latency.

@jgiszczak
Copy link
Contributor

Modifying start_read_message to use async_read_some instead of async_read reduces measured latency from 20 ms to 12 ms.

@pmesnier
Copy link
Contributor Author

That's interesting. Can you remind me the branch you are using for this work?
I'd like to see how you are caching partial messages. In fact, on TAO we found amazing speedup using application level buffer management.

@jgiszczak
Copy link
Contributor

The branch name is p2p-liveness-checking-395, but I haven't pushed it yet. It still contains C-isms that have to be eradicated. The relevant method is small enough though.

    void read_pending( connection_ptr c ) {
      c->socket->async_read_some(boost::asio::buffer(c->pending_message_buffer.data() + c->pending_message_offset,
                                 c->pending_message_size),
          [this,c]( boost::system::error_code ec, std::size_t bytes_transferred)
    {
          if(!ec) {
            c->message_size += bytes_transferred;
            try {
              auto msg = fc::raw::unpack<net_message>( c->pending_message_buffer );
              c->pending_message_offset = 0; // no unpack exception, so we have a whole msg
              precache pc( c );
              msg.visit (pc);
              start_read_message( c );

              msgHandler m(*this, c);
              msg.visit(m);
              return;
            } catch ( const fc::exception& ) {
              c->pending_message_offset += bytes_transferred;
              read_pending( c );
            }
          } else {
            elog( "Error reading message from connection: ${m}", ("m", ec.message() ) );
          }
        });
    }

This is a substitute for start_reading_pending_buffer(), invoked in the same location and manner as that method, in start_read_message(). It depends on a new member, size_t pending_message_offset{0}; added to the connection class.

@jgiszczak
Copy link
Contributor

After correcting my math, here's what clock offsets look like:
image
Using async_read is indistinguishable from async_read_some on an unloaded system.

For posterity, the plot comes from outputting the clockoffset with dlog() as:

1011483ms thread-0   net_plugin.cpp:813            handle_message       ] Clock offset is 21413.00000000000000000ns (21.41300000000000026us)

then running:

grep "Clock offset" stderr.txt | tr -d '()' | awk '{print $10}' | gnuplot -p -e "set title 'Clock Offset'; set ylabel '{/Symbol m}s'; set grid xtics ytics; plot '<cat' title '' with lines"

in the data directories produced by launcher.

@jgiszczak
Copy link
Contributor

Rebased and reconstructed to mesh well with branch p2p-sync-slow-peers and pushed.

@pmesnier
Copy link
Contributor Author

closing, feature merged into master

@thomasbcox thomasbcox added this to the EOS Dawn 1.1 milestone Nov 9, 2017
taokayan pushed a commit to taokayan/eos that referenced this issue May 15, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants