p2p: fix back ping to discover healthy peers to connect to #8640

j-berman · 2022-11-16T01:22:04Z

Overview

When Node A receives an incoming connection from Node B, Node A will attempt to "back ping" Node B to see if Node B would be a healthy peer to connect to in the future. #8426 introduced a bug that causes Node A to fail to back ping Node B: Node A attempts to read Node B's response over an SSL stream, which errors out (looks like because Node A does not establish an SSL connection with Node B ... see more on this here).

The back ping seems to be a critical component of the p2p protocol for getting incoming connections (also see this comment). As such, I think fixing the back ping in this PR should help resolve or alleviate the lack of incoming peers issue reported in #8520 once enough nodes on the network start running it. To be clear, if you run this PR with your node, this PR wouldn't help you get incoming connections directly -- the nodes your node connects to must also run this update in order to help you get incoming connections.

The fix

The fix matches connect_async's logic to this synchronous call to connect(), ensuring that a node attempting to back ping its incoming connection will use the expected SSL setting (as the code is structured today, this specific m_ssl_support appears to always be disabled since SSL isn't supported for p2p connections).

How to manually test

Run 2 nodes, pointing one to the other. Start Node A normally with --log-level 2 and ensure that Node A can receive incoming connections. Start Node B with the flag --add-priority-node <Node A> and also make sure that Node B can receive incoming connections.

Once Node B initiates the connection to Node A, make sure that Node A successfully back pings Node B. In order to make sure this happens, grep Node A's logs for "PING SUCCESS" and you should see Node B's <IP:port number>.

If you try repeating the above test without this PR, there should be no "PING SUCCESS" in the logs. A lack of "PING SUCCESS" indicates that Node A does not add Node B to its white list, which means that neither Node A nor the rest of the network will know that Node B is a healthy peer to connect to right off the bat.

Note: I plan on writing an explicit test for this behavior to ensure we don't end up breaking back ping behavior again in the future.

j-berman · 2022-11-16T01:58:27Z

@selsta and I just ran the following test:

I connected to a seed node @selsta has access to on testnet using the --add-priority-node flag. Neither of us were running this patch on our nodes. I confirmed that my node made outgoing connections to @selsta's node and to other nodes fine, but after 10 minutes, had 0 incoming connections.
@selsta updated the seed node using this patch.
I re-connected to @selsta's seed node. The back ping from the seed node to my node succeeded and I immediately started getting incoming connections.

moneromooo-monero · 2022-11-16T14:00:51Z

This implies that autodetecting SSL does not work, right ? If so, this seems to be the real problem, and disabling SSL just bodges it.

j-berman · 2022-11-16T21:49:25Z

I don't think it would make sense to call connect_async with SSL support set to autodetect here anyway because we don't use SSL for p2p. I figure disabling SSL support in this call to connect_async is a justified contained fix that prevents this particular back ping bug.

While I think a fix for autodetect in connect_async is probably also warranted, I don't think it being broken has any practical ramifications. I don't see where we should expect to make an outgoing connection with SSL support using the TCP server connection, but maybe I'm missing something there. If I run monerod --rpc-ssl=autodetect and monero-wallet-rpc --daemon-ssl=autodetect --rpc-ssl=autodetect, I have no issues even when connecting to the respective SSL-enabled backends, autodetect seems to work as expected.

To fix autodetect in connect_async, it seems we'd need to implement this same handshake sequence the synchronous connect() implements before calling start on the connection:

Synchronous connect():

monero/contrib/epee/include/net/abstract_tcp_server2.inl

Lines 1752 to 1769 in 365fd45

    
           auto try_connect_result = try_connect(new_connection_l, adr, port, sock_, remote_endpoint, bind_ip_to_use, conn_timeout, ssl_support); 
        
           if (try_connect_result == CONNECT_FAILURE) 
        
             return false; 
        
           if (ssl_support == epee::net_utils::ssl_support_t::e_ssl_support_autodetect && try_connect_result == CONNECT_NO_SSL) 
        
           { 
        
             // we connected, but could not connect with SSL, try without 
        
             MERROR("SSL handshake failed on an autodetect connection, reconnecting without SSL"); 
        
             new_connection_l->disable_ssl(); 
        
             try_connect_result = try_connect(new_connection_l, adr, port, sock_, remote_endpoint, bind_ip_to_use, conn_timeout, epee::net_utils::ssl_support_t::e_ssl_support_disabled); 
        
             if (try_connect_result != CONNECT_SUCCESS) 
        
               return false; 
        
           } 
        
           // start adds the connection to the config object's list, so we don't need to have it locally anymore 
        
           connections_mutex.lock(); 
        
           connections_.erase(new_connection_l); 
        
           connections_mutex.unlock(); 
        
           bool r = new_connection_l->start(false, 1 < m_threads_count);

try_connect():

monero/contrib/epee/include/net/abstract_tcp_server2.inl

Lines 1636 to 1657 in 365fd45

    
           const ssl_support_t ssl_support = new_connection_l->get_ssl_support(); 
        
           if (ssl_support == epee::net_utils::ssl_support_t::e_ssl_support_enabled || ssl_support == epee::net_utils::ssl_support_t::e_ssl_support_autodetect) 
        
           { 
        
             // Handshake 
        
             MDEBUG("Handshaking SSL..."); 
        
             if (!new_connection_l->handshake(boost::asio::ssl::stream_base::client)) 
        
             { 
        
               if (ssl_support == epee::net_utils::ssl_support_t::e_ssl_support_autodetect) 
        
               { 
        
                 boost::system::error_code ignored_ec; 
        
                 sock_.shutdown(boost::asio::ip::tcp::socket::shutdown_both, ignored_ec); 
        
                 sock_.close(); 
        
                 return CONNECT_NO_SSL; 
        
               } 
        
               MERROR("SSL handshake failed"); 
        
               if (sock_.is_open()) 
        
                 sock_.close(); 
        
               return CONNECT_FAILURE; 
        
             } 
        
           } 
        
           return CONNECT_SUCCESS;

I believe the fact that connect_async doesn't do the above handshake sequence implies SSL is broken via connect_async. It doesn't look like mishandling autodetect was the primary issue here; it looks like it was just lucky that the back ping worked before the connection patch.

Circling back to my original point: I think this PR is still justified as is toward preventing the back ping because SSL should be disabled for p2p connections. I think it's worth a separate PR to make sure connect_async handles SSL correctly.

j-berman · 2022-11-17T03:43:38Z

Clarifying: when a wallet initiates an outbound connection to a daemon, SSL autodetect is handled correctly in this totally separate area of the code:

monero/contrib/epee/include/net/net_helper.h

Lines 158 to 248 in 365fd45

    
             inline 
        
           	try_connect_result_t try_connect(const std::string& addr, const std::string& port, std::chrono::milliseconds timeout) 
        
           { 
        
           		m_deadline.expires_from_now(timeout); 
        
           		boost::unique_future<boost::asio::ip::tcp::socket> connection = m_connector(addr, port, m_deadline); 
        
           		for (;;) 
        
           		{ 
        
           			m_io_service.reset(); 
        
           			m_io_service.run_one(); 
        
           			if (connection.is_ready()) 
        
           				break; 
        
           		} 
        
           		m_ssl_socket->next_layer() = connection.get(); 
        
           		m_deadline.cancel(); 
        
           		if (m_ssl_socket->next_layer().is_open()) 
        
           		{ 
        
           			m_connected = true; 
        
           			m_deadline.expires_at(std::chrono::steady_clock::time_point::max()); 
        
           			// SSL Options 
        
           			if (m_ssl_options.support == epee::net_utils::ssl_support_t::e_ssl_support_enabled || m_ssl_options.support == epee::net_utils::ssl_support_t::e_ssl_support_autodetect) 
        
           			{ 
        
           				if (!m_ssl_options.handshake(*m_ssl_socket, boost::asio::ssl::stream_base::client, {}, addr, timeout)) 
        
           				{ 
        
           					if (m_ssl_options.support == epee::net_utils::ssl_support_t::e_ssl_support_autodetect) 
        
           					{ 
        
           						boost::system::error_code ignored_ec; 
        
           						m_ssl_socket->next_layer().shutdown(boost::asio::ip::tcp::socket::shutdown_both, ignored_ec); 
        
           						m_ssl_socket->next_layer().close(); 
        
           						m_connected = false; 
        
           						return CONNECT_NO_SSL; 
        
           					} 
        
           					else 
        
           					{ 
        
           						MWARNING("Failed to establish SSL connection"); 
        
           						m_connected = false; 
        
           						return CONNECT_FAILURE; 
        
           					} 
        
           				} 
        
           			} 
        
           			return CONNECT_SUCCESS; 
        
           		}else 
        
           		{ 
        
           			MWARNING("Some problems at connect, expected open socket"); 
        
           			return CONNECT_FAILURE; 
        
           		} 
        
           } 
        
             inline 
        
           	bool connect(const std::string& addr, const std::string& port, std::chrono::milliseconds timeout) 
        
           { 
        
           	m_connected = false; 
        
           	try 
        
           	{ 
        
           		m_ssl_socket->next_layer().close(); 
        
           		// Set SSL options 
        
           		// disable sslv2 
        
           		m_ssl_socket.reset(new boost::asio::ssl::stream<boost::asio::ip::tcp::socket>(m_io_service, m_ctx)); 
        
           		// Get a list of endpoints corresponding to the server name. 
        
           		try_connect_result_t try_connect_result = try_connect(addr, port, timeout); 
        
           		if (try_connect_result == CONNECT_FAILURE) 
        
           			return false; 
        
           		if (m_ssl_options.support == epee::net_utils::ssl_support_t::e_ssl_support_autodetect) 
        
           		{ 
        
           			if (try_connect_result == CONNECT_NO_SSL) 
        
           			{ 
        
           				MERROR("SSL handshake failed on an autodetect connection, reconnecting without SSL"); 
        
           				m_ssl_options.support = epee::net_utils::ssl_support_t::e_ssl_support_disabled; 
        
           				if (try_connect(addr, port, timeout) != CONNECT_SUCCESS) 
        
           					return false; 
        
           			} 
        
           		} 
        
           	} 
        
           	catch(const boost::system::system_error& er) 
        
           	{ 
        
           		MDEBUG("Some problems at connect, message: " << er.what()); 
        
           		return false; 
        
           	} 
        
           	catch(...) 
        
           	{ 
        
           		MDEBUG("Some fatal problems."); 
        
           		return false; 
        
           	} 
        
           	return true; 
        
           }

I don't see how a "server" would ever initiate an outbound SSL connection using the boosted_tcp_server connection anywhere. It seems the reason there is what looks like unused SSL code for a server initiating the outbound connect (and connect_async) is that the initial PR to add SSL support originally supported SSL for p2p, which was then removed: #4054 (comment). It looks like code that was left in there by accident. Which would mean fixing connect_async to correctly handle autodetect (and SSL) would just be fixing unused code.

And circling back again: regardless, I'm not seeing why connect_async should be called here in this p2p handshake with SSL autodetect as is. I still think this PR is a correct solution in its own right

vtnerd · 2022-11-17T16:22:02Z

And FWIW I read the backlog of discussion here, and had my cup of coffee.

The connect_async function still needs to be fixed for autodetect. This small patch is better for now because its simpler, and we want this behavior even with a fixed connect_async. Defaulting to autodetect is "noisy", has wasted RTT, etc.

vtnerd · 2022-11-17T16:24:28Z

@j-berman one thing for maintenance - should the parameter positions be flipped? The string default here, "0.0.0.0", might be problematic during maintenance and changes to the function. Only small nitpick - I think this needs to be merged sooner rather than later.

vtnerd

Ugh. Wrong PR.

j-berman · 2022-11-17T18:07:45Z

@vtnerd agree with the above :) Will submit a new maintenance PR with SSL fixed in connect_async+ parameter positions flipped

EDIT: I might have misunderstood that you were suggesting the param change for this PR. I also think this PR needs to be merged sooner rather than later, so would prefer to keep this PR as small and simple as possible, and will do the maintenance PR separately

moneromooo-monero · 2022-11-18T12:23:21Z

Sounds good.

jeffro256 · 2022-11-18T22:36:14Z

Does a node ever attempt to back-ping on a loopback address?

j-berman · 2022-11-18T22:47:39Z

@jeffro256

monero/src/p2p/net_node.inl

Lines 2352 to 2353 in 365fd45

    
           if(!zone.m_peerlist.is_host_allowed(context.m_remote_address)) 
        
             return false;

monero/src/p2p/net_peerlist.h

Lines 265 to 269 in 365fd45

    
           bool peerlist_manager::is_host_allowed(const epee::net_utils::network_address &address) 
        
           { 
        
             //never allow loopback ip 
        
             if(address.is_loopback()) 
        
               return false;

jeffro256 · 2022-11-18T22:50:09Z

@j-berman That's exactly what I was looking for, thank you! I was looking for the reason why the back-ping test wasn't working locally.

jeffro256

I was able to confirm this patch fixed the node so that back pings work 👍🏻

p2p: fix back ping to discover healthy peers to connect to

1a3ae18

j-berman mentioned this pull request Nov 16, 2022

p2p: fix back ping to discover healthy peers to connect to [release] #8641

Merged

vtnerd approved these changes Nov 17, 2022

View reviewed changes

jeffro256 approved these changes Nov 19, 2022

View reviewed changes

luigi1111 merged commit ac8580c into monero-project:master Jan 11, 2023

selsta mentioned this pull request Jan 13, 2023

monerod: incoming peers unreasonably low #8520

Closed

j-berman mentioned this pull request Apr 19, 2024

Add SSL support to P2P #8996

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

p2p: fix back ping to discover healthy peers to connect to #8640

p2p: fix back ping to discover healthy peers to connect to #8640

j-berman commented Nov 16, 2022

j-berman commented Nov 16, 2022 •

edited

Loading

moneromooo-monero commented Nov 16, 2022

j-berman commented Nov 16, 2022

j-berman commented Nov 17, 2022 •

edited

Loading

vtnerd commented Nov 17, 2022

vtnerd commented Nov 17, 2022

vtnerd left a comment •

edited

Loading

j-berman commented Nov 17, 2022 •

edited

Loading

moneromooo-monero commented Nov 18, 2022

jeffro256 commented Nov 18, 2022

j-berman commented Nov 18, 2022

jeffro256 commented Nov 18, 2022

jeffro256 left a comment

p2p: fix back ping to discover healthy peers to connect to #8640

p2p: fix back ping to discover healthy peers to connect to #8640

Conversation

j-berman commented Nov 16, 2022

Overview

The fix

How to manually test

j-berman commented Nov 16, 2022 • edited Loading

moneromooo-monero commented Nov 16, 2022

j-berman commented Nov 16, 2022

j-berman commented Nov 17, 2022 • edited Loading

vtnerd commented Nov 17, 2022

vtnerd commented Nov 17, 2022

vtnerd left a comment • edited Loading

Choose a reason for hiding this comment

j-berman commented Nov 17, 2022 • edited Loading

moneromooo-monero commented Nov 18, 2022

jeffro256 commented Nov 18, 2022

j-berman commented Nov 18, 2022

jeffro256 commented Nov 18, 2022

jeffro256 left a comment

Choose a reason for hiding this comment

j-berman commented Nov 16, 2022 •

edited

Loading

j-berman commented Nov 17, 2022 •

edited

Loading

vtnerd left a comment •

edited

Loading

j-berman commented Nov 17, 2022 •

edited

Loading