Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Peer2Peer Networking #3467

Merged
merged 146 commits into from
Nov 15, 2021
Merged

Peer2Peer Networking #3467

merged 146 commits into from
Nov 15, 2021

Conversation

coot
Copy link
Contributor

@coot coot commented Oct 29, 2021

No description provided.

bolt12 and others added 19 commits November 15, 2021 14:50
- Fix numberOfConnections call to be number of incommingConnections and not countPrunableConnections
- Changed everything to countIncomingConnections
- Applied pruning on:
  - OutboundStateDuplex -> InboundStateDuplex;
  - OutboundStateDuplex^tau -> InboundStateDuplex;
  - OutboundIdleStateDuplex^tau -> InboundStateDuplex;
  - OutboundStateDuplex -> DuplexState
  transitions
- Tweaked DemoteToColdLocal PruneConnections constructor
- Changed prune connections code:
  - for OutboundStateDuplex{^tau} -> InboundStateDuplex and
    OutboundIdleStateDuplex^tau -> InboundStateDuplex transitions, prune
    code is pretty much the same except we have to check for
    (numberOfConns + 1) -- See Summary
  - for OutboundStateDuplex -> DuplexState transition, prune code has to
    check for (numberOfConns + 1) and also forcefully be contained
    inside its toPrune Set -- See Summary
  - for DuplexState -> InboundStateDuplex transition, prune code has to
    take into account if the connection is contained in its toPrune Set.
    If so we shouldn't update the connection to the new state as we
    might go above hardlimit.
- Fix race between acceptLoop and unregisterOutboundConnection
- Fixed bug on unregisterInboundConnectionImpl where UnsupportedState
  was being returned incorrectly.

Summary:

In order for us to never go above hard limit connections we need to make sure that connections that evolve via OutboundDupSt Ticking → InboundIdleSt Duplex  and via OutboundState Duplex -> DuplexState  to use 'numberOfConns + 1' because we want to know if we actually let those connection to evolve if we need to make room for them by pruning.

For OutboundState Duplex -> DuplexState we also need to include it on its own pruneSet, since this connection might be a Sim-open connection and we are above the hardlimit, we really need to prune this connection in order to preserve the hard limit.

Due to the inherent race condition between accept and runConnectionRateLimits in the acceptLoop, we need to, after successfully running accept, check if accepting that connection put us above the hard limit. If it does then we need to cancel the connection.
Disable assert which triggers by failing hot demotions.
Using TCP state names, or something similar, e.g. SYN_SENT, ESTABLISHED,
FIN.  We use FIN rather than 'FIN_WAIT_1', etc, since the closing is
done by the attenuated channel.
There is no need to distinguish between two cases of simultaneous open:
whether we opened first or second.
Connect has to blocking actions that can throw async exceptions:
* wait for connection delay
* wait for connection to be accepted

In the first case only async exceptions can be thrown, so we can
simplify the error handling with `onException`.

In the second case:

* if an async exception is caught when we are waiting for the connection
  to be accepted we simplify remove it from the known connection.
* on the other side, we let the `accept` skip over unknown connections.
There is a bug in accept for IPv6 sockets. Instead of returning -1
and setting errno to ECONNABORTED an invalid (>= 0) file descriptor
is returned, with the client address left unchanged. The uninitialized
client address causes the network package to throw a user error.
- In unregisterInboundConnectionImpl: OutboundDupState Ticking -> OutboundDupState Expired
- In withConnectionManager cleanup, where we put the state in TerminatedState is missing a trace. However, since this can be async the connState can come out of order and/or not making much sense.
- In requestOutboundConnectionImpl bracketOnError: TerminatedState -> Unknown
- In includeInboundConnectionImpl: on readPromise Left handleError case, where we put the state in either TerminatingState or TerminatedState
- In unregisterInboundConnectionImpl: Only trace TerminatingState -> TerminatedState, after having written to the connVar
- In a possible race between withConnectionManager finally block and forkConnectionHandler cleanup function, where the latter can run first and break an assumption made by withConnectionManager that it is the first to run so a connection shouldn't be in TerminatingState (which is not the case, since there is a race condition). So an out of order and possibly not making much sense transition is logged.
- When accept call returns first than connect and the connVar gets overwritten. In requestOutboundConnectionImpl cleanup function we wrongly trace * -> TerminatedState transition of a removed connVar, we shouldn't care about that connVar anymore.
- In promotedToWarmRemoteImpl, in OutboundIdleState case we are not updating the state correctly.
In promotedToWarmRemoteImpl (and family), it is possible that the STM action successfully commits, writing a new state to the connVar, but right after an AsyncCancelled is received making us omit the respective transition tracing.
Fixed issue in inboundGovernor function where in case of Async
exceptions we wouldn't log the final transitions for the connections.
- This way an assertion is not a pure exception evaluated when printing
an `io-sim` test case, but a simulation error. Note that this will hide
such exceptions from simulation. To test that such assertion do not
happen we can analyse trace.

- Check that `TrUnexpectedlyMissingConnectionState` is not logged.
This is an initial part of #3465.

Authors: Marcin Szamotulski
Basically, if I connect to someone and someone connects to me, before the connect returns (and before the remote accept returns as well) the local accept can return first masking itself as the remote one because we have no way to distinguish directions.
When we prune connection and it is not part of the connections chosen
for pruning, we should update its state.

Fixes issue #3485.
@coot
Copy link
Contributor Author

coot commented Nov 15, 2021

bors merge

@iohk-bors
Copy link
Contributor

iohk-bors bot commented Nov 15, 2021

@iohk-bors iohk-bors bot merged commit 7a56e88 into master Nov 15, 2021
@iohk-bors iohk-bors bot deleted the p2p-master branch November 15, 2021 19:40
Copy link
Contributor

@nfrisby nfrisby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thanks for the ping.

--
-- TODO: Talk to devops about what they should do when the node does
-- terminate with a storage layer exception (restart with full recovery).
consensusRethrowPolicy ::
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I confirmed with Marcin that this is effectively a clone of the existing ErrorPolicy. We hope to someday not need them both.

iohk-bors bot added a commit that referenced this pull request Jan 24, 2022
3568: Haddock failure workaround r=amesgen a=amesgen

The Haddock documentation CRON job started failing on Nov 16 due to #3467, as the `-- *`s are interpreted as Haddock sections headings which are invalid in this place. Inserting extra spaces (like in many other cases in the codebase) resolves the issue.

Co-authored-by: Alexander Esgen <alexander.esgen@iohk.io>
coot pushed a commit that referenced this pull request May 16, 2022
3568: Haddock failure workaround r=amesgen a=amesgen

The Haddock documentation CRON job started failing on Nov 16 due to #3467, as the `-- *`s are interpreted as Haddock sections headings which are invalid in this place. Inserting extra spaces (like in many other cases in the codebase) resolves the issue.

Co-authored-by: Alexander Esgen <alexander.esgen@iohk.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants