-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Peer2Peer Networking #3467
Peer2Peer Networking #3467
Conversation
ouroboros-network/src/Ouroboros/Network/PeerSelection/Simple.hs
Outdated
Show resolved
Hide resolved
ouroboros-network/src/Ouroboros/Network/PeerSelection/Simple.hs
Outdated
Show resolved
Hide resolved
ouroboros-network-framework/src/Ouroboros/Network/Protocol/Handshake/Unversioned.hs
Show resolved
Hide resolved
ouroboros-network-framework/src/Ouroboros/Network/ConnectionManager/Core.hs
Show resolved
Hide resolved
- Fix numberOfConnections call to be number of incommingConnections and not countPrunableConnections - Changed everything to countIncomingConnections - Applied pruning on: - OutboundStateDuplex -> InboundStateDuplex; - OutboundStateDuplex^tau -> InboundStateDuplex; - OutboundIdleStateDuplex^tau -> InboundStateDuplex; - OutboundStateDuplex -> DuplexState transitions - Tweaked DemoteToColdLocal PruneConnections constructor - Changed prune connections code: - for OutboundStateDuplex{^tau} -> InboundStateDuplex and OutboundIdleStateDuplex^tau -> InboundStateDuplex transitions, prune code is pretty much the same except we have to check for (numberOfConns + 1) -- See Summary - for OutboundStateDuplex -> DuplexState transition, prune code has to check for (numberOfConns + 1) and also forcefully be contained inside its toPrune Set -- See Summary - for DuplexState -> InboundStateDuplex transition, prune code has to take into account if the connection is contained in its toPrune Set. If so we shouldn't update the connection to the new state as we might go above hardlimit. - Fix race between acceptLoop and unregisterOutboundConnection - Fixed bug on unregisterInboundConnectionImpl where UnsupportedState was being returned incorrectly. Summary: In order for us to never go above hard limit connections we need to make sure that connections that evolve via OutboundDupSt Ticking → InboundIdleSt Duplex and via OutboundState Duplex -> DuplexState to use 'numberOfConns + 1' because we want to know if we actually let those connection to evolve if we need to make room for them by pruning. For OutboundState Duplex -> DuplexState we also need to include it on its own pruneSet, since this connection might be a Sim-open connection and we are above the hardlimit, we really need to prune this connection in order to preserve the hard limit. Due to the inherent race condition between accept and runConnectionRateLimits in the acceptLoop, we need to, after successfully running accept, check if accepting that connection put us above the hard limit. If it does then we need to cancel the connection.
Disable assert which triggers by failing hot demotions.
Using TCP state names, or something similar, e.g. SYN_SENT, ESTABLISHED, FIN. We use FIN rather than 'FIN_WAIT_1', etc, since the closing is done by the attenuated channel.
There is no need to distinguish between two cases of simultaneous open: whether we opened first or second.
Connect has to blocking actions that can throw async exceptions: * wait for connection delay * wait for connection to be accepted In the first case only async exceptions can be thrown, so we can simplify the error handling with `onException`. In the second case: * if an async exception is caught when we are waiting for the connection to be accepted we simplify remove it from the known connection. * on the other side, we let the `accept` skip over unknown connections.
There is a bug in accept for IPv6 sockets. Instead of returning -1 and setting errno to ECONNABORTED an invalid (>= 0) file descriptor is returned, with the client address left unchanged. The uninitialized client address causes the network package to throw a user error.
- In unregisterInboundConnectionImpl: OutboundDupState Ticking -> OutboundDupState Expired - In withConnectionManager cleanup, where we put the state in TerminatedState is missing a trace. However, since this can be async the connState can come out of order and/or not making much sense. - In requestOutboundConnectionImpl bracketOnError: TerminatedState -> Unknown - In includeInboundConnectionImpl: on readPromise Left handleError case, where we put the state in either TerminatingState or TerminatedState - In unregisterInboundConnectionImpl: Only trace TerminatingState -> TerminatedState, after having written to the connVar - In a possible race between withConnectionManager finally block and forkConnectionHandler cleanup function, where the latter can run first and break an assumption made by withConnectionManager that it is the first to run so a connection shouldn't be in TerminatingState (which is not the case, since there is a race condition). So an out of order and possibly not making much sense transition is logged. - When accept call returns first than connect and the connVar gets overwritten. In requestOutboundConnectionImpl cleanup function we wrongly trace * -> TerminatedState transition of a removed connVar, we shouldn't care about that connVar anymore. - In promotedToWarmRemoteImpl, in OutboundIdleState case we are not updating the state correctly.
In promotedToWarmRemoteImpl (and family), it is possible that the STM action successfully commits, writing a new state to the connVar, but right after an AsyncCancelled is received making us omit the respective transition tracing.
Fixed issue in inboundGovernor function where in case of Async exceptions we wouldn't log the final transitions for the connections.
- This way an assertion is not a pure exception evaluated when printing an `io-sim` test case, but a simulation error. Note that this will hide such exceptions from simulation. To test that such assertion do not happen we can analyse trace. - Check that `TrUnexpectedlyMissingConnectionState` is not logged. This is an initial part of #3465. Authors: Marcin Szamotulski
Basically, if I connect to someone and someone connects to me, before the connect returns (and before the remote accept returns as well) the local accept can return first masking itself as the remote one because we have no way to distinguish directions.
When we prune connection and it is not part of the connections chosen for pruning, we should update its state. Fixes issue #3485.
bors merge |
Build succeeded: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM thanks for the ping.
-- | ||
-- TODO: Talk to devops about what they should do when the node does | ||
-- terminate with a storage layer exception (restart with full recovery). | ||
consensusRethrowPolicy :: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I confirmed with Marcin that this is effectively a clone of the existing ErrorPolicy
. We hope to someday not need them both.
3568: Haddock failure workaround r=amesgen a=amesgen The Haddock documentation CRON job started failing on Nov 16 due to #3467, as the `-- *`s are interpreted as Haddock sections headings which are invalid in this place. Inserting extra spaces (like in many other cases in the codebase) resolves the issue. Co-authored-by: Alexander Esgen <alexander.esgen@iohk.io>
3568: Haddock failure workaround r=amesgen a=amesgen The Haddock documentation CRON job started failing on Nov 16 due to #3467, as the `-- *`s are interpreted as Haddock sections headings which are invalid in this place. Inserting extra spaces (like in many other cases in the codebase) resolves the issue. Co-authored-by: Alexander Esgen <alexander.esgen@iohk.io>
No description provided.