-
Notifications
You must be signed in to change notification settings - Fork 22
immediately perform the handshake when setting up the secure session #16
Conversation
@whyrusleeping and @Stebalien, what do you think? |
4ea2d45
to
96d5b3f
Compare
I don't know man -- could have something to do with identify? In situations like this I'd like to switch away from secio asap. |
Or that we don't yet know whether the connection is actually properly established? |
Why would you pass an unestablished connection to In fact, in go-libp2p-conn (which is the only place I could find secio is used in libp2p), we pass in established connections, so I don't think we'll break anything by merging this change. |
I think the reason might be that this increases time needed for the dial process. |
So, I've been running a node with this change for a few hours and haven't noticed any problems. However, we should try it on a bootstrap node before declaring it good to go. |
@Stebalien: Any news on this? |
I'll talk to @whyrusleeping and @Kubuxu (I wasn't here for the original decision so I'd like to hash it out with them before merging).
|
So, apparently I've been running this on mars... Unfortunately, it significantly slows down the accept loop and ended up causing us to run out of file descriptors (too many connections stuck in the accept loop). We need to fix that but will have to hold off on this patch until then. |
How do we move forward here? This PR is needed to merge libp2p/go-libp2p-conn#9. |
If we want to not do the lazy handshake, then we need another mechanism to handle the secio handshake that doesnt block accepting additional connections. The reason (that i now remember) for the lazy handshake was so that we didnt have to spawn a goroutine for each incoming connection to run the secio stuff in. |
So, it looks like the handshake being lazy here is actually causing a pretty serious perf issue by clogging up notifications:
|
(this also implies this issue is now fairly critically important) |
I think it might make sense to have a pool of goroutines that run the handshakes for new incoming connections, and then pass them up to the swarm listener. I think its probably important that we don't announce the new connection until we have secured a channel with them. Doing this would also allow us to get rid of the lazy handshake stuff. |
I wonder if this is becoming a problem due to that disconnect/reconnect issue (and the fact that we don't have any form of session resumption). |
In one conversation with @whyrusleeping I suggested not accepting secio from disconnected peer for some time. No idea about the timeframe but we should add some metrics. It is possible that many peers are reconnecting right after we drop the session causing CPU trashing because of RSA operations. |
@whyrusleeping libp2p/go-libp2p-conn#9 does exactly this, modulo the pool of goroutines: The connection is not returned before it is fully set up, including secio. I'm not sure if it's worse to introduce a goroutine pool here, goroutines are cheap and an attacker would have to go through the TCP 3-way handshake to block one of them for the time of the handshake timeout. |
Note: libp2p/go-libp2p-transport#21 needs to be solved before we can make progress on that. |
What do you suggest how we move forward with this? libp2p/go-libp2p-transport#21 seems like it requires a lot of changes, and I'm not sure if it's easy to integrate this with the changes I've been working for the last few months. (edited for clarity) |
@Stebalien I don't think that libp2p/go-libp2p-transport#21 is a hard requirement for moving forward here. What we need to do is to make the secio handshake synchronous (this PR) and then make sure that while we are doing this, it does not block more connections from being accepted. I'm reading through https://github.com/libp2p/go-libp2p-conn/pull/9/files now |
You're right, we don't need to fix that right now. However, I don't want to go too far down a road we know leads to a dead-end. The current interfaces are unworkable from a security standpoint (we need to secure connections under the stream muxer, not over); that's why we put that panic in for the So, in order of preference, I'd like to:
As for connection setup part of libp2p/go-libp2p-conn#9, we can't block other connections on any one connection (when we merged this change, the node on which we were testing became unreachable after a period of time because of this + timeouts). Personally, I'd just use go routines and a timeout rather than a pool but either way works and has trade-offs. |
@Stebalien yeah, @marten-seemann's PR here: libp2p/go-libp2p-conn#9 does that with the goroutines (no pool) and timeout. I suggest we move forward with those changes there, but split the interface changes into a separate PR. This will let us move forward on eradicating the lazy handshake bit, and also make the PR for the interface changes much cleaner. |
@whyrusleeping Ah. I didn't see the outer go routine (I thought it was blocking on the timeout). Fine by me. We should really sit down, draw libp2p, and rethink the abstractions a bit when we fix those interfaces... |
s.handshakeDone = true | ||
} | ||
return s.handshakeErr | ||
handshakeCtx, cancel := context.WithTimeout(ctx, HandshakeTimeout) // remove |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For record keeping, I've filed an issue about this: #17
Ok. Let's move forward here. This is a prereq for libp2p/go-libp2p-conn#9 and it looks ready to go. |
This PR removes the lazy handshake. The handshake is now performed immediately when creating a new secure session.