Skip to content
This repository has been archived by the owner on Feb 24, 2021. It is now read-only.

immediately perform the handshake when setting up the secure session #16

Merged
merged 1 commit into from
Oct 26, 2017

Conversation

marten-seemann
Copy link
Contributor

This PR removes the lazy handshake. The handshake is now performed immediately when creating a new secure session.

@marten-seemann
Copy link
Contributor Author

@whyrusleeping and @Stebalien, what do you think?

@whyrusleeping
Copy link
Contributor

Also cc @Kubuxu and @lgierth

I'm trying to remember why we do this, and coming up short... there definitely was a reason for this, but i'm not sure if it still valid, or what.

@ghost
Copy link

ghost commented Sep 20, 2017

I don't know man -- could have something to do with identify? In situations like this I'd like to switch away from secio asap.

@ghost
Copy link

ghost commented Sep 20, 2017

Or that we don't yet know whether the connection is actually properly established?

@marten-seemann
Copy link
Contributor Author

Why would you pass an unestablished connection to NewSession? There would be hardly any benefit, since you'll have to wait for connection establishment anyway before performing the handshake.

In fact, in go-libp2p-conn (which is the only place I could find secio is used in libp2p), we pass in established connections, so I don't think we'll break anything by merging this change.

@Kubuxu
Copy link
Member

Kubuxu commented Sep 21, 2017

I think the reason might be that this increases time needed for the dial process.
Currently dial takes as much time as an establishment of TCP connections (3 way handshake, 1RTT). With this the dial will take much more, about 5RTT.

@Stebalien
Copy link
Member

So, I've been running a node with this change for a few hours and haven't noticed any problems. However, we should try it on a bootstrap node before declaring it good to go.

@marten-seemann
Copy link
Contributor Author

@Stebalien: Any news on this?
What's the plan on merging this PR?

@Stebalien
Copy link
Member

Stebalien commented Sep 25, 2017 via email

@Stebalien
Copy link
Member

So, apparently I've been running this on mars...

Unfortunately, it significantly slows down the accept loop and ended up causing us to run out of file descriptors (too many connections stuck in the accept loop). We need to fix that but will have to hold off on this patch until then.

@marten-seemann
Copy link
Contributor Author

How do we move forward here? This PR is needed to merge libp2p/go-libp2p-conn#9.

@whyrusleeping
Copy link
Contributor

If we want to not do the lazy handshake, then we need another mechanism to handle the secio handshake that doesnt block accepting additional connections. The reason (that i now remember) for the lazy handshake was so that we didnt have to spawn a goroutine for each incoming connection to run the secio stuff in.

@whyrusleeping
Copy link
Contributor

So, it looks like the handshake being lazy here is actually causing a pretty serious perf issue by clogging up notifications:

goroutine 11018914 [semacquire]:
sync.runtime_SemacquireMutex(0xc4454cf89c)
        /home/whyrusleeping/go/src/runtime/sema.go:62 +0x34
sync.(*Mutex).Lock(0xc4454cf898)
        /home/whyrusleeping/go/src/sync/mutex.go:87 +0x9d
gx/ipfs/QmZfwmhbcgSDGqGaoMMYx8jxBGauZw75zPjnZAyfwPso7M/go-libp2p-secio.(*secureSession).Handshake(0xc4454cf680, 0x0, 0x0)
        /home/whyrusleeping/gopkg/src/gx/ipfs/QmZfwmhbcgSDGqGaoMMYx8jxBGauZw75zPjnZAyfwPso7M/go-libp2p-secio/protocol.go:93 +0x4f
gx/ipfs/QmZfwmhbcgSDGqGaoMMYx8jxBGauZw75zPjnZAyfwPso7M/go-libp2p-secio.(*secureSession).RemotePeer(0xc4454cf680, 0xc420025600, 0x7f17bd6410b8)
        /home/whyrusleeping/gopkg/src/gx/ipfs/QmZfwmhbcgSDGqGaoMMYx8jxBGauZw75zPjnZAyfwPso7M/go-libp2p-secio/interface.go:71 +0x2b
gx/ipfs/QmTi4629yyHJ8qW9sXFjvxJpYcN499tHhERLZYdUqwRU9i/go-libp2p-conn.(*secureConn).RemotePeer(0xc432b32720, 0x7f17bd6410b8, 0xc432b32720)
        /home/whyrusleeping/gopkg/src/gx/ipfs/QmTi4629yyHJ8qW9sXFjvxJpYcN499tHhERLZYdUqwRU9i/go-libp2p-conn/secure_conn.go:100 +0x34
gx/ipfs/QmWpJ4y2vxJ6GZpPfQbpVpQxAYS3UeR6AKNbAHxw7wN3qw/go-libp2p-swarm.(*Conn).RemotePeer(0xc449f719d0, 0x1537468, 0xc4200d05e8)
        /home/whyrusleeping/gopkg/src/gx/ipfs/QmWpJ4y2vxJ6GZpPfQbpVpQxAYS3UeR6AKNbAHxw7wN3qw/go-libp2p-swarm/swarm_conn.go:64 +0x80
gx/ipfs/QmYi2NvTAiv2xTNJNcnuz3iXDDT1ViBwLFXmDb2g7NogAD/go-libp2p-kad-dht.(*netNotifiee).Connected(0xc4200d0540, 0x1c188a0, 0xc42000b400, 0x1c16280, 0xc449f719d0)
        /home/whyrusleeping/gopkg/src/gx/ipfs/QmYi2NvTAiv2xTNJNcnuz3iXDDT1ViBwLFXmDb2g7NogAD/go-libp2p-kad-dht/notif.go:35 +0xd5
gx/ipfs/QmWpJ4y2vxJ6GZpPfQbpVpQxAYS3UeR6AKNbAHxw7wN3qw/go-libp2p-swarm.(*ps2netNotifee).Connected(0xc420b2f000, 0xc449f719d0)
        /home/whyrusleeping/gopkg/src/gx/ipfs/QmWpJ4y2vxJ6GZpPfQbpVpQxAYS3UeR6AKNbAHxw7wN3qw/go-libp2p-swarm/swarm.go:387 +0x5e
gx/ipfs/QmTMNkpso2WRMevXC8ZxgyBhJvoEHvk24SNeUr9Mf9UM1a/go-peerstream.(*Swarm).addConn.func2(0x1c11460, 0xc420b2f000)
        /home/whyrusleeping/gopkg/src/gx/ipfs/QmTMNkpso2WRMevXC8ZxgyBhJvoEHvk24SNeUr9Mf9UM1a/go-peerstream/conn.go:225 +0x3a
gx/ipfs/QmTMNkpso2WRMevXC8ZxgyBhJvoEHvk24SNeUr9Mf9UM1a/go-peerstream.(*Swarm).notifyAll.func1(0xc4334eb3f0, 0xc4334eb3e0, 0x1c11460, 0xc420b2f000)
        /home/whyrusleeping/gopkg/src/gx/ipfs/QmTMNkpso2WRMevXC8ZxgyBhJvoEHvk24SNeUr9Mf9UM1a/go-peerstream/swarm.go:404 +0x60
created by gx/ipfs/QmTMNkpso2WRMevXC8ZxgyBhJvoEHvk24SNeUr9Mf9UM1a/go-peerstream.(*Swarm).notifyAll
        /home/whyrusleeping/gopkg/src/gx/ipfs/QmTMNkpso2WRMevXC8ZxgyBhJvoEHvk24SNeUr9Mf9UM1a/go-peerstream/swarm.go:405 +0x12e

@whyrusleeping
Copy link
Contributor

(this also implies this issue is now fairly critically important)

@whyrusleeping
Copy link
Contributor

I think it might make sense to have a pool of goroutines that run the handshakes for new incoming connections, and then pass them up to the swarm listener. I think its probably important that we don't announce the new connection until we have secured a channel with them. Doing this would also allow us to get rid of the lazy handshake stuff.

@Stebalien
Copy link
Member

I wonder if this is becoming a problem due to that disconnect/reconnect issue (and the fact that we don't have any form of session resumption).

@Kubuxu
Copy link
Member

Kubuxu commented Oct 17, 2017

In one conversation with @whyrusleeping I suggested not accepting secio from disconnected peer for some time. No idea about the timeframe but we should add some metrics. It is possible that many peers are reconnecting right after we drop the session causing CPU trashing because of RSA operations.

@marten-seemann
Copy link
Contributor Author

@whyrusleeping libp2p/go-libp2p-conn#9 does exactly this, modulo the pool of goroutines: The connection is not returned before it is fully set up, including secio. I'm not sure if it's worse to introduce a goroutine pool here, goroutines are cheap and an attacker would have to go through the TCP 3-way handshake to block one of them for the time of the handshake timeout.

@Stebalien
Copy link
Member

Note: libp2p/go-libp2p-transport#21 needs to be solved before we can make progress on that.

@marten-seemann
Copy link
Contributor Author

marten-seemann commented Oct 18, 2017

What do you suggest how we move forward with this? libp2p/go-libp2p-transport#21 seems like it requires a lot of changes, and I'm not sure if it's easy to integrate this with the changes I've been working for the last few months.

(edited for clarity)

@whyrusleeping
Copy link
Contributor

@Stebalien I don't think that libp2p/go-libp2p-transport#21 is a hard requirement for moving forward here. What we need to do is to make the secio handshake synchronous (this PR) and then make sure that while we are doing this, it does not block more connections from being accepted. I'm reading through https://github.com/libp2p/go-libp2p-conn/pull/9/files now

@Stebalien
Copy link
Member

You're right, we don't need to fix that right now. However, I don't want to go too far down a road we know leads to a dead-end. The current interfaces are unworkable from a security standpoint (we need to secure connections under the stream muxer, not over); that's why we put that panic in for the MultiplexConn connection setup. For now, we haven't merged much code that relies on these interfaces but, if we merge that change, everything will suddenly start relying on these interfaces.

So, in order of preference, I'd like to:

  1. Solve Require that all transports be "fully featured" go-libp2p-transport#21
  2. Revert add a MultiStreamConn interface go-libp2p-transport#20 and Compatibility with the new transport interfaces go-tcp-transport#8
  3. Or go forward knowing we'll need to fix everything later.

As for connection setup part of libp2p/go-libp2p-conn#9, we can't block other connections on any one connection (when we merged this change, the node on which we were testing became unreachable after a period of time because of this + timeouts). Personally, I'd just use go routines and a timeout rather than a pool but either way works and has trade-offs.

@whyrusleeping
Copy link
Contributor

@Stebalien yeah, @marten-seemann's PR here: libp2p/go-libp2p-conn#9 does that with the goroutines (no pool) and timeout. I suggest we move forward with those changes there, but split the interface changes into a separate PR. This will let us move forward on eradicating the lazy handshake bit, and also make the PR for the interface changes much cleaner.

@Stebalien
Copy link
Member

@whyrusleeping Ah. I didn't see the outer go routine (I thought it was blocking on the timeout). Fine by me.

We should really sit down, draw libp2p, and rethink the abstractions a bit when we fix those interfaces...

s.handshakeDone = true
}
return s.handshakeErr
handshakeCtx, cancel := context.WithTimeout(ctx, HandshakeTimeout) // remove
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For record keeping, I've filed an issue about this: #17

@Stebalien
Copy link
Member

Ok. Let's move forward here. This is a prereq for libp2p/go-libp2p-conn#9 and it looks ready to go.

@Stebalien Stebalien merged commit a92a92e into libp2p:master Oct 26, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants