-
Notifications
You must be signed in to change notification settings - Fork 76
can't create cluster over localhost:7777 tunneled connection #16
Comments
I tried giving the 2nd peer -p 7480 to start it on a different port. Better, but still no luck:
The first peer seems to want to dial via tcp directly, rather than re-using the existing (tunnelled) connection to the 7480 peer. |
interestingly, even removing the 2nd peer does not work, and no leader is elected from the one viable node:
1st node continues to say:
I would prefer that "raftremovepeer" be a bit more aggressive here, so as to restore the cluster to a functioning state. |
(I do realize this is all the underlying raft implementation, and little to do with summitdb proper.) |
I haven't played to much with ssh tunneling over raft, so I'm trying to catch up. I'll have to investigate further to fully wrap my head around it. Regarding the raft implementation, as I understand all the peers must be able to reach each other using the same |
I didn't set up symmetric tunnels, so it's my bad. I'm sure it simplifies the raft code to assume full peer-to-peer connectivity, both acting as client and both acting as "server". It does end up simulating split-brain pretty well though. I wonder why hashicorp raft has such a difficult time recovering from it. Might be because I never got to 3 nodes, only 1 and then 1.5 |
I think the "peer already known" logic needs to take into account the port as well as the host; or perhaps it just needs to treat localhost specially. I setup an ssh tunnel (using ssh -L 7777:localhost:7481 remotehost) between machines in EC2 to run some benchmarks, but I can't seem to make a cluster over the tunnel:
hmm... actually, upon further investigation, this errors seems to be coming from the vendored raft here: https://github.com/tidwall/summitdb/blob/master/vendor/github.com/hashicorp/raft/raft.go#L1101
I will continue to investigate. Ideas about how to approach this and workaround thoughts welcome.
The text was updated successfully, but these errors were encountered: