Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add connectivity tests for circuit v2 #116

Open
Tracked by #64
marten-seemann opened this issue Jan 26, 2023 · 39 comments
Open
Tracked by #64

add connectivity tests for circuit v2 #116

marten-seemann opened this issue Jan 26, 2023 · 39 comments
Assignees

Comments

@marten-seemann
Copy link
Contributor

We should add tests for circuit v2. This will require booting 3 nodes: one relay, and 2 nodes that connect via that relay.

This test is independent of (but prerequisite for) tests for NAT hole punching. We don't need to implement a NAT to test that a connection via a relay succeeds.

We need to decide how many dimensions we want to test here. The matrix might become too large if we test every combination of client, server and relay.

@thomaseizinger
Copy link
Contributor

We need to decide how many dimensions we want to test here. The matrix might become too large if we test every combination of client, server and relay.

We could take inspiration from property-based testing, in particular: "shrinking".

First, generate a test matrix where at least each implementation is at least in each role once. For example:

  • go dialer
  • rust relay
  • js listener

and

  • rust dialer
  • js relay
  • go listener

and

  • js dialer
  • go relay
  • rust listener

Then if one of these tests fails, try to narrow it down by swapping out components to find the faulty one, i.e. swap go relay for a rust relay and see if it passes then.

@thomaseizinger
Copy link
Contributor

cc @dhuseby I'd like to propose that we drop dedicated relay tests from the OKRs and instead focus on interop hole punching tests. I think those provide more value and essentially cover all the relay stuff implicitly.

@marten-seemann
Copy link
Contributor Author

I don’t think we should do this, these are (almost) orthogonal. In this test, we want to test that different combinations of circuit relay servers and clients interoperate.

In the holepunching test, we don’t care about the relay server implementation. We can always use the Go or the Rust one. In this test, we care to test that the different holepuching clients and servers interop.

@thomaseizinger
Copy link
Contributor

thomaseizinger commented Jun 14, 2023

How is that "almost orthogonal"? Hole punching will need a matrix out of all client implementations whereas relay tests will need a matrix out of all client and server implementations. Given that we always start two clients and one server, it is a 2xN+1 vs a 3xN matrix right?1

To assert that the relay protocol works, we don't care what protocol we run, can be ping or dcutr.

That is a lot of overlap in implementation and resource usage. I don't see how it is worth it building both, esp. given that I don't think the complexity of the hole punching tests will be any less just because we always use the same relay. It will certainly not take less time than what it takes to build all of the relay tests.

Footnotes

  1. Not taking into account that we probably want to be smart about this anyway to not explode in combinations.

@mxinden
Copy link
Member

mxinden commented Jun 22, 2023

If I understand correctly there are two options on how to move forward:

  1. Add a dedicated circuit relay v2 test, testing compatibility across implementations across all permutations of dialer-relay-listener. In a second step add hole punching tests across implementations. Either with a static relay server implementation, i.e. where it is fine to e.g. test with a go-libp2p relay server only, existing next to the circuit relay v2 tests, or merging with the circuit relay v2 test from step one into a single one.
  2. Start with the hole punching tests right away, adding the relay-server-implementation as an additional dimension testing with go-libp2p, rust-libp2p and js-libp2p relay servers.

The circuit relay v2 test is significantly less work. E.g. it does not require setting up network topologies, nor does it need to worry about the bizarre properties of NATs. Thus following option (1) de-risks the entire endeavor, providing value (i.e. the circuit relay v2 tests) early. Doing (2) right away might safe us some work, though I would argue that the hole punching tests can be build on top of the circuit relay v2 tests, with little time wasted on the latter, potentially even replacing / merging with the latter.

I don't have a strong opinion on (1) vs (2), though with the above arguments I am leaning towards (1), that is get a quick win through the circuit relay v2 tests and then tackle the hard problem of hole punching tests, either building on top of the circuit relay v2 tests or replacing them.

@thomaseizinger
Copy link
Contributor

thomaseizinger commented Jun 22, 2023

The above argument builds on the assumptions:

  • that the relay tests are significantly less work
  • that setting up NATs and testing hole punching is difficult
  • that we can reuse something significant from the relay tests in the hole punching tests

Esp. the last point is not obvious to me at all. For example, I am not sure if docker (compose) provides all the right knobs for us to properly test hole punching. I wouldn't want to build the relay tests with docker and later learn that we have to ditch1 that for something else. Am I the only one with this concern?

My suggestion would be to first explore, what hole punching tests could look like before we make a decision. I am okay with building the relay tests first iff we have a clear plan on how they can evolve into hole punching tests.2

Footnotes

  1. This assumes that we will not have both eventually due to hole-punching being a superset of the functionality tested.

  2. I don't really care if somebody else picks this up but from recent discussions with @mxinden , I'd work on these tests next.

@marten-seemann
Copy link
Contributor Author

Thank you @mxinden, I very much agree with your framing of the two options, and that we should start with (1).

Esp. the last point is not obvious to me at all. For example, I am not sure if docker (compose) provides all the right knobs for us to properly test hole punching

I have little doubt about that. Docker gives us access to the Linux networking stack. We can set iptables rules and routes etc. I've actually done that a few years ago for a similar setup for QUIC interop testing with the QUIC network simulator.

Do you have any other solution in mind that you'd like to explore first?

@thomaseizinger
Copy link
Contributor

Esp. the last point is not obvious to me at all. For example, I am not sure if docker (compose) provides all the right knobs for us to properly test hole punching

I have little doubt about that. Docker gives us access to the Linux networking stack. We can set iptables rules and routes etc. I've actually done that a few years ago for a similar setup for QUIC interop testing with the QUIC network simulator.

Do you have any other solution in mind that you'd like to explore first?

I'd assume that overall tooling for creating virtual network stacks on Linux is more advanced than docker because it is closer to the kernel.

There are also other approaches like provisioning infrastructure on cloud-provider like AWS through terraform.

My suggestion would be to build a PoC for testing hole punching on Linux. I am happy to build that as a Rust-only component in our repo because I want some form of automated hole punching tests anyway. That saves a lot of work because we don't need to deal with the cardinality of multiple implementations etc.

Once that is in place, we can build the relay tests with that in mind.

@thomaseizinger
Copy link
Contributor

I recently wrote a TURN relay for another project and wanted to test that using docker-compose and two networks.

It was a frustrating experience and I ended up ditching the containers due to docker-compose's bad documentation on the support for networks with the new buildkit. It messed up my local routing tables to the point where it broke internet connectivity. Plus, iterating on anything that is built in containers is also painful because again, buildkit isn't very mature yet and caches are somehow not properly shared between docker and docker-compose.

@marten-seemann
Copy link
Contributor Author

I'd really, really like us to stick with Docker and not use any cloud provisioning. It will be:

  1. Faster to run tests locally instead of spinning up a cluster
  2. Cheaper: we're planning to run a lot of these tests, on a regular basis
  3. More flexible: regarding NAT types (think: cones, symmetric, etc.) if we can set them up ourselves instead of relying on the (one?) NAT type that a cloud infrastructure provider would give us.

Thinking ahead, we will need some kind of NAT solution our larger-scale tests as well (at the very least for AutoRelay, AutoNAT, Kademlia). This justifies investing into a clean and simple setup that we can reuse later.

To be honest, I have little doubt that we'll be able to create a Docker setup that emulates a NAT. I don't even think it will be particularly hard. In the end, a NAT doesn't do a lot more than rewrite IPs and keep track of flows, which is what iptables is made for.

I can see multiple ways of creating a NAT in a docker-ized setup:

  • Using iptables inside of Docker containers. Maybe https://github.com/zzJinux/docker-nat-simulate can serve as a source of inspiration. A quick Google search also revealed a bunch of results. Haven't looked at any of those in detail, but it seems like what we're doing is not that special.
  • Using ns-3 inside a Docker container. This is what the QUIC network simulator does, and this would allow us to programmatically handle IP packets passing through the network. It's quite a lot of overhead though (performance-wise), and not trivial to set up, so we should only use this option if we really can't make the iptables approach work.

@thomaseizinger
Copy link
Contributor

Thanks for the pointers! I'll look into them.

@p-shahi
Copy link
Member

p-shahi commented Jun 22, 2023

For the H2 planning, do we have consensus that we'll go forward with completing Relay v2 tests i.e. this PR #147 and then focus on hole punching tests? It would be a good win to get the relay v2 tests in as they have been in flight for some time.

@thomaseizinger
Copy link
Contributor

My suggestion would be to build a PoC for testing hole punching on Linux. I am happy to build that as a Rust-only component in our repo because I want some form of automated hole punching tests anyway. That saves a lot of work because we don't need to deal with the cardinality of multiple implementations etc.

Once that is in place, we can build the relay tests with that in mind.

I am going to start on this spike next week and report the findings here after.

@mxinden
Copy link
Member

mxinden commented Jul 3, 2023

I am going to start on this spike next week and report the findings here after.

//CC @sukunrt since I remember you looking into ways to simulate different network topologies as well. @thomaseizinger @sukunrt I assume there is a possibility to collaborate here.

@sukunrt
Copy link
Member

sukunrt commented Jul 3, 2023

I used network namespaces on linux to do some debugging for hole punching. I wanted to avoid simulators because I felt that would give some false positives but I didn't research this deeply. By simulators, I mean anything which doesn't use the kernel TCP/UDP stack. If you are interested in going the simulation route @dhuseby pointed me to this https://shadow.github.io/ which looks interesting.

For linux namespaces, mininet is a nice abstraction but it doesn't work with IPv6. I think, docker would provide the same set of options as linux network namespaces with the added benefit of running on MacOS.

I'm very excited about hole punching tests. Happy to help with anything you need.

@marten-seemann
Copy link
Contributor Author

I am going to start on this spike next week and report the findings here after.

@thomaseizinger Any updates on this? Really curious to hear what you found out!

@thomaseizinger
Copy link
Contributor

thomaseizinger commented Jul 9, 2023

I am going to start on this spike next week and report the findings here after.

@thomaseizinger Any updates on this? Really curious to hear what you found out!

Not a super exciting one so far unfortunately. My first attempt was to use Linux network namespaces within a container and simulate the NAT with nftables.

Setting up the namespaces works but the NAT doesn't. Debugging proved difficult because nftables rules within a container don't support logging packet drops etc. (Took me a while to figure that one out.)

So the current state is that I have the namespaces set up on my machine and experimenting what nftables rules can work. I managed to set up a NAT in that egress traffic has its source IP rewritten but for some reason, the main namespace which simulates "the internet" doesn't forward the traffic correctly to the other virtual ethernet adapters.

So far I've only tested just with ICMP packets, no libp2p software involved yet. I've investigated for about 10h so far.

I am off next week but will try some different avenues after that. I've not tried any dedicated middleware software like ns3 yet. I was hoping that we can just rely on the linux kernel's IP routing and manage the NAT with firewall rules.

@BigLep
Copy link
Contributor

BigLep commented Jul 18, 2023

@thomaseizinger : I’m not close on this, but do we feel like we have a good handle on the potential options we should be considering? If it could help, it would be nice to ask for input from a wider group of engineers in PLN companies or the p2p space. I assume someone has expertise here and can probably provide some useful pointers. Maybe start with #libp2p-implementers?

@thomaseizinger
Copy link
Contributor

@thomaseizinger : I’m not close on this, but do we feel like we have a good handle on the potential options we should be considering? If it could help, it would be nice to ask for input from a wider group of engineers in PLN companies or the p2p space. I assume someone has expertise here and can probably provide some useful pointers. Maybe start with #libp2p-implementers?

Thanks! I've made some progress as I learnt more about linux network namespaces. I'll continue looking into this beginning of August. I have very limited time at the moment so I don't think it makes sense to involve somebody else at this stage. You can expect an update in about 2 weeks!

@SgtPooki
Copy link
Member

SgtPooki commented Aug 7, 2023

@thomaseizinger any update? I'm trying to push ipfs/helia#182 forward and trying to think through testing for libp2p/js-libp2p#1928

Also, Helia WG(Working Group) has a related task for getting some tests set up so we can confirm DCUtR will work once WebRTC is in go-libp2p: https://pl-strflt.notion.site/write-DCUtR-tests-so-they-re-ready-for-when-go-libp2p-supports-WebRTC-3d0ea3903eee400487326640e8be56e4?pvs=4

@thomaseizinger
Copy link
Contributor

@thomaseizinger any update? I'm trying to push ipfs/helia#182 forward and trying to think through testing for libp2p/js-libp2p#1928

Not yet, I've been catching up with things in rust-libp2p after my time away. @mxinden pointed me to https://github.com/mininet/mininet which I'll explore next.

@thomaseizinger
Copy link
Contributor

Big news! I was able to write a mininet script that sets up a topology where two clients are behind NAT and can only talk to a relay that is reachable from both ends. I then wrote simple binaries using the Rust implementation that first connect to the relay and then attempt to hole-punch and it worked. Here is the output:

❯ sudo python relay-test.py
*** Creating network
*** Adding controller
*** Adding hosts:
halice hbob hrelay natalice natbob
*** Adding switches:
s0 s1 s2
*** Adding links:
(halice, s1) (hbob, s2) (hrelay, s0) (100ms delay) (100ms delay) (natalice, s0) (natalice, s1) (100ms delay) (100ms delay) (natbob, s0) (natbob, s2)
*** Configuring hosts
halice hbob hrelay natalice natbob
*** Starting controller
c0
*** Starting 3 switches
s0 s1 s2 ...(100ms delay) (100ms delay)
*** Waiting for switches to connect
s0 s1 s2
*** Running test
*** hrelay : ('/home/thomas/src/github.com/libp2p/rust-libp2p/target/debug/relay --port 8080 --secret-key-seed 1 --listen-addr 10.0.0.1 &',)
[1] 67094
*** halice : ('/home/thomas/src/github.com/libp2p/rust-libp2p/target/debug/client --mode listen --secret-key-seed 2 --relay-address /ip4/10.0.0.1/tcp/8080/p2p/12D3KooWPjceQrSwdWXPyLLeABRXmuqt69Rg3sBYbU1Nft9HyQ6X &',)
[1] 67095
*** hbob : ('/home/thomas/src/github.com/libp2p/rust-libp2p/target/debug/client --mode dial --secret-key-seed 3 --relay-address /ip4/10.0.0.1/tcp/8080/p2p/12D3KooWPjceQrSwdWXPyLLeABRXmuqt69Rg3sBYbU1Nft9HyQ6X --remote-peer-id 12D3KooWH3uVF6wv47WnArKHk5p6cvgCJEb74UTmxztmQDc298L3',)
[2023-08-17T15:09:48Z INFO  client] Local peer id: 12D3KooWQYhTNQdmr3ArTeUHRYzFg94BKyTkoWBDWez9kSCVe2Xo
[2023-08-17T15:09:48Z INFO  client] Listening on "/ip4/127.0.0.1/tcp/43093"
[2023-08-17T15:09:48Z INFO  client] Listening on "/ip4/192.168.2.100/tcp/43093"
[2023-08-17T15:09:48Z INFO  client] Listening on "/ip4/127.0.0.1/udp/35874/quic-v1"
[2023-08-17T15:09:48Z INFO  client] Listening on "/ip4/192.168.2.100/udp/35874/quic-v1"
[2023-08-17T15:09:51Z INFO  client] Told relay its public address.
[2023-08-17T15:09:52Z INFO  client] Relay told us our public address: "/ip4/10.0.0.5/tcp/43093"
[2023-08-17T15:09:55Z INFO  client] OutboundCircuitEstablished { relay_peer_id: PeerId("12D3KooWPjceQrSwdWXPyLLeABRXmuqt69Rg3sBYbU1Nft9HyQ6X"), limit: None }
[2023-08-17T15:09:59Z INFO  client] Successfully hole-punched to 12D3KooWH3uVF6wv47WnArKHk5p6cvgCJEb74UTmxztmQDc298L3
*** Stopping 1 controllers
c0
*** Stopping 7 links
.......
*** Stopping 3 switches
s0 s1 s2
*** Stopping 5 hosts
halice hbob hrelay natalice natbob
*** Done

Things worth mentioning:

  • This is not using docker but directly set up linux network namespaces or process groups.
  • Currently I am using a sleep to wait for the listening client successfully complete the reservation. We probably should use redis here same as for the interop tests.
  • We'll need to build binaries of different implementations but run them all on the same host. I think docker's host networking mode should make this possible.
  • Currently, I only managed to run the script as an example in the mininet repository. We'll need to figure out a way to install mininet as a package and have the script still work.

Regarding the scope: I'd consider the MVP to be complete at this point. We can use mininet to simulate the NAT. Should we use that already for the circuit v2 tests or run those without a NAT setup as hinted at in the PR description?

@mxinden
Copy link
Member

mxinden commented Aug 17, 2023

This is great news.

Currently I am using a sleep to wait for the listening client successfully complete the reservation. We probably should use redis here same as for the interop tests.

Sounds good to me.

We'll need to build binaries of different implementations but run them all on the same host. I think docker's host networking mode should make this possible.

I am in favor of exploring this early on. In my eyes Docker would significantly simplify build and package management. Especially when it comes to non-statically build binaries, see e.g. #244 (comment).

Big news! I was able to write a mininet script that sets up a topology where two clients are behind NAT and can only talk to a relay that is reachable from both ends. I then wrote simple binaries using the Rust implementation that first connect to the relay and then attempt to hole-punch and it worked. Here is the output:

Can you share the code of the MVP?

Should we use that already for the circuit v2 tests or run those without a NAT setup as hinted at in the PR description?

I am fine either way.

@thomaseizinger
Copy link
Contributor

Big news! I was able to write a mininet script that sets up a topology where two clients are behind NAT and can only talk to a relay that is reachable from both ends. I then wrote simple binaries using the Rust implementation that first connect to the relay and then attempt to hole-punch and it worked. Here is the output:

Can you share the code of the MVP?

At the moment it is tied to my machine unfortunately. I'll next explore how I can bundle up the mininet script such that other people can easily run this.

But for the curious, here is the topology script:

#!/usr/bin/env python
import time

from mininet.link import TCLink
from mininet.log import setLogLevel
from mininet.net import Mininet
from mininet.nodelib import NAT
from mininet.topo import Topo

class InternetTopo(Topo):
    def build(self, **_kwargs ):
        # set up inet switch
        inetSwitch = self.addSwitch('s0')

        # add two hosts, both behind NAT
        for index, kind in enumerate(["alice", "bob"]):
            index += 1

            inetIntf = 'nat%s-eth0' % kind
            localIntf = 'nat%s-eth1' % kind
            localIP = '192.168.%d.1' % index
            localSubnet = '192.168.%d.0/24' % index
            natParams = { 'ip' : '%s/24' % localIP }
            # add NAT to topology
            nat = self.addNode('nat%s' % kind, cls=NAT, subnet=localSubnet,
                               inetIntf=inetIntf, localIntf=localIntf)

            switch = self.addSwitch('s%s' % index)
            # connect NAT to inet and local switches
            self.addLink(nat, inetSwitch, intfName1=inetIntf, cls=TCLink, delay = '70ms')
            self.addLink(nat, switch, intfName1=localIntf, params1=natParams)
            # add host and connect to local switch
            host = self.addHost('h%s' % kind,
                                ip='192.168.%d.100/24' % index,
                                defaultRoute='via %s' % localIP)
            self.addLink(host, switch)

        # add relay host
        host = self.addHost('hrelay', ip='10.0.0.1/24')
        self.addLink(host, inetSwitch, cls=TCLink, delay = '30ms')

def relay_test(mininet: Mininet):
    relay = mininet.getNodeByName('hrelay')
    alice = mininet.getNodeByName('halice')
    bob = mininet.getNodeByName('hbob')

    relay.cmdPrint('/home/thomas/src/github.com/libp2p/rust-libp2p/target/debug/relay --port 8080 --secret-key-seed 1 --listen-addr %s &' % relay.IP())
    alice.cmdPrint('/home/thomas/src/github.com/libp2p/rust-libp2p/target/debug/client --mode listen --secret-key-seed 2 --relay-address /ip4/%s/tcp/8080/p2p/12D3KooWPjceQrSwdWXPyLLeABRXmuqt69Rg3sBYbU1Nft9HyQ6X &' % relay.IP())

    time.sleep(5)
    bob.cmdPrint('/home/thomas/src/github.com/libp2p/rust-libp2p/target/debug/client --mode dial --secret-key-seed 3 --relay-address /ip4/%s/tcp/8080/p2p/12D3KooWPjceQrSwdWXPyLLeABRXmuqt69Rg3sBYbU1Nft9HyQ6X --remote-peer-id 12D3KooWH3uVF6wv47WnArKHk5p6cvgCJEb74UTmxztmQDc298L3' % relay.IP())

if __name__ == '__main__':
    setLogLevel('info')

    net = Mininet(topo=InternetTopo(), waitConnected=True)
    net.run(relay_test, net)

And here are the binaries I used: https://github.com/libp2p/rust-libp2p/tree/feat/hole-punching-tests/hole-punching-tests/src/bin

@MarcoPolo
Copy link
Contributor

@thomaseizinger did you take a look at https://github.com/shadow/shadow? It compares itself to mininet, but claims to be more deterministic.

@thomaseizinger
Copy link
Contributor

thomaseizinger commented Aug 17, 2023

@thomaseizinger did you take a look at https://github.com/shadow/shadow? It compares itself to mininet, but claims to be more deterministic.

I did not (yet). I've found mininet to be a bit dated. For example, it still uses iptables despite being deprecated on distributions like Ubuntu.

I can take a look at shadow but overall, I like the idea of actually using the kernel's TCP/UDP implementation.

Given that mininet is dated, it might be worth using shadow instead.

@mxinden
Copy link
Member

mxinden commented Aug 17, 2023

@Menduist mentioned shadow in the last community call. I checked but dropped it due to missing NAT support shadow/shadow#249.

@dhuseby
Copy link

dhuseby commented Aug 17, 2023

I am in favor of exploring this early on. In my eyes Docker would significantly simplify build and package management.

@thomaseizinger is there some way to run the hosts as docker containers and use mininet to connect the two docker containers to NATs and a controller? Basically, instead of running the binaries directly, you run docker images.

@thomaseizinger
Copy link
Contributor

I am in favor of exploring this early on. In my eyes Docker would significantly simplify build and package management.

@thomaseizinger is there some way to run the hosts as docker containers and use mininet to connect the two docker containers to NATs and a controller? Basically, instead of running the binaries directly, you run docker images.

I think using docker's host networking mode, this should possible and it is what I am planning to do.

@marten-seemann
Copy link
Contributor Author

Great work, happy to see progress here @thomaseizinger!

When I built a network simulator for QUIC testing, I had a similar problem, and maybe the solution is transferrable here: https://github.com/marten-seemann/quic-network-simulator. As you can see in the diagram in the README, this is a 3-container Docker setup, with a client and a server communication through a third container (sim). In our case, this would be a container containing the mininet setup. There's a little bit of setup necessary to force the packets to pass through the container (here and here).

@thomaseizinger
Copy link
Contributor

thomaseizinger commented Aug 24, 2023

I've done some more research and stumbled over https://github.com/containernet/containernet. This says it can run inside a docker container which is OSX friendly. Mininet only runs on Linux.

I'll investigate that and see what it can do.

@thomaseizinger
Copy link
Contributor

As you can see in the diagram in the README, this is a 3-container Docker setup, with a client and a server communication through a third container (sim). In our case, this would be a container containing the mininet setup. There's a little bit of setup necessary to force the packets to pass through the container (here and here).

I am not sure this is possible. Mininet itself uses network namespaces and cgroups and thus operates on the same level as docker.

@thomaseizinger
Copy link
Contributor

I've done some more research and stumbled over containernet/containernet. This says it can run inside a docker container which is the OSX friendly. Mininet only runs on Linux.

I'll investigate that and see what it can do.

Unfortunately, I can't get containernet to work. It propagates some error from the docker daemon so I am assuming, it is not up to date with the more recent docker version that I have installed.

@marten-seemann
Copy link
Contributor Author

I played around with a dockerized NAT setup today. The good news is: It works, without any mininet / containernet / ns-3. Just good old (legacy) iptables, plus adding a few routes is all it takes! The NAT is a separate docker container, as are the two endpoints. This is exactly the setup we need for our interop tests. It's surprisingly straightforward, I wouldn't have expected to have to write that little code. Take a close look at the docker compose file and the setup scripts.

The setup is in my GitHub Repo docker-nat-simulator. There's basically no setup needed at all, all you need to do is use docker compose up. Please take a look at the README, it describes in quite some detail how the setup works and how you can convince yourself of that.

I also played around with a more complicated hole punching setup, based on my simple setup from above. It's in the hole-punching branch of the same repo. (You might need to run docker network prune when switching between the two branches). It uses exactly the same iptables commands to set up the setup, but now has two NAT-ed hosts behind their respective NATs, plus a relay server. Again, everything is described in detail in the README.

The good news is: go-libp2p is able to holepunch to the peer, sometimes. Yes, it's super flaky. You might need to re-run the test 5 times or more. And it only works on TCP and not on UDP. See the Future Work section in the README. Might be a bugs with the setup, or in go-libp2p. I haven't spent more than 15 minutes debugging this yet, and at this point, I don't care too much. The fact that the hole punch sometimes works shows that we're not on the wrong path here.

@thomaseizinger
Copy link
Contributor

Thanks @marten-seemann ! I'll take a more detailed look next week and see if I can transfer some of the NAT setup from my previous testing. I remember mininet having more iptables rules than this. Plus, if we control it directly, we might as well go for nftables straight away which is as far as I understand the successor.

Tbh, not using mininet is probably a good idea. It is very powerful and I've found its network setup to be very stable. It is also super easy to change the config and build more complex topologies which might be interesting for future tests. But the fact that it is written in Python is an absolute pain.

@thomaseizinger
Copy link
Contributor

Great news!

Based on your work @marten-seemann, I was able to write a deterministic hole-punching test for rust-libp2p that works for TCP and QUIC! I replaced iptables with nftables which seems to do the trick? I also add some scripts have the container perform some auto-configuration based on their network interfaces.

Lastly, I added a control network that includes a redis server which coordinates the test.

The code is here: https://github.com/libp2p/rust-libp2p/tree/feat/hole-punching-tests/hole-punching-tests

You should be able to run it with:

# First, clone the branch, then run:

cd ./hole-punching-tests
docker compose rm -f && docker compose up --exit-code-from alice --abort-on-container-exit

I've run this several times already and the hole-punch always succeed, no flakes so far.

How do we want to proceed from here?

@thomaseizinger
Copy link
Contributor

thomaseizinger commented Sep 25, 2023

There is now a working version of automated hole-punch tests ready for review in #304. We can either expand that to also cover more combinations of the relay or build separate relay tests from there.

@mxinden
Copy link
Member

mxinden commented Oct 16, 2023

Cross-referencing nim-libp2p hole punching tests tracking issue vacp2p/nim-libp2p#966. //CC @diegomrsantos and @thomaseizinger

@thomaseizinger
Copy link
Contributor

Cross-referencing nim-libp2p hole punching tests tracking issue status-im/nim-libp2p#966. //CC @diegomrsantos and @thomaseizinger

The hole punch test issue is #126. This issue is about interop tests for circuitv2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants