Persisting/seeding a routing table #383

aarshkshah1992 · 2019-08-12T04:21:00Z

Based on PR #315 by @raulk

Please refer to issues:
#254
#295

aarshkshah1992 · 2019-08-12T16:23:14Z

@raulk Please take a look.

Changes

Refactored the randomSeeder
Added tests for the randomSeeder & the snapshot Implementation
Set meaningful defaults, try to seed dht & init periodic snapshotting when dht is created
End-End dht test for periodic snapshotting & seeding from a previous snapshot

TODO

Resolve the conflicts :)

One question for you:
The DefaultBootstrap peers we use for fallback haven't been added explicitly to the peerstore
anywhere in the DHT code. Usages of DefaultBootstrap peers in other packages also assumes that they are somehow already in the peerstore. Is this is a valid assumption ? If yes, when do these get added to the peerstore ?

Note: A side effect of successfully dialling to another peer is that the Dht network notification handler adds it to the routing table. We have an issue open to separate our routing table from our connections/disconnections (#283). But, till that is done, we will end up adding the same peer twice to the routing table in the seeder which will make the routing table falsely assume that is a 'good'/'active' peer. I think we can live it for now (?)

aarshkshah1992 · 2019-08-31T14:51:48Z

@raulk

We will populate the RT using the seeder whenever RT becomes empty as part of this. #384 now depends on this PR.

raulk · 2019-09-03T16:47:45Z

Hey @aarshkshah1992, I've spoken to @bigs and he's gonna pick this up, as I'm currently tackling a deadline and I don't want to leave this out to dry. Thanks for your amazing work so far. Expect to hear from @bigs shortly.

aarshkshah1992 · 2019-09-05T02:07:57Z

@bigs Hey, this PR is ripe for review. Please take a look when you can. Thanks :)

bigs · 2019-09-05T02:25:49Z

will do in a few 👌🏻

…

On Wed, Sep 4, 2019 at 22:07 Aarsh Shah ***@***.***> wrote: @bigs <https://github.com/bigs> Hey, this PR is ripe for review. Please take a look when you can. Thanks :) — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#383?email_source=notifications&email_token=AABUCWQWTE7ULPL5APC7RALQIBSX5A5CNFSM4IK5YVWKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD55SPMY#issuecomment-528164787>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AABUCWTA4GHSRXAPMSNL6A3QIBSX5ANCNFSM4IK5YVWA> .

1) on connecting to a new peer -> trigger self & bucket bootstrap if RT size goes below thereshold 2) accept formatting & doc suggestions in the review 3) remove RT recovery code for now -> will address in a separate PR once libp2p#383 goes in changes as per review

aarshkshah1992 · 2019-09-10T14:59:37Z

Hey @bigs, reminder to review :)

raulk · 2019-09-13T18:16:36Z

@bigs how can I help move this forward?

bigs

looking really good

dht.go

dht_bootstrap.go

pb/dht.proto

dht_test.go

dht_bootstrap.go

dht_test.go

persist/seeder.go

aarshkshah1992 · 2019-09-20T05:21:32Z

@bigs I have fixed the conflicts & addressed your comments. Thanks for the review ! :) Let me know if we need more changes before merging this.

aarshkshah1992 · 2019-09-27T17:58:43Z

Hey @bigs, please take a look when you can. This is very close to getting merged.

bigs

thanks so much for the hard work here. looking great.

raulk · 2019-09-28T01:35:56Z

Thanks for all the hard work here! Let me take a look before we merge.

aarshkshah1992 · 2019-10-02T12:38:41Z

Thanks a lot @bigs ! Really appreciate it :)

1) on connecting to a new peer -> trigger self & bucket bootstrap if RT size goes below thereshold 2) accept formatting & doc suggestions in the review 3) remove RT recovery code for now -> will address in a separate PR once libp2p#383 goes in changes as per review

aarshkshah1992 · 2019-10-11T14:52:53Z

@raulk

Please take a look when you get time. Since #384 is now merged, getting this in would enable us to finish #387 which would really go a long way in helping Dht recover from an empty RT/sparse K-Buckets.

"All you need is love, along with a snapshotted & seeded Dht"
-John Lennon

raulk

Great, great work pushing this forward. Thank you, @aarshkshah1992! Just a few things I'd consider revisiting, but nothing too major. I understand this PR has been awaiting my review for a long time, and your focus/drive may have drifted away a bit. Let me know if you'd rather want me to make the changes myself, and land this at last.

Note: this feature is only useful across restarts with a persisted peerstore! We should probably disclaim that in the godocs.

raulk · 2019-10-12T12:56:54Z

dht.go

+	}
+
+	// schedule periodic snapshots
+	sproc := periodicproc.Tick(cfg.Persistence.SnapshotInterval, func(proc goprocess.Process) {


I'd really love to store the routing table before we shut down. However, because we don't control the order in which libp2p components shut down, we might end up storing the routing table after other components that are also running their shutdown logic have disconnected peers. As a result, we'd end up storing a crippled routing table.

On the other hand, I guess a degree of randomness here is good. Otherwise, if an attacker found a way to both poison the table and force a shutdown, they could permanently bork the routing table if the peer saved the poisoned one every time.

I see what you mean. So, in it's current form, this code could persist an empty RT if the snapshotting go-routine fires after all peers have been dropped from the RT, but Dht hasn't been closed yet.

However, this can also prove to be a blessing in disguise because storing an empty RT & then seeding with bootstrap peers after we restart could save us from ALWAYS storing a poisoned RT if an attacker messed up our RT and found a way to immediately shut us down.

So, let's see how the current implementation works in practise & fix it if required ?

dht.go

raulk · 2019-10-12T13:02:25Z

dht_bootstrap.go

+		if err != nil {
+			panic(err)
+		}
+		DefaultBootstrapPeerIDs = append(DefaultBootstrapPeerIDs, info.ID)


DefaultBootstrapPeerIDs should be private and derived from the DefaultBootstrapPeers, once the latter is completely initialized.

dht_test.go

go.mod

persist/seeder.go

raulk · 2019-10-12T13:15:52Z

persist/seeder.go

+		defer cancel()
+
+		// start dialing
+		semaphore := make(chan struct{}, NSimultaneousDial)


This kind of throttling logic already exists in the dialer. Any motivation to put it here too?

@raulk I see. Please can you point me to the relevant code in the dialer ? Also, if this is taken care of by the dialer, please can you explain your comment here:

https://github.com/libp2p/go-libp2p-kad-dht/pull/384/files/00fffba0aa6948e7549752197f63b85f293a66e9#r334237930

What do you mean by:

We should stagger these lookups, instead of launching them all at once.

Dialer reference: Sure, have a look at: https://github.com/libp2p/go-libp2p-swarm/blob/master/limiter.go.

Staggering clarification: I responded in Feature/correct bootstrapping #384 directly.

I had a look at the dial limiter & understand that we queue all TCP dial requests once the file descriptor limit is hit. However, my concern is that we will eat up a huge number of fd's for ourselves if we have too many candidates in the snapshot & do not have some form of rate limiting here. This in turn will slow down dial requests from other parts of the application. Let me know what you think.

raulk · 2019-10-12T13:18:45Z

persist/seeder.go

+	}
+
+	// attempts to dial to a given peer to verify it's available
+	dialFn := func(ctx context.Context, p peer.ID, res chan<- result) {


I'm having second thoughts about putting this dialing logic here. It's probable that all seeders would want to verify they can actually add a peer before doing so. The dialing logic needs to be somewhere common.

I wonder if the seeder should be dumb, poll-driven and not perform any dials. We can move the dialing logic to the DHT, and pass in the current state of the routing table for the seeder to use when making decisions about which candidates to return next.

type Seeder interface { // Next returns a slice of peers to attempt to add next, based on the current // state of the routing table. Returning a nil slice signals that the seeder // is done proposing peers. Next(rt *kbucket.RoutingTable) []peer.ID }

@raulk

I don't see the value in making the seeder poll driven & moving the burden of actually seeding the RT on the caller. Please can you explain why that'd be a good thing ? Also, why should we remove the candidate/fallback peers from the interface ? The set of fallback peers is already configurable which means the caller can pass in peers obtained from any source.

However, I do agree that the dialing logic is pretty generic & can be pulled out. We can put stuff like that in a helpers package that makes it easy for users to construct their own seeder implementations.

In a nutshell, I'm worried about putting dialling functionality in the Seeder. It simply doesn't belong there and it complicates the seeder, which is just supposed to seed, not dial. I see this being a future footgun, as it's essentially leaking an abstraction and turning into "spooky action at a distance". I did have it like this in my WIP work, so I totally understand that you followed suit, but that was mostly a placeholder -- I was supposed to revisit that aspect at some point.

Solutions I can think of:

The Seeder acts like a "proposer", and we call it iteratively until we exhaust the candidates, and the Seeder returns no more proposals. Imagine we have peers [A-E] available.

The Seeder implements Next(rt *RoutingTable, candidates []peer.ID, fallback []peer.ID) (proposed []peer.ID).

On a first iteration, we call Next() with an empty routing table, all candidates ([A-E]), and the fallback peers.

The Seeder returns A and B.

The DHT bootstrap logic verifies that A works, B is undiallable. It adds A to the routing table, and drops B from the candidate set.

We call Next() with the routing table with A, and candidates [C-E] (having dropped B).

This goes on until the candidate set is exhausted, and the dialer returns nil.

Pass in a "validate function" to the seeder, owned by the DHT. This func would perform the dial. The Seeder calls this function and can therefore verify if the peer is diallable without owning the logic.

I still prefer 1. WDYT?

@raulk I implemented 1 to get a feel for what you were saying & it is a good idea indeed. Now that we have decoupled 'proposing' seeders from 'dialling' & 'seeding' the RT, users can simply plug in custom 'proposal' strategies (one strategy could be preferring candidates in ascending order of XOR distance from self for example) & the 'dialling'/'seeding' will just work out of the box. Thanks for this !

2. Rebased PR & fixed conflicts

2. addressed raul's comments 3. decoupled seeding from seed proposing

dht_rt_seeder.go

dht_rt_seeder_test.go

persist/seeds_proposer_test.go

aarshkshah1992 · 2019-12-16T08:27:54Z

@aschmahmann Thanks for the really in-depth review.

The following changes have been made:

The peerstore & the "ignore if already in RT" filtering has been moved to the caller
Managing the target is now the responsibility of the caller & it is a functional option
Batching/polling of the proposer has been replaced with the proposer returning a channel that the caller reads from
All the code organization changes

Open questions:

Why should the proposer return small peers sets ? (here)

@raulk Please can you take a look at the interface changes & the changes in the random seed proposer & let us know what you think ?

dht.go

aarshkshah1992 · 2020-01-08T15:50:32Z

Hey @raulk

Would you be able to take a look at this ?

raulk · 2020-02-23T17:43:14Z

This PR has accumulated a ton of history and it's hard to process/review. I'm going to close it and open a new one preserving the commit history, as a clean checkpoint.

This was referenced Aug 30, 2019

Active bootstrapping #387

Open

Feature/correct bootstrapping #384

Merged

raulk assigned raulk and bigs and unassigned raulk Sep 3, 2019

raulk requested a review from bigs September 3, 2019 16:49

bigs suggested changes Sep 13, 2019

View reviewed changes

aarshkshah1992 force-pushed the feature/dht-persist-seed branch from 65d1a47 to c0e52c7 Compare September 20, 2019 05:12

bigs approved these changes Sep 27, 2019

View reviewed changes

raulk self-requested a review September 28, 2019 01:36

aarshkshah1992 force-pushed the feature/dht-persist-seed branch from c0e52c7 to af79fc8 Compare October 4, 2019 14:38

raulk requested changes Oct 12, 2019

View reviewed changes

raulk mentioned this pull request Oct 12, 2019

WIP: Persisting/seeding a routing table #315

Closed

raulk changed the title ~~Feature/dht persist seed~~ Persisting/seeding a routing table Oct 12, 2019

aarshkshah1992 force-pushed the feature/dht-persist-seed branch 2 times, most recently from fe3ea46 to 561f9d0 Compare October 14, 2019 14:29

aarshkshah1992 added 6 commits December 13, 2019 15:40

tests for seeder

f7cfb4f

changes to seeder test

978f077

use default snapshotter & seeder if none is given

a8a1cf9

end-end test for snapshotting & seeding

1f09fcb

1. Made changes as per review

62b33eb

2. Rebased PR & fixed conflicts

1. rebased PR on master

9965eba

2. addressed raul's comments 3. decoupled seeding from seed proposing

aarshkshah1992 force-pushed the feature/dht-persist-seed branch from 2423486 to c8efcec Compare December 15, 2019 05:41

non-interface changes as per adin's review

bdeb350

aarshkshah1992 force-pushed the feature/dht-persist-seed branch from c8efcec to bdeb350 Compare December 15, 2019 06:21