Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Persisting/seeding a routing table #383

Closed

Conversation

aarshkshah1992
Copy link
Contributor

@aarshkshah1992 aarshkshah1992 commented Aug 12, 2019

Based on PR #315 by @raulk

Please refer to issues:
#254
#295

@aarshkshah1992
Copy link
Contributor Author

aarshkshah1992 commented Aug 12, 2019

@raulk Please take a look.

Changes

  1. Refactored the randomSeeder
  2. Added tests for the randomSeeder & the snapshot Implementation
  3. Set meaningful defaults, try to seed dht & init periodic snapshotting when dht is created
  4. End-End dht test for periodic snapshotting & seeding from a previous snapshot

TODO

  • Resolve the conflicts :)

One question for you:
The DefaultBootstrap peers we use for fallback haven't been added explicitly to the peerstore
anywhere in the DHT code. Usages of DefaultBootstrap peers in other packages also assumes that they are somehow already in the peerstore. Is this is a valid assumption ? If yes, when do these get added to the peerstore ?

Note: A side effect of successfully dialling to another peer is that the Dht network notification handler adds it to the routing table. We have an issue open to separate our routing table from our connections/disconnections (#283). But, till that is done, we will end up adding the same peer twice to the routing table in the seeder which will make the routing table falsely assume that is a 'good'/'active' peer. I think we can live it for now (?)

This was referenced Aug 30, 2019
@aarshkshah1992
Copy link
Contributor Author

aarshkshah1992 commented Aug 31, 2019

@raulk

We will populate the RT using the seeder whenever RT becomes empty as part of this. #384 now depends on this PR.

@raulk
Copy link
Member

raulk commented Sep 3, 2019

Hey @aarshkshah1992, I've spoken to @bigs and he's gonna pick this up, as I'm currently tackling a deadline and I don't want to leave this out to dry. Thanks for your amazing work so far. Expect to hear from @bigs shortly.

@raulk raulk assigned raulk and bigs and unassigned raulk Sep 3, 2019
@raulk raulk requested a review from bigs September 3, 2019 16:49
@aarshkshah1992
Copy link
Contributor Author

@bigs Hey, this PR is ripe for review. Please take a look when you can. Thanks :)

@bigs
Copy link
Contributor

bigs commented Sep 5, 2019 via email

aarshkshah1992 added a commit to aarshkshah1992/go-libp2p-kad-dht that referenced this pull request Sep 5, 2019
1) on connecting to a new peer  -> trigger self & bucket bootstrap if RT size goes below thereshold
2) accept formatting & doc suggestions in the review
3) remove RT recovery code for now -> will address in a separate PR once libp2p#383 goes in

changes as per review
@aarshkshah1992
Copy link
Contributor Author

Hey @bigs, reminder to review :)

@raulk
Copy link
Member

raulk commented Sep 13, 2019

@bigs how can I help move this forward?

Copy link
Contributor

@bigs bigs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking really good

dht.go Outdated Show resolved Hide resolved
dht_bootstrap.go Outdated Show resolved Hide resolved
pb/dht.proto Outdated Show resolved Hide resolved
dht_test.go Show resolved Hide resolved
dht_bootstrap.go Outdated Show resolved Hide resolved
dht_test.go Show resolved Hide resolved
persist/seeder.go Outdated Show resolved Hide resolved
persist/seeder.go Outdated Show resolved Hide resolved
persist/seeder.go Outdated Show resolved Hide resolved
@aarshkshah1992
Copy link
Contributor Author

@bigs I have fixed the conflicts & addressed your comments. Thanks for the review ! :) Let me know if we need more changes before merging this.

@aarshkshah1992
Copy link
Contributor Author

Hey @bigs, please take a look when you can. This is very close to getting merged.

Copy link
Contributor

@bigs bigs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks so much for the hard work here. looking great.

@raulk
Copy link
Member

raulk commented Sep 28, 2019

Thanks for all the hard work here! Let me take a look before we merge.

@raulk raulk self-requested a review September 28, 2019 01:36
@aarshkshah1992
Copy link
Contributor Author

Thanks a lot @bigs ! Really appreciate it :)

Stebalien pushed a commit to aarshkshah1992/go-libp2p-kad-dht that referenced this pull request Oct 11, 2019
1) on connecting to a new peer  -> trigger self & bucket bootstrap if RT size goes below thereshold
2) accept formatting & doc suggestions in the review
3) remove RT recovery code for now -> will address in a separate PR once libp2p#383 goes in

changes as per review
@aarshkshah1992
Copy link
Contributor Author

@raulk

Please take a look when you get time. Since #384 is now merged, getting this in would enable us to finish #387 which would really go a long way in helping Dht recover from an empty RT/sparse K-Buckets.

"All you need is love, along with a snapshotted & seeded Dht"
-John Lennon

Copy link
Member

@raulk raulk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, great work pushing this forward. Thank you, @aarshkshah1992! Just a few things I'd consider revisiting, but nothing too major. I understand this PR has been awaiting my review for a long time, and your focus/drive may have drifted away a bit. Let me know if you'd rather want me to make the changes myself, and land this at last.

Note: this feature is only useful across restarts with a persisted peerstore! We should probably disclaim that in the godocs.

}

// schedule periodic snapshots
sproc := periodicproc.Tick(cfg.Persistence.SnapshotInterval, func(proc goprocess.Process) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd really love to store the routing table before we shut down. However, because we don't control the order in which libp2p components shut down, we might end up storing the routing table after other components that are also running their shutdown logic have disconnected peers. As a result, we'd end up storing a crippled routing table.

On the other hand, I guess a degree of randomness here is good. Otherwise, if an attacker found a way to both poison the table and force a shutdown, they could permanently bork the routing table if the peer saved the poisoned one every time.

Copy link
Contributor Author

@aarshkshah1992 aarshkshah1992 Oct 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you mean. So, in it's current form, this code could persist an empty RT if the snapshotting go-routine fires after all peers have been dropped from the RT, but Dht hasn't been closed yet.

However, this can also prove to be a blessing in disguise because storing an empty RT & then seeding with bootstrap peers after we restart could save us from ALWAYS storing a poisoned RT if an attacker messed up our RT and found a way to immediately shut us down.

So, let's see how the current implementation works in practise & fix it if required ?

dht.go Outdated Show resolved Hide resolved
dht_bootstrap.go Outdated
if err != nil {
panic(err)
}
DefaultBootstrapPeerIDs = append(DefaultBootstrapPeerIDs, info.ID)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DefaultBootstrapPeerIDs should be private and derived from the DefaultBootstrapPeers, once the latter is completely initialized.

dht_test.go Show resolved Hide resolved
go.mod Outdated Show resolved Hide resolved
persist/seeder.go Outdated Show resolved Hide resolved
persist/seeder.go Outdated Show resolved Hide resolved
persist/seeder.go Outdated Show resolved Hide resolved
defer cancel()

// start dialing
semaphore := make(chan struct{}, NSimultaneousDial)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This kind of throttling logic already exists in the dialer. Any motivation to put it here too?

Copy link
Contributor Author

@aarshkshah1992 aarshkshah1992 Oct 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@raulk I see. Please can you point me to the relevant code in the dialer ? Also, if this is taken care of by the dialer, please can you explain your comment here:

https://github.com/libp2p/go-libp2p-kad-dht/pull/384/files/00fffba0aa6948e7549752197f63b85f293a66e9#r334237930

What do you mean by:

We should stagger these lookups, instead of launching them all at once. 

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

@aarshkshah1992 aarshkshah1992 Oct 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a look at the dial limiter & understand that we queue all TCP dial requests once the file descriptor limit is hit. However, my concern is that we will eat up a huge number of fd's for ourselves if we have too many candidates in the snapshot & do not have some form of rate limiting here. This in turn will slow down dial requests from other parts of the application. Let me know what you think.

}

// attempts to dial to a given peer to verify it's available
dialFn := func(ctx context.Context, p peer.ID, res chan<- result) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm having second thoughts about putting this dialing logic here. It's probable that all seeders would want to verify they can actually add a peer before doing so. The dialing logic needs to be somewhere common.

I wonder if the seeder should be dumb, poll-driven and not perform any dials. We can move the dialing logic to the DHT, and pass in the current state of the routing table for the seeder to use when making decisions about which candidates to return next.

type Seeder interface {
   // Next returns a slice of peers to attempt to add next, based on the current
   // state of the routing table. Returning a nil slice signals that the seeder
   // is done proposing peers.
   Next(rt *kbucket.RoutingTable) []peer.ID
}

Copy link
Contributor Author

@aarshkshah1992 aarshkshah1992 Oct 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@raulk

I don't see the value in making the seeder poll driven & moving the burden of actually seeding the RT on the caller. Please can you explain why that'd be a good thing ? Also, why should we remove the candidate/fallback peers from the interface ? The set of fallback peers is already configurable which means the caller can pass in peers obtained from any source.

However, I do agree that the dialing logic is pretty generic & can be pulled out. We can put stuff like that in a helpers package that makes it easy for users to construct their own seeder implementations.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a nutshell, I'm worried about putting dialling functionality in the Seeder. It simply doesn't belong there and it complicates the seeder, which is just supposed to seed, not dial. I see this being a future footgun, as it's essentially leaking an abstraction and turning into "spooky action at a distance". I did have it like this in my WIP work, so I totally understand that you followed suit, but that was mostly a placeholder -- I was supposed to revisit that aspect at some point.

Solutions I can think of:

  1. The Seeder acts like a "proposer", and we call it iteratively until we exhaust the candidates, and the Seeder returns no more proposals. Imagine we have peers [A-E] available.
    • The Seeder implements Next(rt *RoutingTable, candidates []peer.ID, fallback []peer.ID) (proposed []peer.ID).
    • On a first iteration, we call Next() with an empty routing table, all candidates ([A-E]), and the fallback peers.
    • The Seeder returns A and B.
    • The DHT bootstrap logic verifies that A works, B is undiallable. It adds A to the routing table, and drops B from the candidate set.
    • We call Next() with the routing table with A, and candidates [C-E] (having dropped B).
    • This goes on until the candidate set is exhausted, and the dialer returns nil.
  2. Pass in a "validate function" to the seeder, owned by the DHT. This func would perform the dial. The Seeder calls this function and can therefore verify if the peer is diallable without owning the logic.

I still prefer 1. WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@raulk I implemented 1 to get a feel for what you were saying & it is a good idea indeed. Now that we have decoupled 'proposing' seeders from 'dialling' & 'seeding' the RT, users can simply plug in custom 'proposal' strategies (one strategy could be preferring candidates in ascending order of XOR distance from self for example) & the 'dialling'/'seeding' will just work out of the box. Thanks for this !

@raulk raulk changed the title Feature/dht persist seed Persisting/seeding a routing table Oct 12, 2019
@aarshkshah1992 aarshkshah1992 force-pushed the feature/dht-persist-seed branch 2 times, most recently from fe3ea46 to 561f9d0 Compare October 14, 2019 14:29
dht_rt_seeder.go Outdated Show resolved Hide resolved
dht_rt_seeder.go Outdated Show resolved Hide resolved
dht_rt_seeder.go Outdated Show resolved Hide resolved
dht_rt_seeder_test.go Outdated Show resolved Hide resolved
dht_rt_seeder_test.go Outdated Show resolved Hide resolved
dht_rt_seeder_test.go Outdated Show resolved Hide resolved
@aarshkshah1992
Copy link
Contributor Author

aarshkshah1992 commented Dec 16, 2019

@aschmahmann Thanks for the really in-depth review.

The following changes have been made:

  1. The peerstore & the "ignore if already in RT" filtering has been moved to the caller
  2. Managing the target is now the responsibility of the caller & it is a functional option
  3. Batching/polling of the proposer has been replaced with the proposer returning a channel that the caller reads from
  4. All the code organization changes

Open questions:

  1. Why should the proposer return small peers sets ? (here)

@raulk Please can you take a look at the interface changes & the changes in the random seed proposer & let us know what you think ?

dht.go Outdated Show resolved Hide resolved
dht.go Outdated Show resolved Hide resolved
dht.go Show resolved Hide resolved
@aarshkshah1992
Copy link
Contributor Author

Hey @raulk

Would you be able to take a look at this ?

@bigs bigs removed their assignment Jan 29, 2020
@raulk
Copy link
Member

raulk commented Feb 23, 2020

This PR has accumulated a ton of history and it's hard to process/review. I'm going to close it and open a new one preserving the commit history, as a clean checkpoint.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants