-
Notifications
You must be signed in to change notification settings - Fork 226
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Persisting/seeding a routing table #383
Persisting/seeding a routing table #383
Conversation
@raulk Please take a look. Changes
TODO
One question for you: Note: A side effect of successfully dialling to another peer is that the Dht network notification handler adds it to the routing table. We have an issue open to separate our routing table from our connections/disconnections (#283). But, till that is done, we will end up adding the same peer twice to the routing table in the seeder which will make the routing table falsely assume that is a 'good'/'active' peer. I think we can live it for now (?) |
Hey @aarshkshah1992, I've spoken to @bigs and he's gonna pick this up, as I'm currently tackling a deadline and I don't want to leave this out to dry. Thanks for your amazing work so far. Expect to hear from @bigs shortly. |
@bigs Hey, this PR is ripe for review. Please take a look when you can. Thanks :) |
will do in a few 👌🏻
…On Wed, Sep 4, 2019 at 22:07 Aarsh Shah ***@***.***> wrote:
@bigs <https://github.com/bigs> Hey, this PR is ripe for review. Please
take a look when you can. Thanks :)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#383?email_source=notifications&email_token=AABUCWQWTE7ULPL5APC7RALQIBSX5A5CNFSM4IK5YVWKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD55SPMY#issuecomment-528164787>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AABUCWTA4GHSRXAPMSNL6A3QIBSX5ANCNFSM4IK5YVWA>
.
|
1) on connecting to a new peer -> trigger self & bucket bootstrap if RT size goes below thereshold 2) accept formatting & doc suggestions in the review 3) remove RT recovery code for now -> will address in a separate PR once libp2p#383 goes in changes as per review
Hey @bigs, reminder to review :) |
@bigs how can I help move this forward? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looking really good
65d1a47
to
c0e52c7
Compare
@bigs I have fixed the conflicts & addressed your comments. Thanks for the review ! :) Let me know if we need more changes before merging this. |
Hey @bigs, please take a look when you can. This is very close to getting merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks so much for the hard work here. looking great.
Thanks for all the hard work here! Let me take a look before we merge. |
Thanks a lot @bigs ! Really appreciate it :) |
c0e52c7
to
af79fc8
Compare
1) on connecting to a new peer -> trigger self & bucket bootstrap if RT size goes below thereshold 2) accept formatting & doc suggestions in the review 3) remove RT recovery code for now -> will address in a separate PR once libp2p#383 goes in changes as per review
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, great work pushing this forward. Thank you, @aarshkshah1992! Just a few things I'd consider revisiting, but nothing too major. I understand this PR has been awaiting my review for a long time, and your focus/drive may have drifted away a bit. Let me know if you'd rather want me to make the changes myself, and land this at last.
Note: this feature is only useful across restarts with a persisted peerstore! We should probably disclaim that in the godocs.
} | ||
|
||
// schedule periodic snapshots | ||
sproc := periodicproc.Tick(cfg.Persistence.SnapshotInterval, func(proc goprocess.Process) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd really love to store the routing table before we shut down. However, because we don't control the order in which libp2p components shut down, we might end up storing the routing table after other components that are also running their shutdown logic have disconnected peers. As a result, we'd end up storing a crippled routing table.
On the other hand, I guess a degree of randomness here is good. Otherwise, if an attacker found a way to both poison the table and force a shutdown, they could permanently bork the routing table if the peer saved the poisoned one every time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see what you mean. So, in it's current form, this code could persist an empty RT if the snapshotting go-routine fires after all peers have been dropped from the RT, but Dht hasn't been closed yet.
However, this can also prove to be a blessing in disguise because storing an empty RT & then seeding with bootstrap peers after we restart could save us from ALWAYS storing a poisoned RT if an attacker messed up our RT and found a way to immediately shut us down.
So, let's see how the current implementation works in practise & fix it if required ?
dht_bootstrap.go
Outdated
if err != nil { | ||
panic(err) | ||
} | ||
DefaultBootstrapPeerIDs = append(DefaultBootstrapPeerIDs, info.ID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DefaultBootstrapPeerIDs
should be private and derived from the DefaultBootstrapPeers
, once the latter is completely initialized.
persist/seeder.go
Outdated
defer cancel() | ||
|
||
// start dialing | ||
semaphore := make(chan struct{}, NSimultaneousDial) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This kind of throttling logic already exists in the dialer. Any motivation to put it here too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@raulk I see. Please can you point me to the relevant code in the dialer ? Also, if this is taken care of by the dialer, please can you explain your comment here:
What do you mean by:
We should stagger these lookups, instead of launching them all at once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Dialer reference: Sure, have a look at: https://github.com/libp2p/go-libp2p-swarm/blob/master/limiter.go.
- Staggering clarification: I responded in Feature/correct bootstrapping #384 directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a look at the dial limiter & understand that we queue all TCP dial requests once the file descriptor limit is hit. However, my concern is that we will eat up a huge number of fd's for ourselves if we have too many candidates in the snapshot & do not have some form of rate limiting here. This in turn will slow down dial requests from other parts of the application. Let me know what you think.
persist/seeder.go
Outdated
} | ||
|
||
// attempts to dial to a given peer to verify it's available | ||
dialFn := func(ctx context.Context, p peer.ID, res chan<- result) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm having second thoughts about putting this dialing logic here. It's probable that all seeders would want to verify they can actually add a peer before doing so. The dialing logic needs to be somewhere common.
I wonder if the seeder should be dumb, poll-driven and not perform any dials. We can move the dialing logic to the DHT, and pass in the current state of the routing table for the seeder to use when making decisions about which candidates to return next.
type Seeder interface {
// Next returns a slice of peers to attempt to add next, based on the current
// state of the routing table. Returning a nil slice signals that the seeder
// is done proposing peers.
Next(rt *kbucket.RoutingTable) []peer.ID
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see the value in making the seeder poll driven & moving the burden of actually seeding the RT on the caller. Please can you explain why that'd be a good thing ? Also, why should we remove the candidate/fallback peers from the interface ? The set of fallback peers is already configurable which means the caller can pass in peers obtained from any source.
However, I do agree that the dialing logic is pretty generic & can be pulled out. We can put stuff like that in a helpers package that makes it easy for users to construct their own seeder implementations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In a nutshell, I'm worried about putting dialling functionality in the Seeder. It simply doesn't belong there and it complicates the seeder, which is just supposed to seed, not dial. I see this being a future footgun, as it's essentially leaking an abstraction and turning into "spooky action at a distance". I did have it like this in my WIP work, so I totally understand that you followed suit, but that was mostly a placeholder -- I was supposed to revisit that aspect at some point.
Solutions I can think of:
- The Seeder acts like a "proposer", and we call it iteratively until we exhaust the candidates, and the Seeder returns no more proposals. Imagine we have peers [A-E] available.
- The Seeder implements
Next(rt *RoutingTable, candidates []peer.ID, fallback []peer.ID) (proposed []peer.ID)
. - On a first iteration, we call
Next()
with an empty routing table, all candidates ([A-E]), and the fallback peers. - The Seeder returns A and B.
- The DHT bootstrap logic verifies that A works, B is undiallable. It adds A to the routing table, and drops B from the candidate set.
- We call
Next()
with the routing table with A, and candidates [C-E] (having dropped B). - This goes on until the candidate set is exhausted, and the dialer returns nil.
- The Seeder implements
- Pass in a "validate function" to the seeder, owned by the DHT. This func would perform the dial. The Seeder calls this function and can therefore verify if the peer is diallable without owning the logic.
I still prefer 1. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@raulk I implemented 1 to get a feel for what you were saying & it is a good idea indeed. Now that we have decoupled 'proposing' seeders from 'dialling' & 'seeding' the RT, users can simply plug in custom 'proposal' strategies (one strategy could be preferring candidates in ascending order of XOR distance from self for example) & the 'dialling'/'seeding' will just work out of the box. Thanks for this !
fe3ea46
to
561f9d0
Compare
2. Rebased PR & fixed conflicts
2. addressed raul's comments 3. decoupled seeding from seed proposing
2423486
to
c8efcec
Compare
c8efcec
to
bdeb350
Compare
50678b4
to
a6cf077
Compare
@aschmahmann Thanks for the really in-depth review. The following changes have been made:
Open questions:
@raulk Please can you take a look at the interface changes & the changes in the random seed proposer & let us know what you think ? |
Hey @raulk Would you be able to take a look at this ? |
This PR has accumulated a ton of history and it's hard to process/review. I'm going to close it and open a new one preserving the commit history, as a clean checkpoint. |
Based on PR #315 by @raulk
Please refer to issues:
#254
#295