-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use do-not-send-first-blocks extension for restarts #257
Conversation
Codecov Report
@@ Coverage Diff @@
## master #257 +/- ##
==========================================
- Coverage 68.00% 67.83% -0.18%
==========================================
Files 24 24
Lines 3069 3146 +77
==========================================
+ Hits 2087 2134 +47
- Misses 627 652 +25
- Partials 355 360 +5
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain the rationale for the blocksReceivedIndex? Why not just add a value to the main data transfer state? It's just a number -- the only reason we store received cids list seperately is they are very large
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See previous comment about blocks received index.
Comment about tracking peers outside the go-graphsync extension and tracking peers outside the network impl is NON-BLOCKING.
dataSender peer.ID, | ||
channelID ChannelID, | ||
root ipld.Link, | ||
stor ipld.Node, | ||
doNotSendCids []cid.Cid, | ||
msg Message) error | ||
channel ChannelState, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sidebar +++1 for this change generally. I've never liked the specificity of doNodeSendCids. Technically, we can probably drop a bunch of other parameters if we do this (I believe root/selector/sender/channelID can all be extracted from channel state)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that wouldn't quite work for the code here just yet (you're counting on channel == nil being evidence of a not-restart), but a great thing for future refactor!
} | ||
|
||
// Get the peer's protocol version | ||
protocol, err := t.peerProtocol.Protocol(ctx, p) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I want to point out something that's a potential reason to augment the Graphsync extension as well: in the case of a pull request, you may still not have this cached even for a restart, as the only communication may be over graphsync. I don't know whether this matters a lot, but the idea would be to maintain your protocol list outside the network implementation, so you can update it for an incoming GS request/response as well as well. The only penalty is a new libp2p message I guess, so not the worst thing but still.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see what you mean, that may save us an extra round trip.
If I understand correctly, if we add another extension protocol, then whenever we send a message, we also include an encoded payload for that extension. Is that correct? So for example if we have an extension for go-data-transfer-v1.2, the message will look like:
<message data>
extensions:
fil/data-transfer/1.2: <encoded data>
fil/data-transfer/1.1: <encoded data>
fil/data-transfer: <encoded data>
Maybe easier to explain with a screenshot from the debugger:
:There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes it's an additional copy of the exact same mesage... so arguably this ones a bit a bummer, especially since there's no difference in the message format.... it's literally just the extension name.
BTW we can probably deprecate 1_0 at this point -- hasn't existed since pre-mainnet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also Aarsh pointed out that for a pull request it's the data receiver who sends the extension, so the data receiver is not going to know the protocol of the data sender.
So I think it makes most sense not to use an additional extension for now.
You're right it makes more sense to store it on the state. I've pushed a commit to do that. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dirkmc First set of comments .
@@ -241,8 +241,27 @@ func (c *Channels) DataQueued(chid datatransfer.ChannelID, k cid.Cid, delta uint | |||
} | |||
|
|||
// Returns true if this is the first time the block has been received | |||
func (c *Channels) DataReceived(chid datatransfer.ChannelID, k cid.Cid, delta uint64) (bool, error) { | |||
return c.fireProgressEvent(chid, datatransfer.DataReceived, datatransfer.DataReceivedProgress, k, delta) | |||
func (c *Channels) DataReceived(chid datatransfer.ChannelID, k cid.Cid, delta uint64, index int64) (bool, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, Graphsync will keep track of the index/offset
for each block in the DAG and pass that to the Data Receieved callback ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After a restart -> GS will fire these notifications for blocks we have already received, right ? So, we'll re-update the offset for each block all the way to the last received block and therefore the latest offset, right ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes 👍
// version of data-transfer (supports do-not-send-first-blocks extension) | ||
ProtocolDataTransfer1_2 protocol.ID = "/fil/datatransfer/1.2.0" | ||
|
||
// ProtocolDataTransfer1_2 is the protocol identifier for the version |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// ProtocolDataTransfer1_2 is the protocol identifier for the version | |
// ProtocolDataTransfer1_1 is the protocol identifier for the version |
return nil | ||
} | ||
|
||
func (impl *libp2pDataTransferNetwork) Protocol(ctx context.Context, id peer.ID) (protocol.ID, error) { | ||
// Check the cache for the peer's protocol version |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dirkmc You don't need this cache as libp2p maintains it's own cache. You can use impl.host.Peerstore().FirstSupportedProtocol()
for the cache lookup and then go do the open stream thing it it doesen't have anything for you.
Fixes #256