Extract dagservice and friends from go-ipfs #8

Stebalien · 2017-06-23T03:25:00Z

This is a WIP/RFC attempt at extracting DAGService from go-ipfs. Please see the TODOs in the source files as there's a lot to discuss here.

However, the first question is: Is this even the right place to put this? I could create an go-ipfs-dag repo.

Stebalien · 2017-06-23T03:34:56Z

Context: It turns out I will need this to get PubSub+IPLD working. IPLD without the DAG is kind of useless.

@whyrusleeping it appears I can't assign reviewers on this repo so I'm going to informally throw this on your plate 🏋️‍♂️.

@kevina You also have a fair amount of experience with this subsystem.

Kubuxu · 2017-06-23T10:56:45Z

merkledag.go

+//
+// Why *not* do this? We might decide to store a light-weight DAG of links
+// without actually storing the data. I don't really find that to be a
+// convincing argument.


Plan with this interface is to have link cache/graph DB at some time in the future, fetching object from disk to parse them to just see the links adds not that minor overhead.

Ah. I assume this interface needs to be updated for IPLD? Non-IPFS IPLD nodes don't fill out the Name and Size fields of the Link structure (as a matter of fact, they can't accurately fill out the Size field).

I assume such an interface would want to return an array of:

Link { Path []string Cid *cid.Cid }

(for pathing through "light-weigh" objects)

The linkservice interface is useful for optimizing graph traversals. Large ipld graphs might cache links for a given node to make resolving down a path much much faster.

I agree that there might be a cleaner way, but your proposal above includes getting a node, and then calling getlinks on it. the problem there is that you have to spend time and memory unmarshaling the bytes from disk into the in memory object and then calling another function on it that returns the link array.

Large ipld graphs might cache links for a given node to make resolving down a path much much faster.

Currently (I hope), this is just used for pinning/GC. The only Node type that fills in the Name field of Link structs is PBNode so this can't be used for resolving paths. (but it's obviously still useful or GC/pinning).

An alternative is to use upcasting/specialization (or whatever go calls it) to simplify DAGService implementations:

// NodeGetters can optionally implement this interface to make // finding linked objects faster. type GetLinks interface { NodeGetter GetLinks(ctx context.Context, nd *cid.Cid) ([]Link, error) } func GetLinks(ctx context.Context, ng NodeGetter, nd *cid.Cid) ([]Link, error) { if c.Type() == cid.Raw { return nil, nil } if gl, ok := ng.(GetLinks); ok { return gl.GetLinks(ctx, nd) } node, err := ng.Get(ctx, c) if err != nil { return nil, err } return node.Links(), nil }

However, I don't know how prevalent this pattern is in the go-ipfs codebase and whether or not it's worth introducing.

Yeah, using the upcasting/specialization pattern here sounds good to me. I also believe youre right about this only being used for GC and pinning

whyrusleeping · 2017-06-26T01:54:40Z

promise.go

+// 1. NodeGetter is a naming conflict.
+// 2. It's a promise...
+
+// TODO: Should this even be an interface? It seems like a simple struct would


This could probably be a simple struct as opposed to an interface. I'm not sure i can see a scenario where we would want to implement this differently

What are the thread-safety guarantees here? It looks like single-writer/single-reader. Is that the case?

whyrusleeping · 2017-06-26T01:58:28Z

merkledag.go

+	Remove(Node) error
+
+	// TODO: This is returning them in-order?? Why not just use []NodePromise?
+	// Maybe add a couple of helpers for getting them in-order and as-available?


This is returned as a channel to make its usage in caller contexts much simpler. As a channel, we can just call range over it to do a 'for each node' type thing. Using an array of things, we would have to range over the array, then call 'get' on each inside the loop, which is slightly more complicated. Maybe we can change this in the future, but for now i think its fine

Got it, it's just a bit inconsistent. I'll leave a shorter todo.

whyrusleeping · 2017-06-26T02:00:37Z

merkledag.go

+
+// An interface for batch-adding nodes to a DAG.
+
+// TODO: Is this really the *right* level to do this at?


The reason for this abstraction is to avoid forcing callers to think about batching. This batch can determine the right time to flush on its own for optimal memory vs batch size tradeoffs, if done differently, each caller would have to reimplement similar logic, and that increases the chance of them doing something incorrectly

I was thinking of defining some form of func Batching(dag DAGService) Batch function where Batch is a concrete helper type that calls AddMany as needed.

Pro: You don't need to implement a custom batching strategy per DAGService.
Con: You can't implement a custom batching strategy for individual DAGServices.

I think that sounds good. Removing Batch as a method on the dagservice seems fine to me.

Thoughts @Kubuxu ?

Stebalien · 2017-06-27T01:49:26Z

I've addressed the comments and done a little cleanup.

To discuss

NodePromise thread safety guarantees.
LinkGetter versus LinkService.

kevina · 2017-06-27T04:00:38Z

Is there a p.r in go-ipfs for using this new interface?

There are several things that concern me in particular GetOfflineLinkService is there for a reason (of which I forget) but if things can be made to work without it than I don't object.

I would not approve of this p.r. until there is a corresponding p.r. in go-ipfs that passes all the tests.

I don't have a strong opinion either way for LinkGetter vs LinkService as long as the new interface works with go-ipfs.

whyrusleeping · 2017-06-27T04:46:43Z

@kevina the suggestion for replacement is here: ipfs/kubo#4009

Stebalien · 2017-06-27T05:44:52Z

@kevina, @whyrusleeping

the suggestion for replacement is here: ipfs/kubo#4009

Actually, for now, I've added an OfflineNodeGetter function on the NodeGetter interface that should serve the same purpose. Instead of passing a LinkService to such systems, one would pass a NodeGetter and call ng.OfflineNodeGetter() as needed. I'd like to remove this function in favor of the context based system described in that issue in the future but that will take a some design and quite a bit of refactoring.

Is there a p.r in go-ipfs for using this new interface?

I'd rather not spend the time refactoring IPFS until we've agreed on these interfaces.

I would not approve of this p.r. until there is a corresponding p.r. in go-ipfs that passes all the tests.

This PR (intentionally) doesn't actually touch any code that IPFS currently uses. The idea is to extract some IPLD interfaces/code from the go-ipfs repo into this repo and then later update the go-ipfs repo to depend on them (removing the duplicate code from the go-ipfs repo in the process).

kevina · 2017-06-27T17:33:55Z

I'd rather not spend the time refactoring IPFS until we've agreed on these interfaces.

I will have a hard time reviewing the interface without seeing it in use. There are minor details that could be missed.

I could probably do the actual refactoring in a few hours. The time consuming part for me would be the gx-publish part.

@whyrusleeping thoughts

Stebalien · 2017-06-27T18:18:41Z

I'll take that as "this looks fine but my review can't progress without seeing this in use". I'll go ahead and implement that PR, I just didn't want to spend time on that without getting any feedback here (as it is, that would have been a waste as the code here has changed over the course of this PR).

Stebalien · 2017-06-27T19:49:32Z

~~Actually, I should probably wait for blocks: move block format to it's own repo kubo#3990.~~
~~Also, wait on Use block decoding framework kubo#4040 (integrate the block decoding framework into IPFS).~~
Use block decoding framework kubo#4040 was replaced by by Change IPFS to use the new pluggable Block to IPLD decoding framework. kubo#4060
Waiting on review (but not merge) of [WIP] Extract DAGService and friends kubo#4063

**DO NOT MERGE!** Refactor IPFS to use: ipfs/go-ipld-format#8 Context: IPLD isn't currently very usable from libp2p because interfaces like the DAGService are bundled into go-ipfs. This PR extracts these interfaces and some of the related helper functions. License: MIT Signed-off-by: Steven Allen <steven@stebalien.com>

Kubuxu · 2017-07-12T21:16:10Z

merkledag.go

+type LinkGetter interface {
+	NodeGetter
+	// TODO(ipfs/go-ipld-format#9): This should return []*cid.Cid
+	GetLinks(ctx context.Context, nd *cid.Cid) ([]*Link, error)


One also important reason this interface exists is that raw nodes can't have links so we don't have to fetch them to enumerate their links. This optimization was placed in LinkService.

but I see you placed that short-circuit in helper bellow.

Yes; the idea is to avoid requiring that all DAGServices implement this method (even it they don't cache links).

Stebalien · 2017-08-17T22:35:38Z

Plan: #23

kevina · 2017-09-01T22:10:50Z

@Stebalien the build is now failing, it looks like you may of forgot to commit something.

Stebalien · 2017-09-01T22:19:19Z

Sorry, I'm concurrently working on the next PR (refactoring the Node API).

whyrusleeping

LGTM, Is there anything else that needs to be done here?

Stebalien · 2017-10-04T11:30:46Z

I have a second PR that refactors the node API but that's blocked on finalizing the IPLD spec.

whyrusleeping · 2017-10-04T11:54:00Z

link to PR? and why is it blocked on the spec? what are the open questions? (and where can we answer them)

Stebalien · 2017-10-04T13:59:01Z

I haven't push it yet. I was waiting on answering the questions: Does IPLD have primitive types? If so, what? (#25).

(This is that thing I keep on trying to schedule a discussion about).

whyrusleeping · 2017-10-04T19:55:39Z

My answer was no, it doesn't have canonical types

…

On Wed, Oct 4, 2017, 4:59 PM Steven Allen ***@***.***> wrote: I haven't push it yet. I was waiting on answering the questions: Does IPLD have primitive types? If so, what? (#25 <#25>). (This is that thing I keep on trying to schedule a discussion about). — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#8 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABL4HJKNMz2EB56wtECAmq9a5njVtMOOks5so48ngaJpZM4ODF3C> .

(ipfs/kubo#4296) 1. Modern storage devices (i.e., SSDs) tend to be highly parallel. 2. Allows us to read and write at the same time (avoids pausing while flushing). fixes ipfs/kubo#898 (comment)

Stebalien · 2017-10-16T01:38:54Z

@whyrusleeping You can always add that to the copy of DAGService in go-ipfs if you want to get that integration done faster.

However, we should probably just move forward with this anyways (and leave fixing the node API to a future PR).

I have a few more questions open for discussion but we can punt on those if you want.

Stebalien · 2017-10-16T01:39:26Z

batch.go

@@ -0,0 +1,113 @@
+package format


This is a lot of code for an interface package. Unfortunately, the interface needs this (unless we decide to introduce a Batch interface.

Stebalien · 2017-10-16T01:40:27Z

merkledag.go

+	NodeGetter
+
+	// Add adds a node to this DAG.
+	Add(Node) (*cid.Cid, error)


Does this really need to return the Cid? The caller can also just call node.Cid().

Stebalien · 2017-10-16T01:41:17Z

merkledag.go

+	//
+	// Consider using NewBatch instead of calling this directly if you need
+	// to add an unbounded number of nodes to avoid buffering too much.
+	AddMany([]Node) ([]*cid.Cid, error)


Same here. Worse, this will allocate something that we're likely to ignore (that's why I bothered bringing it up).

Probably fine to not have it return cids, as you say, though there might be cases where we need an array of cids directly after calling AddMany, and that will kindof suck.

IMO, we can just create a Cids([]Node) []*cid.Cid function for that.

Yeah, that SGTM

I agree also.

Stebalien · 2017-10-16T01:41:53Z

merkledag.go

+	// Remove removes a node from this DAG.
+	//
+	// If the node is not in this DAG, Remove returns ErrNotFound.
+	Remove(*cid.Cid) error


Do we also want RemoveMany(...) (for, e.g., dag-level GC)?

We can probably punt on this

Stebalien · 2017-10-16T01:42:49Z

merkledag.go

+}
+
+// DAGService is an IPFS Merkle DAG service.
+type DAGService interface {


Should all of these methods take contexts? Currently, they don't because they never hit the network. However, I assume we'll eventually want remote DAGs.

We might want to add it now.

Stebalien · 2017-10-17T18:00:28Z

Actually, I'm just going to punt on all of those questions and merge this if I don't get any objections within the next few hours. 🚀

Kubuxu · 2017-10-17T18:09:04Z

I wanted to give it a review tomorrow morning.

Stebalien · 2017-10-17T18:46:39Z

@Kubuxu got it. I'll hold off. FYI, I've decided to punt on fixing the Node API till later so we can get stop blocking this on "one more question".

Kubuxu

One comment, and also I would love to see those interface changes implemented.

Kubuxu · 2017-10-18T13:15:04Z

format.go

+// Node is the base interface all IPLD nodes must implement.
+//
+// Nodes are **Immutable** and all methods defined on the interface are
+// **Thread Safe**.


If Nodes are immutable, what is the reason for Copy method?

None. I was planning on fixing that in this PR but have punted on fixing Nodes till a future one.

Kubuxu · 2017-10-18T13:23:22Z

merkledag.go

+}
+
+// DAGService is an IPFS Merkle DAG service.
+type DAGService interface {


We might want to add it now.

Kubuxu · 2017-10-18T13:24:45Z

merkledag.go

+	NodeGetter
+
+	// Add adds a node to this DAG.
+	Add(Node) (*cid.Cid, error)


Kubuxu · 2017-10-18T13:24:58Z

merkledag.go

+	// Remove removes a node from this DAG.
+	//
+	// If the node is not in this DAG, Remove returns ErrNotFound.
+	Remove(*cid.Cid) error


Kubuxu · 2017-10-18T13:25:08Z

merkledag.go

+	//
+	// Consider using NewBatch instead of calling this directly if you need
+	// to add an unbounded number of nodes to avoid buffering too much.
+	AddMany([]Node) ([]*cid.Cid, error)


I agree also.

1. Add is already idempotent. This makes remove match. 2. Generally, all we care about is that the node no longer exists in the DAG. If two callers remove a node at the same time, they should both succeed. Currently, we *ignore* the result of Remove in go-ipfs. IMO, it would be better to let errors be *errors* and only return an error if something goes wrong. 3. This can be significantly faster. It allows us to batch/queue removes (as long as we guarantee that they'll eventually happen). 4. This matches how most databases/key-value stores operate. An alternative would be to return `(deleted bool, err error)` but then we don't get the speedup.

The caller can just call `node.Cid()` and returning CIDs from `AddMany` requires allocation.

We'll need these for slower/remote datastores.

I considered (well, implemented then threw it away) allowing contexts on all calls to Batch (Add, Commit, etc). However, really, you should treat a batch as a single large "operation". I also went down the road of generalizing batches to sessions. However, it became immediately obvious that permitting add *and* remove *and* fetch would require a lot of bookkeeping and that you'd lose a lot of performance. So, we'll do that separately.

Stebalien · 2017-11-16T20:13:49Z

@Kubuxu I've decided to just go ahead and address all your comments/the todos now.

Added contexts.
No longer returning CIDs from Add.
Added RemoveMany.

Also, improved Batch by adding context support (we'll need that to use batch from the API).

However, this is starting to become an ever growing PR so I should probably stop now... Final review?

whyrusleeping · 2017-11-20T05:26:55Z

If we're on the breaking interfaces train already, i'd love to get a Load(ctx, cid, &thing) method on the dagservice... Its better for memory usage and it makes dealing with cbor things much easier.

Also fine having that be in a follow-up too though

Stebalien · 2017-11-20T05:47:48Z

@whyrusleeping we could also just lazily deserialize nodes. Are you worried about the intermediate buffer?

Stebalien · 2017-12-08T23:26:42Z

@Kubuxu IIRC, you wanted to do a final review.

kevina · 2017-12-09T01:51:43Z

This p.r. sorta interferes with my effort to create better error messages, which I believe will require some API changes to make it happen. I would like to work though ipfs/kubo#4468 first as it is easier to modify things when they are all in the same package. But if this is blocking something I don't want to hold anything in,espacally considering I just created the p.r. yesterday.

Stebalien · 2017-12-11T18:13:23Z

@kevina I'd like to push this through ASAP. @whyrusleeping wants some of these changes and this has been sitting here for months.

Kubuxu reviewed Jun 23, 2017

View reviewed changes

Stebalien mentioned this pull request Jun 23, 2017

API Todos #5

Open

whyrusleeping reviewed Jun 26, 2017

View reviewed changes

Stebalien mentioned this pull request Jun 27, 2017

Change DAGService/BlockService's Remove methods to take CIDs ipfs/kubo#4010

Closed

1 task

Stebalien force-pushed the dag-service branch from 54e5fc8 to 58a6fe6 Compare July 12, 2017 05:59

Stebalien mentioned this pull request Jul 12, 2017

[WIP] Extract DAGService and friends ipfs/kubo#4063

Closed

Kubuxu reviewed Jul 12, 2017

View reviewed changes

Stebalien force-pushed the dag-service branch from e55bf8e to 617ad76 Compare August 17, 2017 20:57

Stebalien mentioned this pull request Aug 17, 2017

API Overhaul #23

Open

13 tasks

kevina mentioned this pull request Aug 19, 2017

Eliminate GetOfflineLinkService method from LinkService interface ipfs/kubo#4152

Closed

Stebalien force-pushed the dag-service branch from eaf2cad to 5ed5f98 Compare September 1, 2017 22:16

whyrusleeping approved these changes Oct 4, 2017

View reviewed changes

Stebalien changed the title ~~[WIP] [RFC] extract dagservice and friends from go-ipfs~~ Extract dagservice and friends from go-ipfs Oct 16, 2017

Stebalien added 2 commits October 15, 2017 18:35

port async batch commit code from ipfs

1af7e81

(ipfs/kubo#4296) 1. Modern storage devices (i.e., SSDs) tend to be highly parallel. 2. Allows us to read and write at the same time (avoids pausing while flushing). fixes ipfs/kubo#898 (comment)

add a basic test for Batch

0408f8d

Stebalien force-pushed the dag-service branch from 1a40eb0 to 0408f8d Compare October 16, 2017 01:35

Stebalien commented Oct 16, 2017

View reviewed changes

Kubuxu self-requested a review October 17, 2017 17:18

Kubuxu reviewed Oct 18, 2017

View reviewed changes

Stebalien mentioned this pull request Oct 18, 2017

WIP: batch fetching interfaces on dagservice ipfs/kubo#4314

Closed

Stebalien added 5 commits November 15, 2017 09:42

don't return CIDs on add

b403323

The caller can just call `node.Cid()` and returning CIDs from `AddMany` requires allocation.

Add RemoveMany method

f10b5dd

add contexts to Add/Remove methods

44a7801

We'll need these for slower/remote datastores.

Kubuxu self-requested a review November 16, 2017 21:20

magik6k mentioned this pull request Nov 30, 2017

[WIP] Extract provider logic from BitSwap ipfs/kubo#4333

Closed

Stebalien mentioned this pull request Nov 30, 2017

add tests, make golint pass #29

Closed

Kubuxu approved these changes Dec 11, 2017

View reviewed changes

Stebalien merged commit 9d8c846 into ipfs:master Dec 11, 2017

ghost removed the status/in-progress In progress label Dec 11, 2017

Stebalien deleted the dag-service branch December 11, 2017 18:13


		// An interface for batch-adding nodes to a DAG.

		// TODO: Is this really the right level to do this at?

Extract dagservice and friends from go-ipfs #8

Extract dagservice and friends from go-ipfs #8

Conversation

Stebalien commented Jun 23, 2017 • edited Loading

Stebalien commented Jun 23, 2017

Kubuxu Jun 23, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Stebalien commented Jun 27, 2017 • edited Loading

kevina commented Jun 27, 2017

whyrusleeping commented Jun 27, 2017

Stebalien commented Jun 27, 2017

kevina commented Jun 27, 2017

Stebalien commented Jun 27, 2017

Stebalien commented Jun 27, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Stebalien commented Aug 17, 2017 • edited Loading

kevina commented Sep 1, 2017 • edited Loading

Stebalien commented Sep 1, 2017

whyrusleeping left a comment

Choose a reason for hiding this comment

Stebalien commented Oct 4, 2017

whyrusleeping commented Oct 4, 2017

Stebalien commented Oct 4, 2017

whyrusleeping commented Oct 4, 2017 via email

Stebalien commented Oct 16, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Stebalien commented Oct 17, 2017

Kubuxu commented Oct 17, 2017

Stebalien commented Oct 17, 2017

Kubuxu left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Stebalien commented Nov 16, 2017

whyrusleeping commented Nov 20, 2017

Stebalien commented Nov 20, 2017

Stebalien commented Dec 8, 2017

kevina commented Dec 9, 2017

Stebalien commented Dec 11, 2017

Stebalien commented Jun 23, 2017 •

edited

Loading

Kubuxu Jun 23, 2017 •

edited

Loading

Stebalien commented Jun 27, 2017 •

edited

Loading

Stebalien commented Jun 27, 2017 •

edited

Loading

Stebalien commented Aug 17, 2017 •

edited

Loading

kevina commented Sep 1, 2017 •

edited

Loading