Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[community contribution] Remove UnixfsNode from the trickle builder #5135

Closed
schomatis opened this issue Jun 18, 2018 · 23 comments · Fixed by ipfs/go-unixfs#54
Closed

[community contribution] Remove UnixfsNode from the trickle builder #5135

schomatis opened this issue Jun 18, 2018 · 23 comments · Fixed by ipfs/go-unixfs#54
Labels
exp/expert Having worked on the specific codebase is important help wanted Seeking public contribution on this issue P3 Low: Not priority right now topic/files Topic files topic/technical debt Topic technical debt

Comments

@schomatis
Copy link
Contributor

schomatis commented Jun 18, 2018

Community contribution opportunity

Area: IPFS files.
Estimated time: 10-20 hours (depends on previous knowledge of the Files API).
Difficulty: Moderate, this is not a good first issue, but the problem has already been solved in another part of the code base, so it's not an open-ended question, it has a clear template to apply.
What you can learn: interaction with files in IPFS, how is a DAG constructed, how is a file formatted inside the DAG, DAG and UnixFS layers.
Mentor: @schomatis (meaning I commit to guide the contributor throughout the entire resolution of this issue answering any question he/she may have).

@schomatis schomatis added help wanted Seeking public contribution on this issue topic/files Topic files exp/expert Having worked on the specific codebase is important topic/technical debt Topic technical debt P3 Low: Not priority right now labels Jun 18, 2018
@schomatis
Copy link
Contributor Author

The UnixfsNode structure is being removed from the balanced builder in #5118 (see #5106 for background). The same task can be done for the trickle DAG builder to be able to completely eliminate the UnixfsNode structure from the code base.

This isn't a trivial task but has already been solved for the balanced case, and most of the necessary functions and structures have already been created, they just need to be applied in the trickle case replacing the analogous UnixfsNode-related functions. The most challenging part is understanding the implementation differences between the balanced and trickle builders (conceptually they are very similar) to check if there is a need to include some more functions to completely replace UnixfsNode, especially in the append family of functions which are not present in the balanced case.


I'm not taking this on myself because it does take time and careful analysis to get this right and there's much more to be done in the Files API milestone; the trickle builder is used in only a minor part of the code, so there's no rush here. Also not adding it to the milestone as its main objective is to clarify how Unix files work in IPFS and that is accomplished through the balanced case, the trickle won't add much more new information to the user. Personally I've learned (and suffered) a lot from studying this part of the code base so I think this is a great opportunity for a (relatively) new IPFS developer who wants to dig deeper into the code (and hopefully the suffering part will be diminished with the work done on the milestone).

@schomatis schomatis changed the title Remove UnixfsNode from the trickle builder [community contribution] Remove UnixfsNode from the trickle builder Jun 18, 2018
@kevina
Copy link
Contributor

kevina commented Jun 18, 2018

I have not been following this discussion 100% but this is something I could probably do if it wanted.

@whyrusleeping what are your thoughts on this (if they are expressed elsewhere I apologize, like I said I have not really been following this discussion very carefully).

@schomatis
Copy link
Contributor Author

Hey @kevina, thanks for volunteering but I created this issue as an opportunity to provide some training to a new IPFS developer (you actually have much more experience than me on this part of the code) so if you don't mind I'd rather have someone more inexperienced take this.

@kevina
Copy link
Contributor

kevina commented Jun 18, 2018

@schomatis okay, I am always looking for straightforward tasks to do that don't require a lot of back and forth with other developers and this looks like it was a good fit which is why I volunteered, but I can leave it for someone else.

@schomatis
Copy link
Contributor Author

I am always looking for straightforward tasks to do that don't require a lot of back and forth with other developers

Good to know, I could use your help with some of the issues of the Files API milestone that actually aren't beginner friendly and would fit someone with your experience.

@kevina
Copy link
Contributor

kevina commented Jun 18, 2018

@schomatis most of those issues are labeled as documentation, I can help with writing what I know, but I am not the ultimate authority on it for that I think only @whyrusleeping and possible Juan or one of the other earlier developers that worked on that code.

@bjzhang
Copy link

bjzhang commented Jul 23, 2018

Hi @schomatis I read the above comments and patches. And I am interested in this work.

@schomatis
Copy link
Contributor Author

Hey @bjzhang, that's great! The first steps would be to:

  1. Take a look at the balanced builder to understand the format of the file DAG.
  2. Take a look at the previous implementation of the builder to understand how was the UnixfsNode used there (which is the structure that should be replaced in the trickle builder).
  3. Study the trickle builder to first understand the topology differences of a trickle file DAG compared to the balanced one.
  4. Draft a plan/issue/PR of how we could use the importer: remove UnixfsNode from the balanced builder #5118 PR to apply a similar mechanism in the trickle builder, especially with the append functions that we're not present in the balanced builder. (Be warned, I am going to bother you about documenting the code as much as possible :))

(As you said you may have already performed some of these steps.) Use this issue to ask any questions about the code (that will be a great feedback on how we could make it more accessible to the new IPFS developer) or feel free to open new issues/PRs if you think the complexity of the issue deserves it.

@bjzhang
Copy link

bjzhang commented Jul 23, 2018

Hi, @schomatis Thanks for above guidelines. I read the balanced builder, #5118 and #5106. And I am reading the code in trickle builder.

IIUC, I should make use of db. NewFSNodeOverDag() and db.NewLeafDataNode() to avoid use UnixfsNode. And fillTrickleRec and others functions need to rewrite to replace UnixfsNode with FSNodeOverDag.

Given that there are several functions need to be rewrite. I will start from Layout and fillTrickleRec.

@schomatis
Copy link
Contributor Author

IIUC, I should make use of db. NewFSNodeOverDag() and db.NewLeafDataNode() to avoid use UnixfsNode. And fillTrickleRec and others functions need to rewrite to replace UnixfsNode with FSNodeOverDag.

Yes, exactly! You understood it perfectly.

Given that there are several functions need to be rewrite. I will start from Layout and fillTrickleRec.

Sounds about right. I would strongly advise you not to write a huge commit (as I did in #5118). The advantage of this scenario is that at this point the trickle builder is the only consumer of the UnixfsNode structure so now you can progressively dismember it as you see fit. (The same criteria applies to the DagBuilderHelper structure.) For example, you could replace the ProtoNode and FSNode attributes with the new FSNodeOverDag structure and adapt the AddChild function (and related) to be able to process it (FSNodeOverDag is ideal to act as an internal node, which is by definition what AddChild is adding a child to).

That is just an illustrative example off the top of my head, this is the part where you should take your time to think this through and find a way to slowly modify the code to be able organize the PR in small commits that can each pass the tests individually (so you know you're on the right track and aren't breaking anything).

@schomatis
Copy link
Contributor Author

In the meanwhile I'm going to take a look at the current state of the DAG diff command (ipfs object diff) which can be very useful here as most of the errors you'll be getting are of the hash-mismatch type, meaning your modified version of the builder will create an DAG with a (probably really small) difference that will propagate up to the root changing its hash and the test will just report a generic "the hash of the DAG generated with your builder doesn't match the one I have hard-coded here" error. The diff command can be very helpful to debug that situation and find exactly where is the actual difference (I should have used it myself in #5118 now that I think about it).

@schomatis
Copy link
Contributor Author

@bjzhang Just a heads-up to let you know that the importer and unixfs packages (which concentrate pretty much all the code related to this issue) will be moving (in a near future) to https://github.com/ipfs/go-unixfs (#5316). This won't affect how the code works, but if at some point in the future you update the master branch and can't find the code go to that repo.

@bjzhang
Copy link

bjzhang commented Jul 31, 2018

I am trying to understand the code in trickle dag builder. And here is my trivial patch of Layout. I could add and get the file successful. But the trickle_test.go report the wrong ProtoNode type:

--- FAIL: TestSizeBasedSplit (0.00s)
    --- FAIL: TestSizeBasedSplit/leaves=ProtoBuf (0.00s)
    	/go/src/github.com/ipfs/go-ipfs/unixfs/importer/trickle/trickle_test.go:93: expected file as branch node, got: Raw
    --- FAIL: TestSizeBasedSplit/leaves=Raw (0.00s)
    	/go/src/github.com/ipfs/go-ipfs/unixfs/importer/trickle/trickle_test.go:93: expected file as branch node, got: Raw

I guess there is some misunderstanding of such types. But I could not find a glue. Meanwhile I saw a pull request #5120 which try to remove the Raw type by encountering backward compatibility issues. Is it relative issues?

diff --git a/unixfs/importer/helpers/dagbuilder.go b/unixfs/importer/helpers/dagbuilder.go
index 85e7aa7bc..123d9a173 100644
--- a/unixfs/importer/helpers/dagbuilder.go
+++ b/unixfs/importer/helpers/dagbuilder.go
@@ -257,6 +257,25 @@ func (db *DagBuilderHelper) FillNodeLayer(node *UnixfsNode) error {
 	return nil
 }
 
+// FillFSNodeLayer do the same thing as FillNodeLayer.
+func (db *DagBuilderHelper) FillFSNodeLayer(node *FSNodeOverDag) error {
+
+	// while we have room AND we're not done
+	for node.NumChildren() < db.maxlinks && !db.Done() {
+		//TODO size
+		child, childFileSize, err := db.NewLeafDataNode()
+		if err != nil {
+			return err
+		}
+
+		if err := node.AddChild(child, childFileSize, db); err != nil {
+			return err
+		}
+	}
+
+	return nil
+}
+
 // GetNextDataNode builds a UnixFsNode with the data obtained from the
 // Splitter, given the constraints (BlockSizeLimit, RawLeaves) specified
 // when creating the DagBuilderHelper.
diff --git a/unixfs/importer/trickle/trickledag.go b/unixfs/importer/trickle/trickledag.go
index 55e9b849d..41d9a3661 100644
--- a/unixfs/importer/trickle/trickledag.go
+++ b/unixfs/importer/trickle/trickledag.go
@@ -40,23 +40,13 @@ const layerRepeat = 4
 // DagBuilderHelper. See the module's description for a more detailed
 // explanation.
 func Layout(db *h.DagBuilderHelper) (ipld.Node, error) {
-	root := db.NewUnixfsNode()
-	log.Infof("Bamvor: root: %p.", &root)
-
-	if err := fillTrickleRec(db, root, -1); err != nil {
-		return nil, err
-	}
-
-	out, err := db.Add(root)
+	newRoot := db.NewFSNodeOverDag(ft.TFile)
+	root, _, err := fillTrickleRecFSNode(db, newRoot, -1)
 	if err != nil {
 		return nil, err
 	}
 
-	if err := db.Close(); err != nil {
-		return nil, err
-	}
-
-	return out, nil
+	return db.AddNodeAndClose(root)
 }
 
 // fillTrickleRec creates a trickle (sub-)tree with an optional maximum specified depth
@@ -91,6 +81,47 @@ func fillTrickleRec(db *h.DagBuilderHelper, node *h.UnixfsNode, maxDepth int) er
 	}
 }
 
+// fillTrickleRecFSNode creates a trickle (sub-)tree with an optional maximum specified depth
+// in the case maxDepth is greater than zero, or with unlimited depth otherwise
+// (where the DAG builder will signal the end of data to end the function).
+func fillTrickleRecFSNode(db *h.DagBuilderHelper, node *h.FSNodeOverDag, maxDepth int) (filledNode ipld.Node, nodeFileSize uint64, err error) {
+	// Always do this, even in the base case
+	if err := db.FillFSNodeLayer(node); err != nil {
+		return nil, 0, err
+	}
+
+	for depth := 1; ; depth++ {
+		// Apply depth limit only if the parameter is set (> 0).
+		if db.Done() || (maxDepth > 0 && depth == maxDepth) {
+			break
+		}
+		for layer := 0; layer < layerRepeat; layer++ {
+			if db.Done() {
+				break
+			}
+
+			nextChild := db.NewFSNodeOverDag(ft.TFile)
+			childNode, childFileSize, err := fillTrickleRecFSNode(db, nextChild, depth)
+			if err != nil {
+				return nil, 0, err
+			}
+
+			if err := node.AddChild(childNode, childFileSize, db); err != nil {
+				return nil, 0, err
+			}
+		}
+	}
+	nodeFileSize = node.FileSize()
+
+	// Get the final `dag.ProtoNode` with the `FSNode` data encoded inside.
+	filledNode, err = node.Commit()
+	if err != nil {
+		return nil, 0, err
+	}
+
+	return filledNode, nodeFileSize, nil
+}
+
 // Append appends the data in `db` to the dag, using the Trickledag format
 func Append(ctx context.Context, basen ipld.Node, db *h.DagBuilderHelper) (out ipld.Node, errOut error) {
 	base, ok := basen.(*dag.ProtoNode)

Thanks.

@schomatis
Copy link
Contributor Author

Hey @bjzhang, yes, #5120 was just a simple test to prove that the balanced builder was using the UnixFS types incorrectly: internal nodes should be of type TFile and leaf nodes (when using ProtoNode) should be of type TRaw, but the balanced builder uses TFile for all nodes, so I kept it like that to avoid changing all of the test hashes, but the trickle should continue to comply with this standard,

https://github.com/ipfs/go-unixfs/blob/master/importer/helpers/dagbuilder.go#L217

So, NewLeafDataNode (and the NewLeafNode function it calls) should be adapted to be able to pass the type of leaf node by argument, right now it's always creating TFile leaf nodes. The balanced builder should call it with TFile argument and the trickle with TRaw.

That being said, the error you're getting doesn't seem to be about that,

https://github.com/ipfs/go-unixfs/blob/master/importer/trickle/trickledag.go#L333

verifyTDagRec is complaining that an internal node ("branch node", as it's called there) should have been of type TFile but instead was of type TRaw (this is incorrect for both the balanced and the trickle builders), although from what I'm seeing in your patch you're always using TFile when creating new internal nodes (NewFSNodeOverDag), so at first sight I can't spot the problem.


The importer directory (the code you're working on) has already been moved to the go-unixfs repo, feel free to keep working here with an old version of go-ipfs, but eventually (when you pull from master) you'll have to move your current work there. I'm cc'ing @whyrusleeping so he can paste here a simple example (or point to an already existing one) for a new developer of how to work across repositories using the gx tool (I'm assuming with gx link although the paths here are already fixed to a gx version, I don't know if that's a problem), that is, so you can work on go-unixfs submitting a PR there and use the tests of go-ipfs here to check if it's working.

@schomatis
Copy link
Contributor Author

Also, it seems that by default an unspecified unixfs.FSNode will return a Raw type (which is not really nice) so keep an eye for a possible bug that would leave a node without a type specified,

https://github.com/ipfs/go-unixfs/blob/master/pb/unixfs.pb.go#L89

@schomatis
Copy link
Contributor Author

You can also temporarily hack verifyTDagRec to print more information about the DAG to pinpoint exactly at which level, node, etc., the inconsistency is taking place.

@schomatis
Copy link
Contributor Author

Hey @bjzhang, we're having some technical difficulties with our package manager gx right now, and the go-unixfs won't build against this repo, so for now keep using your old master branch of go-ipfs where you've been working so far to figure out the TFile/TRaw problem, I'll ping you when we have this sorted out, sorry.

@schomatis
Copy link
Contributor Author

Hey @bjzhang, you should now be able to build go-ipfs against your cloned version of go-unixfs doing

GO_UNIXFS_HASH=QmWdTRLi3H7ZJQ8s7NYo8oitz5JHEEPKLn1QPMsJVWg2Ew
# Hash of the current `go-unixfs` version (1.0.2) used in `go-ipfs`

go get github.com/whyrusleeping/gx
go get github.com/whyrusleeping/gx-go

go get -d github.com/ipfs/go-unixfs
gx-go link $GO_UNIXFS_HASH
# This will create a symbolic link from `$GOPATH/src/gx/$GO_UNIXFS_HASH/go-unixfs`
# to the git clone `$GOPATH/src/github.com/ipfs/go-unixfs`.

# Inside the `go-ipfs` repo:
make install

This will allow you to work on github.com/ipfs/go-unixfs and test your code through the go-ipfs tests, bear in mind that gx-go link will rewrite the generic imports in go-unixfs to the gx versions installed, e.g.,

diff --git a/importer/trickle/trickledag.go b/importer/trickle/trickledag.go
index 30c9618..8caff99 100644
--- a/importer/trickle/trickledag.go
+++ b/importer/trickle/trickledag.go
@@ -20,12 +20,12 @@ import (
 	"errors"
 	"fmt"
 
-	ft "github.com/ipfs/go-unixfs"
-	h "github.com/ipfs/go-unixfs/importer/helpers"
+	ft "gx/ipfs/QmWdTRLi3H7ZJQ8s7NYo8oitz5JHEEPKLn1QPMsJVWg2Ew/go-unixfs"
+	h "gx/ipfs/QmWdTRLi3H7ZJQ8s7NYo8oitz5JHEEPKLn1QPMsJVWg2Ew/go-unixfs/importer/helpers"
 
-	cid "github.com/ipfs/go-cid"
-	ipld "github.com/ipfs/go-ipld-format"
-	dag "github.com/ipfs/go-merkledag"
+	cid "gx/ipfs/QmYVNvtQkeZ6AKSwDrjQTs432QtL6umrrK41EBq3cu7iSP/go-cid"
+	ipld "gx/ipfs/QmZtNq8dArGfnpCZfx2pUNY7UcjGhVp5qqwQ4hH6mpTMRQ/go-ipld-format"
+	dag "gx/ipfs/Qma2BR57Wqp8w9vPreK4dEzoXXk8DFFRL3LresMZg4QpzN/go-merkledag"
 )

When submitting a PR in go-unixfs you should restore the generic import version with

go-unixfs$ gx-go rewrite --undo --fix

@bjzhang
Copy link

bjzhang commented Aug 14, 2018

@schomatis Thanks for your above method. After read the code, I found the issue is NewLeafNode default to TFile. I should add pb.Data_DataType to it and then specified the TRaw for the leaf node of trickle. I could add the trickle node correctly and go test for importer/trickle and importer/balanced is passed. I current patches:
dag: add data type in NewLeafDataNode
dag: remote unixfs node in Layout of trickledag

@schomatis
Copy link
Contributor Author

That's great @bjzhang!!

Could you submit those patches as a PR in go-unixfs to start the process of review and get that merged?

@bjzhang
Copy link

bjzhang commented Aug 15, 2018

@schomatis I just create the above pull request in go-unixfs package. Just to be clear, I still work on functions other than Layout in trickledag. I want to know if I am on the right direction. Thanks.

@bjzhang
Copy link

bjzhang commented Sep 23, 2018

When I read the codes in go-unixfs/importer/helpers/helpers.go. I found that the comments of roughLinkSize and DefaultLinksPerBlock is difference from the value of them. Could I send a patch to remove the comments?

@schomatis
Copy link
Contributor Author

Could I send a patch to remove the comments?

Instead of removing the comments I would like them to correctly reflect the current values, but yes, open an issue about it and tag kevina who might now how are those values estimated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
exp/expert Having worked on the specific codebase is important help wanted Seeking public contribution on this issue P3 Low: Not priority right now topic/files Topic files topic/technical debt Topic technical debt
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants