implement a faster dag structure for unixfs #687

whyrusleeping · 2015-01-29T08:56:30Z

Note: This code isnt alpha priority, and while it works and ive tested it pretty well, isnt on our TODO list. So other code should take priority.
This PR adds an alternate method in importer to build a DAG with. This method ensures that every node has data in it, making streaming much faster since you'll have some data to display while you're fetching the next blocks. I accomplish a mostly balanced tree by performing a sort of 'rotate' every time the required depth has filled up. This rotate moves all but the first child node under the first child node to make room for more nodes to be added at the root level. Doing this rotate ensures that the in pre-order traversal of the tree remains the same throughout the trees creation.

…treaming files

jbenet · 2015-01-29T09:15:11Z

@whyrusleeping please duplicate the data in this case. we've discussed this a ton of times... it must be possible to take only the leaves and regenerate the entire file.

Without this property we will not be able to easily share entire subsections of files with others, because the links will taint that data.

whyrusleeping · 2015-01-29T09:22:07Z

You can still share subsections of the file with others, the tree structure allows for that just fine. Im curious what scenario youre worried about.

jbenet · 2015-01-29T09:29:57Z

@whyrusleeping maybe i dont understand your desc. can you draw it?

whyrusleeping · 2015-01-29T17:41:20Z

Well, any concerns you would have had about storing data in the intermediate links would still apply. In the case that I want just a subset of a file, from offset 4000 to offset 10000, I would just give a subtree starting at the node whose data contains offset 4000.

I tried doing the data duplication strategy, but it seemed very wasteful of bandwidth and also very difficult to implement properly. To get it right, adding in the duplicate 'cache' data has to be done entirely post-process, which is expensive for larger trees and will increase the number of disk-writes (which is already our bottleneck). Im not saying we should entirely replace our current dag builder, but adding this as an option should be considered.

whyrusleeping · 2015-01-29T19:06:34Z

I actually have another idea that should be a good compromise. It will keep all data in the leaf nodes and provide a better layout for streaming. Ill work on that later when i have some extra time. Its very similar to the ext4 block layout model.

whyrusleeping · 2015-01-29T19:09:09Z

Also, just noticed this:

var roughLinkSize = 258 + 8 + 5  // sha256 multihash + size + no name + protobuf framing

We are assuming that a sha256 multihash is 258 bytes, when its actually 34 (32 bytes for sha256 + 2 byte tag)

jbenet · 2015-01-30T00:46:45Z

@whyrusleeping oh wow. we (i) should be more clear when using bit or byte sizes.

(though i dont think we should necessarily increase the # of links per indirect block. seeking is pretty fast right now thanks in part to that.)

whyrusleeping · 2015-01-30T00:47:56Z

Agreed. Having ~20 or 30 links per block is honestly plenty.

jbenet · 2015-01-30T00:48:08Z

i very much like the ext4 idea.

whyrusleeping · 2015-01-30T00:54:06Z

Okay, and the more i think about it, the more i like it over whats implemented here.

jbenet · 2015-01-30T00:55:56Z

@whyrusleeping yeah would help opening files over the network :)

whyrusleeping · 2015-01-30T05:01:30Z

The other structure i thought up is what im calling a "List of Lists"

The advantage is that with every request after the first, you receive data, the RTT's required to get the next data block do not increase with the size of the file, but remain constant. Its basically a linked list of arrays of nodes. I also believe that it has fewer intermediary nodes than any other option discussed, which is better for the network overall (fewer values need to be provided). The only downside i can really think of is that as far as trees go, its kindof an ugly tree (0/10 would not decorate in christmas ornaments)

whyrusleeping · 2015-02-01T07:28:07Z

Alright, so ive come up with a new tree structure optimized for both streaming AND seeking through a given file. This improves both upon the ext4 structure (Which is mainly aimed at on disk filesystems) and the "List of Lists" idea i previously commented about.

The downside of the ext4 style tree layout was that, as you got farther into the file, the number of requests you need to make in order to get data increases, I noticed this problem and came up with the "List of Lists" layout, which would work fantastically for a sequential stream, the issue though, comes when you try to seek through it, the top level node is very poorly weighted to one side so that its 'narrow' from the data's perspective, thus seeking through requires O(n) requests to find the desired location in the file, where ext4 was roughly O(log(n)).

The Trickle{Tree,Dag} addresses both of these concerns, each request after the first can return actual file data, and the cost of seeking remains near O(log(n)) since it has a recursive tree structure. A visualization of it would look like the ext4 tree, but instead of having iteratively deeper 'balanced' trees, it has an iteratively deeper version of itself.

An example layout is here: http://gateway.ipfs.io/ipfs/QmT3mc4wtmyk2Fu1RFMVqvoVgYbDJeoTVnxLM28E4prVvj

whyrusleeping · 2015-02-01T22:01:02Z

closing in favor of #713

Hardening Improvements: RT diversity and decreased RT churn

implement a faster dag structure for unixfs to improve latency when s…

ff42b5b

…treaming files

whyrusleeping added the status/in-progress In progress label Jan 29, 2015

whyrusleeping mentioned this pull request Feb 1, 2015

implement trickledag for faster unixfs operations #713

Merged

whyrusleeping closed this Feb 1, 2015

whyrusleeping removed the status/in-progress In progress label Feb 1, 2015

Kubuxu deleted the fast-dag branch February 27, 2017 20:36

ariescodescream pushed a commit to ariescodescream/go-ipfs that referenced this pull request Oct 23, 2021

Merge pull request ipfs#687 from libp2p/feat/merge-dht-hardening-0.7

e788ffc

Hardening Improvements: RT diversity and decreased RT churn

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

implement a faster dag structure for unixfs #687

implement a faster dag structure for unixfs #687

whyrusleeping commented Jan 29, 2015

jbenet commented Jan 29, 2015

whyrusleeping commented Jan 29, 2015

jbenet commented Jan 29, 2015

whyrusleeping commented Jan 29, 2015

whyrusleeping commented Jan 29, 2015

whyrusleeping commented Jan 29, 2015

jbenet commented Jan 30, 2015

whyrusleeping commented Jan 30, 2015

jbenet commented Jan 30, 2015

whyrusleeping commented Jan 30, 2015

jbenet commented Jan 30, 2015

whyrusleeping commented Jan 30, 2015

whyrusleeping commented Feb 1, 2015

whyrusleeping commented Feb 1, 2015

implement a faster dag structure for unixfs #687

implement a faster dag structure for unixfs #687

Conversation

whyrusleeping commented Jan 29, 2015

jbenet commented Jan 29, 2015

whyrusleeping commented Jan 29, 2015

jbenet commented Jan 29, 2015

whyrusleeping commented Jan 29, 2015

whyrusleeping commented Jan 29, 2015

whyrusleeping commented Jan 29, 2015

jbenet commented Jan 30, 2015

whyrusleeping commented Jan 30, 2015

jbenet commented Jan 30, 2015

whyrusleeping commented Jan 30, 2015

jbenet commented Jan 30, 2015

whyrusleeping commented Jan 30, 2015

whyrusleeping commented Feb 1, 2015

whyrusleeping commented Feb 1, 2015