IPLD refining call 2 #181

nicola · 2016-09-11T23:19:39Z

IPLD stakeholders should get together to agree on some pending refinement in IPLD. These issues are not necessarily new features but restrictions that we may have to specify in the spec to make it less ambiguous.

Partecipants are expected to be informed about the different issues, they should:

Read the different issues beforehand
Raise concerns and give opinion offline (so that everyone is ready to agree/further discuss)

Agenda so far:

Update on CID
Writing of the new spec (skeleton & structure)
Use of IPLD pointers in link object (Spec refining: Support of IPLD pointers as links ipld/specs#3)

Other:

Relative Paths in IPLD (Spec refining: Relative paths in IPLD ipld/specs#1)
IPLD links must be IPLD pointers (Spec refining: IPLD pointers must point to IPLD objects only ipld/specs#8)
Requiring prefixing of paths (Spec refining: requiring namespace in IPLD pointers ipld/specs#7)
Mutable pointers (Spec refining: Mutable links in IPLD ipld/specs#9)

We can do this in the coming week or next

Ping @Stebalien, @mildred, @jbenet, @dignifiedquire, @RichardLitt

RichardLitt · 2016-09-11T23:38:21Z

Sounds good to me. Want to set up a time to talk about it on the sprint calls tomorrow? Maybe after libp2p?

jbenet · 2016-09-12T00:00:28Z

👍 SGTM

nicola · 2016-09-12T00:39:12Z

Can we make it 1:30 hours after libp2p? (that works best for me)

RichardLitt · 2016-09-12T00:57:51Z

Might be a little late for @dignifiedquire: thoughts?

Stebalien · 2016-09-12T01:57:43Z

I can make it whenever (thesis done). @nicola I assume you'll be in CSAIL?

jbenet · 2016-09-12T03:18:41Z

I can only make it right after the other calls.

jbenet · 2016-09-12T03:19:10Z

But if it doesn't work, no worries, meet w/o me

dignifiedquire · 2016-09-12T06:55:15Z

I don't think I can make it that late I'm afraid :/ but I would be available right after the calls.

RichardLitt · 2016-09-12T13:43:35Z

@nicola You ok with having a call right after libp2p, then? Might be good to jump on the All Hands call at noon to schedule in person with the others.

nicola · 2016-09-12T13:58:54Z

I can make it to the All Hands, but unfortunately I can't make it to after libp2p, we can still meet in all hands to decide future timing!

nicola · 2016-09-12T18:23:08Z

IPLD meeting is happening 4pm Boston/NYC time (1h30m after the libp2p meeting)

RichardLitt · 2016-09-12T18:33:39Z

Sweet. Neither @dignifiedquire or I will be around for that; you can record it using Zoom, or just take notes.

flyingzumwalt · 2016-09-12T18:43:32Z

@nicola I would like to listen in, so please post info about how to join the call when you start (or can we use the same zoom link as the calls we just finished?)

nicola · 2016-09-12T21:18:55Z

hey @flyingzumwalt at the end we asked on IRC if someone wanted to join and it was just me and @Stebalien discussing through:

code restructuring (what do we need?)
relative paths in object and across object

We have been taking accurate notes about everything we discussed, I will be posting them soon in ipfs/pm (I am sorry that I did not check this earlier on :( )

Stebalien · 2016-09-13T19:17:04Z

Terminology

Block - a chunk of raw data
Object - an IPLD object
Node - a node in the "graph" encoded in an IPLD object.
Graph - a DAG of IPLD objects.
Canonicalize: Apply some function f to a path such that f(path1) == f(path2) for all path1 and path2 if and only if they point to the same object.

Resolutions

Cycles

Do we let relative links introduce cycles?

Conclusion

No.

Pro

Parent Links ("..").
Bidirectional links.

However, these can also be done at the application level (see below).

Cons

Harder for applications to traverse the graph.
In the case of inter-object links, a single block can be multiple logical "objects" depending on the path used to access it.

Intra-Object relative links

Do we allow relative links within an object?

Conclusion

Yes.

Pros

Without internal non-cyclic relative links, one would need to break an object into a graph in some cases. For example, given:

  a
 / \
v   v
b-->c

With internal non-cyclic relative links, we can express the b -> c link as ../c. If we didn't allow these links, c would need to be a separate object.

Cons

We should probably have a validation step that checks for cycles/invalid links. This is also one more feature to implement.

Alternatives

When writing this up, I realized that there is an alternative (besides do nothing). Specifically, we could allow some form of nameable sub-object. However, I feel that this is even more complicated and probably a bad idea.

Inter-Object Relative Links

Should we allow relative links to traverse up through a parent.

Conclusion

No.

Pros

With cycles, this would allow cross-object bidirectional links/parent links. However, we don't want cycles.
Without cycles, this would allow one to create a link from a child object to a non-ancestor node in a parent object. This is the cross-object variant of the motivation for intra-object links. However, we're already dealing with multiple objects here so I find this argument less convincing.

Cons

The value of an object would depend on the path used to reach it; there would be no one-to-one mapping of blocks to objects.
As a consequence of this, objects would no longer have "canonical" paths. This would make, e.g., comparing objects significantly more difficult.

Unresolved Questions

When writing this up, I came across a few unresolved questions.

Inter-object relative links that don't traverse parents

What about inter-object relative links that never only traverse to child objects? These links would have a verifiable one-to-one mapping to absolute links (as in, they could be trivially replaced with absolute links). We could canonicalize all links when encoding an object but then we might lose some semantic information. This leads into the second question.

Canonicalizing Links

Do we canonicalize all links regardless? Do we just make this an optional feature (leave the choice up to the implementation)? This would throw away some semantic information and may not always be possible (e.g., if you don't have a copy of the intermediate objects). However, this would make fetching objects faster.

Stebalien · 2016-09-13T19:59:13Z

Application Level Cycles

There are (at least) two ways to do application level cycles.

Adjacency List

Distribute a supplementary adjacency list to overlay the object. That is, one can write:

{
    this: { /* ... */ },
    overlay: {
        "a/b/c": "a/b", // A parent link
    }
}

Twin-Node

Use two nodes instead of one — one for out-links, one for in-links — and construct the logical graph at the application level. That is, given a graph such as:

The application would produce:

In the above graphs, G "lists" all nodes in the graph where each node is actually two nodes: one that points to other nodes and one that is pointed to by other nodes.

nicola · 2016-09-14T19:21:29Z

@Stebalien thanks for writing this down! By tonight I should be able to add other notes (if there are any)

nicola · 2016-09-19T03:52:11Z

Created a PR with the notes from the meeting #196

ghost · 2016-11-11T02:25:26Z

IMO not allowing cycles in documents is a big mistake. Real applications that would like to use IPLD don't always have nice DAGs.

Also the two algorithms that you've given won't work for applications that have DAGs with duplicate node labels. Here I'm referring to node labels as hashes of the non-cyclic components of each node. That is, the content of a document with the links to other nodes in its strongly connected component removed. Nodes can have duplicate labels but have different connectivity information within a strongly connected component.

As a simple example of documents that have identical structure, consider the following struct definitions in the context of a fine-grained software repository/version control system (this is what I'm planning on working on):

struct Vector2 { float x; float y; }

struct Point2 { float x; float y; }

Clearly these two struct definitions have the exact same structure, so they should be hashed to the same value and thus have the same labels. In other words, structural equality is not equality for some applications.

I have a rough idea of an algorithm that will work for applications with duplicate node labels. It is based off of De Bruijn indices from lambda calculus. Note that this algorithm would require edge labels (probably based off of the order of which the edges appear in the document). For applications that have multisets, this will not work. I believe that some combination of the algorithm given here:
https://arxiv.org/ftp/arxiv/papers/1512/1512.07263.pdf
and Merkle hashing could be devised.

Let me know if you're interested in more details on the De Bruijn algorithm. Also I'd like to suggest that multisets should be added as a type in the JSON for IPLD. There are plenty of hashing strategies for multisets that can be found in academic papers.

nicola mentioned this issue Sep 11, 2016

Captain.log - IPLD v1 spec ipld/specs#13

Closed

2 tasks

This was referenced Sep 19, 2016

Create 2016-09-12-IPLD-refining-call-2.md #196

Merged

Spec refining: Relative paths in IPLD ipld/specs#1

Closed

nicola closed this as completed Sep 19, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IPLD refining call 2 #181

IPLD refining call 2 #181

nicola commented Sep 11, 2016 •

edited

Loading

RichardLitt commented Sep 11, 2016

jbenet commented Sep 12, 2016

nicola commented Sep 12, 2016

RichardLitt commented Sep 12, 2016

Stebalien commented Sep 12, 2016

jbenet commented Sep 12, 2016

jbenet commented Sep 12, 2016

dignifiedquire commented Sep 12, 2016

RichardLitt commented Sep 12, 2016

nicola commented Sep 12, 2016

nicola commented Sep 12, 2016

RichardLitt commented Sep 12, 2016

flyingzumwalt commented Sep 12, 2016

nicola commented Sep 12, 2016

Stebalien commented Sep 13, 2016

Stebalien commented Sep 13, 2016

nicola commented Sep 14, 2016

nicola commented Sep 19, 2016

ghost commented Nov 11, 2016

IPLD refining call 2 #181

IPLD refining call 2 #181

Comments

nicola commented Sep 11, 2016 • edited Loading

RichardLitt commented Sep 11, 2016

jbenet commented Sep 12, 2016

nicola commented Sep 12, 2016

RichardLitt commented Sep 12, 2016

Stebalien commented Sep 12, 2016

jbenet commented Sep 12, 2016

jbenet commented Sep 12, 2016

dignifiedquire commented Sep 12, 2016

RichardLitt commented Sep 12, 2016

nicola commented Sep 12, 2016

nicola commented Sep 12, 2016

RichardLitt commented Sep 12, 2016

flyingzumwalt commented Sep 12, 2016

nicola commented Sep 12, 2016

Stebalien commented Sep 13, 2016

Terminology

Resolutions

Cycles

Conclusion

Pro

Cons

Intra-Object relative links

Conclusion

Pros

Cons

Alternatives

Inter-Object Relative Links

Conclusion

Pros

Cons

Unresolved Questions

Inter-object relative links that don't traverse parents

Canonicalizing Links

Stebalien commented Sep 13, 2016

Application Level Cycles

Adjacency List

Twin-Node

nicola commented Sep 14, 2016

nicola commented Sep 19, 2016

ghost commented Nov 11, 2016

nicola commented Sep 11, 2016 •

edited

Loading