Skip to content
This repository has been archived by the owner on Feb 8, 2023. It is now read-only.

IPLD refining call 2 #181

Closed
nicola opened this issue Sep 11, 2016 · 19 comments · Fixed by #196
Closed

IPLD refining call 2 #181

nicola opened this issue Sep 11, 2016 · 19 comments · Fixed by #196

Comments

@nicola
Copy link
Member

nicola commented Sep 11, 2016

IPLD stakeholders should get together to agree on some pending refinement in IPLD. These issues are not necessarily new features but restrictions that we may have to specify in the spec to make it less ambiguous.

Partecipants are expected to be informed about the different issues, they should:

  • Read the different issues beforehand
  • Raise concerns and give opinion offline (so that everyone is ready to agree/further discuss)

Agenda so far:

Other:

We can do this in the coming week or next


Ping @Stebalien, @mildred, @jbenet, @dignifiedquire, @RichardLitt

@RichardLitt
Copy link
Member

Sounds good to me. Want to set up a time to talk about it on the sprint calls tomorrow? Maybe after libp2p?

@jbenet
Copy link
Member

jbenet commented Sep 12, 2016

👍 SGTM

@nicola
Copy link
Member Author

nicola commented Sep 12, 2016

Can we make it 1:30 hours after libp2p? (that works best for me)

@RichardLitt
Copy link
Member

Might be a little late for @dignifiedquire: thoughts?

@Stebalien
Copy link
Member

I can make it whenever (thesis done). @nicola I assume you'll be in CSAIL?

@jbenet
Copy link
Member

jbenet commented Sep 12, 2016

I can only make it right after the other calls.

@jbenet
Copy link
Member

jbenet commented Sep 12, 2016

But if it doesn't work, no worries, meet w/o me

@dignifiedquire
Copy link
Member

I don't think I can make it that late I'm afraid :/ but I would be available right after the calls.

@RichardLitt
Copy link
Member

@nicola You ok with having a call right after libp2p, then? Might be good to jump on the All Hands call at noon to schedule in person with the others.

@nicola
Copy link
Member Author

nicola commented Sep 12, 2016

I can make it to the All Hands, but unfortunately I can't make it to after libp2p, we can still meet in all hands to decide future timing!

@nicola
Copy link
Member Author

nicola commented Sep 12, 2016

IPLD meeting is happening 4pm Boston/NYC time (1h30m after the libp2p meeting)

@RichardLitt
Copy link
Member

Sweet. Neither @dignifiedquire or I will be around for that; you can record it using Zoom, or just take notes.

@flyingzumwalt
Copy link
Contributor

@nicola I would like to listen in, so please post info about how to join the call when you start (or can we use the same zoom link as the calls we just finished?)

@nicola
Copy link
Member Author

nicola commented Sep 12, 2016

hey @flyingzumwalt at the end we asked on IRC if someone wanted to join and it was just me and @Stebalien discussing through:

  • code restructuring (what do we need?)
  • relative paths in object and across object

We have been taking accurate notes about everything we discussed, I will be posting them soon in ipfs/pm (I am sorry that I did not check this earlier on :( )

@Stebalien
Copy link
Member

Terminology

  • Block - a chunk of raw data
  • Object - an IPLD object
  • Node - a node in the "graph" encoded in an IPLD object.
  • Graph - a DAG of IPLD objects.
  • Canonicalize: Apply some function f to a path such that f(path1) == f(path2) for all path1 and path2 if and only if they point to the same object.

Resolutions

Cycles

Do we let relative links introduce cycles?

Conclusion

No.

Pro

  1. Parent Links ("..").
  2. Bidirectional links.

However, these can also be done at the application level (see below).

Cons

  1. Harder for applications to traverse the graph.
  2. In the case of inter-object links, a single block can be multiple logical "objects" depending on the path used to access it.

Intra-Object relative links

Do we allow relative links within an object?

Conclusion

Yes.

Pros

Without internal non-cyclic relative links, one would need to break an object into a graph in some cases. For example, given:

  a
 / \
v   v
b-->c

With internal non-cyclic relative links, we can express the b -> c link as ../c. If we didn't allow these links, c would need to be a separate object.

Cons

We should probably have a validation step that checks for cycles/invalid links. This is also one more feature to implement.

Alternatives

When writing this up, I realized that there is an alternative (besides do nothing). Specifically, we could allow some form of nameable sub-object. However, I feel that this is even more complicated and probably a bad idea.

Inter-Object Relative Links

Should we allow relative links to traverse up through a parent.

Conclusion

No.

Pros

  1. With cycles, this would allow cross-object bidirectional links/parent links. However, we don't want cycles.
  2. Without cycles, this would allow one to create a link from a child object to a non-ancestor node in a parent object. This is the cross-object variant of the motivation for intra-object links. However, we're already dealing with multiple objects here so I find this argument less convincing.

Cons

  1. The value of an object would depend on the path used to reach it; there would be no one-to-one mapping of blocks to objects.
  2. As a consequence of this, objects would no longer have "canonical" paths. This would make, e.g., comparing objects significantly more difficult.

Unresolved Questions

When writing this up, I came across a few unresolved questions.

Inter-object relative links that don't traverse parents

What about inter-object relative links that never only traverse to child objects? These links would have a verifiable one-to-one mapping to absolute links (as in, they could be trivially replaced with absolute links). We could canonicalize all links when encoding an object but then we might lose some semantic information. This leads into the second question.

Canonicalizing Links

Do we canonicalize all links regardless? Do we just make this an optional feature (leave the choice up to the implementation)? This would throw away some semantic information and may not always be possible (e.g., if you don't have a copy of the intermediate objects). However, this would make fetching objects faster.

@Stebalien
Copy link
Member

Application Level Cycles

There are (at least) two ways to do application level cycles.

Adjacency List

Distribute a supplementary adjacency list to overlay the object. That is, one can write:

{
    this: { /* ... */ },
    overlay: {
        "a/b/c": "a/b", // A parent link
    }
}

Twin-Node

Use two nodes instead of one — one for out-links, one for in-links — and construct the logical graph at the application level. That is, given a graph such as:

Input

The application would produce:

Result

In the above graphs, G "lists" all nodes in the graph where each node is actually two nodes: one that points to other nodes and one that is pointed to by other nodes.

@nicola
Copy link
Member Author

nicola commented Sep 14, 2016

@Stebalien thanks for writing this down! By tonight I should be able to add other notes (if there are any)

@nicola
Copy link
Member Author

nicola commented Sep 19, 2016

Created a PR with the notes from the meeting #196

@nicola nicola closed this as completed Sep 19, 2016
@ghost
Copy link

ghost commented Nov 11, 2016

IMO not allowing cycles in documents is a big mistake. Real applications that would like to use IPLD don't always have nice DAGs.

Also the two algorithms that you've given won't work for applications that have DAGs with duplicate node labels. Here I'm referring to node labels as hashes of the non-cyclic components of each node. That is, the content of a document with the links to other nodes in its strongly connected component removed. Nodes can have duplicate labels but have different connectivity information within a strongly connected component.

As a simple example of documents that have identical structure, consider the following struct definitions in the context of a fine-grained software repository/version control system (this is what I'm planning on working on):

struct Vector2 { float x; float y; }

struct Point2 { float x; float y; }

Clearly these two struct definitions have the exact same structure, so they should be hashed to the same value and thus have the same labels. In other words, structural equality is not equality for some applications.

I have a rough idea of an algorithm that will work for applications with duplicate node labels. It is based off of De Bruijn indices from lambda calculus. Note that this algorithm would require edge labels (probably based off of the order of which the edges appear in the document). For applications that have multisets, this will not work. I believe that some combination of the algorithm given here:
https://arxiv.org/ftp/arxiv/papers/1512/1512.07263.pdf
and Merkle hashing could be devised.

Let me know if you're interested in more details on the De Bruijn algorithm. Also I'd like to suggest that multisets should be added as a type in the JSON for IPLD. There are plenty of hashing strategies for multisets that can be found in academic papers.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants