Spec refining: specify valid merklepath segments and encoding #67

warpfork · 2018-07-17T07:38:21Z

Mission

It's important to specify precisely what is a valid merklepath segement in IPLD.

The spec currently contains a "TODO: list path resolving restrictions" and this could be improved :)

Why

First, a quick clarification: "merklepath segments" are a distinct concept from "IPLD Selectors". Merklepaths are a specific and limited implementation of IPLD Selectors; they can only specify a traversal to a single object; and importantly, we want them to be serializable in a way that's easy for humans to operate. To quote the current spec for motivations:

IPLD paths MUST layer cleanly over UNIX and The Web (use /, have deterministic transforms for ASCII systems).

(Perhaps "ASCII" is a little over-constrained there. The spec also says "
IPLD paths MUST be universal and avoid oppressing non-english societies (e.g. use UTF-8, not ASCII)" -- we might want to refine those two lines after we tackle the rest of this issue.)

Second of all, just a list of other issues that are fairly closely related to a need for clarity on this subject:

IPFS permits undesirable paths ipfs/kubo#1710 -- "IPFS permits undesirable paths" -- lots of good discussion on encodings, and examples of valid but problematic characters
Document restrictions on keys of maps #59 -- "Document restrictions on keys of maps"
Non-string keys #58 -- "Non-string keys" -- this one has a particularly interesting detail quoted: "The original intention was to actually be quite restrictive: map keys must be unicode strings with no slashes. However, we've loosened that so that they can contain slashes, it's just that those keys can't be pathed through.". (n.b. this issue here is named "merklepath segments", not "IPLD keys", specifically to note this same distinction.)
Spec out DagPB path resolution #55 -- "Spec out DagPB path resolution"
Spec refining: make sure that an attribute cannot be named . #37 -- "Spec refining: make sure that an attribute cannot be named ."
Spec refining: Merkle-paths to an array's index #20 -- "Spec refining: Merkle-paths to an array's index"
Spec refining: Terminology IPLD vs Merkle #15 -- "Spec refining: Terminology IPLD vs Merkle" -- basically, am I titling this issue correctly by saying "merklepath"? Perhaps not ;)
Spec refining: Relative paths in IPLD #1 -- "Spec refining: Relative paths in IPLD" -- may require reserving more character sequences as special
Handing of non-utf-8 posix filenames legacy-unixfs-v2#3 -- "Handing of non-utf-8 posix filenames"
WebUI should (somehow) indicate a problematic directory entry ipfs/kubo#4292 -- "WebUI should (somehow) indicate a problematic directory entry"
perhaps more!

As this list makes evident... we really need to get this nailed down.

Mission, refined

Okay, motivations and intro done. What do we need to do?

(1) Update the spec to be consistently clear about IPLD keys versus valid merklepath segments. This distinction seems to exist already, but it's tricky, so we should hammer it.

(2) Define normal character encoding. (I think it's now well established that this is necessary -- merklepath segments are absolutely for direct human use, so we're certainly speaking of chars rather than bytes; and also unicode is complex and ignoring normalization is not viable.)

(3) Define any blacklisting of any further byte sequences which are valid normalized characters but we nonetheless don't want to see in merklepath segments.

(4) Ensure we're clear about what happens when an IPLD key is valid but as a key but not a merklepath segment (e.g. it's unpathable).

(And one more quick note: a lot of this has been in discussion already as part of sussing out the unixfsv2 spec. In unixfsv2, we've come to the conclusion that some of our path handling rules are quantum-entangled with the IPLD spec for merklepaths. Unixfsv2 may apply more blacklistings of byte sequences which are problematic than IPLD merklepath segements, so we don't have to worry about everything up here; but we do want to spec this first, so we can make sure the Unixfsv2 behavior normalizers are a nice subset of the IPLD merklepath rules.)

Progress

Regarding (1): "just a small matter of writing" once we nail the rest...

Regarding (2): We have an answer and the answer is "NFC". (At least, I think we have an answer with reasonable consensus.) We had a long thread about this in the context of unixfsv2, but entirely applicable here in general. Everyone seems to agree that UTF8 is a sensible place to be and NFC encoding is a sensible, already-well-specified normalization to use. And importantly, in practice, NFC is the encoding seen in practically all documents everywhere, so choosing NFC means we accept the majority of strings unchanged. Whew. dusts off hands

Regarding (3): Lots of example choices in ipfs/kubo#1710 . We need to reify that into a list in the spec.

Regarding (4): Open field?

(I'll update this "progress" section as discussion... progresses.)

The text was updated successfully, but these errors were encountered:

whyrusleeping · 2018-07-17T13:41:59Z

👏 so, it we agree on NFC. The next step is to figure out the list of bad byte sequences for keys? And then to agree on something like "parsing an ipld object with bad keys should error", right?

Stebalien · 2018-07-17T14:14:17Z

Note: While I agree that UTF8-NFC should be the canonical encoding, some formats, languages, and operating systems my force us to use different encodings. We just have to make sure to use UTF8-NFC when comparing.

warpfork · 2018-07-17T14:26:42Z

agree on something like "parsing an ipld object with bad keys should error", right?

In the unixfsv2 discussions, I actually proposed our implementation should probably take a func(rejected []byte, reason error) (replacement []byte, orHalt error) (exact signature TBD) parameter which defines how we respond to that situation. In some cases, when creating an object and finding a non-joyful key string, we'll want to halt and warn the user. In some cases, we'll have been told by the user to go ahead, but do some normalization. And in some cases, perhaps will simply want to log (especially in early drafts of this, to see how widely occurring tricky scenarios actually are in in the wild). How exactly to type that function might need a little sussing out, but I think its the right approach, in that it doesn't require us to pick just one approach.

Stebalien · 2018-07-18T12:40:35Z

And then to agree on something like "parsing an ipld object with bad keys should error", right?

IIRC, the consensus is to just not allow them in paths. Just like /.

This report was originally issue #67. Closes #67.

This report was originally issue ipld#67. Closes ipld#67.

warpfork mentioned this issue Apr 12, 2019

Data Model doc expanded, and in terms of Kinds. #112

Merged

vmx added a commit that referenced this issue Aug 20, 2019

Add exploration report about merkle segments

0c1358e

This report was originally issue #67. Closes #67.

vmx mentioned this issue Aug 20, 2019

Add exploration report about merkle segments #171

Merged

vmx added a commit that referenced this issue Aug 20, 2019

Add exploration report about merkle segments

c9569c8

This report was originally issue #67. Closes #67.

warpfork closed this as completed in #171 Aug 21, 2019

rvagg pushed a commit that referenced this issue Sep 2, 2019

Add exploration report about merkle segments

fdcb9a6

This report was originally issue #67. Closes #67.

rvagg pushed a commit that referenced this issue Sep 9, 2019

Add exploration report about merkle segments

276abd8

This report was originally issue #67. Closes #67.

rvagg pushed a commit that referenced this issue Sep 9, 2019

Add exploration report about merkle segments

ad52827

This report was originally issue #67. Closes #67.

rvagg pushed a commit that referenced this issue Sep 10, 2019

Add exploration report about merkle segments

4a7edfe

This report was originally issue #67. Closes #67.

rvagg pushed a commit that referenced this issue Sep 10, 2019

Add exploration report about merkle segments

b38f0cd

This report was originally issue #67. Closes #67.

warpfork mentioned this issue Jan 22, 2020

Add weekly sync meeting notes 2020-01-20 ipld/team-mgmt#52

Merged

prataprc pushed a commit to iprs-dev/ipld-specs that referenced this issue Oct 13, 2020

Add exploration report about merkle segments

c49769a

This report was originally issue ipld#67. Closes ipld#67.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spec refining: specify valid merklepath segments and encoding #67

Spec refining: specify valid merklepath segments and encoding #67

warpfork commented Jul 17, 2018 •

edited

Loading

whyrusleeping commented Jul 17, 2018

Stebalien commented Jul 17, 2018

warpfork commented Jul 17, 2018

Stebalien commented Jul 18, 2018

Spec refining: specify valid merklepath segments and encoding #67

Spec refining: specify valid merklepath segments and encoding #67

Comments

warpfork commented Jul 17, 2018 • edited Loading

Mission

Why

Mission, refined

Progress

whyrusleeping commented Jul 17, 2018

Stebalien commented Jul 17, 2018

warpfork commented Jul 17, 2018

Stebalien commented Jul 18, 2018

warpfork commented Jul 17, 2018 •

edited

Loading