-
-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clarify whether merkledag links can be binary #1172
Comments
I'm in favor of requiring UTF-8.
A slightly-related link-name issue is #915, since the encoding used
for a base-58 key could conceivably alter the sort order (although I'd
expect most encodings to byte-sort those characters in the same
order).
|
For the record, I have no bias either way (both seem valuable, I can always shove my binary things in Data). |
On Thu, Apr 30, 2015 at 12:45:15PM -0700, Tv wrote:
Knowing that the keys are UTF-8 makes it easy to convert filenames, ipfs cat QmSomeList/NH will by default reference the base-58 encoded key ND (decimal 1234, Incidentally, how do we handle path-separators in keys? Do we |
I think we should use utf-8 for link names. also I think that we would like to be able to extend the link protobuf in sub data structures. Any good way of doing it? Casting the protobuf (decoding with an extended schema) based on the type of the object? |
@wking There's more kinds of IPFS objects than directories. E.g. large files have Links with Name=="", try feeding that to your cat. @jbenet The canonical way to extend proto3, decentralized, is Any. It uses URL-like strings as identifiers: https://developers.google.com/protocol-buffers/docs/proto3#any |
There's a spec somewhere about those identifiers of Any. They're supposed to be usable to fetch the protobuf description (binary), or 404 (but definitely not give human-friendly pages). They should match the names of messages. I don't have a link for that convenient. |
@wking Too late and base58 is so slow it's silly. Besides, they have an index position in Links already. |
On Thu, Apr 30, 2015 at 02:52:48PM -0700, Tv wrote:
With enough motivation we can always migrate ;).
True. But you'll only care about encoding to base-58 when you add a
I'm fine having two object types for link maps (lookup by key) and
but that doesn't work if you can mix-and match keyed and indexed |
@wking Please make that a separate issue, if you care strongly enough. In this one, I just want the language enough Name clarified one way or the other. Also, even if it's UTF-8, that doesn't stop it from having sequences to e.g. reprogram your VT-100. UIs should probably look for non-printables and quote, or something. |
On Thu, Apr 30, 2015 at 03:44:39PM -0700, Tv wrote:
I think the issue is “is it a big restriction to require path-like The safer route seems to be to have arbitrary binary keys in the base |
On Thu, Apr 30, 2015 at 03:57:26PM -0700, W. Trevor King wrote:
Just to be clear, the typing information would be set in Data (or a |
Control characters are plenty valid in path-like names, if you trust UNIX. So is non-UTF-8. There only illegal path segments, by definition of path as per UNIX, are ones with bytes 0x00 or '/'. If you want unixfs etc to enforce some other rules, please file an issue about that. I see no such code enforcing such a thing. (This may be an issue if you naively use Link.Name in JSON. Fun all around.) |
The idea of "standard Data entry" falls on its face, too. Either there is more than unixfs, or there isn't. If there is, the core layer cannot dictate what goes in Data. |
just to clarify, unixfs is just one of many formats that will be built upon the merkledag structure |
On Thu, Apr 30, 2015 at 06:14:03PM -0700, Jeromy Johnson wrote:
Sure, there are also plans for some version-control objects (commits, And I do think we need a field besides Links and Data for a type |
@wking Here's a non-UTF-8 Name for you: mount ipns and run touch $(printf '\xff') |
On Thu, Apr 30, 2015 at 09:04:02PM -0700, Tv wrote:
I'm fine not supporting that use case. We're not looking for full |
protobuf extensions were made for precisely this: https://developers.google.com/protocol-buffers/docs/proto#extensions |
protobuf extensions were removed (almost completely) in proto3. replaced by any type.
Yeah, the Any type is sort-of linked-data-ish. It's a bit clunky with the wrapping, but AFAIK, it's possible to just decode a protobuf "twice" (not really twice) with different schemas: message M1 {
bytes foo = 1
// start all extensions at 127
}
message M2 {
bytes foo = 1
bytes bar = 128
} which we could do with links. It's a bit annoying with standard protobuf parsing, which always decodes everything (instead of wrapping a buffer with an object with accessor methods to only decode what you need). On the original point of this issue:
we should stick to UTF-8 names. names are meant to be printed out for users in browsers and terminals. and there shouldn't be unescaped slashes in names, otherwise path resolution gets hard to reason about. perhaps we should open an issue about enforcing this. |
https://github.com/ipfs/go-ipfs/blob/cc5f6bb306430d146d3e5a3f7956073c10d0112b/merkledag/node.go#L45-L46
If they're UTF-8, they can't contain binary keys.
If binary is allowed, UI has to be careful to not print raw binary to terminal, web interface, etc, which it currently doesn't seem to be:
https://github.com/ipfs/go-ipfs/blob/cc5f6bb306430d146d3e5a3f7956073c10d0112b/core/commands/refs.go#L330
The text was updated successfully, but these errors were encountered: