-
Notifications
You must be signed in to change notification settings - Fork 30
pinning index (as ipfs objects) and cdb discussion #4
Comments
@tv42 said:
|
@wking said
|
I'll think about this more and reply with something more solid tomorrow, but basic feedback:
|
also tagging (for keeping in the loop + surfacing thoughts if relevant): @whyrusleeping @cryptix @krl |
Without something in the If I put the pins in the Links.Hash, things change:
|
Thinking out lout: mph/cdb-style hash table in Data, result is |
On Fri, May 01, 2015 at 07:47:05AM -0700, Tv wrote:
Ah, right. So you'd want to trie the cypto hash and shard
I don't understand “scan the slice”, with sorted link lists
Just use an invalid multihash prefix for the depth-appropriate slice As an aside, I'm not sure how, given a byte string the might be a Anyhow, I'd be surprised if keeping everything in links is going to |
Can use different substrings of the hash. it is a cryptographic hash -- which makes it appear (to a bounded adversary, etc) uniformly distributed at any bit length. So grabbing different substrings helps. -- i think @wking suggests the same above.
unixfs dir's extend each of links in the data section. we could do this, or extend the link protobufs themselves (as has been discussed elsewhere). Agreed about not abusing the name. |
The problem with using the height as input to the hash function is that now debugging by staring at a single object is harder; I'd be inclined to put the level in the object, and at that point, I might as well put in something else. And without a randomized hash, the data structure isn't very generic; it can't be safely & robustly used to store non-cryptohash, user-chosen, keys. |
On Wed, May 06, 2015 at 02:11:08PM -0700, Tv wrote:
How frequently will you need to do this? It doesn't seem all that
If we're looking at a generic fanout index structure, then this is a I dunno. I think it's hard to get efficient fanout and name-based |
Next draft. I did not change the hash function yet, but now that the keys are actually stored in Links.Hash they have to be IPFS Keys anyway, so assuming uniformness is more acceptable. Set/Multiset of Keys as IPFS objectIPFS objects have schema We implement a Set and Multiset of IPFS objects, using IPFS For our purposes, we divide Links into two groups:
The object always has at least For Set: Keys in items are distinct. As a write optimization, a For Multiset: items may contain the same key multiple times, and The object Data consists of two sections:
The protobuf message is described by message Set {
// 0 for now, library will refuse to handle entries with version greater than recognized.
required uint32 version = 1;
// how many of the links are subtrees
required uint32 fanout = 2;
// hash seed for subtree selection, a random number
required fixed32 seed = 3;
} RefcountsTo be useful in an array, we need fixed size refcounts. However, there Version 0 of the format uses
|
On Fri, May 08, 2015 at 09:58:31AM -0700, Tv wrote:
I'm liking this more :). Hopefully it's performant enough that we
Yeah, this is a minor issue for me too. Especially before we have
I'm fine with this, and it's the traditional approach. But for the
On the balance, I think the pros outweigh the cons, especially for
‘% n’ should be ‘% fanout’.
Works for me. I don't mind who pays the subtree-drilling cost. One
This seems like a useful thing to do during the garbage-collection |
FNV-1a of an IPFS key takes about half the time of a single RAM cache miss.
That pretty quickly leads to a world where objects know their own type, and at that point that shouldn't live in Data, but alongside Links and Data. I'm not quite convinced it's worth storing that in every object. In any case, this is easy to convert to that, once the infrastructure exists. For more inspirational reading, see Ethos ETypes.
Yes. Thank you.
It's a merkle tree, all atomicity comes from flipping the root hash, nothing else is possible. |
On Fri, May 08, 2015 at 11:56:54AM -0700, Tv wrote:
In that case we might as well use it, since the person writing the
I think the “where should this type information live” is what's
Well, modulo some version-dependent migration, since it won't fit into |
@wking I can still change the "current version" to 1, new versions can leave that tag unused in the protobuf, it'll get the value 0. |
On Fri, May 08, 2015 at 05:02:35PM -0700, Tv wrote:
Oh, right :p. I think that covers all my concerns with the v2 spec :). |
I like v2 very much!! 👍 👍 👍 cool construction I've some questions which may be obviously stupid: 1, a few times it is mentioned that the key may be stored multiple times. why not store the keys once, and use an array of numbers, like: links: [ l1, l2, l2, l3, l3 ]
data: { ... }
--- # vs
links: [ l1, l2, l3 ]
data: { counts: [ 1, 2, 2 ], ... } 2, it is possible to also use the names to signal links: [ { hash: k1, name: "fanout", ... }, ... ]
data: { ... } 3, we could bite the bullet and figure out how the hell to make links nicely extensible. (for example, maybe we can designate proto tag numbers above 50 to be for the user. so could make the link: message SetLink {
// standard merkledag part
bytes Hash = 1;
string Name = 2;
uint64 Tsize = 3;
// ours
bool fanout = 51;
uint64 count = 52;
} It may be a bit tricky to parse out, but we may be able to make a nice set of interfaces (a bit capnp inspired) which expose things like: // in merkledag
type Link struct {}
func (ln *Link) Data() []byte {
return ln.rawData
}
// in SetLink
func CastLink(ln *dag.Link) *SetLink {
return NewSetLink(ln.Data())
}
func NewSetLink(data []byte) *SetLink {
// validates data, check it's a conforming protobuf
setLinkCheckProtobuf(data)
return &SetLink{data}
} (sorry, dont mean to co-opt this thread. we can discuss this complicated question elsewhere. Oh and, maybe I'm stupid but so far I find the I aree with @wking here:
I think we
As described in ipfs/ipfs#36,
or
The importance of doing this in the links is that:
I'll read on Ethos ETypes, thanks for posting. may want to check out PADS too (links in ipfs/ipfs#36) -- that's a fantastic piece of work-- even though it's more Type-Theory specific, and takes a long grokking time before the benefits are apparent. And to my knowlede there's no developer-sensible approaches yet. (i.e. no "JSON-LD" to the RDF.)
I still don't understand why, if we have uniformly distributed data, we need to rehash it? what am i missing? Like -- depending on how big the seed is, and whether it's properly random -- why can't we just:
where
Yeah, sounds good to me. go for it :) -- just, i don't understand why it's needed yet. |
anyway all this is minor, this SGTM |
On Sat, May 09, 2015 at 03:52:47AM -0700, Juan Batiz-Benet wrote:
With the current design, the Links indexes map directly to indexes in
Yes, it is. But I didn't want to repeat the word "fanout" about 256 Also, if the fanout number is not easily accessible, we need to walk
Plenty of things there that could be done. It's worth noting that protobuf as a whole is switching away from A top-level (sibling of Links and Data) protobuf Any would let us Venti solves a similar problem by using two streams for one kind of Now, I'd rather get this thing out there in a simple form, first, Let's please make a separate ticket for "extensible objects"? If and
That part that's hurting this particular use there is that if Links That also conflicts with the idea that Links are unixfs filenames. You I feel like the idea has merit, but slapping everything into Links is This conversation doesn't really belong in this ticket. Only one thing
Because it's hard to convince me the data is always uniformly If I publish an object, it contains links that I have carefully (While doing this work, I'm starting to have opinions about how the :(){ :|:&};: |
Yeah, that sounds good to me 👍
Ah! that's a good point. yep, we do need to hash 👍 |
Final version, needs to be archived somewhere, probably in this repo. Note that it's not specific to pinning. Set/Multiset of Keys as IPFS objectIPFS objects have schema We implement a Set and Multiset of IPFS objects, using IPFS For our purposes, we divide Links into two groups:
The object always has at least For Set: Keys in items are distinct. As a write optimization, a For Multiset: items may contain the same key multiple times, and The object Data consists of two sections:
The protobuf message is described by message Set {
// 1 for now, library will refuse to handle entries with version different from what's recognized.
required uint32 version = 1;
// how many of the links are subtrees
required uint32 fanout = 2;
// hash seed for subtree selection, a random number
required fixed32 seed = 3;
} RefcountsTo be useful in an array, we need fixed size refcounts. However, there Version 1 of the format uses
|
Is there anything implemented related to this? Is there a way to store key-value pair and retrieve/lookup the value by key? |
this is a transplant from: https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!msg/ipfs-users/BI-P0H41-2E/d4Gd0akbYPoJ
The text was updated successfully, but these errors were encountered: