Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create new Builder interface for creating CIDs. #53

Merged
merged 9 commits into from
Aug 10, 2018
Merged

Conversation

kevina
Copy link
Contributor

@kevina kevina commented Jun 29, 2018

This is my proposed alternative to #52

My making Format an interface we can easily extend it to provide additional functionality outside of the go-cid package. In addition the current Prefix implemented the Format interface so to take advantage of the added functional all we need to is is change the signature of functions that take a cid.Prefix to cid.Format.

I specifically did not add support for automatically creating identity hashes as that can be added later, and possible outside of go-cid. The code could look something like:

  struct FormatWithInlining {Base Format, Limit int}

  func WithInlinling(base Format) {
    return FormatWithInlining{Base: base, Limit: DefaultInlineLimit}
  }

  func (f FormatWithInlining) Sum(data []byte) (*Cid, error) {
    if len(data) <= f.Limit {/* use id hash*/}
    else {return f.Base.Sum(data)}
  }

At some point in the future we can remove NewPrefixV0, NewPrefixV1 and Prefix.Sum. We should also consider making the MhLength field an unsigned integer to disallow -1 as a value.

Some additional nodes: I find the field names MhType and MhLength crypt and think HashFun and HashLen are better and easier to understand names so I used those in this structure. If we are dead set on MhType and MhLength I can change it back.

@kevina kevina requested a review from Stebalien June 29, 2018 02:04
@ghost ghost assigned kevina Jun 29, 2018
@ghost ghost added the status/in-progress In progress label Jun 29, 2018
Stebalien
Stebalien previously approved these changes Jun 29, 2018
@Stebalien
Copy link
Member

At some point in the future we can remove NewPrefixV0, NewPrefixV1 and Prefix.Sum

I'd leave Prefix.Sum. Being able to treat a prefix as a format is useful. However, we'll need to be careful to ignore the hash length for ID hashes (may need to fix that?).

Some additional nodes: I find the field names MhType and MhLength crypt and think HashFun and HashLen are better and easier to understand names so I used those in this structure. If we are dead set on MhType and MhLength I can change it back.

I don't care much either way. However, I'd slightly prefer remaining consistent with Prefix.

format.go Outdated
HashLen int // 0 (or -1 for compatibility) means the default length
}

func PrefixToFormat(p Prefix) Format {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if we really need this given that all Prefixs are Formats.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea was that we might want to deprecate Prefix.Sum. This way we we can format MhLength from being -1, which is a source of confusion.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can just check that on Sum (after a deprecation period). That is, I'd like to move away from having MhLength == -1 (except for ID hashes, maybe). However, this is only an issue right now because the best way to create a CID is to construct a Prefix and then call Sum (and lazy programmers just set MhLength = -1).

Basically, Prefix.Sum isn't the problem (and is quite useful). It's Prefix.MhLength == -1.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would go one step further and not add an exception for ID hashes. That is a round trip from Prefix.Sum(data) and back to Prefix should create identical results. If you want to create an ID hash use the new FormatV1 struct.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My point is that I really don't see why we can't just make Prefix implement Format (well, Builder now). Internally, it would just do exactly what the function above is doing and then call Sum. It just feels like adding this extra step isn't really necessary.

format.go Outdated
type FormatV1 struct {
Codec uint64
HashFun uint64
HashLen int // 0 (or -1 for compatibility) means the default length
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd just say HashLen <= 0 means "default".

format.go Outdated
if mhLen == 0 {
mhLen = -1
}
hash, err := mh.Sum(data, p.HashFun, mhLen)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should check if we're using the ID hash and verify that the hash length is <= 0

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already basically have this check in mh.Sum.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough.

@Stebalien Stebalien dismissed their stale review June 29, 2018 02:18

(jumping the gun a bit)

@kevina
Copy link
Contributor Author

kevina commented Jun 29, 2018

Some additional nodes: I find the field names MhType and MhLength crypt and think HashFun and HashLen are better and easier to understand names so I used those in this structure. If we are dead set on MhType and MhLength I can change it back.

I don't care much either way. However, I'd slightly prefer remaining consistent with Prefix.

I find FormatV1{Codec: ..., HashFun: ...} clearer then FormatV1{Codec: ..., MhType: ...}, espacally to someone who doesn't know what Mh is an abbreviation for.

@Mr0grog any opinions on this, you where pretty vocal abort short names in ipfs/kubo#5056

@kevina
Copy link
Contributor Author

kevina commented Jun 29, 2018

@Stebalien I updated the depreciated notes so that only setting MhLength to -1 in Prefix is deprecated and not the Sum method. I also think we should consider changing MhLength to an unsigned integer at somepoint to prevent any confusion.

PrefixToFormat might still be useful in the case of identify hashes because I believe fairly strongly that the round trip from Prefix.Sum(data) and back to Prefix should always create the same prefix. This limits the usefulness of Prefix for identity hashes. However, if you think the presence of that function is confusing I can remove it.

@Mr0grog
Copy link

Mr0grog commented Jul 6, 2018

I find FormatV1{Codec: ..., HashFun: ...} clearer then FormatV1{Codec: ..., MhType: ...}@Mr0grog any opinions on this, you where pretty vocal abort short names

Ha! I’m not very familiar with this package at all, so take my commentary with a grain of salt. Knowing nothing else, I’d agree: HashFun and HashLen are clearer to me than MhType and MhLength (though I’m sure it will surprise nobody that I’d prefer HashFunction/HashLength or MultihashType/MultihashLength 😜).

BUT, I am sympathetic to @Stebalien’s point on consistency with Prefix. I’m just not entirely sure on the intended context or purpose of Format vs. Prefix (they both contain all the same info, just differently arranged and with different levels of explict-ness), so I’m not sure how to weigh that concern.

Either way, I think it would be helpful to clarify via comments that the field is intended to be a hash function from the multihash package. Since that package doesn’t export a special type we can mark things with and we just use a uint64 everywhere instead, it’s up to the comments/docs or the argument name to explain it (MhType does this if only someone knows what the Mh means).

(Sorry for the slow reply; just got back from a few days away.)

@Stebalien
Copy link
Member

BUT, I am sympathetic to @Stebalien’s point on consistency with Prefix. I’m just not entirely sure on the intended context or purpose of Format vs. Prefix (they both contain all the same info, just differently arranged and with different levels of explict-ness), so I’m not sure how to weigh that concern.

Prefix includes information on existing CIDs. Format is for creating new CIDs. We currently use Prefix for both but that leads to some odd cases (e.g, the user wants to specify "just use the default hash length" but a valid prefix technically needs to explicitly specify the hash length).

@Mr0grog
Copy link

Mr0grog commented Jul 6, 2018

We currently use Prefix for both but that leads to some odd cases

Ah, ok. Is this mainly so code that deals with it never has to check/modify the hash length the way Format.Sum() does? I guess I was looking at the code like:

metadata := cid.NewPrefixV1(cid.DagCBOR, mh.SHA2_256)
// or if Prefix-using functions could handle default length lookups when 0, or there was an accessor for hash length:
// metadata := cid.Prefix{Version: 1, Codec: cid.DagCBOR, MhType: mh.SHA2_256}
id := metadata.Sum(bytesOfContent)

// vs.

metadata := cid.FormatV1{Codec: cid.DagCBOR, HashFun: mh.SHA2_256}
id := metadta.Sum(bytesOfContent)

…which didn’t feel like it was getting us much, but I also don’t know what "additional functionality" we are heading towards. If there’s a lot on the table, I can see how all the empty/optional fields a struct affords that Go’s lack of optional arguments doesn’t would be useful.

Since these two structs are so similar, this definitely feels like it needs some documentation pointing the way (both text in the package overview saying you should use the Format struct to create a CID for some bytes vs. NewCidVx() if you already have a hash and some docs in format.go), and maybe an example in the test file. I don’t think I’ll be alone in not totally understanding this.

Anyway, on the original point: all that does make me feel like having the names match is pretty valuable in this case (unless we might be comfortable changing Prefix to use different field names). MhType and HashFun would be representing the exact same thing and for pretty much the same purpose in both places. That’s a good reason to keep the names consistent, even if I’m not a big fan of the names themselves :)

@Stebalien
Copy link
Member

…which didn’t feel like it was getting us much, but I also don’t know what "additional functionality" we are heading towards.

So, there were two concrete issues:

  • Decide how to handle -1 in Prefix #23
  • Additional options like when to inline the data into the CID (i.e., when to not use the specified hash function and to instead use the identity hash function).

Basically, Prefix is a valid Format that can produce CIDs that all have the same Prefix. Format allows us to define a single CID formatter that can spit out CIDs with different formats.


Actually, @kevina, that reminds me of something. This version hardcodes the codec and doesn't provide a way to change it. That could be a problem for raw nodes etc. I'm thinking we may want something that takes the Codec in the Sum function. That is, Sum(codec int, data []byte) (*cid.Cid, error). The format would then dictate all the "choices" (CIDv0 versus CIDv1, inline data, hash function, etc.) and could even choose these based on the codec. Thoughts? Have you taken a stab at using this interface with your inline CID patches?

@Mr0grog
Copy link

Mr0grog commented Jul 6, 2018

#23

Ah, thanks for this; it explains a good bit of background. Did I miss an obvious link somewhere here?

Format allows us to define a single CID formatter that can spit out CIDs with different formats.

Oh! I don’t think I would have expected that from the name.

@Stebalien
Copy link
Member

Oh! I don’t think I would have expected that from the name.

Any suggestions? We've been through quite a few names but haven't found one we're completely satisfied with.

@Mr0grog
Copy link

Mr0grog commented Jul 6, 2018

Hmm… Formatter, CidOptions, FormatOptions, Builder, CidBuilder (it strikes me that the point and purpose of this is not too far from the builder pattern)? Best I have at the moment.

@kevina
Copy link
Contributor Author

kevina commented Jul 25, 2018

Actually, @kevina, that reminds me of something. This version hardcodes the codec and doesn't provide a way to change it.

Your right. I added methods to do so and tested this now with go-ipfs.

@kevina
Copy link
Contributor Author

kevina commented Jul 25, 2018

I think Format is not such a nice name for this? @Stebalien is there a reason for this name. I think maybe Opts will be better (CidOpts would be redundant).

For one thing I think renaming Prefix to Format in field names within go-ipfs would be confusing, we could use CidFormat but I think naming this to Opts and the field CidOpts would be better. What do you think? Note, I am somewhat against the longer Options name as I find Opts a fairly easy to recognized abbreviation.

@kevina
Copy link
Contributor Author

kevina commented Aug 6, 2018

@Stebalien @Mr0grog after thinking about it I think Builder will be a better name. What do you think?

@Stebalien
Copy link
Member

Builder sounds like a great name!

@kevina kevina changed the title Create new Format interface for creating CIDs. WIP: Create new Format interface for creating CIDs. Aug 7, 2018
@Mr0grog
Copy link

Mr0grog commented Aug 7, 2018

Sounds good to me :)

@kevina kevina changed the title WIP: Create new Format interface for creating CIDs. Create new Builder interface for creating CIDs. Aug 9, 2018
Copy link
Member

@Stebalien Stebalien left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, I like this method. However, I have some concerns about PrefixToBuilder(p).Sum not behaving the same way as p.Sum.

Other than that, I'd like to move forward on this ASAP. Please bug me repeatedly on IRC whenever you're waiting for feedback.

builder.go Outdated
}
}

func (p Prefix) GetCodec() uint64 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If Prefix is going to implement Builder, PrefixToBuilder(p).Sum(...) should be functionally equivalent to p.Sum(...). My solution would be to just put the logic in PrefixToBuilder into Prefix.Sum but the alternative is to just call PrefixToBuilder in Sum (actually, that may just be the better solution).

return p.Codec
}

func (p Prefix) WithCodec(c uint64) Builder {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should comment that this will auto-upgrade to CIDv1 if necessary. Actually, I wonder if this should return a prefix or just return a builder (V1/V0).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that is intentional. I don't see any reason why this should return a builder V1/V0.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was just a thought. I don't really see any benefit one way or the other.

@kevina
Copy link
Contributor Author

kevina commented Aug 10, 2018

@Stebalien Prefix implements Builder right now. My thoughts are that at some point we might want to depreciate this behavior and then the PrefixToBuilder function may be useful. However, it if is causing you that much of a problem (or confusion) I can just remove PrefixToBuilder.

I do have one question though do you want to eventually depreciate Prefix.Sum?

@kevina
Copy link
Contributor Author

kevina commented Aug 10, 2018

@Stebalien

> However, I have some concerns about PrefixToBuilder(p).Sum not behaving the same way as p.Sum.

In what way?

@kevina
Copy link
Contributor Author

kevina commented Aug 10, 2018

I just removed PrefixToBuilder for now. It doesn't have a useful purpose as of now and just serves to cause confusion.

@kevina
Copy link
Contributor Author

kevina commented Aug 10, 2018

Okay. I clean up the code to remove all the controversial parts so this should be ready for review.

Before we merge this it needs to be tested with go-ipfs. I will update ipfs/kubo#5281 so that it complies with this p.r.

@Stebalien
Copy link
Member

I do have one question though do you want to eventually depreciate Prefix.Sum?

No. That is, I'd just treat Prefix as a valid builder.

Having the extra "convert to a builder" step just feels like, well, an extra step. It'll either introduce two ways to do the same thing, two things that look like they should do the same thing but don't, or a lot of code churn if we deprecate Prefix.Sum

Copy link
Member

@Stebalien Stebalien left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@kevina
Copy link
Contributor Author

kevina commented Aug 10, 2018

No. That is, I'd just treat Prefix as a valid builder.

I don't necessary agree with this, but that is something to revisit later.

@Stebalien
Copy link
Member

No. That is, I'd just treat Prefix as a valid builder.

I don't necessary agree with this, but that is something to revisit later.

It may not make sense in all cases however, in those where it doesn't, I expect we'll want something fancier than a simple PrefixToBuilder function. That is, we'll want a smarter builder that's better about picking the hash function.

Basically, I agree that Prefix is not a good builder however, I don't see the builder produced by PrefixToBuilder as being any better (at least in it's current incarnation).

@kevina
Copy link
Contributor Author

kevina commented Aug 10, 2018

Okay. I completed the integration testing so this should be good to go. Let's get #61 is also before we gx publish.

@kevina kevina merged commit 23f03cb into master Aug 10, 2018
@ghost ghost removed the status/in-progress In progress label Aug 10, 2018
@Stebalien Stebalien deleted the kevina/format branch August 10, 2018 23:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants