Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cid implementation variations++ #72

Merged
merged 1 commit into from
Aug 30, 2018
Merged

Conversation

warpfork
Copy link
Member

Enumerating another option -- "Option F" -- for type struct{string}.

I originally got nerdsniped into looking at this because of a comment elsewhere; it's also what the tip of #71 is doing. Definitely deserves to be uniquely identified as an option.

I didn't bother to write a whole new implementation in this package of this thing, because it would be almost identical to the string typedef impl... but the exploratory inspection code I came up with in the playground is easy to keep in the lab notebook, so here it is.

It's also viable.  Options list expanded.  (And regretting my ordering of it now.  Wish I'd thought of this one and realized it's distinct earlier.)
@ghost ghost assigned warpfork Aug 27, 2018
@ghost ghost added the status/in-progress In progress label Aug 27, 2018

Option F (a struct wrapper) means we can prevent casting into our Cid type.
Option B (typedef string) can be declared a `const`.
Are there any other concerns that would separate the two choices?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The struct wrapper allows us to add additional metadata/offsets later if we want.

Copy link
Member Author

@warpfork warpfork Aug 27, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's "Option D" or "Option E" though, respectively.

Tbh, I think basically all of these are equal in terms of how easy they are to switch to any of the others, later. Switching from typedef-string to struct later would be a "breaking change" in terms of how a linker things about the raw assembly... but it will not be a breaking change syntactically‡, and that's what matters to us as humans estimating refactor costs.

[‡] Presuming we make all accesses outside the cid package itself via public functions, and define a cid.Zero. Which would be the plan, I think. The only thing that makes this first big refactor a pain is the syntactic change of removing the * and fixing all the == nil occurances.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

People would still compare against "" and cast to a string/bytes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The former should be rejected in code review based on the existence of cid.Zero and introduced universally when we do the refactor; the latter would be a bug.

If we're seriously worried about these, then we should 100% hands-down prefer Option F over B, because more than precisely zero instances of casting to Cid from outside the cid package would strike me as acceptable code quality.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The former should be rejected in code review based on the existence of cid.Zero and introduced universally when we do the refactor; the latter would be a bug.

That's not an assumption we can make, generally. Users (not in our projects) will cast. It's really the primary reason to use a type alias over a struct.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the choice of the constant I am going with Nil. My second choice would be Empty, and Zero the third. Nil to me is neutral and also implies an invalid value, of which an empty Cid{} struct is (it can't be used for anything). Empty to me implies a valid but empty container, but I am okay with it if Nil is causing that much of a problem. Zero to me implies a numeric value, of which a Cid is not. I understand the notion of a zero-value in compiler terminology, but Zero still strikes me as an odd choice, ZeroValue may be a bit better, but if Nil is no good I still rather go with Empty.

I didn't realize that using struct initializers in the middle of equality was so verbose. However, it was never my intent to use them in equality. Instead I provided an IsNil method as c.IsNil() is nicer than c == cid.Nil or c == (cid.Cid{}). However elected to still provide var Nil = Cid{} so it can be used if desired.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're going to make speculations about whether comparison to a var is going to be a performance issue, then shouldn't proposing an equality method have at least as much burden to prove itself a non-issue before we consider that a confirmed design choice?

I disagree completely about naming a variable of non-pointer type "Nil". That is extremely unidiomatic Go because it conflicts with the language's own definitions of the term, and in discussion elsewhere I've already linked the part of the language spec which defines zero versus nil. But this PR is probably not the topical place to relitigate this naming, either, seeing as how the term is not used anywhere in the diff body we're discussing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm strongly in favor of Nil as a constant. While Zero is technically correct, it's confusing.

However, as a method, I'd be fine with IsEmpty (or IsZero but I'd rather not).

As for the method, golang is pretty good about inlining one-line methods so it should be fine. After doing a bit of decompiling, it looks like we should go with:

func (c Cid) IsNil() bool {
    return c.string == ""
}

c == (Cid{}) seems to cost us an extra xor zeroing something out. c == Nil costs us a mov.

See: https://go.godbolt.org/z/CHyhZl

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Stebalien the IsNilString one looks wrong. The compiler is getting really smart there and looks like its comparing the empty string to itself, and calling that the output.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(discussed on slack)

The assembly was correct. The generated instruction was testq AX, AX which tests AX & AX to see if it's non-nil.


Option F (a struct wrapper) means we can prevent casting into our Cid type.
Option B (typedef string) can be declared a `const`.
Are there any other concerns that would separate the two choices?
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going to speculate about if one would be easier/faster for refmt to munge than the other, because struct-containing-string seems like it going to require at least a few more reflection calls...

But on second thought that probably doesn't matter at all; CIDs are going to be using transform funcs either way because the serial type is either bytes (which is a different reflect.Kind than string, so needs transform) or base-$n$ encoded string (which... needs to be de-base-$n$'d, so that's a transform). So it's the same.

Are there any other things that would make B/F distinct? 🤔

@warpfork
Copy link
Member Author

I think this has generated some good discussion, and we could merge it now.

In review: The diff body improves the README by enumerating a new option; it mentions the casting and const issues; and it provides some evidence of memory layout impact. Thus, it's strictly better documentation and more information than before.

Since we don't seem to be coming up with any more distinctions between B/F, and just thinking out loud about how we balance them, I think we can move on. As with the prior PR, this is all contained in a directory which is research content only and affects no APIs, so there's no splash radius to consider before merging.

@Stebalien Stebalien merged commit 1766ab0 into master Aug 30, 2018
@ghost ghost removed the status/in-progress In progress label Aug 30, 2018
@warpfork warpfork deleted the rsrch-cid-as-struct-wrapped-str branch August 30, 2018 09:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants