Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hardcoded assumptions that the hash will be encoded in hex #31

Closed
wking opened this issue Mar 7, 2017 · 7 comments · Fixed by #32
Closed

Hardcoded assumptions that the hash will be encoded in hex #31

wking opened this issue Mar 7, 2017 · 7 comments · Fixed by #32

Comments

@wking
Copy link
Contributor

wking commented Mar 7, 2017

The docs for Algorithm (now algorithm) made it clear that the algorithm identifier was intended to cover both the hash and encoding algorithms. @stevvooe confirmed this interpretation in recent comments as well. The idea is that a future algorithm may chose a non-hex encoding like base 64.

The current implementation, on the other hand, bakes the hex encoding into key locations (e.g. in NewDigestFromBytes and Digest.Validate). I suggest:

  • Defining an Encoding interface.
  • Adding an Algorithm.Encoding() Encoding method.
  • Adding an Algorithm.HashSize() int method.
  • Updating go-descriptor to use those instead of the currently-hard-coded hex assumptions.

I've floated an implementation in #30 if folks want to see how that works out.

Alternative solutions include giving up on alternatives and just requiring all hashes to be hex-encoded. Or some other API for pushing encoding information into the Algorithm instances.

Thoughts?

@stevvooe
Copy link
Contributor

stevvooe commented Mar 8, 2017

This is a known limitation of the package. We actually considered parameterizing the encoding with the algorithm itself, but found the variation not to be conducive to wider distribution.

@stevvooe confirmed this interpretation in recent comments as well. The idea is that a future algorithm may chose a non-hex encoding like base 64.

Please don't confuse my desire to reserve something for future implementation as an endorsement of running out and implementing something now.

Updating go-descriptor to use those instead of the currently-hard-coded hex assumptions.

What is this?

IMHO, the implementation in #30 goes a little further than it needs to (not to mention the massive compatibility breaks). In fact, there was early code (not sure if I checked it in), that attempted this. The biggest invariant that the PR misses is that the format, encoding and actual hash algorithm are all linked. For example, sha256 means hex-encoded sha256, prefixed with sha256. Any change in this direction needs to serve that invariant. A variant on encoding would be another "algorithm" from the point of view of the digest package. For example, sha256 with base64 encoding might be sha256+b64, however, the utility of this is really uncertain, in practice.

When you look at digest as an API format, variation isn't a feature. If we look at other approaches to this problem, such as multihash, and actual uses, the most important feature is that they are easy to calculate and easy to compare. When you vary the encoding for things that are "equivalent", you break those features. Anything more complicated than "put the bytes in, get the hash out" means people won't verify and leverage the properties of the system to make it more secure; their eyes glaze over and the exploit surface area gets bigger.

Typically, the need for choosing the encoding for a digest is centered around reducing the storage size. However, they all come with trade offs. The most obvious is base64, which can achieve about a 30% savings over hex but you sacrifice case invariance. base32 will give you back case invariance but don't quite get the same savings. I have experimented with base36, as well, which is compact, case insensitive and easy on the eye. However, none of these provide even close to the benefit of using the same hash. This is powerful, simple and overlooking that value is only for the unwise.

Under storage, this necessity isn't the case. If you are storing a large number of hashes (non-zero percentage of total data), I'd recommend encoding directly to binary. Conversion back to the hex encoding is cheap and efficient. I have considered adding binary unmarshaler/marshalers but have balked on the complexity and expense of maintaining format compatibility over different package versions.

@stevvooe
Copy link
Contributor

stevvooe commented Mar 8, 2017

TL; DR This is a feature, not a bug

@stevvooe stevvooe changed the title Hardcoded assumtions that the hash will be encoded in hex Hardcoded assumptions that the hash will be encoded in hex Mar 8, 2017
@wking
Copy link
Contributor Author

wking commented Mar 8, 2017 via email

@jonboulle
Copy link
Contributor

If this is a feature, can we at least make it an explicit feature

@stevvooe
Copy link
Contributor

stevvooe commented Mar 8, 2017

If this is a feature, can we at least make it an explicit feature

What do you propose?

It is already explicit and part of the format and design of this package. It has been validated in use shipping millions of container images.

jonboulle added a commit to jonboulle/go-digest that referenced this issue Mar 9, 2017
Also fixes a typo and adds one clarifying link in the README.

Fixes opencontainers#31
@jonboulle
Copy link
Contributor

Uh, I have no idea what the relevance of that is.

Posted my proposal here: #32

jonboulle added a commit to jonboulle/go-digest that referenced this issue Mar 9, 2017
Also fixes a typo and adds one clarifying link in the README.

Fixes opencontainers#31

Signed-off-by: Jonathan Boulle <jonathanboulle@gmail.com>
@wking
Copy link
Contributor Author

wking commented May 10, 2017

Do we want to revisit this (and #30 / #32) now that #33 has landed? Or are we sticking with “maintainers will reject attempts to remove this TODO and not support folks who would like to use this package with non-hex algorithms” for the time being?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants