-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: incremental-hasher #261
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - I left some suggestions.
src/hashes/interface.ts
Outdated
digest(): Digest | ||
|
||
/** | ||
* Computes the digest of the given input and writes it into the provided |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
of the given input
Do you mean "of the bytes written so far"?
src/hashes/interface.ts
Outdated
* can be use to control whether multihash prefix is written, if `false` | ||
* only the raw digest writtend omitting the prefix. | ||
*/ | ||
digestInto(output: Uint8Array, offset?: number, asMultihash?: boolean): this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Return output
? Not sure how useful it is to chain this method.
digestInto(output: Uint8Array, offset?: number, asMultihash?: boolean): this | |
digestInto(output: Uint8Array, offset?: number, asMultihash?: boolean): Uint8Array |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if there's a better convention for method name when receiving a parameter to mutate. digestInto
is a bit awkwardly named (IMO!).
digestBYOB
? 🙄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
asMultihash
- do we need? The point of this library is multiformats.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went into this in the linked issue. In practice I have encountered many instances where I do need to leave out the prefix, I could go and use another library in those instances, but seems like we could just expose this. That sayid I agree that this method is kind of meh and we could do better.
@vasco-santos suggested having a whole another method, personally I'm wondering if perhaps we should have methods just to get digest without a prefix as this is low level API anyway ?
digestInto(output: Uint8Array, offset?: number, asMultihash?: boolean): this | |
// multihash prefix for it | |
header: Uint8Array | |
// only writes the digest without a prefix | |
digestInto(output: Uint8Array, offset?: number): this |
Or alternatively something like this
digestInto(output: Uint8Array, offset?: number, asMultihash?: boolean): this | |
encodeMultihashInto(target: Uint8Array, offset?: number): this | |
encodeMultihashHeaderInto(target: Uint8Array, offset?: number): this | |
encodeDigestInto(target: Uint8Array, offset?: number): thsi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
digestInto
is a bit awkwardly named (IMO!).
I'm happy to call this something, else I was trying to align with varint.encodeInto
, which as it turns out was called varint.encodeTo
😅
digestBYOB? 🙄
Works for me although I had to google to figure out what BYOB
stand for.
Alternatively we could just have read(bytes: Uint8Array, offfset?: number): this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Return output? Not sure how useful it is to chain this method.
I personally find mutations that return values misleading, that said I'm amendable to the idea
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had to google to figure out what BYOB stand for.
BYOB was aiming for familiarity with https://developer.mozilla.org/en-US/docs/Web/API/ReadableStreamBYOBReader etc. but I guess that was a miss 😆
/** | ||
* Number of bytes that were consumed. | ||
*/ | ||
count(): bigint |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean if someone is hashing >9PiB of data in JS then 👏👏👏.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah ... is this overkill?
src/hashes/interface.ts
Outdated
digest(): Digest | ||
|
||
/** | ||
* Computes the digest of the given input and writes it into the provided |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Computes the digest of the given input and writes it into the provided | |
* Computes the digest of the bytes written so far and writes it into the provided |
I'm realizing now that interface here is intentionally non-destructive, as in I could compute digest over and over. Unfortunately node crypto APIs are destructive though
Instead node provides Given this the case with node, proposed API seems impractical, perhaps instead we could also introduce same constraint and On the other hand copy on digest maybe negligible overhead, in which case API without copy would be nicer. |
/** | ||
* Writes bytes to be digested. | ||
*/ | ||
write(bytes: Uint8Array): this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typically in streaming hashers this is called update
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with calling it update although I do find that name confusing personally as I think of update
as overwrite
as opposed to append
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, but streaming hashers aren't appending to a buffer they are updating their internal state with the new data you pass.
What about |
I'd really like this to land! I find myself wanting to do this more and more... |
* | ||
* @param [offset=0] - Byte offset in the `target`. | ||
*/ | ||
readDigest(target: Uint8Array, offset?: number): this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you describe the use-case for this? it seems like this makes it an onerous API to have to implement
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, my mistake, this is the output function!
I think maybe the naming could be better here. We have ample precedent of digest()
in JS-land, so we could have digest()
and multihash()
(or multihashDigest()
if you want to be more explicit). In Go-land Sum()
is the standard for this action, which has grown on me to make sense (though it's taken time!).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I also see I'm discussing history here - read*
being the new versions? I'm not a fan. I also wonder whether we could have nicer APIs that don't require you to pass in a target
? I understand that's an important part of this, for efficiency, but casual use typically just wants that done for you. So could the APIs take a target?
instead and always return Uint8Array
? So you can either choose to supply the bytes to write in to (with optional offset) or not supply one, but either way you get back some bytes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
read* being the new versions? I'm not a fan.
I mean if you think of it as a transform stream, it makes sense to have write and read ops. I don't mind renaming it to something else, but please don't make me come up with a name that everyone will like.
I also wonder whether we could have nicer APIs that don't require you to pass in a target? I understand that's an important part of this, for efficiency, but casual use typically just wants that done for you. So could the APIs take a target? instead and always return Uint8Array? So you can either choose to supply the bytes to write in to (with optional offset) or not supply one, but either way you get back some bytes.
I'm not completely opposed to returning back the target, however I would caution against it as it mixes two very different modes into one and can also lead to mistakes (e.g. you may have passed undefined reference which will no through but happily give you back Uint8Array)
Idea was that if you want to compute digest you just call digest
method and use this only in those rare cases when you need to work with slabs of memory.
For what it's worth Rust Multihash hasher has a similar interface to one proposed here /// Trait implemented by a hash function implementation.
pub trait Hasher {
/// Consume input and update internal state.
fn update(&mut self, input: &[u8]);
/// Returns the final digest.
fn finalize(&mut self) -> &[u8];
/// Reset the internal hasher state.
fn reset(&mut self);
} |
/** | ||
* Number of bytes that were consumed. | ||
*/ | ||
count(): bigint |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/** | |
* Number of bytes that were consumed. | |
*/ | |
count(): bigint |
Let's just drop this method, we can revisit if we find it really necessary.
Proposal for the #260