-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove Identity hash from table, add helper function and examples of alternatives #201
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this nice set of examples.
Would you then suggest that the upgrade path for users of the identity
feature is, that they should use their own codec table?
Yes, having a default can only cover one or maybe two use cases when it would be better for lib users to decide how they want to handle it. Adding a helper function might be helpful, but again it depends on how they want to handle excess data and should be implemented by the user. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update the modules top-level documentation. It still mentions the identity hash. There should be some information about the identity hash on how to use it if you need it.
@mxinden I'd like to get your approval before this one gets merged. As libp2p is one of the users of the identity hash. |
Thanks for the ping @vmx. I am not quite sure I understand the consequences of this change for rust-libp2p. Do I understand correctly that we would need to special-case the identity hash parsing from a |
Yes that's correct. You'd also need to special case here: Alternatively you could implement your own |
That sounds reasonable to me. Thanks @vmx. |
@mriise : are you able to incorporate comments here so this can be merged? |
@mriise : are you able to incorporate comments here, or should we close? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for removing footguns.
// The following hashes are not cryptographically secure hashes and are not enabled by default | ||
/// Identity hash (max. 64 bytes) | ||
#[cfg(feature = "identity")] | ||
#[mh(code = 0x00, hasher = crate::IdentityHasher::<64>)] | ||
Identity, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of directly removing it, may I suggest the use of #[deprecated]
to make this a non-breaking change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not possible, unfortunately. The footgun is that one might accidentally construct such a hasher, attempt to hash some data, then panic. This won't hit any "deprecated" warnings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure I understand. If the user uses this code, adding #[deprecated]
to it will flag it right?
We can point the user to anything in the deprecation note, including explaining the footgun.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear, I am suggesting to add #[deprecated]
to the enum variant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The foot-gun is Code::try_from(number).digest(...)
.
- If the identity feature gets enabled (e.g., by some other crate in the dependency tree).
- If
number
happens to be 0.
This will panic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(assuming that the input buffer ...
is too long)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, now I get you. Thanks for explaining!
We can still add code that will automatically trigger a deprecation warning once the identity
feature gets enabled. I did similar stuff in rust-libp2p
.
Can push a PoC in a bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #260.
/// Code reserved for Identity "hash" | ||
pub const IDENTITY_CODE: u64 = 0x0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't necessarily need to be public, can we make it crate-private?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
its still useful imo, e.g. Multihash::wrap(multihash::IDENTITY_CODE, &bytes)
or doing a match for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that. I am just wary about extending the public API with new items that are not essential. If we want to discourage the use of the identity hasher, it seems reasonable to have users declare that constant by themselves if they actually want to use it.
pub fn identity_hash<const S: usize>(data: impl AsRef<[u8]>) -> Result<MultihashGeneric<S>> { | ||
MultihashGeneric::wrap(IDENTITY_CODE, data.as_ref()) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to offer this as Multihash::identity
instead?
pub const IDENTITY_CODE: u64 = 0x0; | ||
// blind copy of the input array | ||
pub fn identity_hash_arr<const S: usize>(input: &[u8; S]) -> MultihashGeneric<S> { | ||
MultihashGeneric::wrap(IDENTITY_CODE, input).unwrap() | ||
} | ||
|
||
// input is truncated to S size | ||
pub fn identity_hash<const S: usize>(input: &[u8]) -> MultihashGeneric<S> { | ||
let mut hasher = IdentityTrunk::<S>::default(); | ||
hasher.update(input); | ||
MultihashGeneric::wrap(IDENTITY_CODE, hasher.finalize()).unwrap() | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So which one is the recommended way of doing it? I am not sure it make sense to add an example where the user is again left with two choices.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem we have here is that these are both "valid" technically, the "what to do with too small of a buffer" is entirely based around the requirements of the system you are writing for. Most of the time I would go with identity_hash_arr since that forces users to fit data into a certain size before the library gets it.
These are examples anyway so just using the function defined in in lib.rs will be fine.
@@ -27,6 +27,7 @@ serde-codec = ["serde", "serde-big-array"] | |||
|
|||
blake2b = ["blake2b_simd"] | |||
blake2s = ["blake2s_simd"] | |||
# identity feature is depricated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# identity feature is depricated | |
# identity feature is deprecated |
//! - `sha1`: Enable SHA-1 hasher | ||
//! - `sha2`: (default) Enable SHA-2 hashers | ||
//! - `sha3`: (default) Enable SHA-3 hashers | ||
//! - `strobe`: Enable Strobe hashers | ||
//! - `identity`: A depricated feature for identity hashes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
//! - `identity`: A depricated feature for identity hashes. | |
//! - `identity`: A deprecated feature for identity hashes. |
// The following hashes are not cryptographically secure hashes and are not enabled by default | ||
/// Identity hash (max. 64 bytes) | ||
#[cfg(feature = "identity")] | ||
#[mh(code = 0x00, hasher = crate::IdentityHasher::<64>)] | ||
Identity, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure I understand. If the user uses this code, adding #[deprecated]
to it will flag it right?
We can point the user to anything in the deprecation note, including explaining the footgun.
/// Will error if `data.len() > S` | ||
/// | ||
/// See examples for a few other approaches if this doesn't fit your application | ||
pub fn identity_hash<const S: usize>(data: impl AsRef<[u8]>) -> Result<MultihashGeneric<S>> { | ||
MultihashGeneric::wrap(IDENTITY_CODE, data.as_ref()) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems inconsistent to me to provide only this here when we are showing two ways to do it in the example.
Why have this at all? Seems trivial to implement so I am not sure what we are gaining here. At the same time, we are adding an item to our public API which in the future might cause a breaking change.
// The following hashes are not cryptographically secure hashes and are not enabled by default | ||
/// Identity hash (max. 64 bytes) | ||
#[cfg(feature = "identity")] | ||
#[mh(code = 0x00, hasher = crate::IdentityHasher::<64>)] | ||
Identity, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear, I am suggesting to add #[deprecated]
to the enum variant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the sake of making this a non-breaking change and working towards minimizing them in the future, can I ask that we:
- Not remove the
Identity
variant, but add#[deprecated]
to it explicitly. - Not add any new public API items. What we add here seems trivial to implement outside of the crate. It is my understanding that the
IdentityHasher
per se is not bad, users just need to be careful when using it. Having them write a bit more code seems like a code way of getting them to think what they are doing there. - Apply Issue deprecation warning on
Code
ifidentity
feature is enabled #260 to warn users about the footgun in combination with theidentity
feature.
Thanks for considering :)
I think that API comes from my request, when I said there needs to be a proper replacement for people that currently use it. I wanted a clean upgrade path. Though it can also be a well defined function in the (now existing ;) changelog, it doesn't have to be a public API (especially if we move the code table to a separate crate as it currently is discussed at #259). |
I think a recommendation on how to deal with the identity hash is a good idea. I have no strong opinion whether it is an example, rustdoc comment, test, changelog entry or something else :) |
There is #289 now which is up-to-date with latest master. |
As #289 was merged, I'll close this one. |
related to #196
closes #194
In order to allow for easier transition for downstream users, this adds a few ideas how to implement the identity hasher.
Undecided yet is about what to do with non-const sized identity hashers. Some users may be ok with allocating in order to have larger identity hashes. This is a fairly far edge-case which can be covered, but is debatable whether it should actually be actively supported by us.
(update): We won't be supporting alloc-dependent hashes in this crate for now unless significant desire exists to add them.
Another thing introduced is thebytes
crate which can do fun zero-copy things. This part was mostly for me to become more familiar with the crate.