Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: std::hash::Hash should ensure prefix-free data #89438

Merged
merged 3 commits into from
Oct 10, 2021

Conversation

pierwill
Copy link
Member

@pierwill pierwill commented Oct 1, 2021

Attempt to synthesize the discussion in #89429 into a suggestion regarding Hash implementations (not a hard requirement).

Closes #89429.

@rust-highfive
Copy link
Collaborator

r? @m-ou-se

(rust-highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Oct 1, 2021
@pierwill
Copy link
Member Author

pierwill commented Oct 1, 2021

@cuviper

/// ## Prefix collisions
///
/// Implementations of `hash` should ensure that the data they
/// pass to the `Hasher` are prefix-free. That is, different concatenations
Copy link

@tczajka tczajka Oct 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This explanation of what "prefix-free" means is incomplete. It should say that unequal values should cause two different byte sequences to be written, and neither of the two sequences should be a prefix of the other.

Note that it's not sufficient to say that concatenations of outputs of multiple values of the same type should result in different outputs. It has to be true when concatenated with outputs for other types as well (think about hashing (A, B)). That's where the prefix-free property comes in: the outputs will be different if all the types involved satisfy the prefix-free property.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @tczajka! I'm not sure I understand the idea of one sequence being a prefix of another. Does it simply mean "starts with", or is it another kind of relation? Is there a way we can rephrase this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another way to ask the question: in the example of ("ab", "c") and ("a", "bc") where and how would the "prefix" occur, and how does the extra byte prevent it?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A "prefix" is a beginning of a string, so it's same as "starts with". https://en.wikipedia.org/wiki/Prefix

If strings were hashed without the extra 0xff at the end, hashing ("ab", "c") and ("a", "bc") would write the same byte sequence "abc" to Hasher. The problem is that "a" is a prefix of "ab". Whereas "a\xff" is not a prefix of "ab\xff", so if Hash outputs these sequences instead that solves the problem. "ab\xffc\xff" != "a\xffbc\xff".

Copy link
Member

@cuviper cuviper Oct 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: \xff is not actually allowed in string literals, since it would be invalid UTF-8 -- which is also what makes it a useful separator here. You could really write those as byte strings though, b"ab\xffc\xff" != b"a\xffbc\xff".

@m-ou-se m-ou-se added S-waiting-on-team Status: Awaiting decision from the relevant subteam (see the T-<team> label). T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Oct 3, 2021
@@ -153,9 +153,21 @@ mod sip;
/// Thankfully, you won't need to worry about upholding this property when
/// deriving both [`Eq`] and `Hash` with `#[derive(PartialEq, Eq, Hash)]`.
///
/// ## Prefix collisions
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking more about this... "Collision" isn't the right term, here, is it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not but I can't think of a better word to use.

@m-ou-se m-ou-se assigned Amanieu and unassigned m-ou-se Oct 6, 2021
Co-authored-by: Amanieu d'Antras <amanieu@gmail.com>
@Amanieu
Copy link
Member

Amanieu commented Oct 10, 2021

@bors r+ rollup

@bors
Copy link
Contributor

bors commented Oct 10, 2021

📌 Commit 749194d has been approved by Amanieu

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-team Status: Awaiting decision from the relevant subteam (see the T-<team> label). labels Oct 10, 2021
@pierwill
Copy link
Member Author

@Amanieu Does this need to be rebased?

bors added a commit to rust-lang-ci/rust that referenced this pull request Oct 10, 2021
…askrgr

Rollup of 11 pull requests

Successful merges:

 - rust-lang#88374 (Fix documentation in Cell)
 - rust-lang#88713 (Improve docs for int_log)
 - rust-lang#89428 (Feature gate the non_exhaustive_omitted_patterns lint)
 - rust-lang#89438 (docs: `std::hash::Hash` should ensure prefix-free data)
 - rust-lang#89520 (Don't rebuild GUI test crates every time you run test src/test/rustdoc-gui)
 - rust-lang#89705 (Cfg hide no_global_oom_handling and no_fp_fmt_parse)
 - rust-lang#89713 (Fix ABNF of inline asm options)
 - rust-lang#89718 (Add #[must_use] to is_condition tests)
 - rust-lang#89719 (Add #[must_use] to char escape methods)
 - rust-lang#89720 (Add #[must_use] to math and bit manipulation methods)
 - rust-lang#89735 (Stabilize proc_macro::is_available)

Failed merges:

r? `@ghost`
`@rustbot` modify labels: rollup
@Amanieu
Copy link
Member

Amanieu commented Oct 10, 2021

No, it should be fine as it is.

@bors bors merged commit 06cfd0a into rust-lang:master Oct 10, 2021
@rustbot rustbot added this to the 1.57.0 milestone Oct 10, 2021
@pierwill pierwill deleted the prefix-free-hash branch October 19, 2021 14:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

std::hash::Hash documentation should suggest that the hash data should be prefix-free
8 participants