Skip to content

Commit

Permalink
Hasher and Hash updates
Browse files Browse the repository at this point in the history
- str now emits a delimiter of its own length
- str and [u8] hash the same
- Hasher::delimiter customizes how a delimiter is handled

Add method `fn delimit(&mut self, len: usize)` to Hasher.

This method makes the hasher emit a delimiter for a chunk of length
`len`. For example str and slices both emit a delimiter for their length
during hashing.

The Hasher impl decides how to implement the delimiter. By default it
emits the whole `usize` as data to the hashing stream.

SipHash will ignore the first delimiter and hash the others as data.
Since it hashes in the total length, hashing all but one delimiters is
equivalent to hashing all lengths.

For the next example, take something like farmhash that is not designed
for streaming hashing. It could be implemented like this:

- Every call to Hasher::write runs the whole hashing algorithm.
  Previous hash is xored together with the new result.
- Delimiters are ignored, since the length of each chunk to write is
  already hashed in.

It follows a sketch of how siphash and farmhash could work with this
change:

When hashing a: &[u8]

- SipHash: `write(a); finish();`
- Farmhash: `hash = write(a); hash`

Both SipHash and Farmhash will hash just the bytes of a string in
a single Hasher::write and a single Hasher::finish.

When hashing (a: &[u8], b: [u8]):

- SipHash: `write(a); write(b.len()); write(b); finish();`
- Farmhash: `hash = write(a); hash ^= write(b); hash`
  • Loading branch information
bluss committed Aug 27, 2015
1 parent 2375743 commit e8695f9
Show file tree
Hide file tree
Showing 3 changed files with 20 additions and 19 deletions.
12 changes: 10 additions & 2 deletions src/libcore/hash/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -118,6 +118,13 @@ pub trait Hasher {
#[stable(feature = "rust1", since = "1.0.0")]
fn write(&mut self, bytes: &[u8]);

/// Emit a delimiter for data of length `len`
#[inline]
#[unstable(feature = "hash_delimit", since = "1.4.0", issue="0")]
fn delimit(&mut self, len: usize) {
self.write_usize(len);
}

/// Write a single `u8` into this hasher
#[inline]
#[stable(feature = "hasher_write", since = "1.3.0")]
Expand Down Expand Up @@ -230,8 +237,9 @@ mod impls {
#[stable(feature = "rust1", since = "1.0.0")]
impl Hash for str {
fn hash<H: Hasher>(&self, state: &mut H) {
// See `[T]` impl for why we write the u8
state.delimit(self.len());
state.write(self.as_bytes());
state.write_u8(0xff)
}
}

Expand Down Expand Up @@ -272,7 +280,7 @@ mod impls {
#[stable(feature = "rust1", since = "1.0.0")]
impl<T: Hash> Hash for [T] {
fn hash<H: Hasher>(&self, state: &mut H) {
self.len().hash(state);
state.delimit(self.len());
Hash::hash_slice(self, state)
}
}
Expand Down
8 changes: 8 additions & 0 deletions src/libcore/hash/sip.rs
Original file line number Diff line number Diff line change
Expand Up @@ -192,6 +192,14 @@ impl Hasher for SipHasher {
self.write(msg)
}

#[inline]
fn delimit(&mut self, len: usize) {
// skip the first delimiter
if self.length > 0 {
self.write_usize(len);
}
}

#[inline]
fn finish(&self) -> u64 {
let mut v0 = self.v0;
Expand Down
19 changes: 2 additions & 17 deletions src/libstd/sys/common/wtf8.rs
Original file line number Diff line number Diff line change
Expand Up @@ -124,7 +124,7 @@ impl CodePoint {
///
/// Similar to `String`, but can additionally contain surrogate code points
/// if they’re not in a surrogate pair.
#[derive(Eq, PartialEq, Ord, PartialOrd, Clone)]
#[derive(Eq, PartialEq, Ord, PartialOrd, Clone, Hash)]
pub struct Wtf8Buf {
bytes: Vec<u8>
}
Expand Down Expand Up @@ -382,6 +382,7 @@ impl Extend<CodePoint> for Wtf8Buf {
///
/// Similar to `&str`, but can additionally contain surrogate code points
/// if they’re not in a surrogate pair.
#[derive(Hash)]
pub struct Wtf8 {
bytes: [u8]
}
Expand Down Expand Up @@ -796,22 +797,6 @@ impl Hash for CodePoint {
}
}

impl Hash for Wtf8Buf {
#[inline]
fn hash<H: Hasher>(&self, state: &mut H) {
state.write(&self.bytes);
0xfeu8.hash(state)
}
}

impl Hash for Wtf8 {
#[inline]
fn hash<H: Hasher>(&self, state: &mut H) {
state.write(&self.bytes);
0xfeu8.hash(state)
}
}

impl AsciiExt for Wtf8 {
type Owned = Wtf8Buf;

Expand Down

0 comments on commit e8695f9

Please sign in to comment.