HashingExplained

Hashing

Overview

Java's baked-in concept of hash codes is constrained to 32 bits, and provides no separation between hash algorithms and the data they act on, so alternate hash algorithms can't be easily substituted. Also, implementations of hashCode tend to be poor-quality, in part because they end up depending on other existing poor-quality hashCode implementations, including those in many JDK classes.

Object.hashCode implementations tend to be very fast, but have weak collision prevention and no expectation of bit dispersion. This leaves them perfectly suitable for use in hash tables, because extra collisions cause only a slight performance hit, while poor bit dispersion is easily corrected using a secondary hash function (which all reasonable hash table implementations in Java use). For the many uses of hash functions beyond simple hash tables, however, Object.hashCode almost always falls short -- hence com.google.common.hash.

Organization

Looking at the package Javadoc, we see a lot of different types, but it's not obvious how they fit together.

Let's look at a sample piece of code using this library.

HashFunction hf = Hashing.md5();
HashCode hc = hf.newHasher()
       .putLong(id)
       .putString(name, UTF_8)
       .putObject(person, personFunnel)
       .hash();

HashFunction

HashFunction is a pure, stateless function that maps an arbitrary block of data to a fixed number of bits, with the property that equal inputs always yield equal outputs, and unequal inputs yield unequal outputs as often as possible.

Hasher

A HashFunction can be asked for a stateful Hasher, which provides fluent syntax to add data to the hash and then retrieve the hash value. A Hasher can accept any primitive input, byte arrays, slices of byte arrays, character sequences, character sequences in some charset, and so on, or any other Object, provided with an appropriate Funnel.

Hasher implements the PrimitiveSink interface, which specifies a fluent API for an object that accepts a stream of primitive values.

Funnel

A Funnel describes how to decompose a particular object type into primitive field values. For example, if we had

class Person {
  final int id;
  final String firstName;
  final String lastName;
  final int birthYear;
}

our Funnel might look like

Funnel<Person> personFunnel = new Funnel<Person>() {
  @Override
  public void funnel(Person person, PrimitiveSink into) {
    into
        .putInt(person.id)
        .putString(person.firstName, UTF_8)
        .putString(person.lastName, UTF_8)
        .putInt(birthYear);
  }
};

Note: putString("abc", UTF_8).putString("def", UTF_8) is fully equivalent to putString("ab", UTF_8).putString("cdef", UTF_8), because they produce the same byte sequence. This can cause unintended hash collisions. Adding separators of some kind can help eliminate unintended hash collisions.

HashCode

Once a Hasher has been given all its input, its hash() method can be used to retrieve a HashCode. HashCode supports equality testing and such, as well as asInt(), asLong(), asBytes() methods, and additionally, writeBytesTo(array, offset, maxLength), which writes the first maxLength bytes of the hash into the array.

BloomFilter

Bloom filters are a lovely application of hashing that cannot be done simply using Object.hashCode(). Briefly, Bloom filters are a probabilistic data structure, allowing you to test if an object is definitely not in the filter, or was probably added to the Bloom filter. The Wikipedia page is fairly comprehensive, and we recommend this tutorial.

Our hashing library has a built-in Bloom filter implementation, which requires only that you implement a Funnel to decompose your type into primitive types. You can obtain a fresh BloomFilter<T> with create(Funnel funnel, int expectedInsertions, double falsePositiveProbability), or just accept the default false probability of 3%. BloomFilter<T> offers the methods boolean mightContain(T) and void put(T), which are self-explanatory enough.

BloomFilter<Person> friends = BloomFilter.create(personFunnel, 500, 0.01);
for (Person friend : friendsList) {
  friends.put(friend);
}
// much later
if (friends.mightContain(someone)) {
  // the probability that someone reached this place if they aren't a friend is
  // 1% we might, for example, start asynchronously loading things for someone
  // while we do a more expensive exact check
}

Hashing

The Hashing utility class provides a number of stock hash functions and utilities to operate on HashCode objects.

Provided Hash Functions

For a complete list, see the Hashing javadocs.

HashCode Operations

Method	Description
`HashCode combineOrdered(Iterable<HashCode>)`	Combines hash codes in an ordered fashion, so that if two hashes obtained from this method are the same, then it is likely that each was computed from the same hashes in the same order.
`HashCode combineUnordered(Iterable<HashCode>)`	Combines hash codes in an unordered fashion, so that if two hashes obtained from this method are the same, then it is likely that each was computed from the same hashes in some order.
`int consistentHash(HashCode, int buckets)`	Assigns the hash code a consistent "bucket" which minimizes the need for remapping as the number of buckets grows. See Wikipedia for details.

Introduction
Basic Utilities
- Using/avoiding null
  - Optional
- Preconditions
- Conditional Failures
- Ordering
- Object methods
- Throwables
Collections
- Immutable collections
- New collection types
  - Multiset
  - Multimap
  - BiMap
  - Table
  - ClassToInstanceMap
  - RangeSet
  - RangeMap
- Utility Classes
  - Iterables
  - Lists
  - Sets
  - Maps
  - Multisets
  - Multimaps
  - Tables
- Extension Utilities
Graphs
- Definitions
- Capabilities
- Graph types
  - Graph
  - ValueGraph
  - Network
- Building graph instances
  - Builder constraints vs. optimization hints
- Mutable and Immutable graphs
  - Mutable* types
  - Immutable* implementations
- Graph elements (nodes and edges)
- Library contracts and behaviors
- Notes for implementors
- Code examples
- FAQ
Caches
- Applicability
- Population
- Eviction
- Features
  - Statistics
- Interruption
Functional Idioms
- Obtaining
- Using Predicates
- Using Functions
Concurrency
- ListenableFuture
- Service
  - Using
  - Implementations
Strings
- Joiner
- Splitter
- CharMatcher
- Charsets
- CaseFormat
Networking
- InternetDomainName
Primitives
- Primitive arrays
- General utilities
- Byte conversion
- Unsigned support
Ranges
- Building
- Operations
- Discrete Domains
I/O
- Closing Resources
Hashing
- BloomFilter
EventBus
Math
- Integral
  - Overflow Checking
- Floating Point
Reflection
- TypeToken
- Invokable
- Dynamic Proxies
- ClassPath
Releases
- Release Policy
- Release 23
- Release 22
- Release 21
- Release 20
- Release 19
- Release 18
- Release 17
- Release 16
- Release 15
- Release 14
- Release 13
- Release 12
- Release 11
- Release 10
Tips
- Philosophy
- Building with Guava
- Shrinking JARs with ProGuard
- Translating from Apache Commons
- Guava and Compatibility
- Idea Graveyard
- Friends of Guava
- How to Contribute
Glossary
Mailing List
Stack Overflow
Android Overview
Footprint of JDK/Guava data structures

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HashingExplained

Hashing

Overview

Organization

HashFunction

Hasher

Funnel

HashCode

BloomFilter

Hashing

Provided Hash Functions

HashCode Operations

Clone this wiki locally