Skip to content

Latest commit

 

History

History
223 lines (160 loc) · 10.3 KB

Equivalence-Equality.md

File metadata and controls

223 lines (160 loc) · 10.3 KB

Equivalence versus Equality

This page describes what we mean when we say that the data structures in this library are equivalence-aware in a type-safe fashion.

Set is a data structure that doesn't contain duplicate elements. An implementation of Set must therefore have a way to compare elements for "sameness". A useful notion of sameness is equivalence, i.e. a binary relation that is reflexive, symmetric and transitive. Any sane implementation of Set is equiped with some equivalence relation on its element type.

Here's the catch: For any type with more than 1 inhabitant there are multiple valid equivalence relations. We cannot (in general) pick one that is suitable in all contexts. For example, are these two binary trees same?

  +            +
 / \          / \
1   +        +   3
   / \      / \
  2   3    1   2

It depends on the context. They clearly have different structure, but they are both binary search trees containing the same elements. For a balancing algorithm, they are different trees, but as an implementation of Set, they represent the same set of integers.

Despite the non-uniqueness, there is one equivalence relation that stands out: equality. Two objects are considered equal when they are indistinguishable to an observer. Formally, equality is required to have the substitution property:

  • ∀ a,b ∈ A, ∀ f ∈ (A -> B):    a=Ab ⟹ f(a)=Bf(b)

(Here, =A denotes equality on A, =B denotes equality on B.) Equality is the finest equivalence: whenever two elements are equal, they are necessarily equivalent with respect to every equivalence.

Popular Scala libraries take one of these two approaches when dealing with comparing elements for "sameness": they require either

  1. Equality. This is the approach currently taken by cats. Instances of the cats.Eq[A] typeclass are required to have all the properties of equality, including the substitution property above. The problem with this approach is that for some types, such as Set[Int], equality is too strict to be useful:
  • Are values Set(1, 2) and Set(2, 1) equal? For that to be true, they have to be indistinguishable by any function. Let's try (_.toList):

    scala> Set(1, 2).toList == Set(2, 1).toList
    res0: Boolean = false

    So, Set(1, 2) and Set(2, 1) are clearly not equal. As a result, we cannot use Set[Int] in a context where equality is required (without cheating).

  1. Further unspecified equivalence. This is basically the current approach of scalaz. Although the name scalaz.Equal[A] might suggest equality, instances of this typeclass are only tested for properties of equivalence. As mentioned above, there are multiple valid equivalence relations for virtually any type. When there are also multiple useful equivalences for a type, we are at risk of mixing them up (and the fact that they are usually resolved as implicit arguments only makes things worse).

Equivalence-aware sets (a.k.a. setoids)

Let's look at how we deal with this issue. We define typeclass Equiv with an extra type parameter that serves as a "tag" identifying the meaning of the equivalence.

trait Equiv[A, Eq] {
  def equiv(a: A, b: A): Boolean
}

For the compiler, the "tag" is an opaque type. It only has specific meaning for humans. The only meaning it has for the compiler is that different tags represent (intensionally) different equivalence relations.

An equivalence-aware data structure then carries in its type the tag of the equivalence it uses.

import hasheq._
import hasheq.immutable._
import hasheq.std.int._
scala> HashSet(1, 2, 3, 4, 5)
res0: hasheq.immutable.HashSet[Int] = HashSetoid(5, 1, 2, 3, 4)

What on earth is HashSetoid? Setoid is an equivalence-aware set. HashSetoid is then just a setoid implementated using hash-table. Let's look at the definition of HashSet:

type HashSet[A] = HashSetoid[A, Equality.type]

So HashSet is just a HashSetoid whose equivalence is equality. To create an instance of HashSet[Int] above, we needed to have an implicit instance of Equiv[Int, Equality.type] in scope.

scala> implicitly[Equiv[Int, Equality.type]]
res1: hasheq.Equiv[Int,hasheq.Equality.type] = hasheq.std.int$$anon$1@2f5b9135

For the compiler, Equality is just a rather arbitrary singleton object. It only has the meaning of mathematical equality for us, humans.

There is a convenient type alias provided for equality relation:

type Equal[A] = Equiv[A, Equality.type]
scala> implicitly[Equal[Int]]
res2: hasheq.Equal[Int] = hasheq.std.int$$anon$1@7874c103

So how do we deal with the problem of set equality mentioned above, i.e. that HashSet(1, 2) and HashSet(2, 1) are not truly equal? We just don't provide a definition of equality for HashSet[Int].

scala> implicitly[Equal[HashSet[Int]]]
<console>:22: error: could not find implicit value for parameter e: hasheq.Equal[hasheq.immutable.HashSet[Int]]
       implicitly[Equal[HashSet[Int]]]
                 ^

But that means we cannot have a HashSet[HashSet[Int]]! (Remember, for a HashSet[A], we need an instance of Equal[A], and we just showed we don't have an instance of Equal[HashSet[Int]].)

scala> HashSet(HashSet(1, 2, 3, 4, 5))
<console>:22: error: could not find implicit value for parameter A: hasheq.Hash[hasheq.immutable.HashSet[Int]]
       HashSet(HashSet(1, 2, 3, 4, 5))
              ^

But we can have a HashSetoid[HashSet[Int], E], where E is some equivalence on HashSet[Int].

scala> HashSet.of(HashSet(1, 2, 3, 4, 5))
res5: hasheq.immutable.HashSetoid[hasheq.immutable.HashSet[Int],hasheq.immutable.Setoid.ContentEquiv[Int,hasheq.Equality.type]] = HashSetoid(HashSetoid(5, 1, 2, 3, 4))

HashSet.of(elems) is like HashSet(elems), except it tries to infer the equivalence on the element type, instead of requiring it to be equality.

Notice the equivalence tag: Setoid.ContentEquiv[Int, Equality.type]. Its meaning is (again, for humans only) that two setoids are equivalent when they contain the same elements (here, of type Int), as compared by the given equivalence of elements (here, Equality).

The remaining question is: How does this work in the presence of multiple useful equivalences?

Let's define another equivalence on Int (in addition to the provided equality).

// Our "tag" for equivalence modulo 10.
// This trait will never be instantiated.
sealed trait Mod10

// Provide equivalence tagged by Mod10.
implicit object EqMod10 extends Equiv[Int, Mod10] {
  def mod10(i: Int): Int = {
    val r = i % 10
    if (r < 0) r + 10
    else r
  }
  def equiv(a: Int, b: Int): Boolean = mod10(a) == mod10(b)
}

// Provide hash function compatible with equivalence modulo 10.
// Note that the HashEq typeclass is also tagged by Mod10.
implicit object HashMod10 extends HashEq[Int, Mod10] {
  def hash(a: Int): Int = EqMod10.mod10(a)
}

Now let's create a "setoid of sets of integers", as before.

scala> HashSet.of(HashSet(1, 2, 3, 4, 5))
res13: hasheq.immutable.HashSetoid[hasheq.immutable.HashSet[Int],hasheq.immutable.Setoid.ContentEquiv[Int,hasheq.Equality.type]] = HashSetoid(HashSetoid(5, 1, 2, 3, 4))

This still works, because HashSet requires an equality on Int, and there is only one in the implicit scope (the newly defined equivalence EqMod10 is not equality). Let's try to create a "setoid of setoids of integers":

scala> HashSet.of(HashSet.of(1, 2, 3, 4, 5))
<console>:25: error: ambiguous implicit values:
 both method hashInstance in object int of type => hasheq.Hash[Int]
 and object HashMod10 of type HashMod10.type
 match expected type hasheq.HashEq[Int,Eq]
       HashSet.of(HashSet.of(1, 2, 3, 4, 5))
                            ^

This fails, because there are now more equivalences on Int in scope. (There are now also multiple hash functions, which is what the error message actually says.) We need to be more specific:

scala> HashSet.of(HashSet.of[Int, Mod10](1, 2, 3, 4, 5))
res15: hasheq.immutable.HashSetoid[hasheq.immutable.HashSetoid[Int,Mod10],hasheq.immutable.Setoid.ContentEquiv[Int,Mod10]] = HashSetoid(HashSetoid(5, 1, 2, 3, 4))

Finally, does it prevent mixing up equivalences? Let's see:

scala> val s1 = HashSet(1,  2,  3,         11, 12, 13    )
s1: hasheq.immutable.HashSet[Int] = HashSetoid(1, 13, 2, 12, 3, 11)

scala> val s2 = HashSet(    2,  3,  4,  5,         13, 14)
s2: hasheq.immutable.HashSet[Int] = HashSetoid(5, 14, 13, 2, 3, 4)

scala> val t1 = HashSet.of[Int, Mod10](1,  2,  3,         11, 12, 13    )
t1: hasheq.immutable.HashSetoid[Int,Mod10] = HashSetoid(1, 2, 3)

scala> val t2 = HashSet.of[Int, Mod10](    2,  3,  4,  5,         13, 14)
t2: hasheq.immutable.HashSetoid[Int,Mod10] = HashSetoid(5, 2, 3, 4)

Combining compatible setoids:

scala> s1 union s2
res16: hasheq.immutable.HashSetoid[Int,hasheq.Equality.type] = HashSetoid(5, 14, 1, 13, 2, 12, 3, 11, 4)

scala> t1 union t2
res17: hasheq.immutable.HashSetoid[Int,Mod10] = HashSetoid(5, 1, 2, 3, 4)

Combining incompatible setoids:

scala> s1 union t2
<console>:27: error: type mismatch;
 found   : hasheq.immutable.HashSetoid[Int,Mod10]
 required: hasheq.immutable.HashSetoid[Int,hasheq.Equality.type]
       s1 union t2
                ^

scala> t1 union s2
<console>:27: error: type mismatch;
 found   : hasheq.immutable.HashSet[Int]
    (which expands to)  hasheq.immutable.HashSetoid[Int,hasheq.Equality.type]
 required: hasheq.immutable.HashSetoid[Int,Mod10]
       t1 union s2
                ^

Conclusion

We went one step further in the direction of type-safe equivalence in Scala compared to what is typically seen out in the wild today. There is nothing very sophisticated about this encoding. I think the major win is that we can design APIs so that the extra type parameter (the "equivalence tag") stays unnoticed by the user of the API as long as they only deal with equalities. As soon as the equivalence tag starts requesting our attention (via an ambiguous implicit or a type error), it is likely that the attention is justified.