-
-
Notifications
You must be signed in to change notification settings - Fork 424
What are Discriminated Unions?
Consider that all types in programming languages represent sets of possible values. For example, bool
is a set that has two possible values: true
and false
; uint
represents the set of natural numbers from 0
to UInt32.MaxValue
.
And so, if types represent sets, then what happens if we create a union of sets?
Let's say we have the following enum:
public enum Colour
{
Red,
Green,
Blue
}
And then create a union of Colour
and bool
. Then the set of possible values are:
true
false
Red
Green
Blue
But what happens if we union create a union between bool
and bool
? Well, we just get:
true
false
Because duplicate items become one in a union operation. However, if we tag each set (each type) with an additional unique value then we get what's called a Discriminated Union (also known as: tagged union, variant, variant record, choice-type, disjoint union, sum-type or co-product). The easiest way to show that would be to use a tuple, with the first element being the tag and the second element being the value of the type. So, if we try to create a union of Colour
and bool
then we get:
(1, true)
(1, false)
(2, Red)
(2, Green)
(2, Blue)
The integers being the tag in this case.
If we try to union bool
and bool
, we get:
(1, true)
(1, false)
(2, true)
(2, false)
They are discriminated by their 'tag'.
In most functional-first programming languages discriminated-unions are a first-class concept. This is an example from Haskell:
data Maybe a = Just a
| Nothing
Here Just
and Nothing
are type constructing functions that construct a Maybe a
(like Maybe<A>
in C#), but we can think of them as the tag for each set of values the various cases represent.
data Example = First Bool
| Second Bool
This simple example above would look like this in the tuple form shown earlier in this article:
(First, True)
(First, False)
(Second, True)
(Second, False)
The total number of values that are represented by a discriminated union is the sum of all the possible values in the individual cases. Which is why they are often referred to as sum types.
In C# we don't have sum-types yet. And so, the easiest way to represent them is to create a base-class or interface, and then a finite set of derived types:
public interface Maybe<A>
{
public static Maybe<A> Just(A value) => new Just<A>(value);
public static Maybe<A> Nothing() => new Nothing<A>();
}
public class Just<A> : Maybe<A>
{
public readonly A Value;
public Just(A value) =>
Value = value;
}
public class Nothing<A> : Maybe<A>
{
}
This is obviously less convenient that the terse declarations in Haskell, and so we can achieve something similar by using the LanguageExt.CodeGen
:
[Union]
public interface Maybe<A>
{
Maybe<A> Just(A value);
Maybe<A> Nothing();
}
NOTE: The big caveat here is that C# can't do completeness checking when using union-types with pattern-matching
So, what are they useful for? Well, the key idea is that they represent a finite set of states, and the mapping of that finite set to another (via pattern matching) is more powerful and easier to reason about than the abstract/extend everything approach of the OO-world. Most programming is actually working on more concrete concepts than we're lead to believe by the OOP purists, and so creating a data model that more closely represents the number of states your application can be in is a good thing.
The above is true most of the time, but the reader should familiarise her or himself with The Expression Problem