The way your data is modeled is crucial to the complexity of your software.
We say that a data model A
is better than a data model B
<=> A
eliminates more impossible cases than B
. Let me show you what that means in practice.
Screen example
Imagine that you have to implement a screen with three states - loading, success, and error. A naive way to model its UI state could be:
data class ScreenUiState(
val content: String?,
val loading: Boolean,
val error: String?
)
The problem with this approach is that our code will have to deal with many impossible (illegal) states. For example:
- What to show if
loading = false
,content = null
,error = null
? - What to do if we have both
loading = true
anderror != null
?
There are so many ways things can go wrong - for example, a common one is forgetting to reset loading
back to false
.
A better way to model this would be to use Algebraic Data types (ADTs) or, simply said in Kotlin: data classes
, sealed interfaces
, and combinations of both.
sealed interface ScreenUiState {
data object Loading : ScreenUiState
data class Content(val text: String) : ScreenUiState
data class Error(val msg: String) : ScreenUiState
}
With the ADTs representation, we eliminate all impossible cases. We also eliminate that at compile-time, meaning that whatever shit we do - the compiler will never allow the code to run.
Takeaway: Model your data using data classes
, and sealed interfaces
(and combinations of them) in a way that:
- Mirrors your domain one-to-one, exactly and explicitly.
- Impossible cases are eliminated by construction and at compile-time.
You can have
sealed interfaces
(sum types) insidedata classes
(product types) and vice-versa. Combine them as you see fit.
Unfortunately, not all impossible cases can be eliminated at compile time. For example, imagine that we're building a food ordering app and want to model its domain. A naive data model could be:
data class Order(
val id: UUID,
val userId: UUID,
val itemId: UUID,
val count: Int,
val time: LocalDateTime,
val trackingId: String,
)
I'm making this up, but the goal is to demonstrate common mistakes and how to fix them. Do you spot them?
Let's think and analyze:
- What if someone orders a
count = 0
or even worse acount = -1
? - Imagine a function
placeOrder(orderId: UUID, userId: UUID, itemId: UUID, ...)
. How likely is someone to pass a wrongUUID
and mess UUIDs up? - The
trackingId
seems to be required and important but what if someone passestrackingId = ""
ortrackingId = "XYZ "
?
I can go on, but you see the point. So let's discuss how we can fix it.
data class Order(
val id: OrderId,
val user: UserId,
val item: ItemId,
val count: PositiveInt,
val time: Instant, // <-- always in UTC
val trackingId: NotBlankTrimmedString
)
@JvmInline
value class OrderId(val value: UUID)
@JvmInline
value class UserId(val value: UUID)
@JvmInline
value class ItemId(val value: UUID)
@JvmInline
value class PositiveInt private constructor(val value: Int) {
companion object : Exact<Int, PositiveInt> {
override val exactName = "PositiveInt"
override fun Raise<String>.spec(raw: Int): PositiveInt {
ensure(raw > 0) { "$raw is not > 0" }
ensure(raw.isFinite()) { "Is not a finite number" }
return PositiveInt(raw)
}
}
}
PositiveInt.from(42)
// Either.Right(PositiveInt(42))
PositiveInt.from(-5)
// Either.Left("-5 is not > 0")
The revised data model takes more code, but it gives you one important property:
If any of your functions accepts an instance of
order: Order
, you immediately know that it's a valid order and no validation logic is required.
This is validation by construction and it eliminates undesirable cases asap which greatly simplifies your domain logic. By making our data model explicit we fixed:
- Order
count
of zero, negative, or infinity by explicitly requiring aPositiveInt
(unfortunately, that happens at runtime because the compiler can't know if a given integer is positive or not by just looking at the code). - The
UUID
s now can't be messed up because the compiler will give you an error, if for example you try to passUserId
to a function acceptingOrderId
. - The
time
is now always in UTC by usingInstant
. - The
trackingId
is trimmed and can't be blank.
To learn more about Exact types you can check the Arrow Exact GitHub repo. The benefit of explicit data models is correctness and reduced complexity of your core logic.
Not all types should be exact. For example, we make an exception for DTOs and entities where working with primitives is easier. However, we still use ADTs and everything in the domain layer where the business logic must be exact and explicit.