Modeling Joins in Morphir #77
Replies: 6 comments 4 replies
-
Hi @AttilaMihaly. @stephengoldbaum and @DamianReeves mentioned that perhaps I should comment. calc items products =
items
|> List.flatMap
(\item ->
products
|> List.filter
(\product -> product.id == item.product_id)
|> List.map
(\product ->
{ item_name = item.item_name
, item_count = item.item_count
, total_value = item.item_count * product.price
}
)
) This is basically the way Quill does it: query[Person].flatMap(p =>
query[Address].filter(a =>
p.id == a.person_fk
).map(a =>
PersonAddress(p.name, a.zip)
)
) |
Beta Was this translation helpful? Give feedback.
-
This might be too much for what you are trying to do but, if so inclined, you can technically go even further and change the It would look something like this: calc items products =
items
|> List.flatMap
(\item ->
products
|> Joins.right
(\product -> product.id == item.product_id)
|> List.map
(\product ->
{ item_name = item.item_name
, item_count = item.item_count
, total_value = item.item_count * product.price
}
)
) Then for left-join the only difference is that calc items products =
items
|> List.flatMap
(\item ->
products
|> Joins.left
(\product -> product.id == item.product_id)
|> List.map
|> Maybe.map
(\product ->
{ item_name = item.item_name
, item_count = item.item_count
, total_value = item.item_count * product.price
}
)
) This is exactly how Quill does it. // Inner Join
val pa: Query[PersonAddress] =
query[Person].flatMap(p =>
query[Address].join(a =>
p.id == a.person_fk
).map(a => // a: Address
PersonAddress(p.name, a.zip)
)
)
// Outer Join
val pa: Query[Option[PersonAddress]] =
query[Person].flatMap(p =>
query[Address].leftJoin(a =>
p.id == a.person_fk
).map(a => // a: Option[Address]
a.map(av => // av: Address
PersonAddress(p.name, av.zip)
)
)
) Hope this helps. |
Beta Was this translation helpful? Give feedback.
-
Great insights. There are good business cases for both styles. It seems like they could be nicely combined. |
Beta Was this translation helpful? Give feedback.
-
I realized that there is an even more direct way to express the fact that we expect the lookup to return exactly one value. Instead of a calc : List Item -> Dict String Product -> List Result
calc items products = To this: calc : List Item -> (String -> Product) -> List Result
calc items products = If we do this the query becomes even cleaner: calc items products =
items
|> List.map
(\item ->
{ item_name = item.item_name
, item_count = item.item_count
, total_value = item.item_count * (products item.product_id).price
}
) |
Beta Was this translation helpful? Give feedback.
-
Recently, while working on translating real-world relational data processing pipelines into Morphir, I made a few interesting observations about mapping relational concepts to FP. The fundamental issue with joins is that they can do various things ranging from decreasing the number of rows ( I also found that in most cases this is not something data modelers consider to be a benefit of the relational model. It causes a lot of unexpected behavior such as unintentional duplication of output rows or missing rows which leads to downstream issues. This happens when the intent of the modeler is to look up some extra information for the main entity without changing the number of rows but there are no join operators that would specifically support that. In essence, they want to Doing an outer join can protect you from missing rows but it doesn't protect against duplication. It also makes it look like the intent of the modeler was to assume that a piece of data is optional for a certain use case while the real expectation might be a mandatory lookup and a better behavior would be to report errors for missing data. But again, joins don't support that since a join cannot fail, it just doesn't return any rows. So it looks like the FP model can actually provide benefit to data modelers by making it possible to express their intent more clearly and letting the environment deal with aligning that with reality. The most direct functional way of expressing that you want to look up some required or optional values here is simply using a total or partial function: lookupRequiredValue : a -> b
lookupOptionalValue : a -> Maybe b But, since most of our data is stored in relational databases all we get is a items : List Item
products : List Product
-- This represents a join with an on clause. Unlike a join though it is reusable across multiple queries.
product_of : Item -> Product
product_of item =
Lookup.exactlyOne products
(\product ->
product.id == item.product_id
)
calc : List Result
calc =
items
|> List.map
(\item ->
{ item_name = item.item_name
, item_count = item.item_count
, total_value = item.item_count * (product_of item).price
}
) The |
Beta Was this translation helpful? Give feedback.
-
@AttilaMihaly I think I'm looking at the problem from the other end. Say that instead of starting from a greenfield ability to model business data you have a pre-existing ETL codebase of large queries with deeply nested left/right joins etc... in a large repository of multi-page-long queries. Trying to re-model that kind of codebase has huge risks so it remains "locked up" in SQL for years or even decades. On the other hand, if such a codebase could be hand-transcribed into an SQL-like DSL that contains the same kinds of constructs e.g. left/right joins etc... then suddenly it would become fully portable across technologies with a minimal amount of risk. I think there is tremendous value in that. |
Beta Was this translation helpful? Give feedback.
-
Problem Statement
The business logic we are modeling with Morphir often involves doing lookups to get more data to do further calculations. Traditionally this is done using joins in an ETL pipeline. Joins don't map directly to functional programming so we usually use dictionary lookups to get more data instead.
For example we might want to calculate the total value of each item in an order by joining price information:
You can do the same in Elm with a dictionary lookup:
It looks ok except for that bit with the
Maybe
. Even with all that extra code though it doesn't do exactly what the SQL does. It's essentially doing an outer join with some defaulting. If we want to get the same semantics we have to do something even more complicated:Now we have the same behavior but is this really what we want? Do we want to make it look like a product was never ordered if it's missing from our pricing database for some reason? Probably not. We want things to not fail when this happens but we also want to get notified. How do we achieve that?
Proposed Solution
We recently added a module to the Morphir SDK which has a
required
function that serves as marker for the tooling that a value is required but might not be available at runtime. See the docs for an example: https://package.elm-lang.org/packages/finos/morphir-elm/latest/Morphir-SDK-ValidateIt turns out that we can use that same function to simplify our first example:
If you just ran this Elm code it would fail at runtime when the product is missing. Morphir backends on the other hand can use the extra information that the
required
marker provides. Depending on how you decide to handle it it could generate any of the following:All the above is possible because we retain the modeler's original intent, which is not to filter out or default missing data but to avoid it.
Beta Was this translation helpful? Give feedback.
All reactions