Skip to content

Commit

Permalink
adaptation join spec (#36)
Browse files Browse the repository at this point in the history
## What's changed
<!-- Give a concise description of the change -->
Adapted the specification of joins

## Change checklist
- [x] updated ontology, where necessary
- [x] updated shapes, where necessary
- [x] added or updated test cases, where necessary
- [x] any TODOs have been turned into trackable issues and referenced
where necessary

## Issue reference
<!-- For example: -->
<!-- Fixes #{ISSUE}. -->
<!-- Resolves #{ISSUE}. -->

---------

Co-authored-by: Pano Maria <pano.maria@gmail.com>
  • Loading branch information
elsdvlee and pmaria authored Jul 31, 2024
1 parent 1b9a592 commit 8fc6d70
Show file tree
Hide file tree
Showing 17 changed files with 394 additions and 147 deletions.
1 change: 0 additions & 1 deletion ontology/documentation/ontology.ttl
Original file line number Diff line number Diff line change
Expand Up @@ -116,7 +116,6 @@ rml:LogicalView rdf:type owl:Class ;

### http://w3id.org/rml/LogicalViewJoin
rml:LogicalViewJoin rdf:type owl:Class ;
rdfs:subClassOf rml:Join ;
rdfs:isDefinedBy <http://w3id.org/rml/lv/> .


Expand Down
282 changes: 172 additions & 110 deletions spec/section/joins.md
Original file line number Diff line number Diff line change
@@ -1,128 +1,43 @@
## Logical view joins {#viewjoins}

A <dfn>logical view join</dfn> (`rml:LogicalViewJoin`) is an operation that extends the logical iteration of one logical view (the child logical view) with fields from another logical view (the parent logical view).
A <dfn>logical view join</dfn> (`rml:LogicalViewJoin`) is an operation that extends the logical iteration of one logical view (the [=child logical view=]) with fields derived from another logical view (the [=parent logical view=]).

A [=logical view join=] (`rml:LogicalViewJoin`) MUST contain:
- exactly one parent logical view property (`rml:parentLogicalView`), whose value is a [=logical view=] (`rml:LogicalView`) that supplies the additional fields.
- at least one join condition property (`rml:joinCondition`), whose value is a [=join condition=] that describe which values are compared to join the two logical views.
- at least one field property (`rml:field`), whose value is a [=field=] (`rml:Field`) . This field MAY only contain references to fields that exists in the parent logical view.
- exactly one parent logical view property (`rml:parentLogicalView`), whose value is a [=logical view=] (`rml:LogicalView`) that supplies the additional fields, fulfills the role of the <!-- TODO reference to core parent logical source when available-->[parent logical source]() in the <a data-cite="RML-Core#dfn-join-condition">join condition(s)</a> of the [=logical view join=], and is referred to as <dfn>parent logical view</dfn>.
- at least one join condition property (`rml:joinCondition`), whose value is a <a data-cite="RML-Core#dfn-join-condition">join condition</a>.
- at least one field property (`rml:field`), whose value is a [=field=] (`rml:Field`). This field SHOULD only contain references to fields that exists in the parent logical view.

The [=logical view=] in the subject position of the [=join property=], fulfills the role of <!-- TODO reference to core child logical source when available-->[child logical source]() in the <a data-cite="RML-Core#dfn-join-condition">join condition(s)</a> of the [=logical view join=], and is referred to as <dfn>child logical view</dfn>.

| Property | Domain | Range |
|-------------------------|-----------------------|---------------------|
| `rml:parentLogicalView` | `rml:LogicalViewJoin` | `rml:LogicalView` |
| `rml:joinCondition` | `rml:LogicalViewJoin` | `rml:JoinCondition` |
| `rml:field` | `rml:LogicalViewJoin` | `rml:Field` |


### Join conditions

<aside class="issue">
Pano: We are redefining joins here. This is already defined in core. I think it would be enough to just point to that definition. This section can be removed or some parts added to core if necessary.
</aside>

<aside class="issue">
Davide L: Adding to the comment by Pano, join conditions for logical views cannot insist over expression maps. See example:

~~~
rml:leftJoin [
rml:parentLogicalView :jsonView ;
rml:joinCondition [
rml:parent "name" ;
rml:child "name" ;
] ;
~~~

`joinCondition` above is referring to fields in two logical views, whereas `rml:joinCondition` in `core` refers to `rml:ExpressionMap`.
</aside>


A <dfn>join condition</dfn> is represented by a resource that MUST contain exactly one value for each of the following two properties:

- a <dfn>child map</dfn> (`rml:childMap`), whose value is an <a data-cite="RML-Core#dfn-expression-map">expression map</a> (`rml:ExpressionMap`),
which MUST include references that exist in the child logical view, or it should have a constant value.

- a <dfn>parent map</dfn> (`rml:parentMap`), whose value is an <a data-cite="RML-Core#dfn-expression-map">expression map</a> (`rml:ExpressionMap`),
which, as the join condition's parent map, MUST include references that exist in the logical view specified by the parent logical view property, or it should have a constant value.

The [=join condition=] returns true when values produced by the child map and the parent map during the iteration are equal.
<aside class="note">
If no data type is specified in the field, string values are compared.
Data types are not taken into account.
`1.0` will not match with `1.00`.
To secure this match a transformation with <a href="https://kg-construct.github.io/rml-fnml/ontology/documentation/index-en.html">RML-FNML:Functions</a> needs to be configured.
TODO describe what happens if the data types are deduced from the source (refer to the description that will be added to rml:core?
</aside>

<aside class="note">
This definition is in line with the definition in RML CORE, with one small difference: it refers directly to a parent logicial source, and not to the logical source of the parent triples map.
</aside>

| Property | Domain | Range |
| --------------------------- | -------------------- | ------------------------- |
| `rml:childMap` | `rml:JoinCondition` | `rml:ExpressionMap` |
| `rml:parentMap` | `rml:JoinCondition` | `rml:ExpressionMap` |

#### Shortcuts

If the value of the [=child map=] property (`rml:childMap`) is a [reference-valued Expression Map](https://kg-construct.github.io/rml-core/spec/docs/#reference-rml-reference),
then the `rml:child` shortcut could be used.

Similarly, if value of the [=parent map=] (`rml:parentMap`) is a [reference-valued Expression Map](https://kg-construct.github.io/rml-core/spec/docs/#reference-rml-reference),
then the `rml:parent` shortcut could be used.

| Property | Domain | Range |
| --------------------------- | -------------------- | ------------------------- |
| `rml:child` | `rml:JoinCondition` | `Literal` |
| `rml:parent` | `rml:JoinCondition` | `Literal` |

<aside class="issue">
Els: or can we just refer to rml core and not specify any definitions here?
</aside>
<aside class="issue">
Els: can we also optionally declare a join function here, to allow not only equijoins (default) but also other joins
</aside>

### Join types {#dfn-join-type}

A [=logical view=] (`rml:LogicalView`) MAY have one or more with join properties, specifying the join type, i.e. a [=left join=] and a [=inner join=].
The <dfn>join property</dfn> specifies the join type of the [=logical view join=], i.e. a [=left join=] or an [=inner join=].

A <dfn>left join</dfn> (`rml:leftJoin`) is the equivalent of a left (outer) join in SQL, where the child logical view is the left part of the join, and the parent logical view is the right part of the join.
After the join operation all logical iterations of the child logical view are kept.
These logical iterations are extended with fields from the parent logical view when a match is found that meets the join conditions.
When more than one logical iteration in the parent logical view matches with a logical iteration in the child logical view, each match leads to an additional extended logical iteration.
If no match is found, the added field in that extended logical iteration contains a null value.
A <dfn>left join</dfn> (`rml:leftJoin`) is the equivalent of a left (outer) join in SQL, where the [=child logical view=] is the left part of the join, and the [=parent logical view=] is the right part of the join. If any of the <a data-cite="RML-Core#dfn-join-condition">join conditions</a> evaluates to `false`, the fields from the [=logical view join=] in the extended logical iteration contain a null value.

A <dfn>inner join</dfn> (`rml:innerJoin`) is the equivalent of an inner join in SQL.
The logical iterations from the child logical view are extended with values from the parent logical view when a match is found that meets the join conditions.
When more than one logical iteration in the parent logical view matches with a logical iteration in the child logical view, each match leads to an additional extended logical iteration.
If no match is found for a logical iteration, the logical iteration is removed from the child logical view.

<aside class="issue">
Pano: If there are more than one join, what is the order of execution?
</aside>
An <dfn>inner join</dfn> (`rml:innerJoin`) is the equivalent of an inner join in SQL. If any of the <a data-cite="RML-Core#dfn-join-condition">join conditions</a> evaluates to `false`, the logical iteration is removed from the [=child logical view=].

### Logical view join examples

<aside class="issue">
Pano please verify what I did with the iterator and # key from the parent logical view. Is this ok? This should still be described somewhere?
</aside>
<aside class="issue">
Els: TODO add example with 2 joins???
</aside>

### Left join
#### Left join

<aside class=example id=ex-leftjoin>

In this example a [=logical view=] with fields built with data from the logical source form [[[#csviterator]]] is joined with the logical view from [[[#ex-field-record-sequence]]].
In this example a [=logical view=] with fields built with data from the logical source from [[[#csviterator]]] is joined with the logical view from [[[#ex-field-record-sequence]]].
In case of a left join (as in the example), this results in 4 logical iterations in the logical view.
If an inner joins would have been used, the logical view would have only 3 logical iterations.

<aside class=ex-mapping>

```turtle
:csvView a rml:LogicalView ;
rml:logicalSource :csvSource ;
rml:viewOn :csvSource ;
rml:field [
rml:fieldName "name" ;
rml:reference "name" ;
Expand Down Expand Up @@ -161,7 +76,7 @@ If an inner joins would have been used, the logical view would have only 3 logic
<td>birthyear</td>
<td>item_type.#</td>
<td>item_type</td>
<td>item_weight#</td>
<td>item_weight.#</td>
<td>item_weight </td>
</tr>
<tr>
Expand Down Expand Up @@ -191,21 +106,21 @@ If an inner joins would have been used, the logical view would have only 3 logic
<tr>
<td>1</td>
<td>(row)</td>
<td>1</td>
<td>0</td>
<td>bob</td>
<td>1</td>
<td>0</td>
<td>1999</td>
<td>2</td>
<td>0</td>
<td>flower</td>
<td>2</td>
<td>0</td>
<td>15 </td>
</tr>
<tr>
<td>2</td>
<td>(row)</td>
<td>2</td>
<td>0</td>
<td>tobias</td>
<td>2</td>
<td>0</td>
<td>2005</td>
<td>null</td>
<td>null</td>
Expand All @@ -217,7 +132,7 @@ If an inner joins would have been used, the logical view would have only 3 logic
</aside>
</aside>

### Inner join
#### Inner join
<aside class=example id=ex-innerjoin>

When an inner join is used, the resulting logical view has only 3 logical iterations.
Expand All @@ -226,7 +141,7 @@ When an inner join is used, the resulting logical view has only 3 logical iterat

```turtle
:csvView a rml:LogicalView ;
rml:logicalSource :csvSource ;
rml:viewOn :csvSource ;
rml:field [
rml:fieldName "name" ;
rml:reference "name" ;
Expand Down Expand Up @@ -265,7 +180,7 @@ When an inner join is used, the resulting logical view has only 3 logical iterat
<td>birthyear</td>
<td>item_type.#</td>
<td>item_type</td>
<td>item_weight#</td>
<td>item_weight.#</td>
<td>item_weight </td>
</tr>
<tr>
Expand Down Expand Up @@ -295,14 +210,161 @@ When an inner join is used, the resulting logical view has only 3 logical iterat
<tr>
<td>1</td>
<td>(row)</td>
<td>1</td>
<td>0</td>
<td>bob</td>
<td>0</td>
<td>1999</td>
<td>0</td>
<td>flower</td>
<td>0</td>
<td>15 </td>
</tr>
</table>

</aside>
</aside>


#### Two left joins

<aside class=example id=ex-twoleftjoins>

In this example a second [=logical view join=] is added to the [=logical view=] from [[[#ex-leftjoin]]]. The [=parent logical view=] of this second join is derived from logical source `:additionalCsvSource` with below input data.
<aside class=ex-input>

```csv
name,id
alice,123
bob,456
tobias,789
```
</aside>

<aside class=ex-mapping>

```turtle
:additionalCsvView a rml:LogicalView ;
rml:viewOn :additioncalCsvSource ;
rml:field [
rml:fieldName "name" ;
rml:reference "name" ;
] ;
rml:field [
rml:fieldName "id" ;
rml:reference "id" ;
] .
:csvView a rml:LogicalView ;
rml:viewOn :csvSource ;
rml:field [
rml:fieldName "name" ;
rml:reference "name" ;
] ;
rml:field [
rml:fieldName "birthyear" ;
rml:reference "birthyear" ;
] ;
rml:leftJoin [
rml:parentLogicalView :jsonView ;
rml:joinCondition [
rml:parent "name" ;
rml:child "name" ;
] ;
rml:field [
rml:fieldName "item_type" ;
rml:reference "item.type" ;
] ;
rml:field [
rml:fieldName "item_weight" ;
rml:reference "item.weight" ;
] ;
] ;
rml:leftJoin [
rml:parentLogicalView :additionalCsvView ;
rml:joinCondition [
rml:parent "name" ;
rml:child "name" ;
] ;
rml:field [
rml:fieldName "id" ;
rml:reference "id" ;
] ;
] .
```

</aside>

<aside class="ex-intermediate">
<table>
<tr>
<td>#</td>
<td>&lt;it&gt;</td>
<td>name.#</td>
<td>name</td>
<td>birthyear.#</td>
<td>birthyear</td>
<td>item_type.#</td>
<td>item_type</td>
<td>item_weight.#</td>
<td>item_weight</td>
<td>id#</td>
<td>id</td>
</tr>
<tr>
<td>0</td>
<td>(row)</td>
<td>0</td>
<td>alice</td>
<td>0</td>
<td>1995</td>
<td>0</td>
<td>sword</td>
<td>0</td>
<td>1500 </td>
<td>0</td>
<td>123</td>
</tr>
<tr>
<td>0</td>
<td>(row)</td>
<td>0</td>
<td>alice</td>
<td>0</td>
<td>1995</td>
<td>1</td>
<td>shield</td>
<td>1</td>
<td>2500 </td>
<td>0</td>
<td>123</td>
</tr>
<tr>
<td>1</td>
<td>(row)</td>
<td>0</td>
<td>bob</td>
<td>0</td>
<td>1999</td>
<td>2</td>
<td>0</td>
<td>flower</td>
<td>2</td>
<td>0</td>
<td>15 </td>
<td>0</td>
<td>456</td>
</tr>
<tr>
<td>2</td>
<td>(row)</td>
<td>0</td>
<td>tobias</td>
<td>0</td>
<td>2005</td>
<td>null</td>
<td>null</td>
<td>null</td>
<td>null </td>
<td>0</td>
<td>789</td>
</tr>
</table>

Expand Down
Loading

0 comments on commit 8fc6d70

Please sign in to comment.