Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal to solve issue #74 #78

Open
wants to merge 7 commits into
base: main
Choose a base branch
from
Open

Conversation

elsdvlee
Copy link
Collaborator

This proposal supposes:

  1. the definition of same logical source: descriptions that lead to the same logical source.
  2. the behaviour in case of a referencing object map / referencing term map without join conditions: natural join in case of same logical sources, error in case of different logical sources (this is in line with the R2RML spec and the old RML spec)


A [Logical Source]() is considered as identical to another [Logical Source]()
when the set of objects at the end of the property paths starting with `rml:source` and starting with `rml:iterator` are identical.
In below examples `<LS1>` and `<LS2>` are identical, but `<LS1>` and `<LS3>` are not identical.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm reading this right, engines must check if rml:source and rml:iterator are string/IRI identical.
If they are, the LS is considered the same right?


**NOTE**
If the [Referencing Object Map]() has no join condition and the [Logical Source]() of the [Triples Map]() that contains the [Referencing Object Map]()
and the [Logical Source]() of the [Referencing Object Map]()'s [Parent Triples Map]() are not identical, the mapping engine MUST report an error.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this break R2RML support? AFAIK you can join without a condition which results in joining everything from LS1 with everything of LS2 (Cartesian product)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, R2RML states:

If the child query and parent query of a referencing object map are not identical, then the referencing object map must have at least one join condition.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK most engines allow this, thus violating the spec?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I would say that is a violation of the spec. We could of course argue about the usefulness of such behavior.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm this is of course not enforced in the shapes, it is kinda hard to do that I think.

Copy link
Collaborator Author

@elsdvlee elsdvlee Feb 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pmaria @DylanVanAssche
I am totally fine with the other solution: no join condition means cartesian product. I just want that a decision is taken on this matter, that decision is documented in the spec and that we all know that is not in line with the R2RML spec (what is the consequence?).

If we decide to move away from the R2RML spec, I wonder why we still need the exception for 'same logical source'. It would be much clearer if no join condition means cartesian product for any join.

Since no such decisions were taken until now, I tried to write a PR in line with R2RML spec.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would vote for the cartesian product: given that most engines implement it as such, it feels like it's the more intuitive interpretation of 'no join condition', and I'm all for increasing intuitivity! :). And we can see it as an extension of R2RML: cartesian allows you to do "more" than when you throw an error

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI. R2RML --> if the queries of two different triples maps use different attributes for the generation of subject maps, then you must have a join condition (in other words, you must do a theta-join). When no join conditions are provided, then the rows of the child queries are used to populate both child and parent subject maps. In that sense, the no-join-conditions case simulates a natural join.

If the referencing object map has no join condition:
SELECT * FROM ({child-query}) AS tmp

This is sufficient, and the quote Pano mentioned is confusing. Testing the equivalence of two queries is "ignored" by the community. It was even the subject of a thread a while ago. Are SELECT * FROM foo X and (SELECT * FROM foo Y) and (SELECT * FROM (SELECT * FROM foo)) identical or not?

Copy link
Collaborator

@pmaria pmaria left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find it weird that we define equality of logical sources in core when logical source is defined in the IO spec. Also terminology we are using to define the equality like rml:source and rml:iterator are introduced only in IO.

Maybe we can just talk about equality here in core without specifying how to determine equality here, but delegate that to the specific spec where the concepts are defined.

@@ -43,7 +43,7 @@ has exactly one value for each of the following two properties:
* a [child map]() (`rml:childMap`),
whose value is an [Expression Map]() (`rml:ExpressionMap`), which
MUST include references that exists in the [Logical Source]()
of the [Parent Triples Map]() that contains the [Referencing Object Map]()
of the [Triples Map]() that contains the [Referencing Object Map]()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This correction makes sense, but the whole sentence doesn't IMO.

I don't think we can say that references exists in a logical source. And I also don't think it's a MUST.

I think the sentence should be something along the lines of:

  • a child map property (rr:childMap) whose value is a child map (rml:ChildMap).

And then there should be explanation about what a rml:ChildMap is.
Whether or not the child map expression resolves or not is not really a concern for the spec IMO.

spec/docs/joinconditions.md Outdated Show resolved Hide resolved
spec/docs/joinconditions.md Outdated Show resolved Hide resolved
then the referencing object map must have at least one join condition.

A [Logical Source]() is considered as identical to another [Logical Source]()
when the set of objects at the end of the property paths starting with `rml:source` and starting with `rml:iterator` are identical.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why property paths?

Suggested change
when the set of objects at the end of the property paths starting with `rml:source` and starting with `rml:iterator` are identical.
when
* the value of the source property (`rml:source`) of both logical sources is equal, and
* the value of the reference formulation property (`rml:referenceFormulation`) of both logical sources is equal, and
* the value of the iterator property (`rml:iterator`) of both logical sources is equal, or both logical sources do not specify the iterator property.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pmaria I couldn't think of another way to specify that also the descriptions of nested sources should be equals (so nested source descriptions can have different identifiers, but still have the same values for all nested properties (e.g. and are equal in the example, even is they have different identifiers.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe infinite nesting is not something we want to support in a first go. Couldn't we just extend Pano's suggestion with something like

when
* the value of the reference formulation property (`rml:referenceFormulation`) of both logical sources is equal,
* the value of the iterator property (`rml:iterator`) of both logical sources is equal, or both logical sources do not specify the iterator property, and
* the value of the source property (`rml:source`) of both logical sources is either equal when the source properties both point to a literal object, or when the source properties point to resource objects, all triples of said resources of both logical sources are equal.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as discussed today:

when
* the value of the reference formulation property (`rml:referenceFormulation`) of both logical sources is equal,
* the value of the iterator property (`rml:iterator`) of both logical sources is equal, or both logical sources do not specify the iterator property, and
* the sub RDF graphs of the source property (`rml:source`) of both logical sources that only contain RML actionable properties of the source access descriptions are isomorph.

this last point means that each source access description needs to list its actionable properties

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My last understanding of our discussion was that we were not going to go for fully isomorph sources, but for explicitly defining which parts of each source type should be isomorph. This would be my preference.

spec/docs/joinconditions.md Outdated Show resolved Hide resolved
spec/docs/joinconditions.md Outdated Show resolved Hide resolved
spec/docs/joinconditions.md Outdated Show resolved Hide resolved
spec/docs/joinconditions.md Outdated Show resolved Hide resolved
In reality this means that the [Logical Source]() is used in its original form when generating the related RDF triples.

If the [Referencing Object Map]() has one or more join conditions, an inner join is executed.
The related RDF triples are generated using the [=n-ary Cartesian product=]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we can we should refer to a general description of generating triples. so that we don't have to repeat here that we use the n-ary cartesion product.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pmaria Can you please make a suggestion?

spec/docs/joinconditions.md Outdated Show resolved Hide resolved
elsdvlee and others added 4 commits February 1, 2024 22:04
Co-authored-by: Pano Maria <pano.maria@gmail.com>
Co-authored-by: Pano Maria <pano.maria@gmail.com>
Co-authored-by: Pano Maria <pano.maria@gmail.com>
Copy link
Member

@bjdmeest bjdmeest left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added some inline comments

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants