-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
proposal to solve issue #74 #78
base: main
Are you sure you want to change the base?
Conversation
|
||
A [Logical Source]() is considered as identical to another [Logical Source]() | ||
when the set of objects at the end of the property paths starting with `rml:source` and starting with `rml:iterator` are identical. | ||
In below examples `<LS1>` and `<LS2>` are identical, but `<LS1>` and `<LS3>` are not identical. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm reading this right, engines must check if rml:source
and rml:iterator
are string/IRI identical.
If they are, the LS is considered the same right?
|
||
**NOTE** | ||
If the [Referencing Object Map]() has no join condition and the [Logical Source]() of the [Triples Map]() that contains the [Referencing Object Map]() | ||
and the [Logical Source]() of the [Referencing Object Map]()'s [Parent Triples Map]() are not identical, the mapping engine MUST report an error. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't this break R2RML support? AFAIK you can join without a condition which results in joining everything from LS1 with everything of LS2 (Cartesian product)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, R2RML states:
If the child query and parent query of a referencing object map are not identical, then the referencing object map must have at least one join condition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK most engines allow this, thus violating the spec?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I would say that is a violation of the spec. We could of course argue about the usefulness of such behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm this is of course not enforced in the shapes, it is kinda hard to do that I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pmaria @DylanVanAssche
I am totally fine with the other solution: no join condition means cartesian product. I just want that a decision is taken on this matter, that decision is documented in the spec and that we all know that is not in line with the R2RML spec (what is the consequence?).
If we decide to move away from the R2RML spec, I wonder why we still need the exception for 'same logical source'. It would be much clearer if no join condition means cartesian product for any join.
Since no such decisions were taken until now, I tried to write a PR in line with R2RML spec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would vote for the cartesian product: given that most engines implement it as such, it feels like it's the more intuitive interpretation of 'no join condition', and I'm all for increasing intuitivity! :). And we can see it as an extension of R2RML: cartesian allows you to do "more" than when you throw an error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI. R2RML --> if the queries of two different triples maps use different attributes for the generation of subject maps, then you must have a join condition (in other words, you must do a theta-join). When no join conditions are provided, then the rows of the child queries are used to populate both child and parent subject maps. In that sense, the no-join-conditions case simulates a natural join.
If the referencing object map has no join condition:
SELECT * FROM ({child-query}) AS tmp
This is sufficient, and the quote Pano mentioned is confusing. Testing the equivalence of two queries is "ignored" by the community. It was even the subject of a thread a while ago. Are SELECT * FROM foo X
and (SELECT * FROM foo Y)
and (SELECT * FROM (SELECT * FROM foo))
identical or not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find it weird that we define equality of logical sources in core when logical source is defined in the IO spec. Also terminology we are using to define the equality like rml:source
and rml:iterator
are introduced only in IO.
Maybe we can just talk about equality here in core without specifying how to determine equality here, but delegate that to the specific spec where the concepts are defined.
@@ -43,7 +43,7 @@ has exactly one value for each of the following two properties: | |||
* a [child map]() (`rml:childMap`), | |||
whose value is an [Expression Map]() (`rml:ExpressionMap`), which | |||
MUST include references that exists in the [Logical Source]() | |||
of the [Parent Triples Map]() that contains the [Referencing Object Map]() | |||
of the [Triples Map]() that contains the [Referencing Object Map]() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This correction makes sense, but the whole sentence doesn't IMO.
I don't think we can say that references exists in a logical source. And I also don't think it's a MUST.
I think the sentence should be something along the lines of:
- a child map property (
rr:childMap
) whose value is a child map (rml:ChildMap
).
And then there should be explanation about what a rml:ChildMap
is.
Whether or not the child map expression resolves or not is not really a concern for the spec IMO.
then the referencing object map must have at least one join condition. | ||
|
||
A [Logical Source]() is considered as identical to another [Logical Source]() | ||
when the set of objects at the end of the property paths starting with `rml:source` and starting with `rml:iterator` are identical. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why property paths?
when the set of objects at the end of the property paths starting with `rml:source` and starting with `rml:iterator` are identical. | |
when | |
* the value of the source property (`rml:source`) of both logical sources is equal, and | |
* the value of the reference formulation property (`rml:referenceFormulation`) of both logical sources is equal, and | |
* the value of the iterator property (`rml:iterator`) of both logical sources is equal, or both logical sources do not specify the iterator property. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pmaria I couldn't think of another way to specify that also the descriptions of nested sources should be equals (so nested source descriptions can have different identifiers, but still have the same values for all nested properties (e.g. and are equal in the example, even is they have different identifiers.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe infinite nesting is not something we want to support in a first go. Couldn't we just extend Pano's suggestion with something like
when
* the value of the reference formulation property (`rml:referenceFormulation`) of both logical sources is equal,
* the value of the iterator property (`rml:iterator`) of both logical sources is equal, or both logical sources do not specify the iterator property, and
* the value of the source property (`rml:source`) of both logical sources is either equal when the source properties both point to a literal object, or when the source properties point to resource objects, all triples of said resources of both logical sources are equal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as discussed today:
when
* the value of the reference formulation property (`rml:referenceFormulation`) of both logical sources is equal,
* the value of the iterator property (`rml:iterator`) of both logical sources is equal, or both logical sources do not specify the iterator property, and
* the sub RDF graphs of the source property (`rml:source`) of both logical sources that only contain RML actionable properties of the source access descriptions are isomorph.
this last point means that each source access description needs to list its actionable properties
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My last understanding of our discussion was that we were not going to go for fully isomorph sources, but for explicitly defining which parts of each source type should be isomorph. This would be my preference.
spec/docs/joinconditions.md
Outdated
In reality this means that the [Logical Source]() is used in its original form when generating the related RDF triples. | ||
|
||
If the [Referencing Object Map]() has one or more join conditions, an inner join is executed. | ||
The related RDF triples are generated using the [=n-ary Cartesian product=] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we can we should refer to a general description of generating triples. so that we don't have to repeat here that we use the n-ary cartesion product.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pmaria Can you please make a suggestion?
Co-authored-by: Pano Maria <pano.maria@gmail.com>
Co-authored-by: Pano Maria <pano.maria@gmail.com>
Co-authored-by: Pano Maria <pano.maria@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added some inline comments
This proposal supposes: