-
Notifications
You must be signed in to change notification settings - Fork 133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat/rdfstar support #311
feat/rdfstar support #311
Conversation
…den in the rdf-star spec
@RubenVerborgh this is ready for review but I don't have permissions to mark you as a reviewer. |
Thanks for the catch @TallTed - this PR essentially is changing over from implementing Also to summarise the contents of this PR:
What is not done:
|
Note that, as of this writing, the RDF-star extension to RDF (and its serializations) is focused on triples, not quads, so this update may need reconsideration. |
My use of the quad terminology arises from the fact that the primitive in RDF/JS is a quad and not a triple. So in this PR the convention is that any nested triple (as referred to by the spec), is emmitted as a nested RDF/JS quad where the graph term MUST be the default graph. This is also enforced in the parsing changes in this PR where nested graph terms cause errors. However given that the store currently already supports things outside of the spec (e.g. literals as subjects/predicates); I have made the design decision to not enforce this requirement on store operations; that is, nested quads in the store may contain graph terms that are not the default graph. As I have already discussed with @rubensworks the main thing that needs to be done is to align in the RDFJS standard on a way of representing nested triples. In my view there are 3 ways of going about this:
To give a concrete example then if we have the following nquads statement
would, in RDFJS, become under option 1 quad(
quad(
namedNode('http://example.org/a'),
namedNode('http://example.org/b'),
namedNode('http://example.org/c'),
null,
),
namedNode('http://example.org/p'),
namedNode('http://example.org/o'),
namedNode('http://example.org/g'),
) under option 2 (and as is currently implemented in this PR) quad(
quad(
namedNode('http://example.org/a'),
namedNode('http://example.org/b'),
namedNode('http://example.org/c'),
DEFAULT_GRAPH,
),
namedNode('http://example.org/p'),
namedNode('http://example.org/o'),
namedNode('http://example.org/g'),
) under option 3 quad(
quad(
namedNode('http://example.org/a'),
namedNode('http://example.org/b'),
namedNode('http://example.org/c'),
namedNode('http://example.org/g'),
),
namedNode('http://example.org/p'),
namedNode('http://example.org/o'),
namedNode('http://example.org/g'),
) |
Building to what "[a] store ... supports" rather than to what a spec requires tends to do horrible things for interop. Further, to my mind, I can't see how not building to the RDF-star spec (even as it's still a draft) cannot leave something to be desired in what you're calling "full support for RDF-star". But perhaps I'm missing something and/or these trade-offs are OK with the community around "[this] store". I suppose time will tell. |
I think rdfjs/types#34 (comment) summarizes it nicely. That is, the data is already validated when we do the data exchange (in particular when we are parsing it before adding it to the store, or serializing to send it somewhere else); so there isn't a need to apply additional validation at this processing/storage stage. A good example of where enforcing such interfaces during data processing is problematic is in #296 where the reasoner is working directly with the internal index and has zero knowledge of what types of terms the id represents.
This store correctly implements the |
@jeswr, are there specific reasons why this should be enforced, i.e. why we cannot simply ignore the graph term of quoted triples/quads without erroring? |
Co-authored-by: Ted Thibodeau Jr <tthibodeau@openlinksw.com>
Co-authored-by: Ted Thibodeau Jr <tthibodeau@openlinksw.com>
Co-authored-by: Ted Thibodeau Jr <tthibodeau@openlinksw.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor things in the lexer which I'll sort out myself; then this is good to merge.
Question: did the addition of getSplits
surface any new bugs? I've been debating with myself for ages whether or not to get this; so far, I've relied on manual additions of cases. I do like the idea and I wonder how impactful it is.
termFromId(id[1], factory, true), | ||
termFromId(id[2], factory, true), | ||
id[3] && termFromId(id[3], factory, true) | ||
); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any use of the Store will still require termFromId this since terms can't be recovered from a hash.
But that could be an internal function then indeed!
// ### Constructs a term from the given internal string ID | ||
export function termFromId(id, factory) { | ||
export function termFromId(id, factory, nested) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So only by itself; gotcha.
That's exactly what the follow up PR #318 does :) |
It blew up with dozens of errors on the There were not bugs beyond this - but having it pass on everything else definitely gives me confidence that the lexer is unlikely to be missing edge cases related to chunking. I did also discover that the parser blocks on the following case (and there is a commented out test for this in it('should parse no chunks (i.e. onEnd called immediately)',
shouldParseChunks([])); |
Great! Let me know if there is any further work required for this PR on my end :) |
@RubenVerborgh - when you get around to revisiting this, it is probably worth setting Line 31 in 520054a
rdf-parse which do not recognise -star or -* content types.
|
I've used it like this and it looks perfect in the Turtle output: const quad1 = quad(
namedNode('http://example.org/a'),
namedNode('http://example.org/b'),
namedNode('http://example.org/c'),
)
const quad2 = quad(
quad1,
namedNode('http://example.org/d'),
namedNode('http://example.org/e'),
) Maybe it'd be worth adding a RDF-star example to the Readme? It might not be immediately clear that the way to do it is to use one quad within another one 🤔 |
@benjaminaaron - I've added an example to the writing section :). Feel free to PR any other examples you want. |
@RubenVerborgh I've refactored the lexer and enabled Given that this has been open for a few months - I'd be inclined to merge as is and any other comments can be applied as patch releases? Same goes for RubenVerborgh/SPARQL.js#160. |
This PR adds full support for RDF-star with all spec tests passing.
Closes #272
Closes #304
Related to #256 (What I consider lacking for that to be closed is specialised indexing for nested triples, and a match method that allows you to match based on patterns in nested triples).