feat/rdfstar support #311

jeswr · 2022-11-23T07:40:01Z

This PR adds full support for RDF-star with all spec tests passing.

Closes #272
Closes #304
Related to #256 (What I consider lacking for that to be closed is specialised indexing for nested triples, and a match method that allows you to match based on patterns in nested triples).

…den in the rdf-star spec

jeswr · 2022-11-23T07:46:10Z

@RubenVerborgh this is ready for review but I don't have permissions to mark you as a reviewer.

src/N3Parser.js

test/N3Lexer-test.js

test/N3Parser-test.js

jeswr · 2022-11-24T06:21:33Z

Thanks for the catch @TallTed - this PR essentially is changing over from implementing RDF* to RDF-star. I'd done all but a ctr+f to rename at the end :).

Also to summarise the contents of this PR:

Update the parser to be spec compliant with the latest CG test suite.
Update the store to handle deeply nested quads.

What is not done:

Changes to the writer (I'm not actually sure any are necessary).
Better support for matching against patterns in nested triples.

TallTed · 2022-11-28T16:47:59Z

Update the store to handle deeply nested quads.

Note that, as of this writing, the RDF-star extension to RDF (and its serializations) is focused on triples, not quads, so this update may need reconsideration.

jeswr · 2022-11-29T02:56:50Z

Note that, as of this writing, the RDF-star extension to RDF (and its serializations) is focused on triples, not quads, so this update may need reconsideration.

My use of the quad terminology arises from the fact that the primitive in RDF/JS is a quad and not a triple. So in this PR the convention is that any nested triple (as referred to by the spec), is emmitted as a nested RDF/JS quad where the graph term MUST be the default graph.

This is also enforced in the parsing changes in this PR where nested graph terms cause errors. However given that the store currently already supports things outside of the spec (e.g. literals as subjects/predicates); I have made the design decision to not enforce this requirement on store operations; that is, nested quads in the store may contain graph terms that are not the default graph.

As I have already discussed with @rubensworks the main thing that needs to be done is to align in the RDFJS standard on a way of representing nested triples. In my view there are 3 ways of going about this:

Make the graph term optional in RDFJS quads, and do not include it in quoted triples (this would be breaking) or allow the graph term to be set to null (also breaking).
Agree that all quoted triples should be represented as RDFJS quads with the DefaultGraph set for the graph term
Agree that all quoted triples should be represented as RDFJS quads with the graph term the same as that of the top level quad.

To give a concrete example then if we have the following nquads statement

<<:a :b :c>> :p :o :g

would, in RDFJS, become

under option 1

quad(
   quad(
      namedNode('http://example.org/a'),
      namedNode('http://example.org/b'),
      namedNode('http://example.org/c'),
      null,
   ),
  namedNode('http://example.org/p'),
  namedNode('http://example.org/o'),
  namedNode('http://example.org/g'),
)

under option 2 (and as is currently implemented in this PR)

quad(
   quad(
      namedNode('http://example.org/a'),
      namedNode('http://example.org/b'),
      namedNode('http://example.org/c'),
      DEFAULT_GRAPH,
   ),
  namedNode('http://example.org/p'),
  namedNode('http://example.org/o'),
  namedNode('http://example.org/g'),
)

under option 3

quad(
   quad(
      namedNode('http://example.org/a'),
      namedNode('http://example.org/b'),
      namedNode('http://example.org/c'),
      namedNode('http://example.org/g'),
   ),
  namedNode('http://example.org/p'),
  namedNode('http://example.org/o'),
  namedNode('http://example.org/g'),
)

TallTed · 2022-11-29T04:54:29Z

given that the store currently already supports things outside of the spec

Building to what "[a] store ... supports" rather than to what a spec requires tends to do horrible things for interop. Further, to my mind, I can't see how not building to the RDF-star spec (even as it's still a draft) cannot leave something to be desired in what you're calling "full support for RDF-star".

But perhaps I'm missing something and/or these trade-offs are OK with the community around "[this] store". I suppose time will tell.

jeswr · 2022-11-30T00:14:09Z

But perhaps I'm missing something and/or these trade-offs are OK with the community around "[this] store".

I think rdfjs/types#34 (comment) summarizes it nicely. That is, the data is already validated when we do the data exchange (in particular when we are parsing it before adding it to the store, or serializing to send it somewhere else); so there isn't a need to apply additional validation at this processing/storage stage.

A good example of where enforcing such interfaces during data processing is problematic is in #296 where the reasoner is working directly with the internal index and has zero knowledge of what types of terms the id represents.

Building to what "[a] store ... supports" rather than to what a spec requires tends to do horrible things for interop.

This store correctly implements the DatasetCore interface with the default generic parameters according to the RDFJS spec (https://github.com/rdfjs/types/blob/183bda795f57a9464ddf95deac45a0c4a48879cf/dataset.d.ts#L7).

woutermont · 2022-12-01T07:50:32Z

[I]n this PR the convention is that any nested triple (as referred to by the spec), is emmitted as a nested RDF/JS quad where the graph term MUST be the default graph. This is also enforced in the parsing changes in this PR where nested graph terms cause errors.

@jeswr, are there specific reasons why this should be enforced, i.e. why we cannot simply ignore the graph term of quoted triples/quads without erroring?

src/N3Parser.js

test/N3Parser-test.js

Co-authored-by: Ted Thibodeau Jr <tthibodeau@openlinksw.com>

RubenVerborgh

Some minor things in the lexer which I'll sort out myself; then this is good to merge.

Question: did the addition of getSplits surface any new bugs? I've been debating with myself for ages whether or not to get this; so far, I've relied on manual additions of cases. I do like the idea and I wonder how impactful it is.

src/N3Lexer.js

RubenVerborgh · 2023-01-10T13:20:08Z

src/N3DataFactory.js

+    termFromId(id[1], factory, true),
+    termFromId(id[2], factory, true),
+    id[3] && termFromId(id[3], factory, true)
+  );
 }


Any use of the Store will still require termFromId this since terms can't be recovered from a hash.

But that could be an internal function then indeed!

RubenVerborgh · 2023-01-10T13:21:26Z

src/N3DataFactory.js

 // ### Constructs a term from the given internal string ID
-export function termFromId(id, factory) {
+export function termFromId(id, factory, nested) {


So only by itself; gotcha.

src/N3Lexer.js

jeswr · 2023-01-11T04:51:47Z

But that could be an internal function then indeed! (c.f. #311 (comment))

That's exactly what the follow up PR #318 does :)

jeswr · 2023-01-11T04:57:39Z

Question: did the addition of getSplits surface any new bugs? I've been debating with myself for ages whether or not to get this; so far, I've relied on manual additions of cases. I do like the idea and I wonder how impactful it is.

It blew up with dozens of errors on the {| case you had already pointed out; and iirc it also complained at the first attempt at fixing it due to the behavior of a fall-through case.

There were not bugs beyond this - but having it pass on everything else definitely gives me confidence that the lexer is unlikely to be missing edge cases related to chunking.

I did also discover that the parser blocks on the following case (and there is a commented out test for this in N3Parser-test). I am not sure whether this is intended behavior or not.

it('should parse no chunks (i.e. onEnd called immediately)',
    shouldParseChunks([]));

jeswr · 2023-01-11T05:03:18Z

Some minor things in the lexer which I'll sort out myself; then this is good to merge.

Great! Let me know if there is any further work required for this PR on my end :)

jeswr · 2023-02-07T03:22:33Z

@RubenVerborgh - when you get around to revisiting this, it is probably worth setting

N3.js/src/N3Parser.js

Line 31 in 520054a

this._supportsRDFStar = format === '' || /star|\*$/.test(format);

to true unless there is a parameter explicitly opting out. Otherwise the parser doesn't play nice with tools like rdf-parse which do not recognise -star or -* content types.

benjaminaaron · 2023-02-24T21:21:51Z

I've used it like this and it looks perfect in the Turtle output:

const quad1 = quad(
    namedNode('http://example.org/a'),
    namedNode('http://example.org/b'),
    namedNode('http://example.org/c'),
)

const quad2 = quad(
    quad1,
    namedNode('http://example.org/d'),
    namedNode('http://example.org/e'),
)

Maybe it'd be worth adding a RDF-star example to the Readme? It might not be immediately clear that the way to do it is to use one quad within another one 🤔

jeswr · 2023-02-25T04:31:54Z

@benjaminaaron - I've added an example to the writing section :). Feel free to PR any other examples you want.

jeswr · 2023-02-26T01:09:22Z

@RubenVerborgh I've refactored the lexer and enabled rdfStar parsing by default.

Given that this has been open for a few months - I'd be inclined to merge as is and any other comments can be applied as patch releases? Same goes for RubenVerborgh/SPARQL.js#160.

jeswr added 8 commits November 23, 2022 13:57

chore: add nested list test

2f7d6e8

fix: support quoted triples in list

37ab09c

breaking: drop support for quads in quoted triples as they are forbid…

f837b8d

…den in the rdf-star spec

feat: support annotated triples

da945e9

chore: error on quoted compound bnodes

624e3e1

feat: turtle-star spec tests are passing

d27b920

chore: fix lint and coverage errors

2af5e05

chore: remove commented code

0ac4c46