Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat/rdfstar support #311

Closed
wants to merge 29 commits into from
Closed

feat/rdfstar support #311

wants to merge 29 commits into from

Conversation

jeswr
Copy link
Collaborator

@jeswr jeswr commented Nov 23, 2022

This PR adds full support for RDF-star with all spec tests passing.

Closes #272
Closes #304
Related to #256 (What I consider lacking for that to be closed is specialised indexing for nested triples, and a match method that allows you to match based on patterns in nested triples).

@jeswr
Copy link
Collaborator Author

jeswr commented Nov 23, 2022

@RubenVerborgh this is ready for review but I don't have permissions to mark you as a reviewer.

src/N3Parser.js Outdated Show resolved Hide resolved
src/N3Parser.js Outdated Show resolved Hide resolved
src/N3Parser.js Outdated Show resolved Hide resolved
src/N3Parser.js Outdated Show resolved Hide resolved
test/N3Lexer-test.js Outdated Show resolved Hide resolved
test/N3Parser-test.js Outdated Show resolved Hide resolved
@jeswr
Copy link
Collaborator Author

jeswr commented Nov 24, 2022

Thanks for the catch @TallTed - this PR essentially is changing over from implementing RDF* to RDF-star. I'd done all but a ctr+f to rename at the end :).

Also to summarise the contents of this PR:

  • Update the parser to be spec compliant with the latest CG test suite.
  • Update the store to handle deeply nested quads.

What is not done:

  • Changes to the writer (I'm not actually sure any are necessary).
  • Better support for matching against patterns in nested triples.

@TallTed
Copy link
Contributor

TallTed commented Nov 28, 2022

Update the store to handle deeply nested quads.

Note that, as of this writing, the RDF-star extension to RDF (and its serializations) is focused on triples, not quads, so this update may need reconsideration.

@jeswr
Copy link
Collaborator Author

jeswr commented Nov 29, 2022

Note that, as of this writing, the RDF-star extension to RDF (and its serializations) is focused on triples, not quads, so this update may need reconsideration.

My use of the quad terminology arises from the fact that the primitive in RDF/JS is a quad and not a triple. So in this PR the convention is that any nested triple (as referred to by the spec), is emmitted as a nested RDF/JS quad where the graph term MUST be the default graph.

This is also enforced in the parsing changes in this PR where nested graph terms cause errors. However given that the store currently already supports things outside of the spec (e.g. literals as subjects/predicates); I have made the design decision to not enforce this requirement on store operations; that is, nested quads in the store may contain graph terms that are not the default graph.

As I have already discussed with @rubensworks the main thing that needs to be done is to align in the RDFJS standard on a way of representing nested triples. In my view there are 3 ways of going about this:

  1. Make the graph term optional in RDFJS quads, and do not include it in quoted triples (this would be breaking) or allow the graph term to be set to null (also breaking).
  2. Agree that all quoted triples should be represented as RDFJS quads with the DefaultGraph set for the graph term
  3. Agree that all quoted triples should be represented as RDFJS quads with the graph term the same as that of the top level quad.

To give a concrete example then if we have the following nquads statement

<<:a :b :c>> :p :o :g

would, in RDFJS, become

under option 1

quad(
   quad(
      namedNode('http://example.org/a'),
      namedNode('http://example.org/b'),
      namedNode('http://example.org/c'),
      null,
   ),
  namedNode('http://example.org/p'),
  namedNode('http://example.org/o'),
  namedNode('http://example.org/g'),
)

under option 2 (and as is currently implemented in this PR)

quad(
   quad(
      namedNode('http://example.org/a'),
      namedNode('http://example.org/b'),
      namedNode('http://example.org/c'),
      DEFAULT_GRAPH,
   ),
  namedNode('http://example.org/p'),
  namedNode('http://example.org/o'),
  namedNode('http://example.org/g'),
)

under option 3

quad(
   quad(
      namedNode('http://example.org/a'),
      namedNode('http://example.org/b'),
      namedNode('http://example.org/c'),
      namedNode('http://example.org/g'),
   ),
  namedNode('http://example.org/p'),
  namedNode('http://example.org/o'),
  namedNode('http://example.org/g'),
)

@TallTed
Copy link
Contributor

TallTed commented Nov 29, 2022

given that the store currently already supports things outside of the spec

Building to what "[a] store ... supports" rather than to what a spec requires tends to do horrible things for interop. Further, to my mind, I can't see how not building to the RDF-star spec (even as it's still a draft) cannot leave something to be desired in what you're calling "full support for RDF-star".

But perhaps I'm missing something and/or these trade-offs are OK with the community around "[this] store". I suppose time will tell.

@jeswr
Copy link
Collaborator Author

jeswr commented Nov 30, 2022

But perhaps I'm missing something and/or these trade-offs are OK with the community around "[this] store".

I think rdfjs/types#34 (comment) summarizes it nicely. That is, the data is already validated when we do the data exchange (in particular when we are parsing it before adding it to the store, or serializing to send it somewhere else); so there isn't a need to apply additional validation at this processing/storage stage.

A good example of where enforcing such interfaces during data processing is problematic is in #296 where the reasoner is working directly with the internal index and has zero knowledge of what types of terms the id represents.

Building to what "[a] store ... supports" rather than to what a spec requires tends to do horrible things for interop.

This store correctly implements the DatasetCore interface with the default generic parameters according to the RDFJS spec (https://github.com/rdfjs/types/blob/183bda795f57a9464ddf95deac45a0c4a48879cf/dataset.d.ts#L7).

@RubenVerborgh RubenVerborgh added this to the v2.0 milestone Nov 30, 2022
@woutermont
Copy link

[I]n this PR the convention is that any nested triple (as referred to by the spec), is emmitted as a nested RDF/JS quad where the graph term MUST be the default graph. This is also enforced in the parsing changes in this PR where nested graph terms cause errors.

@jeswr, are there specific reasons why this should be enforced, i.e. why we cannot simply ignore the graph term of quoted triples/quads without erroring?

@jeswr jeswr removed their assignment Jan 4, 2023
@jeswr jeswr mentioned this pull request Jan 4, 2023
src/N3Parser.js Outdated Show resolved Hide resolved
src/N3Parser.js Outdated Show resolved Hide resolved
test/N3Parser-test.js Outdated Show resolved Hide resolved
jeswr and others added 3 commits January 5, 2023 12:08
Co-authored-by: Ted Thibodeau Jr <tthibodeau@openlinksw.com>
Co-authored-by: Ted Thibodeau Jr <tthibodeau@openlinksw.com>
Co-authored-by: Ted Thibodeau Jr <tthibodeau@openlinksw.com>
Copy link
Member

@RubenVerborgh RubenVerborgh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor things in the lexer which I'll sort out myself; then this is good to merge.

Question: did the addition of getSplits surface any new bugs? I've been debating with myself for ages whether or not to get this; so far, I've relied on manual additions of cases. I do like the idea and I wonder how impactful it is.

src/N3Lexer.js Outdated Show resolved Hide resolved
src/N3Lexer.js Outdated Show resolved Hide resolved
termFromId(id[1], factory, true),
termFromId(id[2], factory, true),
id[3] && termFromId(id[3], factory, true)
);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any use of the Store will still require termFromId this since terms can't be recovered from a hash.

But that could be an internal function then indeed!

// ### Constructs a term from the given internal string ID
export function termFromId(id, factory) {
export function termFromId(id, factory, nested) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So only by itself; gotcha.

src/N3Lexer.js Show resolved Hide resolved
@jeswr
Copy link
Collaborator Author

jeswr commented Jan 11, 2023

But that could be an internal function then indeed! (c.f. #311 (comment))

That's exactly what the follow up PR #318 does :)

@jeswr
Copy link
Collaborator Author

jeswr commented Jan 11, 2023

Question: did the addition of getSplits surface any new bugs? I've been debating with myself for ages whether or not to get this; so far, I've relied on manual additions of cases. I do like the idea and I wonder how impactful it is.

It blew up with dozens of errors on the {| case you had already pointed out; and iirc it also complained at the first attempt at fixing it due to the behavior of a fall-through case.

There were not bugs beyond this - but having it pass on everything else definitely gives me confidence that the lexer is unlikely to be missing edge cases related to chunking.


I did also discover that the parser blocks on the following case (and there is a commented out test for this in N3Parser-test). I am not sure whether this is intended behavior or not.

it('should parse no chunks (i.e. onEnd called immediately)',
    shouldParseChunks([]));

@jeswr
Copy link
Collaborator Author

jeswr commented Jan 11, 2023

Some minor things in the lexer which I'll sort out myself; then this is good to merge.

Great! Let me know if there is any further work required for this PR on my end :)

@jeswr
Copy link
Collaborator Author

jeswr commented Feb 7, 2023

@RubenVerborgh - when you get around to revisiting this, it is probably worth setting

this._supportsRDFStar = format === '' || /star|\*$/.test(format);
to true unless there is a parameter explicitly opting out. Otherwise the parser doesn't play nice with tools like rdf-parse which do not recognise -star or -* content types.

@benjaminaaron
Copy link

I've used it like this and it looks perfect in the Turtle output:

const quad1 = quad(
    namedNode('http://example.org/a'),
    namedNode('http://example.org/b'),
    namedNode('http://example.org/c'),
)

const quad2 = quad(
    quad1,
    namedNode('http://example.org/d'),
    namedNode('http://example.org/e'),
)

Maybe it'd be worth adding a RDF-star example to the Readme? It might not be immediately clear that the way to do it is to use one quad within another one 🤔

@jeswr
Copy link
Collaborator Author

jeswr commented Feb 25, 2023

@benjaminaaron - I've added an example to the writing section :). Feel free to PR any other examples you want.

@jeswr
Copy link
Collaborator Author

jeswr commented Feb 26, 2023

@RubenVerborgh I've refactored the lexer and enabled rdfStar parsing by default.

Given that this has been open for a few months - I'd be inclined to merge as is and any other comments can be applied as patch releases? Same goes for RubenVerborgh/SPARQL.js#160.

This was referenced Mar 24, 2023
@jeswr jeswr closed this Mar 24, 2023
@jeswr jeswr mentioned this pull request Mar 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Does not parse RDF lists with RDF-star triples Bug: termFromId errors on deeply nested quads
6 participants