Composition of named graphs #57

dbooth-boston · 2019-03-12T20:28:53Z

Named graphs provide a convenient way to group data. But there is no easy standard way to combine them! For example, I would like to be able to say that one graph is composed of several other graphs. Or I would like to apply a reasoner to one graph, to produce results in another graph.

The RDF Pipeline Framework is one attempt to address this (though it still needs a lot more development).

It would be good to have standard ways to express graph composition and manipulation.

"most of all I’d love to see a generic grouping mechanism that is more powerful than RDFs specification of Named Graphs, supporting nesting and composition of named graphs and identification/reification of statements in named graphs (vulgo: quints). Quints are my favoured hammer and they fit many nails".

https://lists.w3.org/Archives/Public/semantic-web/2018Nov/0170.html

draggett · 2019-03-12T21:24:28Z

This should include unnamed graphs as well as named graphs. In principle, unnamed graphs will have an internal identifier which could be implicit (e.g. as in RDF*) or exposed via an API or query syntax. I am exploring some ideas for how to make this simple to use.

dbooth-boston · 2019-03-12T21:40:47Z

Agreed. I should have said that explicitly: both named and unnamed graphs.

amirouche · 2019-03-23T13:59:49Z

I would like to be able to say that one graph is composed of several other graphs. Or I would like to apply a reasoner to one graph, to produce results in another graph.

This should not be in any standard. I think the best approach to named graphs is the quad store. Basically, it a triple store with an extra column that one might call Collection instead. Then getting composition of collection is an advanced use which boils down to symlink a collection inside another collection relying on the reasoner to traverse the different graphs depending on the query. Anyway, I already thought about things it is very advanced and difficult to query anyway.

draggett · 2019-03-23T17:34:58Z

@amirouche why do you think that we don't need to standardise the ability to express graph compositions? Is that you think the full range of requirements can be handled in some other way?

I see the potential for annotating arbitrary collections of triples. A given triple could occur in multiple collections, or collections of collections. These could be temporary or persistent. It would be useful to know that a given identifier is for a collection of triples without first needing to dereference it.

amirouche · 2019-03-23T18:55:20Z

It seems to me that existing standards already allow to express collection in collection kind of relation.

dbooth-boston · 2019-03-24T18:23:32Z

@amirouche , can you please clarify a couple of points?

I think the best approach to named graphs is the quad store

Agreed, but that is just the implementation. I have not seen standard ways to manipulate those graphs, such specifying that one graph should be composed of two others, or that one graph should hold the result of applying a set of rules to another graph.

existing standards already allow to express collection in collection kind of relation

Which standards? Can you give an example of how this is expressed between named graphs? I'm not following what you mean.

amirouche · 2019-06-05T19:17:37Z

Sorry for the late reply.

I have not seen standard ways to manipulate those graphs, such specifying that one graph should be composed of two others, or that one graph should hold the result of applying a set of rules to another graph.

Can you give an example of how this is expressed between named graphs?

It seems to me it can be expressed in terms of reasoner / rule engines. I am not sure anymore if rule engines are part of RDF.

madnificent · 2019-06-27T07:27:18Z

We use graphs to enforce access rights using query rewriting. Being able to perform set operations on graphs whilst executing the SPARQL query would have greatly minimized the effort. Even now, it would increase the practical expressiveness of the solution.

dbooth-boston · 2019-06-27T12:03:48Z

@madnificent, I am curious about the query rewriting that you mention. Can you explain a little more about it?

madnificent · 2019-07-05T10:24:48Z

Sure thing.

Some context: We have a microservices architecture in which microservices write/read data from a SPARQL endpoint in a shared semantic model. Splitting off access rights helps microservice reuse (see On microservice reuse and authorization). General concept was first coined at ESWC2015/USEWOD (direct link).

All information regarding the application is stored in the triplestore in a well known manner. We have an authorization layer around the SPARQL endpoint used by our services (see: mu-authorization). When a request comes in the microservice receives the session URI and forwards it with each request to the triplestore. Based on the session URI and information in the triplestore, the triplestore itself can identify the access rights of a user. These access rights are shared through the stack (see On sharing authorization). Based on these access rights we can rewrite the SPARQL query so only the right content can be seen or manipulated.

The current authorization system consumes these access rights. It parses the received SPARQL query and converts it into a series of objects as per SPARQL1.1 EBNF. Based on the type of query, it is manipulated in order to read content from the right graphs and to write it to the right graphs. Reading is currently done by wrapping statements into GRAPH/UNION statements or by adding FROM statements. Writing is a bit more complex. We first materialize all triples to INSERT/DELETE by executing the WHERE blocks, interpreting them and materializing the INSERT/DELETE template that belongs with it. The quads which need to be inserted are then compared to the access rights and are scoped to be INSERTED/REMOVED into the right graphs. Pushing the changes through other consumers in the stack allows to clear caches.

@langens-jonathan and I set out strategies for sharing information between actors at Semantics 2016. However, we ran into logical problems with respect to the options of SPARQL queries. From the top of my head (correct me if I'm wrong @langens-jonathan), we figured out there's (active or passive) pushing of information, (active or passive) pulling of information, and a hive-mind in which people cooperate on the same dataset. In order to materialize the first four cases there's the need to overwrite data. Hence the triplestore would need the option to state "I want to query query on all information from this first graph MINUS this second graph.". We don't seem to be able to express this right now without materializing the data in the triplestore.

dbooth-boston added Category: language features For language features of RDF itself -- model and syntax higher-level Higher-level RDF should address this labels Mar 12, 2019

amirouche mentioned this issue Jun 5, 2019

Property Graphs #45

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Composition of named graphs #57

Composition of named graphs #57

dbooth-boston commented Mar 12, 2019 •

edited

Loading

draggett commented Mar 12, 2019

dbooth-boston commented Mar 12, 2019

amirouche commented Mar 23, 2019

draggett commented Mar 23, 2019

amirouche commented Mar 23, 2019

dbooth-boston commented Mar 24, 2019

amirouche commented Jun 5, 2019

madnificent commented Jun 27, 2019

dbooth-boston commented Jun 27, 2019

madnificent commented Jul 5, 2019

Composition of named graphs #57

Composition of named graphs #57

Comments

dbooth-boston commented Mar 12, 2019 • edited Loading

draggett commented Mar 12, 2019

dbooth-boston commented Mar 12, 2019

amirouche commented Mar 23, 2019

draggett commented Mar 23, 2019

amirouche commented Mar 23, 2019

dbooth-boston commented Mar 24, 2019

amirouche commented Jun 5, 2019

madnificent commented Jun 27, 2019

dbooth-boston commented Jun 27, 2019

madnificent commented Jul 5, 2019

dbooth-boston commented Mar 12, 2019 •

edited

Loading