Periodic knowledge graph construction and data lineage #61

idomingu · 2024-11-29T10:57:42Z

idomingu
Nov 29, 2024

There are scenarios that require periodic construction of the knowledge graph from logical sources like REST API or RDBMS.

Thus far, RML engines that pull data from sources (i.e., batch data sources) do this process as a one-time job. To support periodic batch jobs we have implemented scheduling mechanisms that leverage the features provided by the infrastructure. For example, a containerized RML engine like Morph-KGC is deployed on Kubernetes and then use the cron feature from k8s to orchestrate the scheduling of the container. This approach decouples RML from the technology used for the knowledge graph construction, however, it has bad effect in terms of data lineage.

Storing RML mappings together with the data in the knowledge graph helps us keep track of data lineage, and therefore, improve the data quality of the knowledge graph. In a scenario of periodic data integration, the knowledge graph would be missing the periodicity since it was defined outside RML (in Kubernetes in this case).

I see two options here:

Extend RML to declaratively indicate the periodicity for pulling data from the logical source
Keep RML as it is now and define guidelines for generating data lineage from the RML engine. Perhaps the PROV ontology could be a good candidate for this.

Many thanks!

justin2004 · 2024-11-29T22:05:04Z

justin2004
Nov 29, 2024

@idomingu ,
i don't think RML needs vocabulary for this. plenty of existing ontologies have terms for this kind of thing.
e.g. gist has
gist:produces gist:Content gist:plannedStartDateTime etc.

and you can express the situation where data is pulled from some source periodically and transformed into RDF using a particular transformation... something like the example here.

0 replies

DylanVanAssche · 2024-11-30T15:06:34Z

DylanVanAssche
Nov 30, 2024
Maintainer

Hi!

For integrating data changes, you might have a look at this paper under review: https://www.semantic-web-journal.net/content/incrml-incremental-knowledge-graph-construction-heterogeneous-data-sources

To execute it periodically, I don't think RML itself should have extensions, this is more a job for the implementation as RML is a schema transformation.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Periodic knowledge graph construction and data lineage #61

{{title}}

Replies: 2 comments

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Periodic knowledge graph construction and data lineage #61

idomingu Nov 29, 2024

Replies: 2 comments

justin2004 Nov 29, 2024

DylanVanAssche Nov 30, 2024 Maintainer

idomingu
Nov 29, 2024

justin2004
Nov 29, 2024

DylanVanAssche
Nov 30, 2024
Maintainer