Skip to content

creates a docker image with Virtuoso preloaded with the latest DBpedia dataset

License

Notifications You must be signed in to change notification settings

Wimmics/dbpedia-virtuoso-sparql-endpoint-quickstart

 
 

Repository files navigation

French DBpedia Chapter Infrastructure

Welcome on the French DBpedia infrastructure source page. The project is extending the Quick start virtuoso sparql endpoint built by the DBpedia Association.

We present here the main feature of the pipeline drawn for refining the french DBpedia knowledge graph, for more explanation about how to built a virtuoso endpoint please refer to this Wimmics repository

We will find here three docker-composed files :

  • docker-compose.dbpedia-load.yml : that is used for building our knowledge base
  • docker-compose.dbpedia.yml : that is used for hosting our endpoint at the end
  • docker-compose.live.yml : used for building our instance of DBpedia Live, only available for academic purposes

About the extension of the virtuoso instance

The common container used in the three configuration is extending the official docker image of virtuoso, as we organised our data in named graph we were obliged to adapt the VAD interface for allowing us to display for a given entities all the properties contained in every named graphs.

We simply install this corrected VAD interface in the container

The French DBpedia pipeline

The process works on last release of databus our databus collection downloaded via the collection downloader

This pipeline is processed by the second container of the docker-compose.dbpedia-load.yml, called "load". This is running a master bash script shifting depending of the configuration given in the docker-compose file, the different step of the SPARQL refinment process.

This one is composed of theses followings steps, they could be enabled and disabled depending of the value given in the docker-compose.dbpedia-load.yml file :

  • FILTER_WIKIDATA_LABELS : filter the wikidata_labels dataset for keeping only wikidata entities that have a french label
  • PROCESS_INIT : load the data into separate named graphs
  • PROCESS_GEOLOC : update the shape of the geo data triples because of their may refer to geocoordinates found into the article that are not necessarily related to the resource of the wikipedia article
  • PROCESS_WIKIDATA : process the same as invertion into the wikidata sameas wiki dataset and propagate it into all the other wikidata named graphs
  • PROCESS_MULTILANG : link to french resource labels that have wikidata sameAs relation and tags these triples depending of the language
  • COMPUTE_STATS_MULTILANG : compute stats for allowing chapter coverage comparison
  • CLEAN_MULTILANG : delete all the labels that are not related to a french resource
  • CLEAN_WIKIDATA : delete wikidata triples without french labels and link to french resource
  • PROCESS_STATS : compute general stat for every named graphs and specifics statistics for infobox data from DBpedia and wikidata
  • PROCESS DUMPS : save and export dumps of the produced and enriched graphs

About

creates a docker image with Virtuoso preloaded with the latest DBpedia dataset

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Shell 99.5%
  • Dockerfile 0.5%