GraphPusher is a tool to automatically build an RDF store based on the information in a VoID file.
The GraphPusher tool takes a VoID URL as input from command-line, retrieves the VoID file, looks for the void:dataDump
property values in the VoID description, HTTP GETs them, and finally imports them in to an RDF store using one of the graph name methods. The graph name method is defined as part of GraphPusher's configuration.
This script is tested and is functional under Debian/Ubuntu.
See also: http://csarven.ca/statistical-linked-dataspaces#graphpusher
- Ruby (required gems: rubygems, net/http, net/https, uri, fileutils, filemagic)
- tar, gzip, unzip, 7za, rar
- Raptor RDF Syntax Library, and rapper RDF parser utility program
- Fuseki's SOH script (included in this package)
- TDB RDF store (optional: where tdbAssembler setting is used)
- basedir : Location to store the dumps
- dataset : Dataset name for the store
- tdbAssembler : TDB assembler file
- graphNameMethod : Graph name method for SPARQL
- graphNameBase : Base URL for graph names
- os : Operating System name to determine new line types and directory separators
Importing dataDumps into RDF store via VoID file:
Usage: ruby graphpusher.rb VOIDURL [OPTIONS]
Examples: ruby GraphPusher.rb http://example.org/void.ttl --assembler=/usr/lib/fuseki/tdb2_slave.ttl
ruby GraphPusher.rb http://example.org/void.ttl --dataset=http://localhost:3030/dataset/data
A graph name for the SPARQL Endpoint uses one of the following (from highest to lowest priority) by setting the graphNameMethod:
- dataset (default)
- dataDump
- filename
By default, if sd:name in VoID is present, it will be used for SPARQL graph name, otherwise, dataset URI will be used. If dataDump or filename is set, they will be used instead of dataset. When filename is set for the graph name case, the base URL value (graphNameBase) for graph name is used in the SPARQL Endpoint.
- Ability to use datadump files from the local network drive
- Retrieval of the VoID graph from a SPARQL endpoint
- Create an N-Triples file per graph and load that into RDF store