Skip to content

GraphPusher is a tool to automatically build an RDF store based on the information in a VoID file.

License

Notifications You must be signed in to change notification settings

csarven/graphpusher

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

45 Commits
 
 
 
 
 
 

Repository files navigation

GraphPusher

Overview

GraphPusher is a tool to automatically build an RDF store based on the information in a VoID file.

The GraphPusher tool takes a VoID URL as input from command-line, retrieves the VoID file, looks for the void:dataDump property values in the VoID description, HTTP GETs them, and finally imports them in to an RDF store using one of the graph name methods. The graph name method is defined as part of GraphPusher's configuration.

This script is tested and is functional under Debian/Ubuntu.

See also: http://csarven.ca/statistical-linked-dataspaces#graphpusher

Requirements

  • Ruby (required gems: rubygems, net/http, net/https, uri, fileutils, filemagic)
  • tar, gzip, unzip, 7za, rar
  • Raptor RDF Syntax Library, and rapper RDF parser utility program
  • Fuseki's SOH script (included in this package)
  • TDB RDF store (optional: where tdbAssembler setting is used)

Configuration

  • basedir : Location to store the dumps
  • dataset : Dataset name for the store
  • tdbAssembler : TDB assembler file
  • graphNameMethod : Graph name method for SPARQL
  • graphNameBase : Base URL for graph names
  • os : Operating System name to determine new line types and directory separators

Usage

Importing dataDumps into RDF store via VoID file:

Usage: ruby graphpusher.rb VOIDURL [OPTIONS]
Examples: ruby GraphPusher.rb http://example.org/void.ttl --assembler=/usr/lib/fuseki/tdb2_slave.ttl
          ruby GraphPusher.rb http://example.org/void.ttl --dataset=http://localhost:3030/dataset/data

SPARQL Graph names

A graph name for the SPARQL Endpoint uses one of the following (from highest to lowest priority) by setting the graphNameMethod:

  • dataset (default)
  • dataDump
  • filename

By default, if sd:name in VoID is present, it will be used for SPARQL graph name, otherwise, dataset URI will be used. If dataDump or filename is set, they will be used instead of dataset. When filename is set for the graph name case, the base URL value (graphNameBase) for graph name is used in the SPARQL Endpoint.

ToDo

  • Ability to use datadump files from the local network drive
  • Retrieval of the VoID graph from a SPARQL endpoint
  • Create an N-Triples file per graph and load that into RDF store

About

GraphPusher is a tool to automatically build an RDF store based on the information in a VoID file.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages