-
Notifications
You must be signed in to change notification settings - Fork 40
The bin/rdfunit
script (basic usage) automatically builds the project when run for the first time so no need to build anything manually, nevertheless here's what you could do:
$ git clone https://github.com/AKSW/RDFUnit.git
$ cd RDFUnit/
# The following skips the compilation of the webdemo module which takes quite a while to compile
$ mvn -pl rdfunit-validate -am clean install
# or just run $mvn clean install if you don't mind waiting
# argument help to see all available options
$ bin/rdfunit -h
# Simple call (Dereferencing or local file)
$ bin/rdfunit -d <dataset-uri> [-s <schema1,schema2,schema3,...>]
# Simple call (Dereferencing when you want to keep the manual tests for a dataset)
$ bin/rdfunit -d <dataset-uri> -u <source-URI> [-s <schema1,schema2,schema3,...>]
# Simple call (SPARQL)
$ bin/rdfunit -d <dataset-uri> -e <endpoint> [-g <graph1,graph2,...>] [-s <schema1,schema2,schema3,...>]
-d <dataset-uri>
e.g. http://dbpedia.org, http://example.com/data.ttl, /home/datasets/data.ttl
is required in all cases and states a URI that relates to the tested dataset. It could be http://dbpedia.org
for the DBpedia SPARQL endpoint or again the same for the DBpedia dumps. RDFUnit uses the dataset URI to associate manual test cases specific for a dataset.
If no endpoint or -u
option is given, RDFUnit assumes that the dataset uri is to be tested and tries to test it directly. Note that this can also be a local file e.f. /home/rdf/data.ttl
-s < schema1,schema2,schema3,...>
e.g. foaf,skos,prov,http://my.ontology.com/ns/core
schemas are also required for running an evaluation. You can use known prefixes, e.g. foaf or skos and RDFUnit automatically resolves the namespaces through the LOV endpoint or the schemaDecl.csv file. Note that any entries on the file will override
the entries retrieved from LOV.
When -s
is missing, RDFUnit will try to identify all schemata auomatically. At the moment, automatic identification is limited to schemata described in LOV
or schemaDecl.csv
.
-u <source-URI>
e.g. http://example.com/data.ttl, /home/datasets/data.ttl
when we want to test a dataset directly we can use the bin/rdfunit -d <dataset-uri> -s <schemas>
option. However, when we have associated manual test cases for a dataset uri and the actual url is different or the dump is downloaded locally, -u
overrides defines the actual location and -d
is used for loading any associated manual test cases.
-e <endpoint> -g <graph1,graph2,...>
e.g. -e http://dbpedia.org/sparql -g http://dbpedia.org
You can run RDFUnit directly on a SPARQL endpoint by defining the -e
and (optionally) the -g
parameters.
Note by default RDFUnit does the following (all these values can be overridden from the command line options):
- We have a local H2 cache that stores the results and is located in
rdfunit-validate/cache/sparql/*
. The cache has a default TTL 1 week - We do automatic pagination for retrieving big results. The default value is 800.
- We have a 5 seconds delay between queries to keep public endpoint load low.
- We have a limit of 800 results per query.
RDFUnit supports the following types of schemas
- OWL (using CWA): We pick the most commons OWL axioms as well as schema.org. (see [1],[2] for details
- SHACL: SHACL is still in progress but we support the most stable parts of the language. Whatever constructs we support can also run on SPARQL Endpoints (SHACL does not support SPARQL endpoints by design)
- IBM Resource Shapes: The progress is tracked here bus as soon as SHACL becomes stable we will drop support for RS
- DSP (Dublin Core Set Profiles): The progress is tracked here bus as soon as SHACL becomes stable we will drop support for RS
Note that you can mix all of these constraints together and RDFUnit will validate the dataset against all of them.
The enrichment of the schema is not necessary per se, but may lead to better results in cases where the schema/ontology of the considered dataset is in some sense light-weight, meaning that there are only a few constraints that can be used for automatic pattern instantiation.
Since the enrichment is performed by an external tool (the DL-Learner) we refer to the project site for further details how to run it.
An example for the enrichment of the DBpedia dataset could be
user@host interfaces $ mvn exec:java -e -Dexec.mainClass="org.dllearner.cli.Enrichment" -Dexec.args="-e http://dbpedia.org/sparql -g http://dbpedia.org -f rdf/xml -o enrichment_dbpediaorg.xml -s enrichment_dbpediaorg.owl -l -1 -t 0.9"
RDFUnit can load enriched schema test cases using the -p
parameter
# with use of enriched ontnology
$ bin/rdfunit -d <dataset-uri> -e <endpoint> -g <graph1,graph2,...> -s <schema1,schema2,schema3,...> -p <enriched-schema-prefix>
###Docker
RDFUnit CLI is dockerizable! Here a basic Dockerfile:
FROM ubuntu
RUN apt-get update && apt-get upgrade -yy && apt-get install git maven openjdk-8-jdk -yy
RUN git clone https://github.com/AKSW/RDFUnit.git && cd RDFUnit/ && mvn -pl rdfunit-validate -am clean install
WORKDIR /RDFUnit
ENTRYPOINT ["/RDFUnit/bin/rdfunit"]
Build it: docker build -t yourname/rdfunit .
Run it with rdfunit parameters: docker run -it --rm tboonx/rdfunit <params>
E.g.: docker run -it --rm tboonx/rdfunit -d https://raw.githubusercontent.com/AKSW/amsl-on-ci/master/amsl.ttl -r aggregate
We will provide a docker image on dockerhub: AKSW on DockerHub