This project provides SHACL shapes to validate metadata against the eCH-0200 standard.
ech-0200.shacl.ttl
: This file models the constraints defined in eCH-200 with the SHACL vocabularyexamples/
: This directory contains RDF turtle files that can be used to testech-0200.shacl.ttl
. The convention is that files ending with.valid.ttl
will validate, while files ending with.fail.ttl
will not validate.
To use the shapes to validate data, you need a SHACL validator such as TopBraid SHACL API. With this validator you can validate RDF turtle files as follows:
$ shaclvalidate -shapesfile ech-0200.shacl.ttl -datafile data.ttl
For example (from this directory):
$ shaclvalidate -shapesfile ech-0200.shacl.ttl -datafile .\examples\minimal.valid.ttl
As the TopBraid SHACL validator only supports the Turtle RDF Format, you need to convert files in other formats such as RDF/XML files.
First you need Apache Jena. You can download and extract it with these commands:
$ wget https://www-eu.apache.org/dist/jena/binaries/apache-jena-3.9.0.tar.gz
$ tar xvzf apache-jena-3.9.0.tar.gz
This creates a folder named apache-jena-3.9.0
. To convert your RDF/XML
file (eg. file.rdf
) you can use this command:
$ ./apache-jena-3.9.0/bin/riot --output=turtle rdfxml file.rdf > file.ttl
And you will find the converted result in file.ttl
.
Jena supports these RDF formats: turtle
, ntriples
, nquads
, trig
and rdfxml
.
This project is similar and partially based on the EU DCAT-AP SHACL constraint definitions.
While the eCH-0200 Specification is available in German and French the SHACL shapes are documented in English to better allign with other shape files and tools that are likely used simultaneously.
- The specification mandates the use of
schema:url
as class. This seems to be a mistake, so we assume thatschema:URL
is what it's supposed to mean. - The SHACL file also supports
xsd:dateTime
where the spec mandatesxsd:date
. - Inference: The specification isn't explicit if and what inference should be allowed. We assume that where
vcard:Kind
is allowed its subclasses (Individual, Organization, Group, Location) should be allowed to. SHACL only allows specifying ontological statements in the data and not in the shape graph, so currently using a subclass is only accepted if the respectiverdfs:subClassOf
statement is also present in the data. We could of course explicitly allow some named subclassed in the shape file but this doesn't seem to be wanted by the spec. - The type (
foaf:Document
) does not need to be explicitely specified for a document to validate; the type can be inferred from therdfs:range
offoaf:Document
.
- Shouldn't we require a dataset to be named (using standard IRI) rather than requiring a proprietary
dct:identifier
? - Also, shouldn't the
dct:publisher
be named, rather than being an instance offoaf:Agent
? Analogous questions can be asked fordcat:themeTaxonomy
andfoaf:homepage
. - It seems inconsistent to forbid
adms:status
on distributions while generally allowing arbitrary properties.
As prospective part of an eCH standard the code and documentations in this repository can be used, distributed and further developed without any restriction by patents or licenses.