Skip to content

MaastrichtU-IDS/UM_KEN4256_KnowledgeGraphs

Repository files navigation

Notes

Download the required files

The easiest way to download the repository is to clone it using git:

git clone https://github.com/MaastrichtU-IDS/UM_KEN4256_KnowledgeGraphs.git
cd UM_KEN4256_KnowledgeGraphs/

You can also download it as a .zip file.

Execute RML mapping files

You can test if you have Java installed by opening the the terminal (or PowerShell on Windows) and typing:

java -version

If Java is not installed, you can install the version 8 from the Java website.

Download RML processor rmlmapper.jar and put it in the UM_KEN4256_KnowledgeGraphs folder to execute the example mapping file:

java -jar rmlmapper.jar -m "mapping.ttl" -o "output.nt" --duplicates 
  • This command should be executed in the directory where the rmlmapper.jar file and RDF files are located (this repository).
  • --duplicates allow to remove duplicates triples from the output file.
  • The example mapping.ttl file is available to help you start converting the first columns.

Running the rmlmapper on the full DrugBank dataset can take about 40min. Let us know if your computer can't make it.

Install GraphDB

Download and install

Create a repository (triplestore)

  • Setup > Repositories > Create new repository
    • Enter the repository ID you want (only mandatory field here)
    • Create
    • Try out the other parameters (the Context index is recommended if you use multiple graphs)

Users

Enabling security and user management is not necessary when using GraphDB in local. Contact us if you have issues with it.

Explore

GraphDB offers multiple various modules that can be useful to visualize or process data, such as the class hierarchy visualization or OntoRefine.

Interlink datasets with LIMES

Download the jar file for LIMES release 1.7.1.

An example of LIMES config file is provided in the repository, see limes_config.xml

java -jar limes-core-1.7.1.jar limes_config.xml

See the official LIMES documentation for more details on its options, such as the available metrics and thresholds.

Or try out the LIMES Web UI: http://limes.aksw.org/

Using other tools (optional)

Conversion can be done using various other tools and methods. You are encouraged to use different tools than RML mapper and LIMES if they fit the task. Here are some examples of other tools to convert structured data to RDF, they usually needs a bit more proficiency with programming and deploying services on your machine than RML, but are more scalable and can process gigabytes of data.

Data2Services

Students using Linux or MacOS and who already used Docker can use the d2s client, a scalable tool to convert input datasets to a target RDF knowledge graph. It uses SPARQL queries to map the input data to the target ontology instead of RML mappings. See the documentation.

pip install d2s cwlref-runner
d2s init

Client in Python 3, using docker-compose to run services and CWL to run workflows.

RMLStreamer

A new tool for RML processing, it aims to be a scalable implementations of RML. RMLStreamer process stream of data to RDF.

It will require you to start Apache Flink to stream the data (using Docker).

R2RML

You could also use R2RML. The RDB (Relational Database) to RDF Mapping Language is a precursor of RML, it allows you to define mappings for SQL databases (RML extends it for other files, such as XML or JSON). R2RML has much more fast and scalable implementations, but doesn't handle XML (you would need to convert the XML to a CSV or a RDB). R2RML doesn't support CSV natively but CSV files can be exposed as a relational database (each file being a table) using Apache Drill.

See this repository for easy deployment of Apache Drill using Docker. Start it on your /data/r2rml directory:

docker run -dit --rm --name drill -v /data/r2rml:/data:ro -p 8047:8047 -p 31010:31010 umids/apache-drill:latest

OntoRefine

Developed from OpenRefine, OntoRefine is specialized in converting and processing data to RDF. It is included in your GraphDB installation. It allows you to load data from CSV or XML, and apply some processing before converting it to RDF. See this tutorial for more informations.

Python scripts

A common way to process data is still to pick your favorite scripting language and use it to process the data. It usually offers more possibilities and libraries can be helpful, but the mappings are not expressed clearly in a mapping language, making them harder to read, share and reuse.

Explore a graph using SPARQL

Be aware that the count operations can be really time consuming (depending on the dataset size), so you might want to remove it if the query is timing out.

Count all classes in the graph

select ?Concept (count(?Concept) as ?Count) # Count the number of ?Concept in the "group by"
where {?s a ?Concept} # We take all the URIs that are types of other URIs
group by ?Concept # Uniq concepts
order by desc(?Count) # Order from the most used class to the less

Get all properties for a Class

select ?Predicate (count(?Predicate) as ?Count) 
where {
	?s a <http://geonames.org/Country> .
	?s ?Predicate ?o .
} 
group by ?Predicate
order by desc(?Count)

Get all instances of a Class

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
select ?instance ?label
where {
    ?instance a <http://geonames.org/Country> .
    OPTIONAL { ?instance rdfs:label ?label . } # Display the label if one
}