- In the GeoNames dataset you can find informations such as Area in sq km, the countries ISO codes, phone code, currency...
- In WorldBank dataset: yearly GDP value in $US of all countries (from 1980 to 2018). 2 files are available for WorldBank:
- dataset-worldbank-gdp.xml contains a the yearly GDP of a few dozens of countries: RML should be pretty fast to process it
- dataset-worldbank-gdp-full.xml contains the full datasets (all countries): RML can take more than 30min to process
- We advise you to use the dataset-worldbank-gdp.xml dataset when testing. And when you find the right configuration you can run it for the full XML file with all countries.
- Useful links
- Ontologies
- SPARQL specifications
- Find the URL for a prefix: http://prefix.cc
The easiest way to download the repository is to clone it using git
:
git clone https://github.com/MaastrichtU-IDS/UM_KEN4256_KnowledgeGraphs.git
cd UM_KEN4256_KnowledgeGraphs/
You can also download it as a .zip file.
You can test if you have Java installed by opening the the terminal
(or PowerShell
on Windows) and typing:
java -version
If Java is not installed, you can install the version 8 from the Java website.
Download RML processor rmlmapper.jar and put it in the UM_KEN4256_KnowledgeGraphs
folder to execute the example mapping file:
java -jar rmlmapper.jar -m "mapping.ttl" -o "output.nt" --duplicates
- This command should be executed in the directory where the
rmlmapper.jar
file and RDF files are located (this repository). --duplicates
allow to remove duplicates triples from the output file.- The example mapping.ttl file is available to help you start converting the first columns.
Running the
rmlmapper
on the full DrugBank dataset can take about 40min. Let us know if your computer can't make it.
-
Download GraphDB (register to receive an email with the download links)
-
Install from exe, dmg, deb or rpm depending on your operating system.
-
Access it on http://localhost:7200/
Setup
>Repositories
>Create new repository
- Enter the repository ID you want (only mandatory field here)
Create
- Try out the other parameters (the Context index is recommended if you use multiple graphs)
Enabling security and user management is not necessary when using GraphDB in local. Contact us if you have issues with it.
GraphDB offers multiple various modules that can be useful to visualize or process data, such as the class hierarchy visualization or OntoRefine.
Download the jar file for LIMES release 1.7.1.
An example of LIMES config file is provided in the repository, see limes_config.xml
java -jar limes-core-1.7.1.jar limes_config.xml
See the official LIMES documentation for more details on its options, such as the available metrics and thresholds.
Or try out the LIMES Web UI: http://limes.aksw.org/
Conversion can be done using various other tools and methods. You are encouraged to use different tools than RML mapper and LIMES if they fit the task. Here are some examples of other tools to convert structured data to RDF, they usually needs a bit more proficiency with programming and deploying services on your machine than RML, but are more scalable and can process gigabytes of data.
Students using Linux or MacOS and who already used Docker can use the d2s client, a scalable tool to convert input datasets to a target RDF knowledge graph. It uses SPARQL queries to map the input data to the target ontology instead of RML mappings. See the documentation.
pip install d2s cwlref-runner
d2s init
Client in Python 3, using docker-compose to run services and CWL to run workflows.
A new tool for RML processing, it aims to be a scalable implementations of RML. RMLStreamer process stream of data to RDF.
It will require you to start Apache Flink to stream the data (using Docker).
You could also use R2RML. The RDB (Relational Database) to RDF Mapping Language is a precursor of RML, it allows you to define mappings for SQL databases (RML extends it for other files, such as XML or JSON). R2RML has much more fast and scalable implementations, but doesn't handle XML (you would need to convert the XML to a CSV or a RDB). R2RML doesn't support CSV natively but CSV files can be exposed as a relational database (each file being a table) using Apache Drill.
See this repository for easy deployment of Apache Drill using Docker. Start it on your
/data/r2rml
directory:docker run -dit --rm --name drill -v /data/r2rml:/data:ro -p 8047:8047 -p 31010:31010 umids/apache-drill:latest
Developed from OpenRefine, OntoRefine is specialized in converting and processing data to RDF. It is included in your GraphDB installation. It allows you to load data from CSV or XML, and apply some processing before converting it to RDF. See this tutorial for more informations.
A common way to process data is still to pick your favorite scripting language and use it to process the data. It usually offers more possibilities and libraries can be helpful, but the mappings are not expressed clearly in a mapping language, making them harder to read, share and reuse.
Be aware that the count operations can be really time consuming (depending on the dataset size), so you might want to remove it if the query is timing out.
select ?Concept (count(?Concept) as ?Count) # Count the number of ?Concept in the "group by"
where {?s a ?Concept} # We take all the URIs that are types of other URIs
group by ?Concept # Uniq concepts
order by desc(?Count) # Order from the most used class to the less
select ?Predicate (count(?Predicate) as ?Count)
where {
?s a <http://geonames.org/Country> .
?s ?Predicate ?o .
}
group by ?Predicate
order by desc(?Count)
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
select ?instance ?label
where {
?instance a <http://geonames.org/Country> .
OPTIONAL { ?instance rdfs:label ?label . } # Display the label if one
}