The Under-represented Writer project is a research aimed at exploring the under-representation of non-Western writers in the digital landscape, and more specifically on Wikidata and Wikipedia. The projectis is composed of three part:
- A Semantic Model for the encoding of writers and their works. The folder contains the Under-Represented Writers Ontology (urw), and the Under-Represented Books Ontology (urb). In addition, a semantic mapping between urw and the dolce ontology is provided.
- A Knowledge Graph of authors and works gathered from Wikidata, Wikipedia, OpenLibrary, Goodreads, and Google Books. Within the folder you can find all the r2rml mapping files for populating the semantic models and links for downloading the SQL database used for the mapping, and the materialized triples in .ttl format.
- A set of NLP experiments for biographical events extraction. In this folder you can find a Lexico-Semantic Pattern approach for extracting migration events from Wikipedia biographies, and a corpus of sentences annotated with semantic roles.
Here you can try some SPARQL queries:
- Find a sample of 100 writers, their year and country of birth
- Count writers in the KG grouped by gender and condition
- Find a sample of 100 works, their publisher, country, and publication year
- Count all writers grouped by condition and continent of birth
- Find a sample of 100 writers born in Nigeria and their works
- Find all African writers born in 1985
- Find all works published in South Africa
- Count all works by their provenance and author condition
- Select the 100 most frequent subjects in the KG
- Count the most used languages in works written by Asian writers
- Count all the subjects of authors born in India during the 40s
- Find all the works which have the subject 'partition'
Additional information about the resource are available at https://underrepresented.di.unito.it
For any question you can write to marcoantonio.stranisci@unito.it