Skip to content

Cross-Linguistic Data Format (CLDF) dataset derived from von Rosenberg's "De Mentawei-Eilanden en Hunne Bewoners" from 1853.

License

Notifications You must be signed in to change notification settings

complexico/mentawai-word-list-1853

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

38 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CLDF dataset derived from von Rosenberg's "De Mentawei-Eilanden en Hunne Bewoners" from 1853

CLDF validation DOI

How to cite

If you use these data please cite

  • the original source

    Rosenberg, Carl Benjamin Hermann von. 1853. De Mentawei-Eilanden en Hunne Bewoners. Tijdschrift voor Indische Taal-, Land- en Volkenkunde 1. 403–440.

  • the derived dataset using the DOI of the particular released version you were using

Description

This dataset is licensed under a https://creativecommons.org/licenses/by-nc-sa/4.0/deed.en license

Available online at https://www.digitale-sammlungen.de/en/view/bsb10433845?page=450,451

Notes

Based on the Rights Statement (presented down below in that page), this digitised journal has a No Copyright-Non-commercial use only condition.

Before the CLDF conversion using Python, the materials in this repository (inside the data directory) were processed using R as an RStudio project (the R scripts are in the codes directory). The English gloss of the Dutch was generated via the DeepL translator using the deeplr R package.

As a long-time R user, the motivation to produce this repository is as a practice to get started with the cldfbench workflow in Python to implement the Cross-Linguistic Data Format (CLDF) that I would like to apply and extend to the Enggano lexical resources project I have been part of. The other motivation is to (i) document this legacy data in a computer-readable format, (ii) enrich its content following the CLDF standard, and (iii) contribute to an on-going research on the languages of the Barrier Islands, in Sumatra, Indonesia, extending the Enggano language project.

Statistics

Glottolog: 100% Concepticon: 98% Source: 100% BIPA: 100% CLTS SoundClass: 100%

  • Varieties: 1 (linked to 1 different Glottocodes)
  • Concepts: 267 (linked to 255 different Concepticon concept sets)
  • Lexemes: 271
  • Sources: 1
  • Synonymy: 1.01
  • Invalid lexemes: 0
  • Tokens: 1,575
  • Segments: 31 (0 BIPA errors, 0 CLTS sound class errors, 31 CLTS modified)
  • Inventory size (avg): 31.00

Contributors

Name GitHub user Description Role
Gede Primahadi W. Rajeg @gederajeg Digitisation
Code
CLDF conversion
Concepticon mapping
Orthography profiling
Maintainer

CLDF Datasets

The following CLDF datasets are available in cldf: