Skip to content

cldf-datasets/imtvault

Repository files navigation

IMTVault

CLDF validation

CLDF dataset containing Interlinear Glossed Text extracted from linguistic literature.

How to cite

If you use data from this dataset, please cite the released version of the data you are using as well as the paper introducing IMTVault

Krämer, Thomas, and Sebastian Nordhoff. 2022. "IMTVault: Extracting and Enriching Low-resource Language Interlinear Glossed Text from Grammatical Descriptions and Typological Survey Articles: Proceedings of The 8th Workshop on Linked Data in Linguistics within the 13th Language Resources and Evaluation Conference." 13th Language Resources and Evaluation Conference lREC 2022, LREC 2022, Marseille, 24.06.2022.

Coverage

Distribution of examples in IMTVault across the languages of the world:

How to use

The dataset provided in the cldf directory is a valid CLDF dataset. Thus, after looking up the file and column names for standard CLDF tables and properties in the metadata or the README.md, you can use any tool capable of reading CSV to poke around the data.

For example, you could use the commandline tools from the csvkit package

  • to check whether a particular language is represented in the dataset
    $ csvgrep -c Name -m Amele cldf/languages.csv | csvcut -c ID,Name,Examples_Count
    ID,Name,Examples_Count
    amel1241,Amele,4
  • to filter examples based on values for specific properties
    $ csvgrep -c Language_ID -m amel1241 cldf/examples.csv | csvgrep -c"Gloss" -r"food" | csvcut -c ID,Primary_Text,Translated_Text
    ID,Primary_Text,Translated_Text
    langsci220-e5ca0880e8,[Ija sab fajec nu] huga.,I came to buy food.
    glossa5188-47,[ Ege humeb ] sab josi,We came and they two ate the food.

If you found suitable examples, you might render them in a human-readable format using cldfviz.text. E.g. the CLDF markdown snippet

[](ExampleTable?with_internal_ref_link#cldf:langsci220-e5ca0880e8) 
[](ExampleTable?with_internal_ref_link#cldf:glossa5188-47)

[References](Source?cited_only#cldf:__all__)

in a file amele_examples.md would render via

cldfbench cldfviz.text --text-file amele_examples.md cldf/Generic-metadata.json

as


(langsci220-e5ca0880e8) Amele (Roberts 1987, Schmidtke-Bode et al. 2018: via)

[Ija  sab   faj-ec        nu]  h-ug-a.  
1SG   food  buy-INF/NMLZ  for  come-1SG-PST  
‘I came to buy food.’

(glossa5188-47) Amele (Stirling 1993: 213, Bárány and Nikolaeva 2019: via:49)

[ Ege   h-u-me-b          ] sab    jo-si-a.  
1PL  come-PRED-SS-1PL     food  eat-3DU.TODPST  
‘We came and they two ate the food.’
  • Bárány, András and Nikolaeva, Irina. 2019. Possessors in switch-reference. Glossa: a journal of general linguistics 4(1). Open Library of Humanities.
  • Roberts, John. 1987. Amele. London: Croom Helm.
  • Schmidtke-Bode, Karsten and Levshina, Natalia and Michaelis, Susanne Maria and Seržant, Ilja (eds.) 2018. Explanation in typology: Diachronic sources, functional motivations and the nature of the evidence. (Conceptual Foundations of Language Science, 3.) Berlin: Language Science Press.
  • Stirling, Lesley. 1993. Switch-reference and discourse representation. Cambridge: Cambridge University Press.