CLDF dataset containing Interlinear Glossed Text extracted from linguistic literature.
If you use data from this dataset, please cite the released version of the data you are using as well as the paper introducing IMTVault
Krämer, Thomas, and Sebastian Nordhoff. 2022. "IMTVault: Extracting and Enriching Low-resource Language Interlinear Glossed Text from Grammatical Descriptions and Typological Survey Articles: Proceedings of The 8th Workshop on Linked Data in Linguistics within the 13th Language Resources and Evaluation Conference." 13th Language Resources and Evaluation Conference lREC 2022, LREC 2022, Marseille, 24.06.2022.
Distribution of examples in IMTVault across the languages of the world:
The dataset provided in the cldf
directory is a valid CLDF dataset. Thus, after
looking up the file and column names for standard CLDF tables and properties in the metadata
or the README.md, you can use any tool capable of reading CSV to poke around the data.
For example, you could use the commandline tools from the csvkit package
- to check whether a particular language is represented in the dataset
$ csvgrep -c Name -m Amele cldf/languages.csv | csvcut -c ID,Name,Examples_Count ID,Name,Examples_Count amel1241,Amele,4
- to filter examples based on values for specific properties
$ csvgrep -c Language_ID -m amel1241 cldf/examples.csv | csvgrep -c"Gloss" -r"food" | csvcut -c ID,Primary_Text,Translated_Text ID,Primary_Text,Translated_Text langsci220-e5ca0880e8,[Ija sab fajec nu] huga.,I came to buy food. glossa5188-47,[ Ege humeb ] sab josi,We came and they two ate the food.
If you found suitable examples, you might render them in a human-readable format using cldfviz.text. E.g. the CLDF markdown snippet
[](ExampleTable?with_internal_ref_link#cldf:langsci220-e5ca0880e8)
[](ExampleTable?with_internal_ref_link#cldf:glossa5188-47)
[References](Source?cited_only#cldf:__all__)
in a file amele_examples.md
would render via
cldfbench cldfviz.text --text-file amele_examples.md cldf/Generic-metadata.json
as
(langsci220-e5ca0880e8) Amele (Roberts 1987, Schmidtke-Bode et al. 2018: via)
[Ija sab faj-ec nu] h-ug-a. 1SG food buy-INF/NMLZ for come-1SG-PST ‘I came to buy food.’
(glossa5188-47) Amele (Stirling 1993: 213, Bárány and Nikolaeva 2019: via:49)
[ Ege h-u-me-b ] sab jo-si-a. 1PL come-PRED-SS-1PL food eat-3DU.TODPST ‘We came and they two ate the food.’
- Bárány, András and Nikolaeva, Irina. 2019. Possessors in switch-reference. Glossa: a journal of general linguistics 4(1). Open Library of Humanities.
- Roberts, John. 1987. Amele. London: Croom Helm.
- Schmidtke-Bode, Karsten and Levshina, Natalia and Michaelis, Susanne Maria and Seržant, Ilja (eds.) 2018. Explanation in typology: Diachronic sources, functional motivations and the nature of the evidence. (Conceptual Foundations of Language Science, 3.) Berlin: Language Science Press.
- Stirling, Lesley. 1993. Switch-reference and discourse representation. Cambridge: Cambridge University Press.