Skip to content

Latest commit

 

History

History
95 lines (72 loc) · 6.27 KB

README.md

File metadata and controls

95 lines (72 loc) · 6.27 KB

Generic IMTVault

CLDF Metadata: Generic-metadata.json

Sources: sources.bib

A collection of Interlinear Glossed Text extracted from linguistic literature

property value
dc:bibliographicCitation Krämer, Thomas, and Sebastian Nordhoff. 2022. "IMTVault: Extracting and Enriching Low-resource Language Interlinear Glossed Text from Grammatical Descriptions and Typological Survey Articles: Proceedings of The 8th Workshop on Linked Data in Linguistics within the 13th Language Resources and Evaluation Conference." 13th Language Resources and Evaluation Conference lREC 2022, LREC 2022, Marseille, 24.06.2022.
dc:conformsTo CLDF Generic
dc:identifier https://imtvault.org
dc:license https://creativecommons.org/licenses/by/4.0/
dcat:accessURL https://github.com/cldf-datasets/imtvault
prov:wasDerivedFrom
  1. cldf-datasets/imtvault v1.1
  2. Glottolog v5.1
  3. langsci/raw_texfiles 2dcdd57
  4. xrotwang/glossa_xml e66218c
prov:wasGeneratedBy
  1. python: 3.12.3
  2. python-packages: requirements.txt
rdf:ID imtvault
rdf:type http://www.w3.org/ns/dcat#Distribution

Source publications from which IGT examples are extracted are listed as Contributions.

property value
dc:conformsTo CLDF ContributionTable
dc:extent 1128

Columns

Name/Property Datatype Description
ID string
Regex: [a-zA-Z0-9_\-]+
Primary key
Name string
Description string
Contributor string
Citation string
Examples_Count integer
property value
dc:conformsTo CLDF LanguageTable
dc:extent 1611
rdfs:comment We add a pseudo-language with ID undefined to be able to add examples with unknown object language.

Columns

Name/Property Datatype Description
ID string
Regex: [a-zA-Z0-9_\-]+
Primary key
Name string
Macroarea string
Latitude decimal
≥ -90
≤ 90
Longitude decimal
≥ -180
≤ 180
Glottocode string
Regex: [a-z0-9]{4}[1-9][0-9]{3}
ISO639P3code string
Regex: [a-z]{3}
Examples_Count integer
Examples_Count_Log number
property value
dc:conformsTo CLDF ExampleTable
dc:extent 121596

Columns

Name/Property Datatype Description
ID string
Regex: [a-zA-Z0-9_\-]+
Primary key
Language_ID string References languages.csv::ID
Primary_Text string The example text in the source language.
Analyzed_Word list of string (separated by ) The sequence of words of the primary text to be aligned with glosses
Gloss list of string (separated by ) The sequence of glosses aligned with the words of the primary text
Translated_Text string The translation of the example text in a meta language
Meta_Language_ID string References the language of the translated text
References languages.csv::ID
LGR_Conformance string
Valid choices:
WORD_ALIGNED MORPHEME_ALIGNED
The level of conformance of the example with the Leipzig Glossing Rules
Comment string
LGR_Conformance_Level string
Valid choices:
2 1 0
Language_Name string Name of the object language as used in the source publication.
Abbreviations json Mapping of gloss abbreviations used in the examples to descriptions of their meaning.
Corpus_Reference string Identifies the location of the example in the underlying corpus
Source list of string (separated by ;) References sources.bib::BibTeX-key
Contribution_ID string References contributions.csv::ID