Contemplations and resolutions for rule-based morphological descriptions of languages with rich and regular morphology with examples from Uralic, Salish and Maipurean languages
- Rueter_Hoimupaevad-2019-10-11_Tallinn.pdf
These are the slides from a presentation in Tallinn 2019-10-11
Morphological analyzers and other digital tools for Uralic languages
- (1) Extract paradigms from grammars, readers and research to build an analyzer.
- (2) Extract words, part-of-speech information and definitions from existing dictionaries and research. Build on what has already been done (Dutch, French, German, Russian,…)
- (3) Test analysis coverage on written texts. Are the forms unrecognized proper words?
- (4) Disambiguate morphological analyses based on grammars and research. Point out gaps in descriptions
- (5) Test syntactic disambiguation on example sentences cited in grammatical descriptions of the language. And then retest on text corpora.
- (6) Make disambiguated sentences public, so others can test. One by-product of these golden standards are treebanks.
- (7) Use all phases to benefit the speaker and research community
-
First transducers of minority Uralic languages after Finnish 1983 (Kimmo Koskenniemi)
-
Meadow Mari ~1986 (Jorma Luutonen)
-
Giellatekno ~2000 begins work with Sami descriptions (Trond Trosterud et al) Barents Sea languages, Circum Polar languages ~2004-> other Uralic languages
- Balto-Finnic: fit = Meänkieli, fkv = Kveen, izh = Ingrian, krl = Karelian, liv = Livonian, olo = Olonets-Karelian aka Livvi, vep = Veps, vot = Votic, vro = Võro
- Sami: sjd = Kildin Sami, sje = Pite Sami, sma = South Sami, sme = Northern Sami, smj = Lule Sami, smn = Inari Sami, sms = Skolt Sami
- Mordvin: mdf = Moksha, myv = Erzya
- Mari: mhr = Meadow & Eastern Mari, mrj = Hill Mari aka Western Mari
- Permic: koi = Komi-Permyak, kpv = Komi-Zyrian, udm = Udmurt
- Ob Ugrian: kca = Khanty, mns = Mansi
- Samoyedic: nio = Nganasan , sel = Selkup, yrk = Nenets
- Uralic languages in majority: est = Estonian, fin = Finnish, hun = Hungarian
- Auxilliary languages: deu = German, lav = Latvian, nob = Norwegian Bokmål, rus = Russian, tat = Tatar
- Find a source and use the known morphological information
- Find or build a lexicon to propogate this word type
...
- Keyboards Giellalt/
- Spell checkers: Hunspell, Voikko
- Click-in-text dictionaries
- Language learning
- Text-to-speech
- Translation
- North Sami
- Sourthern Sami
- Inari Sami
- Skolt Sami
- Northern Balto-Finnic languages
- Southern Balto-Finnic languages
- Erzya and Moksha
- Hill and Meadow Mari
- Komi-Zyrian and Udmurt
- Tundra Nenets and Mansi
Material Collaborations: FU-Lab, University of Turku, University of Tartu, University, EKI, University of Vienna, Livones, Võro Instituut
- ICALL at Giella (Giellatekno & Divvun)
- Northern Sami
Flag ship
- Skolt Sami
- Erzya
- Võro