This document outlines the morpheme inventory and tagsets that are used in defining the morphotactics model (affixation models per part-of-speech). Morphotactics model is defined in text files that are under //src/analyzer/morphotactics/model.
Morphotactics model consists of 19 text files, where each defines the inflectional and derivational morphemes and agglutination patterns of a part-of-speech. They are structured according to the following conventions:
- Morphotactic model files should use '.txt' file extension.
- Morphotactic model files should be named as
[coarse_pos_tag].txt
, where[coarse_pos_tag]
stands for the part-of-speech for which the rewrite rules are defined for. - Morphotactic model files should end with an empty line (
\n
). - Morphotactic model files can only contain comment lines or lines that define FST rewrite rules.
- Lines that start with '
#
' are comment lines and they are disregarded in morphotactics FST compilation. - All non-comment lines define a FST rewrite rule. The format of a rewrite rule is similar to AT&T FSM format. It should contain 4 whitespace separated strings. First two are the name of the source and destination states. The last two are the input and output labels. Note that while compiling the morphological analyzer FST, we invert the morphotactics FST, thus the input and output labels switch sides (see //src/analyzer/build.sh).
In below sections we present the morpheme inventory (the set of morphemes that define the Turkish morphology) that is used in overall morphotactics model. Morpheme inventory is presented in three separate sections:
- Inflectional morphemes: morphemes that define the inflectional paradigm
of a part-of-speech. Inflectional morphemes are represented with a preceding
+
markup. Some inflectional features might not realize in surface form, we specify their meta-morphemes as<eps>
. - Derivational morphemes: morphemes that alter the part-of-speech of the
word when affixed. Derivational morphemes are represented with a preceding
-
markup. They are always realized in the surface form (no zero-derivations!), therefore there is always a corresponding meta-morpheme for them. - Others: these are not really suffixes, but additional tags that mark certain syntactic agreement, semantic and segmentation features, which are helpful in implementing models for morphological disambiguation, part-of-speech tagging and syntactic parsing
Meta-morphemes are composed of fully realized phonemes (represented in
lowercase; e.g. {c
, s
, n
} are the fully realized phonemes in cAsHnA
)
and meta-phonemes (represented in uppercase; {H
, A
} are the meta-phonemes
in cAsHnA
). Fully realized phonemes are the ones that occur in the surface
form. Meta-phonemes are used to represent allophones and morphophonemics model
realizes them to phonemes given the context. The set of meta-phonemes and
the morphophonemic processes that resolve them are implemented in
self-explanatory Thrax grammars that are under
//src/analyzer/morphophonemics.
Feature Category | Feature Value | Meta-Morphemes | Description | Applies To Categories |
---|---|---|---|---|
Case | Abl | +DAn, +NDAn | Ablative case | ADD, IN, NN, NNP, NOMP, PRD, PRF, PRI, PRP, PRP$, PRR, VN, WP |
Case | Acc | +YH, +NH | Accusative case | ADD, IN, NN, NNP, NOMP, PRD, PRF, PRI, PRP, PRP$, PRR, VN, WP |
Case | Bare | <eps> |
Caseless | ADD, IN, NN, NNP, NOMP, PRD, PRF, PRI, PRP, PRP$, PRR, VN, WP |
Case | Dat | +YA, +NA | Dative case | ADD, IN, NN, NNP, NOMP, PRD, PRF, PRI, PRP, PRP$, PRR, VN, WP |
Case | Gen | +NHn | Genitive case | ADD, IN, NN, NNP, NOMP, PRD, PRF, PRI, PRP, PRP$, PRR, VN, WP |
Case | Ins | +YlA | Instrumental / Comitative case | ADD, IN, NN, NNP, NOMP, PRD, PRF, PRI, PRP, PRP$, PRR, VN, WP |
Case | Loc | +DA, +NDA | Locative case | ADD, IN, NN, NNP, NOMP, PRD, PRF, PRI, PRP, PRP$, PRR, VN, WP |
Case | Nom | <eps> |
Nominative case | ADD, IN, NN, NNP, NOMP, PRD, PRF, PRI, PRP, PRP$, PRR, VN, WP |
Contrast | True | +YsA | Contrastive | ADD, IN, NN, NNP, PRD, PRF, PRI, PRP, PRP$, PRR, RB, VN, WRB, WP |
Copula | CndCop | +YA, +YsA | Conditional copula | NOMP, VB |
Copula | EvCop | +YmHş | Evidential copula | NOMP, VB |
Copula | GenCop | +DHr | Generalizing copula | NOMP, VB |
Copula | PastCop | +YDH | Past copula | NOMP, VB |
Copula | PresCop | <eps> |
Present copula | NOMP, VB |
NumberType | Dist | +SAr | Distributive | CD |
NumberType | Ord | +., +HncH | Ordinal | CD, NN (only number roots) |
PersonNumber | A3pl | +lAr | 3rd person plural (marked on nominals) | ADD, NN, NNP, NOMP, PRD, PRF, PRI, PRP, PRR, VN |
PersonNumber | A3sg | <eps> |
3rd person singular (marked on nominals) | ADD, NN, NNP, NOMP, PRD, PRF, PRI, PRP, PRR, VN |
PersonNumber | V1pl | +YHz, +k,+lHm | 1st person plural (marked on verbals) | NOMP, VB |
PersonNumber | V2pl | +sHnHz, +nHz, +YHn, +sAnHzA | 2nd person plural (marked on verbals) | NOMP, VB |
PersonNumber | V3pl | <eps> , +lAr, +sHn, +sHnlAr |
3rd person plural (marked on verbals) | NOMP, VB |
PersonNumber | V1sg | +YHm, +m | 1st person singular (marked on verbals) | NOMP, VB |
PersonNumber | V2sg | <eps> , +sHn,+n, +sAnA |
2nd person singular (marked on verbals) | NOMP, VB |
PersonNumber | V3sg | <eps> , +sHn |
3rd person singular (marked on verbals) | NOMP, VB |
Polarity | Neg | +mA | Negative polarity | VB |
Polarity | Pos | <eps> |
Positive polarity | VB |
Possessive | P2pl | +HnHz | 2nd person plural possessive | ADD, NN, NNP, IN, NOMP, PRD, PRF, PRI, PRR, VJ, VN, WP |
Possessive | P1pl | +HmHz | 1st person plural possessive | ADD, NN, NNP, IN, NOMP, PRD, PRF, PRI, PRR, VJ, VN, WP |
Possessive | P3pl | +lArH | 3rd person possessive | ADD, NN, NNP, IN, NOMP, PRD, PRF, PRI, PRR, VJ, VN, WP |
Possessive | P1sg | +Hm | 1st person singular possessive | ADD, NN, NNP, IN, NOMP, PRD, PRF, PRI, PRR, VJ, VN, WP |
Possessive | P2sg | +Hn, +HnHz | 2nd person singular possessive | ADD, NN, NNP, IN, NOMP, PRD, PRF, PRI, PRR, VJ, VN, WP |
Possessive | P3sg | +SH | 3rd person singular possessive | ADD, NN, NNP, IN, NOMP, PRD, PRF, PRI, PRR, VJ, VN, WP |
Possessive | Pnon | <eps> |
None possessive | ADD, NN, NNP, IN, NOMP, PRD, PRF, PRI, PRP, PRR, VJ, VN, WP |
TenseAspectMood | Aor | +Ar, +Hr, +r, +z | Aorist tense | VB |
TenseAspectMood | Desr | +sA | Desire / Past Auxiliary | VB |
TenseAspectMood | Fut | +YAcAk | Future tense | VB |
TenseAspectMood | Imp | <eps> |
Imperative | VB |
TenseAspectMood | Nar | +mHş | Narrative past tense / Perfective-Evidential | VB |
TenseAspectMood | Nec | +mAlH | Necesssitative / Obligative | VB |
TenseAspectMood | Opt | +YA | Optative | VB |
TenseAspectMood | Past | +DH | Past tense / Perfective | VB |
TenseAspectMood | Prog1 | +Hyor | Progressive tense 1 / Imperfective 1 | VB |
TenseAspectMood | Prog2 | +mAktA | Progressive tense 2 / Imperfective 2 | VB |
Feature Category | Feature Value | Meta-Morphemes | Description | Example | Derives From Category - To Category |
---|---|---|---|---|---|
Derivation | Able | -YAbil, -YA | Ability | gel-ebil -ir |
VB-to-VB |
Derivation | Acq | -lAn | Acquire | yeşil-len |
{ADD | NN | NNP | VN}-to-VB |
Derivation | Act | -NCA | According to | ben-ce |
{PRP | PRD}-to-RB |
Derivation | Aff | -CHl | Affinity | et-çil |
{ADD | NN | NNP}-to-{JJ | NN | NOMP} |
Derivation | After | -YHp | After doing so | gel-ip |
VB-to-CRB |
Derivation | Agt | -CH | Agentive | koşu-cu |
{ADD | NN | NNP | VN}-to-{JJ | NN | NOMP} |
Derivation | Alm | -YAyaz | Almost | düş-eyaz +dı |
VB-to-VB |
Derivation | AorNom | -Hr, -Ar, -r, -z | Aorist Nominalizer | gel-ir (e.g. elde edilen gelirler) |
VB-to-{VN | NOMP} |
Derivation | AorPart | -Hr, -Ar, -r, -z | Aorist Participle | tükenme-z (e.g. tükenmez kalem) |
VB-to-VJ |
Derivation | Apostrophe | -' | Apostrophe | Yüzüklerin Efendisi-' +nden |
{ADD | CC | CD | CRB | DT | DUP | EP | EX | FW | GW | IN | JJ | LS | NFP | NN | NNP | NOMP | OP | PDT | PFX | PRD | PRF | PRI | PRP | PRP$ | PRR | RB | RPC | RPNEG | RPQ | SYM | UH | VB | VJ | VN | WRD | WRB | WP}-to-NN |
Derivation | As | -DHkçA | As | git-tikçe |
VB-to-CRB |
Derivation | AsIf | -cAsHnA | As if | koşar-casına |
VB-to-{CRB | NOMP} |
Derivation | Bcm | -lAş | Become | iyi-leş |
{ADD | NN | NNP}-to-VB |
Derivation | By | -NCA | By | aklı-nca |
{ADD | NN | NNP | VN}-to-RB |
Derivation | Cau | -DHr, -Hr, -Ht, -t | Causative | yap-tır |
VB-to-VB |
Derivation | Coll | -CA, -CAk, -CAnAk | Collective | toplu-ca , toplu-cak , toplu-canak |
{ADD | NN | VN}-to-RB |
Derivation | Dim | -CHk, -cAğHz | Diminutive | kitap-çık |
{ADD | NN | VN}-to-NN, {ADD | NN | NNP | VN}-to-NOMP |
Derivation | Doct | -izm | Doctrine | fütur-izm |
{ADD | NN | NNP}-to-{NN | NOMP} |
Derivation | Ever | -YAgel | Ever | sür-egel +en |
VB-to-VB |
Derivation | Fam | -gil, -lAr | Family | annem-gil |
{ADD | NN | NNP}-to-{NN | NOMP} |
Derivation | Foll | -ist, -st | Follower | fütur-ist |
{ADD | NN | NNP}-to-{JJ | NN | NOMP} |
Derivation | For | -lHk | For | kitap-lık , saat-lik |
{ADD | NN | VN}-to-{JJ | NN | NOMP} |
Derivation | From | -lH | From | Ankara'-lı |
{ADD | NN | NNP}-to-{JJ | NN | NOMP} |
Derivation | FutNom | -YAcAk | Future Nominalizer | yak-acak (e.g. yakacağımız bitti) |
VB-to-{VN | NOMP} |
Derivation | FutPart | -YAcAk | Future Participle | yak-acak (e.g. yakacak malzeme) |
VB-to-VJ |
Derivation | Ger | -YArAk, -DAn | Gerund | koş-arak (e.g. koşarak geldim), koş+ma-dan (e.g. koşmadan geldim) |
VB-to-CRB |
Derivation | Haste | -YHver | Haste | koş-uver |
VB-to-VB |
Derivation | Inf | -mAk | Infinitive | koş-mak |
VB-to-{NOMP | VN} |
Derivation | Inh | -YHcH | Inherent | del-ici |
VB-to-{NN | NOMP | VJ} |
Derivation | Inter | -ara | Inter | kıtalar-ara +sı |
{ADD | NN}-to-{JJ | NOMP} |
Derivation | Lang | -CA | Language | Alman-ca |
{ADD | NN | NNP}-to-{NN | NOMP} |
Derivation | Like | -CA | Like | insan-ca |
{ADD | NN | VN}-to-{NN}, {ADD | NN | NNP | VN}-to-{JJ, NOMP} |
Derivation | Ly | -CA, -CAsHnA | Adverbial | aptal-casına (e.g. aptalcasına davranmak) |
JJ-to-{JJ | NOMP | RB}, {ADD | NN | NNP}-to-NN |
Derivation | Make | -lA | Make | işaret-le |
{ADD | NN | NNP | VN}-to-VB |
Derivation | Ness | -lHk | Ness | insan-lık |
{ADD | NN | NNP | VN}-to-{NN | NOMP} |
Derivation | Nonf | -mA, YHş | Nonfinite | konuş-ma , bak-ış |
VB-to-{NOMP | VN} |
Derivation | Of | -lArcA | Of | ton-larca |
{ADD | NN}-to-{JJ | NN | NOMP} |
Derivation | Pass | -Hl, -Hn | Passive | yap-ıl +dı |
VB-to-VB |
Derivation | PastNom | -DHk | Past Nominalizer | yap-tık +larım |
VB-to-{NOMP | VN} |
Derivation | PastPart | -DHk | Past Participle | yap-tığ -ım (e.g. yaptığım şeyler) |
VB-to-VJ |
Derivation | PerNom | -mHş | Perfective Nominalizer | gör-müş (e.g. görmüş geçirmiş) |
VB-to-{NOMP | VN} |
Derivation | PerPart | -mHş | Perfective Participle | büyü-müş (e.g. büyümüş çocuk) |
VB-to-VJ |
Derivation | PresNom | -YAn | Present Nominalizer | gel-en +ler |
VB-to-{NOMP | VN} |
Derivation | PresPart | -YAn | Present Participle | kazan-an (e.g. kazanan yarışmacılar) |
VB-to-VJ |
Derivation | Pron | -ki | Pronominalizer | evde-ki (e.g. evdekilerin yeri) |
{ADD | IN | NN | NNP| PRD | PRF | PRI | PRP | PRP$ | PRR | RB | VN | WP}-to-PRF |
Derivation | ProNom | -YAsH | Progressive Nominalizer | acı-yası |
VB-to-{NOMP | VN} |
Derivation | ProPart | -YAsH | Progressive Participle | gülün-esi (e.g. gülünesi şakalar) |
VB-to-VJ |
Derivation | Rcp | -Hş | Reciprocal | gül-üş |
VB-to-VB |
Derivation | Rel | -ki, -kH | Relativizer | okulda-ki (e.g. okuldaki öğrenciler) |
{ADD | IN | NN | NNP| PRD | PRI | PRP | PRR | VN | WP}-to-JJ, {ADD | IN | JJ | NN | PRD | PRI | PRP | PRP$ | PRR | RB | VN | WP}-to-NOMP |
Derivation | Rfx | -Hn | Reflexive | yıka-n |
VB-to-VB |
Derivation | Rpt | -YAdur | Repetitive | yürü-yedur |
VB-to-VB |
Derivation | Rtd | -sAl | Related | bilim-sel |
{ADD | NN | NNP | VN}-to-{JJ | NN | NNP | NOMP | VN} |
Derivation | Sim | -HmsHm, -sH, -sHl, -vari, -Hmtrak | Similar | sarı-msı , sarı-mtrak |
{ADD | NN | NNP | VN}-to-{JJ | NN | NNP | NOMP | VN} |
Derivation | Sincb | -YAlH | Since before | yap-alı |
VB-to-CRB |
Derivation | Since | -DHr | Since | zaman-dır |
NN-to-RB |
Derivation | Snd | -lA, -dA | Sound | fokur-da |
DUP-to-VB |
Derivation | Start | -YAkoy | Start | pişir-ekoy |
VB-to-VB |
Derivation | Stay | -YAkal | Stay | uyu-yakal |
VB-to-VB |
Derivation | When | -YHncA | When | uyu-yunca |
VB-to-{CRB | NOMP} |
Derivation | While | -Yken, -ken | While | uyur-ken |
VB-to-{CRB | NOMP}, {ADD | NN | NNP}-to-RB |
Derivation | With | -lH, -HlH | With | uyku-lu |
{ADD | CD | NN | NNP | VB | VN}-to-{NN | NOMP}, {ADD | CD | NN | NNP | VN}-to-JJ, {ADD | NN | NNP | VN}-to-RB, VB-to-VJ |
Derivation | Wout | -sHz | Without | uyku-suz |
{ADD | NN | NNP | VN}-to-{JJ | NN | NOMP | RB} |
Feature Category | Feature Value | Meta-Morphemes | Description | Applies To Categories |
---|---|---|---|---|
Apostrophe | True | +' | Apostrophe separating root and inflections | ADD, CD, NN (only abbreviated and number roots), NNP, NOMP |
ComplementType | CAbl | <eps> |
(Postposition has) ablative case marked complement | IN |
ComplementType | CAcc | <eps> |
(Postposition has) accusative case marked complement | IN |
ComplementType | CBare | <eps> |
(Postposition has) caseless complement | IN |
ComplementType | CDat | <eps> |
(Postposition has) dative case marked complement | IN |
ComplementType | CFin | <eps> |
(Postposition has) finite complement | IN |
ComplementType | CGen | <eps> |
(Postposition has) genitive case marked complement | IN |
ComplementType | CIns | <eps> |
(Postposition has) instrumental case marked complement | IN |
ComplementType | CNum | <eps> |
(Postposition has) numeric complement | IN |
ConjunctionType | Adv | <eps> |
Adverbial conjunction | CC |
ConjunctionType | Coor | <eps> |
Coordinating conjunction | CC |
ConjunctionType | Par | <eps> |
Parallel conjunction | CC |
ConjunctionType | Sub | <eps> |
Subordinating conjunction | CC |
DeterminerType | Def | <eps> |
Definitive (determiner) | DT |
DeterminerType | Dem | <eps> |
Demonstrative (determiner) | DT |
DeterminerType | Dir | <eps> |
Directional (determiner) | DT |
DeterminerType | Ind | <eps> |
Indefinite (determiner) | DT |
Proper | False | <eps> |
Inflectional group is not a part of proper noun | RB, NN |
Proper | True | <eps> |
Inflectional group is a part of proper noun | RB, NN |
Temporal | True | <eps> |
Temporal | ADD, CC, CD, CRB, DT, DUP, EP, EX, FW, GW, IN, JJ, LS, NFP, NN, NNP, NOMP, OP, PDT, PFX, PRD, PRF, PRI, PRP, PRP$, PRR, RB, RPC, RPNEG, RPQ, SYM, UH, VB, VJ, VN, WRB, WDT, WP |