This library provides easy access to Interlinear Glossed Text (IGT) according to the Leipzig Glossing Rules, stored as CLDF examples.
Installing pyigt
via pip
pip install pyigt
will install the Python package along with a command line interface igt
.
Note: The methods Corpus.get_wordlist
and Corpus.get_profile
, to extract a wordlist and an orthography profile
from a corpus, require the lingpy
package. To make sure it is installed, install pyigt
as
pip install pyigt[lingpy]
$ igt -h
usage: igt [-h] [--log-level LOG_LEVEL] COMMAND ...
optional arguments:
-h, --help show this help message and exit
--log-level LOG_LEVEL
log level [ERROR|WARN|INFO|DEBUG] (default: 20)
available commands:
Run "COMAMND -h" to get help for a specific command.
COMMAND
ls List IGTs in a CLDF dataset
stats Describe the IGTs in a CLDF dataset
The igt ls
command allows inspecting IGTs from the commandline, formatted using the
four standard lines described in the Leipzig Glossing Rules, where analyzed text and
glosses are aligned, e.g.
$ igt ls tests/fixtures/examples.csv
Example 1:
zəple: ȵike: peji qeʴlotʂuʁɑ,
zəp-le: ȵi-ke: pe-ji qeʴlotʂu-ʁɑ,
earth-DEF:CL WH-INDEF:CL become-CSM in.the.past-LOC
...
Example 5:
zuɑməɸu oʐgutɑ ipiχuɑȵi,
zuɑmə-ɸu o-ʐgu-tɑ i-pi-χuɑ-ȵi,
cypress-tree one-CL-LOC DIR-hide-because-ADV
IGT corpus at tests/fixtures/examples.csv
igt ls
can be chained with other commandline tools such as commands from the
csvkit package for filtering:
$ csvgrep -c Primary_Text -m"ȵi" tests/fixtures/examples.csv | csvgrep -c Gloss -m"ADV" | igt ls -
Example 5:
zuɑməɸu oʐgutɑ ipiχuɑȵi,
zuɑmə-ɸu o-ʐgu-tɑ i-pi-χuɑ-ȵi,
cypress-tree one-CL-LOC DIR-hide-because-ADV
The Python API is documented in detail at readthedocs. Below is a quick overview.
You can read all IGT examples provided with a CLDF dataset
>>> from pyigt import Corpus
>>> corpus = Corpus.from_path('tests/fixtures/cldf-metadata.json')
>>> len(corpus)
5
>>> for igt in corpus:
... print(igt)
... break
...
zəple: ȵike: peji qeʴlotʂuʁɑ,
zəp-le: ȵi-ke: pe-ji qeʴlotʂu-ʁɑ,
earth-DEF:CL WH-INDEF:CL become-CSM in.the.past-LOC
or instantiate individual IGT examples, e.g. to check for validity:
>>> from pyigt import IGT
>>> ex = IGT(phrase="palasi=lu", gloss="priest-and")
>>> ex.check(strict=True, verbose=True)
palasi=lu
priest-and
...
ValueError: Rule 2 violated: Number of morphemes does not match number of morpheme glosses!
or to expand known gloss abbreviations:
>>> ex = IGT(phrase="Gila abur-u-n ferma hamišaluǧ güǧüna amuq’-da-č.",
... gloss="now they-OBL-GEN farm forever behind stay-FUT-NEG",
... translation="Now their farm will not stay behind forever.")
>>> ex.pprint()
Gila aburun ferma hamišaluǧ güǧüna amuq’dač.
Gila abur-u-n ferma hamišaluǧ güǧüna amuq’-da-č.
now they-OBL-GEN farm forever behind stay-FUT-NEG
‘Now their farm will not stay behind forever.’
OBL = oblique
GEN = genitive
FUT = future
NEG = negation, negative
And you can go deeper, parsing morphemes and glosses according to the LGR (see module pyigt.lgrmorphemes):
>>> igt = IGT(phrase="zəp-le: ȵi-ke: pe-ji qeʴlotʂu-ʁɑ,", gloss="earth-DEF:CL WH-INDEF:CL become-CSM in.the.past-LOC")
>>> igt.conformance
<LGRConformance.MORPHEME_ALIGNED: 2>
>>> igt[1, 1].gloss
<Morpheme "INDEF:CL">
>>> igt[1, 1].gloss.elements
[<GlossElement "INDEF">, <GlossElementAfterColon "CL">]
>>> igt[1, 1].morpheme
<Morpheme "ke:">
>>> print(igt[1, 1].morpheme)
ke:
- interlineaR - an R package with similar functionality, but support for more input formats.