Py-Elotl

Python package for Natural Language Processing (NLP), focused on low-resource languages spoken in Mexico.

This is a project of Comunidad Elotl.

Developed by:

Paul Aguilar @penserbjorne, paul.aguilar.enriquez@hotmail.com
Robert Pugh @Lguyogiro, robertpugh408@gmail.com
Diego Barriga @umoqnier, dbarriga@ciencias.unam.mx

Requiere python>=3.6

Development Status Alpha. Read Classifiers
pip package: elotl
GitHub repository: ElotlMX/py-elotl

Installation

Using `pip`

pip install elotl

From source

git clone https://github.com/ElotlMX/py-elotl.git
cd py-elotl
pip install -e .

Use

Working with corpus

import elotl.corpus

Listing available corpus

print("Name\t\tDescription")
list_of_corpus = elotl.corpus.list_of_corpus()
for row in list_of_corpus:
    print(row)

Output:

Name		Description
['axolotl', 'Is a Spanish-Nahuatl parallel corpus']
['tsunkua', 'Is a Spanish-otomí parallel corpus']

Loading a corpus

If a non-existent corpus is requested, a value of 0 is returned.

axolotl = elotl.corpus.load('axolotlr')
if axolotl == 0:
    print("The name entered does not correspond to any corpus")

If an existing corpus is entered, a list is returned.

axolotl = elotl.corpus.load('axolotl')
for row in axolotl:
    print(row)

[
    'Hay que adivinar: un pozo, a la mitad del cerro, te vas a encontrar.',
    'See tosaasaanil, see tosaasaanil. Tias iipan see tepeetl, iitlakotian tepeetl, tikoonextis san see aameyalli.',
    '',
    'Adivinanzas nahuas'
]

Each element of the list has four indices:

non_original_language
original_language
variant
document_name

tsunkua = elotl.corpus.load('tsunkua')
  for row in tsunkua:
      print(row[0]) # language 1
      print(row[1]) # language 2
      print(row[2]) # variant
      print(row[3]) # document

Una vez una señora se emborrachó
nándi na ra t'u̱xú bintí
Otomí del Estado de México (ots)
El otomí de toluca, Yolanda Lastra

Package structure

The following structure is a reference. As the package grows it will be better documented.

├── CODE_OF_CONDUCT.md
├── CONTRIBUTING.md
├── dist
├── docs
├── elotl                           Top-level package
    ├── corpora                     Here are the corpus data
    ├── corpus                      Subpackage to load corpus
    ├── huave                       Huave language subpackage
        └── orthography.py          Module to normalyze huave orthography and phonemas
    ├── __init__.py                 Initialize the package
    ├── nahuatl                     Nahuatl language subpackage
        └── orthography.py          Module to normalyze nahuatl orthography and phonemas
    ├── otomi                       Otomi language subpackage
        └── orthography.py          Module to normalyze otomi orthography and phonemas
    ├── __pycache__
    └── utils                       Subpackage with common functions and files
        └── fst                     Finite State Transducer functions
            └── att                 Module with static .att files
├── LICENSE
├── Makefile
├── MANIFEST.in
├── pyproject.toml
├── README.md
└── tests

Development

Requirements

python3
HFST
GNU make
poetry
- For python packaging backend and virtualenvs

Quick build

poetry env use 3.x
poetry shell
make all

Where 3.x is your local python version. Check managing environments with poetry

Step by step

Build FSTs

Build the FSTs with make.

make fst

Create a virtual environment and activate it.

poetry env use 3.x
poetry shell

Update `pip` and generate distribution files.

python -m pip install --upgrade pip
poetry build

Testing the package locally

python -m pip install -e .

Send to PyPI

poetry publish

Remember to configure your PyPi credentials

License

Mozilla Public License 2.0 (MPL 2.0)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Py-Elotl

Installation

Using `pip`

From source

Use

Working with corpus

Listing available corpus

Loading a corpus

Package structure

Development

Requirements

Quick build

Step by step

Build FSTs

Create a virtual environment and activate it.

Update `pip` and generate distribution files.

Testing the package locally

Send to PyPI

License

References

About

Releases

Packages

Contributors 6

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 117 Commits
.github		.github
docs		docs
elotl		elotl
tests		tests
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml

License

ElotlMX/py-elotl

Folders and files

Latest commit

History

Repository files navigation

Py-Elotl

Installation

Using pip

From source

Use

Working with corpus

Listing available corpus

Loading a corpus

Package structure

Development

Requirements

Quick build

Step by step

Build FSTs

Create a virtual environment and activate it.

Update pip and generate distribution files.

Testing the package locally

Send to PyPI

License

References

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Contributors 6

Languages

Using `pip`

Update `pip` and generate distribution files.

Packages