Skip to content

Llamacha/Churana

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 

Repository files navigation

Churana: Revitalizing Languages

Languages are disappearing at an alarming rate; the linguistic rights of the speakers of most of the 6,500 languages are in danger of extinction. The Information and Communication Technologies (ICT) play a key role in preserving endangered languages. As the ultimate use of ICT, it is worth highlighting natural language processing, since this century, the lack of such support hinders literacy and prevents the use of the Internet and any electronic medium. The first step is constructing resources such as speech corpus, monolingual corpus, bilingual corpus, dictionaries. These resources allow the construction of linguistic tools for natural language processing. Some tools such as automatic speech recognition (ASR), translators (NMT), Text-to-Speech (TTS), and others help break the language barrier and revitalize minority languages. However, it is important to know why these languages are in danger of extinction.

In Peru, 48 native languages are still alive but threatened. All these languages are in danger of extinction. Experts point out that the replacement process is irreversible unless disruptive policies and tools emerge (Adelaar, 2014). There are many computational tools for language processing within ICT and under The Human Language Technologies (HLT) label. Thus, computational linguistics should be highlighted as the tool potential for the revitalization of national languages, as the lack of this support prevents the growth of these languages and their productive use on the Internet (and in any electronic system).

Churana is an open-source repository that aims to concentrate academics, independent scholars, organizations, communities, and individuals to revitalize and democratize the native languages in Peru. State of the art on sentiment analysis models

Contribute

Type Platforms
💬 General Discussion Slack Group
How to contribute Github Fork
🙋 Feature Requests & Ideas GitHub Issue Tracker

If you know of any resource available that is not on this list, please add it, either using the link above or by submitting pull requests.

Table of Contents

Organizations

  • [Org] Hinantin
    Research and NLP Software Development.

  • [Org] Siminchikkunarayku
    Preservation and revitalization of native languages in America using computational linguistics.

Computing Systems

Language Specific Projects

Ashaninka

Aymara

Quechua Sureño

  • [Corpus] A Quechua-Spanish parallel treebank
    An Quechua-Spanish parallel treebank. (Rios et al., 2008)

  • [Corpus] On the Building of the Large Scale Corpus of Southern Qichwa
    A non annotated corpus of Southern Qichwa (156 hours). (Camacho et al., 2017)

  • [Corpus] Dictionnaire électronique français-quechua des verbes pour le TAL
    A dictionary of French-Quechua verb. (Duran, 2017)

  • [Corpus] Siminchik: A Speech Corpus for Preservation of Southern Quechua
    Contains 99 hours of transcribed audio of the dialectic varieties of Chanca and Collao. (Cárdenas et al., 2018)

  • [Machine Translation] Building NLP Systems for Two Resource-Scarce Indigenous Languages: Mapudungun and Quechua
    Quechua-Spanish machine translation systems developed with rule-based techniques. (Monson et al., 2006)

  • [Machine Translation] A basic language technology toolkit for quechua
    A hybrid machine translation system that can translate Spanish text into Cuzco Quechua. (Rios, 2015)

  • [Machine Translation] Neural machine translation with a polysynthetic low resource language
    An NMT for Southern Quechua developed with several morphological segmentation techniques and a new one in order to decompose the language’s suffix-based morphemes. (Ortega et al., 2020)

  • [Machine Translation] Traducción automática neuronal para lengua nativa peruana
    An NMT for Chanca Quechua developed with transformers and deep learning methods achieving a BLEU of 39.5. (Huarcaya , 2020)

  • [Speech Recognition] Isolated Automatic Speech Recognition of Quechua Numbers using MFCC, DTW and KNN
    An ASR system of isolated Quechua numbers is developed using Mel-Frequency Cepstral Coefficients (MFCC), Dynamic Time Warping (DTW) and K-Nearest Neighbor (KNN). (Chacca et al., 2018)

  • [Speech Recognition] Conversor de voz a texto para el idioma quechua usando la herramienta de reconocimiento de voz KALDI y una red neuronal profunda
    An ASR built with DNN-HMM achieving a Acc 59.20%. (Aimituma et al., 2019)

  • [Speech Recognition] Automatic Speech Recognition of Quechua Language Using HMM Toolkit
    An ASR built with Hidden Markov Model Toolkit achieving a WER-Test 12.70. (Zevallos et al., 2019)

  • [Spell Checking] Spell checking an agglutinative language: Quechua
    A spell checker using finite state methods for the agglutinative language Quechua. (Rios, 2011)

  • [Syntactic Analyzer] Syntactic Analyzer for Quechua Language
    A syntactic analyzer for Quechua which makes use of a dynamic programming technique with a context freegrammar. (Lozano et al., 2013)

  • [Morphological Analyzer] Morphological Disambiguation and Text Normalization for Southern Quechua Varieties
    A pipeline to normalize Quechua texts through morphological analysis and disambiguation. (Rios et al., 2014)

  • [Alignment Techniques] Using Morphemes from Agglutinative Languages like Quechua and Finnish to Aid in Low-Resource Translation A novel alignment technique for agglutinative languages like Quechua and Finnish. (Ortega et al., 2018)

  • [Word Sense disambiguation] Towards Cross-Language Word Sense Disambiguation for Quechua
    A cross-language WSD for Quechua. (Rudnick, 2011)

  • [Tools] Allin Qillqay! A Free Online Web Spell Checking Service for Quechua
    First online web spell checking for Quechua. (Castro et al., 2014)

Shipibo

  • [Corpus] Corpus Creation and Initial SMT Experiments between Spanish and Shipibo-konibo
    First Spanish-Shipibo parallel corpus. (Garraleta et al., 2017)

  • [Corpus] No data to crawl? Monolingual corpus creation from PDF files of truly low-resource languages in Peru
    New monolingual corpora for four indigenous and endangered languages from Peru. (Bustamante et al., 2020)

  • [Spell Checking] Spell-Checking based on Syllabification and Character-level Graphs for a Peruvian Agglutinative Language
    A spell checker using finite state methods for the Shipibo-konibo. (Alva et al., 2017)

  • [Morphological Analyzer] A morphological analyzer for shipibo-konibo
    A fairly complete morphological analyzer for Shipibo-Konibo. (Cárdenas et al., 2018)

  • [WordNet] WordNet-Shp: Towards the Building of a Lexical Database for a Peruvian Minority Language
    An initial WordNet database for a low-resourced and indigenous language in Peru. (Maguino-Valencia et al., 2018)

  • [Word-Embeddings] Learning Contextualised Cross-lingual Word Embeddings for Extremely Low-Resource Languages Using Parallel Corpora
    A new approach for learning contextualised cross-lingual word embeddings based only on a small parallel corpus. (Wada et al., 2020)

License

Licencia de Creative Commons
Este obra está bajo una licencia de Creative Commons Reconocimiento-NoComercial 4.0 Internacional.