Detectors

Tools used for this purpose:

*: Supports the Guarani language.

Installation

Pre-requisites:

Install polyglot dependencies.

Install requirements pip install -r requirements.txt

Download fastText lib.

~~Download the crubadan corpus.~~

# commented out due to low precision of textcat, use glcd3 instead.
"""
import nltk
nltk.download('crubadan')
nltk.download('punkt')
"""

Command Line Interface

All commands must be run from the src directory.

Detect language of tweets

python run.py [data_dir] [file_name_of_tweets] [language_lexicon] --detect_language --guarani

data_dir: path to data directory and must be relative to the src directory. Required.
file_name_of_tweets: Name of the file containing the tweets in CSV format. Required.
language_lexicon: Name of the file containing the language's (to-identify) words lexicon. Optional. In fact, language_lexicon can be any low-resource language.
guarani: The language (to-identify) is Guarani (or another low-resource language)? Optional. Needed for language_lexicon.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
lang.cmd		lang.cmd
lang_2.cmd		lang_2.cmd
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Detectors

Installation

Pre-requisites:

Command Line Interface

Detect language of tweets

About

Releases 2

Packages

Languages

License

mmaguero/lang-detection

Folders and files

Latest commit

History

Repository files navigation

Detectors

Installation

Pre-requisites:

Command Line Interface

Detect language of tweets

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 2

Packages 0

Languages

Packages