Skip to content

kirillfx/nltk-language-detection

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 

Repository files navigation

nltk-language-detection

N|Solid

Automatic detection of text language with Python and NLTK. This script uses a very simple approach based on stopwords comparaison. The stopwords list with the most commun words wins the association.

Dependencies

you have to install NLTK package for Python to run this script.

How it works

just give the script a brunch of text to analyse and the script will :

  • Parse and tokenize you text
  • Compare the tokens with all stopwords lists contained in NLTK corpus in all available languages
  • Select the most relevant language
  • Calculate the relevancy level of the selected language

Documentation

If you want to know how this script works, just have a look at this blog post titled Detection de langue en NLP i wrote (in french) on my personnal blog le-geek.com

About

Automatic langage detection with Python and NLTK

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%