-
Notifications
You must be signed in to change notification settings - Fork 0
/
PKG-INFO
47 lines (39 loc) · 2.18 KB
/
PKG-INFO
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
Metadata-Version: 1.0
Name: guess-language
Version: 0.3
Summary: Guess the natural language of a text
Home-page: http://code.google.com/p/guess-language
Author: Kent Johnson
Author-email: kent3737@gmail.com
License: LGPL
Description: Attempts to determine the natural language of a selection of Unicode (utf-8) text.
Based on `guesslanguage.cpp
<http://websvn.kde.org/branches/work/sonnet-refactoring/common/nlp/guesslanguage.cpp?view=markup>`_
by Jacob R Rideout for KDE which itself is based on
`Language::Guess <http://languid.cantbedone.org/>`_ by Maciej Ceglowski.
Detects over 60 languages - all languages listed in the `trigrams
<http://code.google.com/p/guess-language/source/browse/trunk/guess_language/trigrams/>`_
directory plus Japanese, Chinese, Korean and Greek.
guess_language uses heuristics based on the character set and trigrams in a sample text
to detect the language. It works better with longer samples and will be confused if
the sample text includes markup such as HTML tags.
Usage
=====
The main entry points all take a single string as input and return a language identifier.
The string must be Unicode or UTF-8 text. The language identifer can be the language name
in English, the two- or three-letter IANA language code, a language ID or a tuple containing
all three codes.
The primary entry points, and the return values, are as follows::
guessLanguage(txt) - IANA language code
guessLanguageTag(txt) - IANA language code (same as guessLanguage)
guessLanguageName(txt) - Language name in English
guessLanguageId(txt) - language ID
guessLanguageInfo(txt) - tuple of (IANA code, id, name)
Platform: any
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: GNU Library or Lesser General Public License (LGPL)
Classifier: Natural Language :: English
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python
Classifier: Topic :: Text Processing :: Linguistic