Skip to content
Gaël de Chalendar edited this page Jun 24, 2024 · 51 revisions

Table of Contents generated with DocToc

Build status GitHub Action Workflow status

LIMA logo

TL;DR

LIMA python bindings are currently available under Linux only (x86_64).

Under Linux with python >= 3.7 and < 4, and upgraded pip:

# Upgrading pip is fundamental in order to obtain the correct LIMA version
$ pip install --upgrade pip
$ pip install aymara==0.5.0b6
$ lima_models.py -l eng
# Either simply use the lima command to produce an analysis of a file in CoNLLU format:
$ lima <path to the file to analyse>
# Or use the python API:
$ python
>>> import aymara.lima
>>> nlp = aymara.lima.Lima("ud-eng")
>>> doc = nlp('Hello, World!')
>>> print(doc[0].lemma)
hello
>>> print(repr(doc))
1       Hello   hello   INTJ    _       _               0       root    _       Pos=0|Len=5
2       ,       ,       PUNCT   _       _               1       punct   _       Pos=5|Len=1
3       World   World   PROPN   _       Number:Sing     1       vocative        _       Pos=7|Len=5
4       !       !       PUNCT   _       _               1       punct   _       Pos=12|Len=1

Introducing LIMA, The Libre Multilingual Analyzer, a Natural Language Processing (NLP) toolkit

LIMA is a multilingual linguistic analyzer developed by the CEA LIST, LASTI laboratory (French acronym for Text and Image Semantic Analysis Laboratory). LIMA is Free Software, available under the MIT license.

LIMA has state of the art performance for more than 60 languages thanks to its recent deep learning (neural network) based modules. But it includes also a very powerful rules based mechanism called ModEx allowing to quickly extract information (entities, relations, events…) in new domains where annotated data does not exist.

A commercial version is available, completed with modules useful to some CEA LIST industrial partners. The commercial version is available directly from CEA LIST through R&D partnerships or through other partners with offers including support and adaptation to one's needs.

We welcome external contributions in the form of comments, suggestions, bug reports, bugs corrections, resources, etc. However, let note that before merging your contributions, we will ask you to sign a Copyright Assignment Agreement in order to allow the proper functioning of the dual licensing model.

FEATURES

  • performant and powerful C++ backend;
  • easy to use native python binding (see TL;DR above);
  • easy to use simple GUI;
  • tokenization;
  • morphologic analysis including:
    • full-form dictionaries;
    • hyphen-words splitting;
    • concatenated words splitting (we're,...);
    • idiomatic expression recognizing;
    • part of speech tagging (deep-learning based with state of the art performance. Two other taggers are available for some languages: The LIMA legacy one, which is a little bit less performant but very useful for resources development, and a SVMTool++-based one;
  • Named Entities Recognition (standard rule-based and neural network-based);
  • coreference resolution;
  • parsing (neural network-based with state of the art performance and the old surface rule-based dependency parsing);
  • semantic analysis (disambiguation and semantic role labeling);
  • regression testing;
  • evaluation tools.

DOWNLOAD and INSTALLATION

The easiest way to use LIMA is through its native python binding (see TL;DR above). We provide a Docker container and also packages for several different GNU/Linux versions (as of 05/04/2024, Debian 12 and Ubuntu 22.04, but you must check what is available at the time of your download). There is finally instructions for building from the source code under GNU/Linux:

LIMA is known to work under macOS, but there is currently no working binary package available. A CircleCI build runs and produce a package but it is does not work.

Thes Microsoft Windows build is currently broken.

DOCUMENTATION

LICENCE

LIMA is available under the MIT license. A commercial version exists too.

CREDITS

LIMA uses several open source libraries and linguistic resources. See the COPYING file for details.

CONTACT

For any discussion, please open a GitHub issue.

You can also contact directly [the LIMA maintainer](mailto:gael DOT de-chalendar AT cea DOT fr)

<script>(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)})(window,document,'script','//www.google-analytics.com/analytics.js','ga');ga('create', 'UA-48448560-1', 'github.com');ga('send', 'pageview');</script>