Skip to content

German lemmatization with IWNLP as extension for spaCy

License

Notifications You must be signed in to change notification settings

textshuttle/spacy-iwnlp

 
 

Repository files navigation

spacy-iwnlp

license Build Status

This package uses the spaCy 2.0 extensions to add IWNLP-py as German lemmatizer directly into your spaCy pipeline.

Please report bugs with spacy-iwnlp as issue in IWNLP-py.

Usage

import spacy
from spacy_iwnlp import spaCyIWNLP
nlp = spacy.load('de')
iwnlp = spaCyIWNLP(lemmatizer_path='data/IWNLP.Lemmatizer_20181001.json')
nlp.add_pipe(iwnlp)
doc = nlp('Wir mögen Fußballspiele mit ausgedehnten Verlängerungen.')
for token in doc:
    print('POS: {}\tIWNLP:{}'.format(token.pos_, token._.iwnlp_lemmas))

Installation

  1. Use pip to install spacy-iwnlp
pip install spacy-iwnlp
  1. Download the latest processed IWNLP dump from http://lager.cs.uni-duesseldorf.de/NLP/IWNLP/IWNLP.Lemmatizer_20181001.zip and unzip it.

Citation

Please include the following BibTeX if you use IWNLP in your work:

@InProceedings{liebeck-conrad:2015:ACL-IJCNLP,
  author    = {Liebeck, Matthias  and  Conrad, Stefan},
  title     = {{IWNLP: Inverse Wiktionary for Natural Language Processing}},
  booktitle = {Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)},
  year      = {2015},
  publisher = {Association for Computational Linguistics},
  pages     = {414--418},
  url       = {http://www.aclweb.org/anthology/P15-2068}
}

About

German lemmatization with IWNLP as extension for spaCy

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 95.4%
  • Shell 4.6%