DERBI: DEutscher RegelBasierter Inflektor

DERBI (DEutscher RegelBasierter Inflektor) is a simple rule-based automatic inflection model for German based on spaCy.
Applicable regardless of POS!

How It Works

DERBI gets an input text;
The text is processes with the given spaCy model;
For each word to be inflected in the text:
- The features predicted by spaCy are overridden with the input features (where specified);
- The words with the result features come through the rules and get inflected;
The result is assembled into the output.

For the arguments, see below.

Installation

Via pip

pip install DERBI

Via git clone

Install all necessary packages:

pip install -r requirements.txt

Clone DERBI:

git clone https://github.com/maxschmaltz/DERBI

or

from git import Repo
Repo.clone_from('https://github.com/maxschmaltz/DERBI', 'DERBI')

Simple Usage

Note that DERBI works with spaCy. Make sure to have installed any of the spaCy pipelines for German.

Example

# python -m spacy download de_core_news_sm
nlp = spacy.load('de_core_news_md')

from DERBI.derbi import DERBI
derbi = DERBI(nlp)

derbi(
    'DERBI sein machen, damit es all Entwickler ein Möglichkeit geben, jedes deutsche Wort automatisch zu beugen',
    [{'Number': 'Sing', 'Person': '3', 'Verbform': 'Fin'},     # sein -> ist
     {'Verbform': 'Part'},                                     # machen -> gemacht
     {'Case': 'Dat', 'Number': 'Plur'},                        # all -> allen
     {'Case': 'Dat', 'Number': 'Plur'},                        # Entwickler -> entwicklern
     {'Gender': 'Fem'},                                        # ein -> eine
     {'Number': 'Sing', 'Person': '3', 'Verbform': 'Fin'},     # geben -> gibt
     {'Case': 'Acc', 'Number': 'Plur'},                        # jedes -> jede
     {'Case': 'Acc', 'Declination': 'Weak', 'Number': 'Plur'}, # deutsche -> deutschen
     {'Case': 'Acc', 'Number': 'Plur'}],                       # wort -> wörter
    [1, 2, 6, 7, 8, 10, 12, 13, 14]
)

# Output:
'derbi ist gemacht , damit es allen entwicklern eine möglichkeit gibt , jede deutschen wörter automatisch zu beugen'

Arguments

init() Arguments

model: spacy.lang.de.German

Any of the spaCy pipelines for German. If model is not of the type spacy.lang.de.German, throws an exception.

call() Arguments

text: str

Input text, containing the words to be inflected. It is strongly recommended to call DERBI with a text, not a single word, as spaCy predictions vary depending on the context.

target_tags: dict or list[dict]

Dicts of category-feature values for each word to be inflected. If None, no inflection is implemented. Default is None.

NB! As the features are overriden over the ones predicted by spaCy, in target_tags only different ones can be specified. Note though, that spaCy predictions are not always correct, so for the DERBI output to be more precise, we recommend to specify the desired features fully. Notice also, that if no tags for an obligatory category were provided (neither by spaCy, neither in target_tags), DERBI restores them as default; default features values are available at ValidFeatures (the first element for every category).

indices: int or list[int]

Indices of the words to be inflected. Default is 0.

NB! The indices order must correspond to the target tags order. Note also, that the input text is lemmatized with the given spaCy model tokenizer, so the indices will be indexing a spacy.tokens.Doc instance.

Output

Returns str: the input text, where the specified words are replaced with the inflection results. The output is normalized.

Tags

DERBI uses Universal POS tags and Universal Features (so does spaCy) with some extensions of features (not POSs). See LabelScheme and ValidFeatures for more details.

The following category-feature values can be used in target-tags:

Category (explanation)	Valid Features (explanation)	In Universal Features
Case	Acc (Accusative) Dat (Dative) Gen (Genitive) Nom (Nominative)	Yes
Declination (Applicable for the words with the adjective declination. In German such words are declinated differently depending on the left context)	Mixed Strong Weak	No
Definite (Definiteness)	Def (Definite) Ind (Definite)	Yes
Degree (Degree of comparison)	Cmp (Comparative) Pos (Positive) Sup (Superlative)	Yes
Foreign (Whether the word is foreign. Applies to POS X)	Yes	Yes
Gender	Fem (Feminine) Masc (Masculine) Neut (Neutral)	Yes
Mood	Imp (Imperative) Ind (Indicative) Sub (Subjunctive) NB! Sub is for Konjunktiv I when Tense=Pres and for Konjunktiv II when Tense=Past)	Yes
Number	Plur (Plural) Sing (Singular)	Yes
Person	1 2 3	Yes
Poss (Whether the word is possessive. Applies to pronouns and determiners.)	Yes	Yes
Prontype (Type of a pronoun, a determiner, a quantifier or a pronominal adverb.	Art (Article) Dem (Demonstrative) Ind (Indefinite) Int (Interrogative) Prs (Personal) Rel Relative	Yes
Reflex (Whether the word is reflexive. Applies to pronouns and determiners.)	Yes	Yes
Tense	Past Pres (Present)	Yes
Verbform (Form of a verb)	Fin (Finite) Inf (Infinitive) Part (Participle) NB! Part is for Partizip I when Tense=Pres and for Partizip II when Tense=Past)	Yes

Note though, that categories Definite, Foreign, Poss, Prontype and Reflex cannot be alternated by DERBI, and thus there is no need to specify them.

NB! DERBI accepts capitalized tags. For example, use Prontype, not PronType.

Performance

Disclaimer

For evaluation we used Universal Dependencies German Treebanks. Unfortunately, there are only .conllu in their GitHub repositories so we had to download some of .txt datasets and add it to our repository. We do not distribute these datasets though; it is your responsibility to determine whether you have permission to use them.

Evaluation

Evaluation conducted with dataset de_lit-ud-test.txt from Universal Dependencies German LIT threebank (≈31k tokens), accuracy:

	de_core_news_md	de_core_news_sm	de_core_news_lg
Overall	0.947	0.949	0.95
ADJ	0.81	0.847	0.841
ADP	0.998	0.998	0.998
ADV	0.969	0.972	0.968
AUX	0.915	0.921	0.912
CCONJ	1.0	1.0	1.0
DET	0.988	0.992	0.988
INTJ	1.0	1.0	1.0
NOUN	0.958	0.959	0.962
NUM	0.935	0.987	0.914
PART	1.0	1.0	1.0
PRON	0.921	0.929	0.928
PROPN	0.941	0.926	0.916
SCONJ	0.999	0.999	0.996
VERB	0.813	0.792	0.824
X	1.0	1.0	1.0

If you are interested in the way we obtained the results, please refer to test0.py.

Or you could check it with the following code:

from DERBI.test import test0
test0.main()

Notice that performance might vary depending on the dataset. Also remember, that if spaCy might make mistakes predicting (that means, that in some cases DERBI inflection is correct but does not correspond spaCy's tags), which also affects evaluation.

License

Copyright 2022 Max Schmaltz: @maxschmaltz

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

Name		Name	Last commit message	Last commit date
Latest commit History 309 Commits
meta		meta
test		test
Inflectors.py		Inflectors.py
LICENSE		LICENSE
README.md		README.md
Router.json		Router.json
Tools.py		Tools.py
__init__.py		__init__.py
derbi.py		derbi.py
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DERBI: DEutscher RegelBasierter Inflektor

Table of Contents

How It Works

Installation

Via pip

Via git clone

Simple Usage

Example

Arguments

init() Arguments

call() Arguments

Output

Tags

Performance

Disclaimer

Evaluation

License

About

Releases

Packages

Languages

License

maxschmaltz/DERBI

Folders and files

Latest commit

History

Repository files navigation

DERBI: DEutscher RegelBasierter Inflektor

Table of Contents

How It Works

Installation

Via pip

Via git clone

Simple Usage

Example

Arguments

__init__() Arguments

__call__() Arguments

Output

Tags

Performance

Disclaimer

Evaluation

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

init() Arguments

call() Arguments

Packages