You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Tagging errors in PatternParser output may lead to incorrect lemmatization of frequent German adjectives. As a consequence of this, there will be unexpected results in all tools relying on the parser's output (pos tagging, sentiment analysis, noun phrase extraction, etc.):
Example (using ipython):
In [1]: fromtextblob_deimportTextBlobDEIn [2]: TextBlobDE(u"Peter hat einen schönen Hund.").sentimentOut[2]: Sentiment(polarity=0.0, subjectivity=0.0)
Out[EXPECTED]: Sentiment(polarity=1.0, subjectivity=0.0)
In [3]: TextBlobDE(u"Peter hat einen schönen Hund.").noun_phrasesOut[3]: WordList([])
Out[EXPECTED]: WordList([u'schönen Hund'])
In [4]: TextBlobDE(u"Peter hat einen schönen Hund.").tagsOut[4]: [('Peter', 'NNP'), ('hat', 'VB'), ('einen', 'DT'), (u'schönen', 'PRP$'), ('Hund', 'NN')]
Out[EXPECTED]: [..., (u'schönen', 'JJ'), ...]
Root cause:
In [5]: frompattern.deimportparse, pprintIn [6]: pprint(parse(u"Peter hat einen schönen Hund.", lemmata=True))
WORDTAGCHUNKROLEIDPNPLEMMAPeterNNPNP---peterhatVBVP---habeneinenDTNP---einschönen>PRP$ <NP^--->schön[en] <HundNNNP^---hund
. . ---- .
Please direct suggestions for improvement directly to the pattern project (see e.g. clips/pattern#63). The version of pattern.text.de included in textblob-de will be updated on a regular basis.
I am also working on the integration of additional lemmatizers into textblob_de, but PatternParserLemmatizer will remain the default choice, as it is implemented in Python.
The text was updated successfully, but these errors were encountered:
Tagging errors in
PatternParser
output may lead to incorrect lemmatization of frequent German adjectives. As a consequence of this, there will be unexpected results in all tools relying on the parser's output (pos tagging, sentiment analysis, noun phrase extraction, etc.):Example (using ipython):
Root cause:
Please direct suggestions for improvement directly to the
pattern
project (see e.g. clips/pattern#63). The version ofpattern.text.de
included intextblob-de
will be updated on a regular basis.I am also working on the integration of additional lemmatizers into
textblob_de
, butPatternParserLemmatizer
will remain the default choice, as it is implemented in Python.The text was updated successfully, but these errors were encountered: