PatternParserLemmatizer: tagging errors negatively affecting sentiment analysis #6

markuskiller · 2014-08-15T12:00:53Z

Tagging errors in PatternParser output may lead to incorrect lemmatization of frequent German adjectives. As a consequence of this, there will be unexpected results in all tools relying on the parser's output (pos tagging, sentiment analysis, noun phrase extraction, etc.):

Example (using ipython):

In [1]: from textblob_de import TextBlobDE
In [2]: TextBlobDE(u"Peter hat einen schönen Hund.").sentiment
Out[2]: Sentiment(polarity=0.0, subjectivity=0.0)
Out[EXPECTED]: Sentiment(polarity=1.0, subjectivity=0.0)

In [3]: TextBlobDE(u"Peter hat einen schönen Hund.").noun_phrases
Out[3]: WordList([])
Out[EXPECTED]: WordList([u'schönen Hund'])

In [4]: TextBlobDE(u"Peter hat einen schönen Hund.").tags
Out[4]: [('Peter', 'NNP'), ('hat', 'VB'), ('einen', 'DT'),  (u'schönen', 'PRP$'),  ('Hund', 'NN')]
Out[EXPECTED]: [...,  (u'schönen', 'JJ'), ...]

Root cause:

In [5]: from pattern.de import parse, pprint

In [6]: pprint(parse(u"Peter hat einen schönen Hund.", lemmata=True))

          WORD   TAG    CHUNK   ROLE   ID     PNP    LEMMA     

      Peter   NNP    NP      -      -      -      peter     
        hat   VB     VP      -      -      -      haben       
      einen   DT     NP      -      -      -      ein       
    schönen > PRP$ < NP ^    -      -      -    > schön[en] <
       Hund   NN     NP ^    -      -      -      hund      
          .   .      -       -      -      -      .

Please direct suggestions for improvement directly to the pattern project (see e.g. clips/pattern#63). The version of pattern.text.de included in textblob-de will be updated on a regular basis.

I am also working on the integration of additional lemmatizers into textblob_de, but PatternParserLemmatizer will remain the default choice, as it is implemented in Python.

The text was updated successfully, but these errors were encountered:

markuskiller added wontfix labels Aug 15, 2014

markuskiller changed the title ~~PatternParserLemmatizer: tagging errors~~ PatternParserLemmatizer: tagging errors negatively affecting sentiment analysis Aug 15, 2014

markuskiller self-assigned this Aug 15, 2014

markuskiller removed the wontfix label Aug 15, 2014

markuskiller added the ready label Aug 24, 2014

markuskiller added the pattern issue label May 30, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PatternParserLemmatizer: tagging errors negatively affecting sentiment analysis #6

PatternParserLemmatizer: tagging errors negatively affecting sentiment analysis #6

markuskiller commented Aug 15, 2014

PatternParserLemmatizer: tagging errors negatively affecting sentiment analysis #6

PatternParserLemmatizer: tagging errors negatively affecting sentiment analysis #6

Comments

markuskiller commented Aug 15, 2014