-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PatternParser problem #17
Comments
Hi Andy, My apologies for the late reply. It seems to be working if you use the standard options that are passed on to the pattern parser (for a list auf the default values, see http://textblob-de.readthedocs.io/en/stable/api_reference.html#module-textblob_de.parsers). The main problem in your example is that the text is not tokenised properly (punctuation sticks to previous token), which leads to a number of additional mistakes in the tagging process. In addition, the chunks are not computed properly if you use the from textblob_de import TextBlobDE as TextBlob
from textblob_de import PatternParser
blob = TextBlob("Das ist ein schönes Auto, das du dir da gekauft hast. Das finde ich richtig klasse!", parser=PatternParser(pprint=True))
blob.parse() I get:
For counting purposes you need to exclude chunks that are followed by a
This gives you the option of just counting the chunks preceded by a Hope this helps. Best wishes, Markus |
This helps a lot! Thank you :) |
Hi there,
thanks a lot for textblob-de.
I found an issue when I try just to get the chunks info. My goal is to count the number of VP, NP, PP...
For that I am trying to extract only the chunks. I'm trying to the following code
But then I get the the pos tags in place of the chunk tags when using the pprint option.
I could not find a way to get the chunks by type in order to count them. Is there a trick to do so?
WORD TAG CHUNK ROLE ID PNP LEMMA
Das - PDS - - - -
ist - VVFIN - - - -
ein - ARTIND - - - -
schönes - NN - - - -
Auto, - NN ^ - - - -
das - ARTDEF - - - -
du - PPOSAT - - - -
dir - PPER - - - -
da - KOUS - - - -
gekauft - VVFIN PNP - - -
hast. - VVFIN ^ PNP - - -
Das - ARTDEF - - - -
finde - NN - - - -
ich - PPER - - - -
richtig - ADJA - - - -
klasse! - NN - - - -
Obviously counting the chunk tags results in wrong results as each token of the chunk contains the same chunk tag. How could I get the boundaries to count properly? Any suggestions?
Many thanks and best regards, Andy
The text was updated successfully, but these errors were encountered: