-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Strange tagging for some verb forms of "sein" and for "du" #9
Comments
Thanks for the report. Unfortunately, this seems to be an issue of the You could use the following test or provide a link to this issue for them to be able to verify the strange behaviour: # Tested on Python2.7.8, 32bit, on Windows 8.1 (64bit)
# pattern.__version__
# '2.6'
In [1]: from pattern.de import parse, pprint
In [2]: pprint(parse("Ich bin. Du bist. Er ist. Wir sind. Ihr seid. Sie sind.", lemmata=True))
WORD TAG CHUNK ROLE ID PNP LEMMA
Ich PRP NP - - - ich
bin VB VP - - - sein
. . - - - - .
WORD TAG CHUNK ROLE ID PNP LEMMA
Du PRP NP - - - du
bist NN NP ^ - - - bist
. . - - - - .
WORD TAG CHUNK ROLE ID PNP LEMMA
Er PRP NP - - - er
ist VB VP - - - sein
. . - - - - .
WORD TAG CHUNK ROLE ID PNP LEMMA
Wir PRP NP - - - wir
sind VB VP - - - sein
. . - - - - .
WORD TAG CHUNK ROLE ID PNP LEMMA
Ihr PRP$ NP - - - ihr
seid NN NP ^ - - - seid
. . - - - - .
WORD TAG CHUNK ROLE ID PNP LEMMA
Sie PRP NP - - - sie
sind VB VP - - - sein
. . - - - - .
In [3]: pprint(parse("Ihr seid alle herzlich eingeladen zu meinem Geburtstagsfest.", lemmata=True))
WORD TAG CHUNK ROLE ID PNP LEMMA
Ihr PRP$ NP - - - ihr
seid NN NP ^ - - - seid
alle RB ADJP - - - alle
herzlich JJ ADJP ^ - - - herzlich
eingeladen VBN VP - - - einladen
zu IN PP - - PNP zu
meinem PRP$ NP - - PNP meinem
Geburtstagsfest NN NP ^ - - PNP geburtstagsfest
. . - - - - .
In [4]: pprint(parse("Du bist herzlich eingeladen zu meinem Geburtstagsfest.", lemmata=True))
WORD TAG CHUNK ROLE ID PNP LEMMA
Du PRP NP - - - du
bist NN NP ^ - - - bist
herzlich JJ ADJP - - - herzlich
eingeladen VBN VP - - - einladen
zu IN PP - - PNP zu
meinem PRP$ NP - - PNP meinem
Geburtstagsfest NN NP ^ - - PNP geburtstagsfest
. . - - - - . |
Thanks, for your fast reply. I'm using Python 3.4 64bit on Windows 8.1 and need to investigate further. |
Thanks for further investigating the issue and contributing your results to the |
By the way, do you know whether rftagger is open source? |
@mk270 rftagger is open source and its source code is available under the following links: project page: http://www.cis.uni-muenchen.de/~schmid/tools/RFTagger/ source code: http://www.cis.uni-muenchen.de/~schmid/tools/RFTagger/data/RFTagger.tar.gz However, it is "freely available for education, research and other non-commercial purposes" only. If this is not a problem for your project, feel free to contact me via email for me to be able to invite you to the bitbucket repository of |
So, "available for education, research and other non-commercial purposes" is fairly canonically NOT open source, see for instance http://opensource.org/osd-annotated section 6. It's a shame. I presume, since they've chosen to exclude commercial use, that they're not going to be biddable. |
Thanks for the link. I interpreted 'open source' as 'is the source code available/accessible' (i.e. can it be modified/tweaked, etc.), which it is. On other projects they released under a similarly restrictive license, they added: "In order to use the TreeTagger commercially, you need to obtain a commercial license (see contact address below)! " (Source: http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/). So, I assume that they might be willing to make a decision/offer/quote on a per project basis and that they want to know what the project is about if you intend to use their software commercially. |
Yes, indeed, it's not remotely open source in that term's conventional acceptation - it's proprietary. I am asking as it's a dependency of another project I'm interested in. Ah well. |
I stumbled over some strange tagging and was wondering why it won't correctly recognize "bist" and "seid" as verb forms of "sein", though they are listed in the "de-verbs.txt" file. Also tagging the personal pronoun "du" as an adjective doesn't make much sense either.
The text was updated successfully, but these errors were encountered: