-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to get dependency tags/tree in a CoNLL-format? #533
Comments
import spacy
nlp = spacy.load('en', vectors=False)
doc = nlp(u'Bob bought the pizza to Alice')
for sent in doc:
for i, word in enumerate(sent):
if word.head is word:
head_idx = 0
else:
head_idx = word.i-sent[0].i+1
print(
i+1, # There's a word.i attr that's position in *doc*
word.pos_, # Coarse-grained tag
word.tag_, # Fine-grained tag
head_idx,
word.dep_, # Relation
'_', '_') Should have had this snippet up from the start --- thanks. |
For common formats, I feel like this should be a method to the
And others could implement methods like
|
@redstar12 messaged me to say that @honnibal's code is not working for her on Python 2.7. Reproducing it on my machine, the problem seems to be this part:
This is easily solved by using the old approach:
I don't know whether this has been fixed in the meantime (I'm using an older version of SpaCy). If not, then there should be a new GitHub issue to address this. |
I did upgrade and tested all these snippets and I'm getting the error: |
Updated the code (apparently we forgot to write |
Thank you! It works. But now I am getting: |
I got this (in Python 2.7), using the example code as given above.
So the code is definitely working. Did you change anything? (At some point you should start figuring this out yourself, though. It's your problem..) |
OK! Thank you very much! |
But now I'm wondering: did you download and install the SpaCy data? If not, do this on the command line: If that doesn't work, then please try to investigate where the code breaks down. Just coming here and saying "it doesn't work" is not good enough. Try to see if other things do work, e.g. tagging: for token in doc:
print('\t'.join(token.orth_, token.pos_, token.tag_)) |
Thank you very much for your reply. I don't know why but we had a problem with parsing after upgrade. We reinstalled Spacy and now the code works. BUT! As you can see, the HEAD value is wrong. It is the same as ID value: (1, u'PROPN', u'NNP', 1, u'nsubj', '', '') And it has to be like this: (1, u'PROPN', u'NNP', 2, u'nsubj', '', '') |
Ok, then there was a small mistake in the code @honnibal wrote. I don't have time to fix it. All I wanted to say in this thread was that it'd be nice to have a method for the parsed document to save it in CONLL format. Hints I can give you to fix the code:
|
Thank you very much for your hints. I fixed the code. It works!!! Thank you! |
Here's the code that works for me:
And the output:
You can test the output here (online) or use this (offline, .Net-based). |
is it possible to get the same result using MaltParser and Python? |
The computation of the head id is not entirely correct in either one of the code snippets. if word.head is word:
head_idx = 0
else:
# this is the corrected line:
head_idx = word.head.i - sent[0].i + 1 Aside from that the comments here were really helpful, thanks! |
It works with French sentences? |
@flackbash Could you please explain why you add "- sent[0].i + 1" ? Why isn't word.head.i enough ? |
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
How to get dependency tags/tree in a CoNLL-format like this:
1 Bob _ NOUN NNP _ 2 nsubj _ _
2 brought _ VERB VBD _ 0 ROOT _ _
3 the _ DET DT _ 4 det _ _
4 pizza _ NOUN NN _ 2 dobj _ _
5 to _ ADP IN _ 2 prep _ _
6 Alice _ NOUN NNP _ 5 pobj _ _
?
The text was updated successfully, but these errors were encountered: