Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to extract noun phrase and verb? #3

Closed
LeeYN-43 opened this issue Apr 14, 2022 · 2 comments
Closed

How to extract noun phrase and verb? #3

LeeYN-43 opened this issue Apr 14, 2022 · 2 comments
Labels
documentation Improvements or additions to documentation

Comments

@LeeYN-43
Copy link

Thanks to your great work!
I want to extract noun phrases and verbs on my own dataset, could you please tell me what tool you used to extract it?

@geyuying
Copy link
Collaborator

geyuying commented Apr 16, 2022

To extract noun phrases, you can refer to https://www.pythonprogramming.in/how-to-extract-noun-phrases-using-textblob.html. We extract the noun phrases using the below script.

from textblob import TextBlob
blob = TextBlob("An old woman is dancing on the green grass")
blob.noun_phrases

When "TextBlob" fails to extract noun phrases of a text, you can refer to https://stackoverflow.com/questions/33587667/extracting-all-nouns-from-a-text-file-using-nltk and we use the example as below,

import nltk

lines = 'lines is some string of words'
# function to test if something is a noun
is_noun = lambda pos: pos[:2] == 'NN'
# do the nlp stuff
tokenized = nltk.word_tokenize(lines)
nouns = [word for (word, pos) in nltk.pos_tag(tokenized) if is_noun(pos)] 

print nouns

To extract verb phrases, you can refer to https://microeducate.tech/extract-verb-phrases-using-spacy/ and we use the "Edit 2" as below,

import spacy   
from spacy.matcher import Matcher
from spacy.util import filter_spans

nlp = spacy.load('en_core_web_sm') 

sentence = 'The cat sat on the mat. He quickly ran to the market. The dog jumped into the water. The author is writing a book.'
pattern = [{'POS': 'VERB', 'OP': '?'},
           {'POS': 'ADV', 'OP': '*'},
           {'POS': 'AUX', 'OP': '*'},
           {'POS': 'VERB', 'OP': '+'}]

# instantiate a Matcher instance
matcher = Matcher(nlp.vocab)
matcher.add("Verb phrase", None, pattern)

doc = nlp(sentence) 
# call the matcher to find matches 
matches = matcher(doc)
spans = [doc[start:end] for _, start, end in matches]

print (filter_spans(spans))   

@geyuying geyuying added the documentation Improvements or additions to documentation label Apr 16, 2022
@vateye
Copy link

vateye commented Apr 26, 2022

Did you extract the nouns directly or extract the noun phrases for training? Thanks. @geyuying

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

3 participants