Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

the performance of StandfordCoreNLP vs. the SpaCy #3

Open
Shiyun-W opened this issue Jan 13, 2024 · 5 comments
Open

the performance of StandfordCoreNLP vs. the SpaCy #3

Shiyun-W opened this issue Jan 13, 2024 · 5 comments

Comments

@Shiyun-W
Copy link

Shiyun-W commented Jan 13, 2024

Hi,

Very thank you for your tool for the extraction.
I have to extract the clauses from sentences. I see you have improved the sentence-to-clauses, I tried to use the modul_stanfordSent in your package, but it failed for the example sentence: "Because Mary and Samantha arrived at the bus station before noon, I did not see them at the station.".
The error is:
`IndexError Traceback (most recent call last)
Cell In[8], line 3
1 from extractreq.modul_stanfordSent import stanford_clause
2 sent = "Because Mary and Samantha arrived at the bus station before noon, I did not see them at the station."
----> 3 stanford_clause().get_clause_list(sent)

File c:\Program Files\softwares\Anaconda3\envs\pytorch\lib\site-packages\extractreq\modul_stanfordSent.py:129, in stanford_clause.get_clause_list(self, sent)
127 del t[i]
128 for i in sub_conj_pos:
--> 129 del t[i]
130 subject_phrase = ' '.join(t.leaves())
131 for i in verb_phrases: # update the clause_list

File c:\Program Files\softwares\Anaconda3\envs\pytorch\lib\site-packages\nltk\tree\parented.py:135, in AbstractParentedTree.delitem(self, index)
133 # del ptree[(i,)]
134 elif len(index) == 1:
--> 135 del self[index[0]]
136 # del ptree[i1, i2, i3]
137 else:
138 del self[index[0]][index[1:]]

File c:\Program Files\softwares\Anaconda3\envs\pytorch\lib\site-packages\nltk\tree\parented.py:124, in AbstractParentedTree.delitem(self, index)
122 raise IndexError("index out of range")
123 # Clear the child's parent pointer.
...
--> 155 return list.getitem(self, index)
156 elif isinstance(index, (list, tuple)):
157 if len(index) == 0:

IndexError: list index out of range`

Now my problem is have you ever tested the performance for clause extraction between the standfordCoreNLP and the spacy? If I use the Spacy module, will the performance be worser than the Standford module?

I would be very appreciate if you could reply my question. This is very important for my thesis.
Thank you in advance!

@Shiyun-W
Copy link
Author

Hi, I also tried another example with the spacy module: "we conclude that the regulated membrane localization of tiam1 through its nh2-terminal ph domain determines the activation of distinct rac-mediated signaling pathways.", but the result show me: ['we conclude that the regulated membrane localization of tiam1 through its nh2-terminal ph domain determines the activation of distinct rac-mediated signaling pathways',
'that the regulated membrane localization of tiam1 through its nh2-terminal ph domain determines the']
Apparently, this is not what we expected. I would like to ask how could I solve this problem?

@asyrofist
Copy link
Owner

Hi
actually, for several issues that we use in spacy or another Language Models..
We are using Langchain, you can checkout my newest Code Repository that talk about that in here..
https://github.com/asyrofist/LangChainProposed

In that Repository we are using Langchain Peoposed to solve that issue, because we are using Large Language Model (LLM) that fix that problem.. That's best practice for me to learn how the NLP works w/ AI generative

@Shiyun-W
Copy link
Author

Thank you very much! I will try it.

But I wonder how are you define the task with the LLM, do you use it to directly extract the frame from the literature or you first split the literature into sentences, and then use the LLM to do the classification task?

@asyrofist
Copy link
Owner

Actually, as AI generative models..
We just use prompt that split from sentences into clause or another atomic word or subword..
you can try it by experience from several prompt models, have fun..

@Shiyun-W
Copy link
Author

I understand. Thank you very much for your kindness and patience!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants