Labels and problem with Classification model #91
Replies: 4 comments 2 replies
-
Hi, regarding your first question. If you have Regarding the losses: For a binary classification task you should use binary cross entropy loss, while for a multi-class classification task the generalisation of this function to many labels, known as categorical cross entropy. Have a look at lambeq's tutorial for classification for more information. Regarding your second question: From your description, it's really hard to say what could be wrong with your model. Your dataset seems very simple, and actually classifying correctly a sentence only depends on the last token (is it question mark or not). My understanding is that probably there's something wrong with your code. Here are a few suggestions:
Let us know how it goes. |
Beta Was this translation helpful? Give feedback.
-
hi @dimkart, Sorry for not answering you before. I wanted first to thank you, that tips about using a sequence model got the accuracy up to 80%. Again, Thanks a lot for help. |
Beta Was this translation helpful? Give feedback.
-
Yes, it's the default behaviour of Bobcat to ignore punctuation rules and tokens since they are not standard CCG. I've written a short method to show you how to fix this, by replacing the punctuation rule with backward application. from lambeq import CCGTree, CCGRule, BobcatParser, diagram2str
from discopy import Ty
def to_tree_with_punct(tree: CCGTree) -> CCGTree:
s = Ty('s')
if (len(tree.children) == 2 and tree.children[0].biclosed_type == s
and tree.children[1].biclosed_type == Ty('punc')):
tree.children[1].biclosed_type = s >> s
tree.rule = CCGRule.BACKWARD_APPLICATION
return tree
parser = BobcatParser()
# We now start by getting the CCG tree of the sentence,
# not directly the diagram.
t = parser.sentence2tree("What is the meaning of life ?")
print(t.deriv()) Output (without using the method) is:
Note the non-standard print(diagram2str(t.to_diagram())) Output:
Now if you use the function: new_tree = to_tree_with_punct(t)
print(diagram2str(new_tree.to_diagram())) Output:
Hope this helps. |
Beta Was this translation helpful? Give feedback.
-
Hi @dimkart Thanks for the method and the explanation. Thanks for the help. |
Beta Was this translation helpful? Give feedback.
-
Hi Lambeq community,
I started a small investigation project and wanted to include some quantum-base model to compare some results and study the current state. When starting building my own model using the tutorials provided I had two main doubts.
What is the idea behind defining the labels of the data in this two-dimensional binary way? and if I were to build a model for multiple labels, what would be the correct way to define the label for using lambeq?
I replicated the steps of the classical case but with my own dataset, which contains over 900 sentences classified in two categories: 1 - User is addresing the bot dirctly, 0 - User is not addresing the bot. The idea is to build a model using lambeq that is able to make this kind of classification. The problem is that I get really poor results (rarely above 55% accuracy), no matter how I set the hyperparemeters. Comparing with the example notebook, the only part that is changed is the addition of the atomic type: PREPOSITIONAL_PHRASE= Ty('p').
*Edit: Here is an example of the sentences I am working with:
0 i really do not like horror games .
0 I believe that exercise is crucial for staying healthy .
0 i love comic books they keep me entertain .
0 i like a lot of different music .
1 Do you like coffee ?
1 what is your name ?
1 can we talk about the batman film ?
1 What are your thoughts on animal testing ?
Any ideas/Tipps to improve my model?
Thanks a lot for help.
Beta Was this translation helpful? Give feedback.
All reactions