-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Evaluate ATE and ASC separetly #35
Comments
In this repo, we propose to handle the E2E-ABSA problem using a sequence tagging model. Since ATE can be formulated as a sequence tagging task, you can evaluate the ATE performance by simply degrading the predicted tags and the gold standard tags of the E2E-ABSA task. However, as ASC is a typical classification task, evaluating its performance in our sequence tagging model is not that straightforward. |
Thank u for ur reply. Yes, you are right, but I was wondering how did u do to report two results, one for ATE and the other for ASC in this repo https://github.com/lixin4ever/E2E-TBSA). As I understood, u separated the tags of each task and evaluated their results separately? Is that true? If yes, it is possible to do the same here, isn't it? Sorry, I feel confused, so If u could explain this to me, I will be grateful. Thank u |
First of all, I want to clarify that ASC, a typical classification task, is different from E2E-ABSA (or "targeted sentiment analysis", "E2E-TBSA"), which is formulated as a sequence tagging task in our paper. In this issue, I mistakenly told you that "targeted sentiment analysis" is equivalent to "aspect sentiment classification" and I think that's the point leading to part of your confusion (sorry for the misinformation). Return to your question, in another work, namely https://github.com/lixin4ever/E2E-TBSA, the reason we can report the results of ATE and E2E-ABSA is that it is a multi-task learning framework, and ATE predictions are explicitly provided. In order to report the ATE performance in this repo, you may need to degrade the predicted/gold tags of E2E-ABSA, i.e., only preserve the boundary tag and ignore the sentiment tag, and then do evaluation. |
Thank u very much for the detailed answer; it is more clear to me now. I really appreciate your effort and time in answering our questions. Another question plz. For a sequence labeling model, people are generally using seqevel for evaluation, but in ur code, u used sklearn metrics; what is the difference between these two frameworks? And what is the most suitable one for this task (E2E-ABSA)? Thank u in advance |
Hi,
Thank u for sharing ur code with us.
I have a question about predictions. As I read in ur paper, u reported a single result for these two tasks. However, is it possible to return evaluation scores for each task separately (As u did in this code https://github.com/lixin4ever/E2E-TBSA)? So we can compare this model against single-task models?
Thank u
The text was updated successfully, but these errors were encountered: