Evaluate ATE and ASC separetly #35

yassmine-lam · 2021-09-05T08:21:19Z

Hi,

Thank u for sharing ur code with us.
I have a question about predictions. As I read in ur paper, u reported a single result for these two tasks. However, is it possible to return evaluation scores for each task separately (As u did in this code https://github.com/lixin4ever/E2E-TBSA)? So we can compare this model against single-task models?

Thank u

lixin4ever · 2021-09-08T11:23:05Z

In this repo, we propose to handle the E2E-ABSA problem using a sequence tagging model. Since ATE can be formulated as a sequence tagging task, you can evaluate the ATE performance by simply degrading the predicted tags and the gold standard tags of the E2E-ABSA task. However, as ASC is a typical classification task, evaluating its performance in our sequence tagging model is not that straightforward.

yassmine-lam · 2021-09-08T15:52:18Z

Thank u for ur reply. Yes, you are right, but I was wondering how did u do to report two results, one for ATE and the other for ASC in this repo https://github.com/lixin4ever/E2E-TBSA). As I understood, u separated the tags of each task and evaluated their results separately? Is that true? If yes, it is possible to do the same here, isn't it?

Sorry, I feel confused, so If u could explain this to me, I will be grateful.

Thank u

lixin4ever · 2021-09-10T06:49:24Z

First of all, I want to clarify that ASC, a typical classification task, is different from E2E-ABSA (or "targeted sentiment analysis", "E2E-TBSA"), which is formulated as a sequence tagging task in our paper. In this issue, I mistakenly told you that "targeted sentiment analysis" is equivalent to "aspect sentiment classification" and I think that's the point leading to part of your confusion (sorry for the misinformation).

Return to your question, in another work, namely https://github.com/lixin4ever/E2E-TBSA, the reason we can report the results of ATE and E2E-ABSA is that it is a multi-task learning framework, and ATE predictions are explicitly provided. In order to report the ATE performance in this repo, you may need to degrade the predicted/gold tags of E2E-ABSA, i.e., only preserve the boundary tag and ignore the sentiment tag, and then do evaluation.

yassmine-lam · 2021-09-15T05:41:35Z

Thank u very much for the detailed answer; it is more clear to me now. I really appreciate your effort and time in answering our questions. Another question plz. For a sequence labeling model, people are generally using seqevel for evaluation, but in ur code, u used sklearn metrics; what is the difference between these two frameworks? And what is the most suitable one for this task (E2E-ABSA)?

Thank u in advance

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluate ATE and ASC separetly #35

Evaluate ATE and ASC separetly #35

yassmine-lam commented Sep 5, 2021

lixin4ever commented Sep 8, 2021

yassmine-lam commented Sep 8, 2021

lixin4ever commented Sep 10, 2021

yassmine-lam commented Sep 15, 2021

Evaluate ATE and ASC separetly #35

Evaluate ATE and ASC separetly #35

Comments

yassmine-lam commented Sep 5, 2021

lixin4ever commented Sep 8, 2021

yassmine-lam commented Sep 8, 2021

lixin4ever commented Sep 10, 2021

yassmine-lam commented Sep 15, 2021