Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with fine classes in trec dataset #4790

Closed
albertvillanova opened this issue Aug 4, 2022 · 0 comments · Fixed by #4801
Closed

Issue with fine classes in trec dataset #4790

albertvillanova opened this issue Aug 4, 2022 · 0 comments · Fixed by #4801
Assignees
Labels
bug Something isn't working

Comments

@albertvillanova
Copy link
Member

albertvillanova commented Aug 4, 2022

Describe the bug

According to their paper, the TREC dataset contains 2 kinds of classes:

  • 6 coarse classes: TREC-6
  • 50 fine classes: TREC-50

However, our implementation only has 47 (instead of 50) fine classes. The reason for this is that we only considered the last segment of the label, which is repeated for several coarse classes:

  • We have one desc fine label instead of 2:
    • DESC:desc
    • HUM:desc
  • We have one other fine label instead of 3:
    • ENTY:other
    • LOC:other
    • NUM:other

From their paper:

We define a two-layered taxonomy, which represents a natural semantic classification for typical answers in the TREC task. The hierarchy contains 6 coarse classes and 50 fine classes,

Each coarse class contains a non-overlapping set of fine classes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant