Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

conll2003 incorrect label explanation #3189

Closed
BramVanroy opened this issue Nov 1, 2021 · 1 comment · Fixed by #3230
Closed

conll2003 incorrect label explanation #3189

BramVanroy opened this issue Nov 1, 2021 · 1 comment · Fixed by #3230
Labels
bug Something isn't working

Comments

@BramVanroy
Copy link
Contributor

BramVanroy commented Nov 1, 2021

In the conll2003 README, the labels are described as follows

  • id: a string feature.
  • tokens: a list of string features.
  • pos_tags: a list of classification labels, with possible values including " (0), '' (1), # (2), $ (3), ( (4).
  • chunk_tags: a list of classification labels, with possible values including O (0), B-ADJP (1), I-ADJP (2), B-ADVP (3), I-ADVP (4).
  • ner_tags: a list of classification labels, with possible values including O (0), B-PER (1), I-PER (2), B-ORG (3), I-ORG (4) B-LOC (5), I-LOC (6) B-MISC (7), I-MISC (8).

First of all, it would be great if we can get a list of ALL possible pos_tags.

Second, the chunk tags labels cannot be correct. The description says the values go from 0 to 4 whereas the data shows values from at least 11 to 21 and 0.

EDIT: not really a bug, sorry for mistagging.

@BramVanroy BramVanroy added the bug Something isn't working label Nov 1, 2021
@mariosasko
Copy link
Collaborator

mariosasko commented Nov 5, 2021

Hi @BramVanroy,

since these fields are of type ClassLabel (you can check this with dset.features), you can inspect the possible values with:

dset.features[field_name].feature.names  # .feature because it's a sequence of labels

and to find the mapping between names and integers, use:

dset.features[field_name].feature.int2str(value_or_values_list)  # map integer value to string value
# or
dset.features[field_name].feature.str2int(value_or_values_list)  # map string value to integer value

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants