You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the conll2003 README, the labels are described as follows
id: a string feature.
tokens: a list of string features.
pos_tags: a list of classification labels, with possible values including " (0), '' (1), # (2), $ (3), ( (4).
chunk_tags: a list of classification labels, with possible values including O (0), B-ADJP (1), I-ADJP (2), B-ADVP (3), I-ADVP (4).
ner_tags: a list of classification labels, with possible values including O (0), B-PER (1), I-PER (2), B-ORG (3), I-ORG (4) B-LOC (5), I-LOC (6) B-MISC (7), I-MISC (8).
First of all, it would be great if we can get a list of ALL possible pos_tags.
Second, the chunk tags labels cannot be correct. The description says the values go from 0 to 4 whereas the data shows values from at least 11 to 21 and 0.
EDIT: not really a bug, sorry for mistagging.
The text was updated successfully, but these errors were encountered:
since these fields are of type ClassLabel (you can check this with dset.features), you can inspect the possible values with:
dset.features[field_name].feature.names# .feature because it's a sequence of labels
and to find the mapping between names and integers, use:
dset.features[field_name].feature.int2str(value_or_values_list) # map integer value to string value# ordset.features[field_name].feature.str2int(value_or_values_list) # map string value to integer value
In the conll2003 README, the labels are described as follows
First of all, it would be great if we can get a list of ALL possible pos_tags.
Second, the chunk tags labels cannot be correct. The description says the values go from 0 to 4 whereas the data shows values from at least 11 to 21 and 0.
EDIT: not really a bug, sorry for mistagging.
The text was updated successfully, but these errors were encountered: