Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add full tagset to conll2003 README #3230

Merged
merged 3 commits into from
Nov 9, 2021
Merged

Conversation

BramVanroy
Copy link
Contributor

Even though it is possible to manually get the tagset list with

dset.features[field_name].feature.names

I think it is useful to have an overview of the used tagset on the dataset card. This is particularly useful in light of the dataset viewer: the tags are encoded, so it is not immediately obvious what they are for a given sample. Adding a label-int mapping should make it easier for visitors to get a grasp of what they mean.

From user-experience perspective, I would urge the full tagsets to always be available in the README's but I understand that that would take a lot of work, probably. Perhaps it can be automated?

closes #3189

Copy link
Member

@lhoestq lhoestq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi ! Thanks I agree we should show the full list.
In my opinion this should even be part of the viewer in case the dataset card is missing, let me see internally if we can have something like that

Also I just added some suggestions to make it clear that the values that are stored in the dataset are integers, not the strings

datasets/conll2003/README.md Outdated Show resolved Hide resolved
datasets/conll2003/README.md Outdated Show resolved Hide resolved
datasets/conll2003/README.md Outdated Show resolved Hide resolved
@lhoestq
Copy link
Member

lhoestq commented Nov 8, 2021

I also added the missing pretty_name tag in the dataset card to fix the CI

Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com>
Copy link
Member

@lhoestq lhoestq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's all good now, thanks !

@lhoestq lhoestq merged commit 53f69b8 into huggingface:master Nov 9, 2021
@BramVanroy BramVanroy deleted the patch-1 branch November 9, 2021 10:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

conll2003 incorrect label explanation
2 participants