-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RONEC v2 #3184
RONEC v2 #3184
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome thanks ! I just added a comment about keeping a way to load V1, let me know what you think
BUILDER_CONFIGS = [ | ||
RONECConfig(name="ronec", version=VERSION, description="RONEC dataset"), | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to still let users access the V1 for reproducibility of all the work before RONEC v2
Do you think you could add a configuration that allows users to load RONEC V1 ? You can use the new links to the V1 data files
@lhoestq Thanks for the review. I totally understand what you are saying. Normally, I would definitely agree with you, but in this particular case, the quality of v1 is poor, and the dataset itself is small (at the time we created v1 it was the only RO NER dataset, and its size was limited by the available resources). This is why we worked to build a larger one, with much better inter-annotator agreement. Fact is, models trained on v1 will be of very low quality and I would not recommend to anybody to use/do that. That's why I'd strongly suggest we replace v1 with v2, and kindof make v1 vanish :) What do you think? If you insist on having v1 accessible, I'll add the required code. Thanks! |
Ok I see ! I think it's fine then, no need to re-add V1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again for updating the dataset !
I think we can merge now :)
Hi, as we've recently finished with the new RONEC (Romanian Named Entity Corpus), we'd like to update the dataset here as well. It's actually essential as links to V1 are no longer valid.
In reality we'd like to replace completely v1, as v2 is a full re-annotation of v1 with additional data (up to 2x size vs v1).
I've run the make style and all the dummy and real data test, and they passed.
I hope it's okay to merge the new RONEC v2 in the datasets.
Thanks!