RONEC v2 #3184

dumitrescustefan · 2021-10-30T10:50:03Z

Hi, as we've recently finished with the new RONEC (Romanian Named Entity Corpus), we'd like to update the dataset here as well. It's actually essential as links to V1 are no longer valid.

In reality we'd like to replace completely v1, as v2 is a full re-annotation of v1 with additional data (up to 2x size vs v1).

I've run the make style and all the dummy and real data test, and they passed.

I hope it's okay to merge the new RONEC v2 in the datasets.

Thanks!

lhoestq

Awesome thanks ! I just added a comment about keeping a way to load V1, let me know what you think

lhoestq · 2021-11-02T10:40:41Z

datasets/ronec/ronec.py

    BUILDER_CONFIGS = [
        RONECConfig(name="ronec", version=VERSION, description="RONEC dataset"),
    ]


It would be nice to still let users access the V1 for reproducibility of all the work before RONEC v2

Do you think you could add a configuration that allows users to load RONEC V1 ? You can use the new links to the V1 data files

dumitrescustefan · 2021-11-02T15:45:49Z

@lhoestq Thanks for the review. I totally understand what you are saying. Normally, I would definitely agree with you, but in this particular case, the quality of v1 is poor, and the dataset itself is small (at the time we created v1 it was the only RO NER dataset, and its size was limited by the available resources).

This is why we worked to build a larger one, with much better inter-annotator agreement. Fact is, models trained on v1 will be of very low quality and I would not recommend to anybody to use/do that. That's why I'd strongly suggest we replace v1 with v2, and kindof make v1 vanish :)

What do you think? If you insist on having v1 accessible, I'll add the required code. Thanks!

lhoestq · 2021-11-02T16:00:59Z

Ok I see ! I think it's fine then, no need to re-add V1

lhoestq

Thanks again for updating the dataset !
I think we can merge now :)

dumitrescustefan and others added 4 commits October 30, 2021 13:36

Updated RONEC to version 2

73415ce

Update RONEC v2

4aad14e

Update README.md

e324b43

Update README.md

7ba8b69

lhoestq reviewed Nov 2, 2021

View reviewed changes

lhoestq approved these changes Nov 2, 2021

View reviewed changes

lhoestq merged commit 425f45b into huggingface:master Nov 2, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RONEC v2 #3184

RONEC v2 #3184

dumitrescustefan commented Oct 30, 2021

lhoestq left a comment

lhoestq Nov 2, 2021

dumitrescustefan commented Nov 2, 2021

lhoestq commented Nov 2, 2021

lhoestq left a comment

RONEC v2 #3184

RONEC v2 #3184

Conversation

dumitrescustefan commented Oct 30, 2021

lhoestq left a comment

Choose a reason for hiding this comment

lhoestq Nov 2, 2021

Choose a reason for hiding this comment

dumitrescustefan commented Nov 2, 2021

lhoestq commented Nov 2, 2021

lhoestq left a comment

Choose a reason for hiding this comment