-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add OSCAR dataset card #1833
Add OSCAR dataset card #1833
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome thank you :)
I added a few suggestions.
I'm also generating the tags that go at the top of the dataset card
@lhoestq Thanks for the suggestions! I agree with all of them. Should I accept them one by one or can I accept them all at once? When I try to load the whole diff GitHub is complaining and it does no render them well (probably my browser?) 😅 |
Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com>
Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com>
Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com>
Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com>
Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com>
I just merged the tables as suggested 😄 . However I noticed something weird, the train sizes are identical for both the original and deduplicated files ... This is not normal, in general the original files are almost twice as big as the deduplicated ones 🤔 |
Good catch @pjox ! I just checked and this is because the scripts doesn't handle having several blank lines in a row. |
I got the new sizes today, will update the dataset_infos.json and the dataset card tomorrow |
great, I just wanted to report that I got error message "NonMatchingSplitsSizesError" when I tried to load one of the oscar dataset. |
Hi @cahya-wirawan, which configuration of oscar do you have this issue with ? |
Ok I see you're having this issue because I haven't updated the sizes yet ! I'm opening a PR I just checked and indeed there's an issue with the |
Thanks @lhoestq for fixing the issue, it works now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright this is all good then ! Thanks a lot @pjox
Thank you so much @lhoestq ! |
I added more information and completed the dataset card for OSCAR which was started by @lhoestq in his previous PR.