Adding HendrycksTest dataset #2370

andyzoujm · 2021-05-17T18:53:05Z

Adding Hendrycks test from https://arxiv.org/abs/2009.03300.
I'm having a bit of trouble with dummy data creation because some lines in the csv files aren't being loaded properly (only the first entry loaded in a row of length 6). The dataset is loading just fine. Hope you can kindly help!
Thank you!

lhoestq

Thank you for adding this dataset !

The dataset python script looks very good, thanks :)
I added a few comments in the dataset card

I also noticed that the dummy data zip files were quite big (160KB each). Could to try to reduce their sizes please ? For example feel free to take a look inside the abstract_algebra dummy data zip file and remove all the csv files that are unrelated to abstract_algebra:

formal_logic_val.csv
prehistory_val.csv
etc
If you could do that for all subjects that would be perfect :) Feel free to use a script to automate that of course. Thank you !

datasets/hendrycks_test/README.md

datasets/hendrycks_test/hendrycks_test.py

andyzoujm · 2021-05-31T09:57:31Z

@lhoestq Thank you for the review. I've made the suggested changes. There still might be some problems with dummy data though due to some csv loading issues (which I haven't found the cause to).

lhoestq · 2021-05-31T14:25:09Z

I took a look at the dummy data and some csv lines were cropped. I fixed them :)

lhoestq

Looks all good now ! Thank you so much for adding it :)

albertvillanova · 2023-04-26T12:44:03Z

@andyzoujm Any reason why this dataset scrip was called "hendrycks_test" instead of "mmlu"?

We are thinking of renaming it...

andyzoujm · 2023-04-27T19:18:39Z

That's because we didn't call it MMLU in the paper (the shorthand didn't emerge until over a year later), and people at OpenAI were calling it that. Andy

…

On Wed, Apr 26, 2023 at 8:44 AM Albert Villanova del Moral < ***@***.***> wrote: @andyzoujm <https://github.com/andyzoujm> Any reason why this dataset scrip was called "hendrycks_test" instead of "mmlu"? We are thinking of renaming it... — Reply to this email directly, view it on GitHub <#2370 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AKLQJMZZFZBGJTBOIFWJ5KDXDEKB5ANCNFSM45BAOSIQ> . You are receiving this because you were mentioned.Message ID: ***@***.***>

albertvillanova · 2023-05-11T05:42:56Z

Thanks for your reply. Just for the records: we have renamed it to "cais/mmlu": https://huggingface.co/datasets/cais/mmlu

andyzoujm changed the title ~~Hendrycks test~~ Adding HendrycksTest dataset May 18, 2021

lhoestq reviewed May 28, 2021

View reviewed changes

Andy Zou added 6 commits May 31, 2021 02:46

adding hendrycks_test

87fa6a6

minor adj

8a3e1f7

update README

9feed80

update README

ed459bb

update README

09e2515

minor modifications

594cd7f

fix cropped csv lines

73e0580

lhoestq added 3 commits May 31, 2021 16:29

remove json import

9566c47

Merge remote-tracking branch 'upstream/master' into hendrycks_test

04ea29e

fix tags

4186899

lhoestq approved these changes May 31, 2021

View reviewed changes

lhoestq merged commit 7bac83b into huggingface:master May 31, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding HendrycksTest dataset #2370

Adding HendrycksTest dataset #2370

andyzoujm commented May 17, 2021 •

edited

Loading

lhoestq left a comment

andyzoujm commented May 31, 2021

lhoestq commented May 31, 2021

lhoestq left a comment

albertvillanova commented Apr 26, 2023

andyzoujm commented Apr 27, 2023 via email

albertvillanova commented May 11, 2023

Adding HendrycksTest dataset #2370

Adding HendrycksTest dataset #2370

Conversation

andyzoujm commented May 17, 2021 • edited Loading

lhoestq left a comment

Choose a reason for hiding this comment

andyzoujm commented May 31, 2021

lhoestq commented May 31, 2021

lhoestq left a comment

Choose a reason for hiding this comment

albertvillanova commented Apr 26, 2023

andyzoujm commented Apr 27, 2023 via email

albertvillanova commented May 11, 2023

andyzoujm commented May 17, 2021 •

edited

Loading