Release raw lambada dataset #131

yaroslavvb · 2019-05-11T20:03:04Z

Is it possible to release the Lambada dataset used to generate accuracy numbers in Table 3 of the paper? This would make it easier to do comparisons with other models :)
@Newmu

WuTheFWasThat · 2019-05-15T18:57:04Z

we just use the plain text files which can be downloaded here https://zenodo.org/record/2630551#.XNxg89NKjUI

yaroslavvb · 2019-05-15T19:03:20Z

That's a post-processed version, ie, "don't" is split into "do n't" etc. GPT-2-small gets around 31% on that set. My understanding from @Newmu was that 45.99 figure from Table 3. in the paper was on raw/non-processed version

WuTheFWasThat · 2019-05-15T19:11:51Z

we apply "de-tokenizers" to remove some of the artifacts. Alec can verify but I think in this case it's simply

def preprocess(text):
    text = text.replace("“", '"')
    text = text.replace("”", '"')
    return '\n'+text.strip()

in fact the detokenizer should be invertible, although i don't think that's important for the accuracy numbers

As recommended in openai/gpt-2#131 Original suggestion makes no difference because official release doesn’t have smart quotes. Adding ``->” and ‘’->” rules improves result 0.3%

yaroslavvb · 2019-05-23T16:18:03Z

This detokenizer doesn't do anything on the official Lambada dataset since there are no smart quotes in it. My understanding is that OpenAI used its own version of Lambada dataset generated from book corpus/lambada. This dataset is interesting because of the accuracy gap in GPT2-small numbers -- 34% on official Lambada vs 46% on OpenAI's version.

WuTheFWasThat · 2019-05-25T02:33:02Z

my bad, you're right, whoops! try this: gs://gpt-2/data/lambada_test.jsonl

Following advice in openai/gpt-2#131

yaroslavvb · 2019-05-29T22:47:30Z

Thanks, that dataset makes a difference.

I'm now getting 41.98 using GPT2-small, this version of dataset + with length-5 beam search decoding of last word for stop-word removal.

Simplifying the procedure to test accuracy by comparing for equality of last BPE token instead of last word the accuracy is up to 46.89

I'm wondering if this should be called "lambada-openai" or something in tables to avoid confusion. I looked at the errors between the two datasets, and it seems easier because formatting provides extra information.

Official Lambada

she and zach were covered in dust and sweat when helen found them. "wow, lexi! you rock." lexi groaned at the bad pun. helen surveyed the work, which was nearly complete. "how did you do this?" lexi shrugged. "don't know." "it's her gift," said zach

This version

She and Zach were covered in dust and sweat when Helen found them. "Wow, Lexi! You rock."

Lexi groaned at the bad pun.

Helen surveyed the work, which was nearly complete. "How did you do this?"

Lexi shrugged. "Don't know."

"It's her gift," said Zach

WuTheFWasThat · 2019-05-29T22:52:24Z

yeah i agree keeping the extra information is potentially useful (even for non-zero-shot) and it's probably good to distinguish it from the original dataset

lukesalamone · 2021-06-02T18:39:43Z

Hi I'm also looking to run the same test. Can you fix the gs://gpt-2/data/lambada_test.jsonl link? I'm getting

BucketNotFoundException: 404 gs://gpt-2 bucket does not exist.

WuTheFWasThat · 2021-06-02T19:46:08Z

should now be at https://openaipublic.blob.core.windows.net/gpt-2/data/lambada_test.jsonl

More info: openai/gpt-2#131

yaroslavvb mentioned this issue May 11, 2019

pretrained GPT-2 checkpoint gets only 31% accuracy on Lambada huggingface/transformers#491

Closed

WuTheFWasThat closed this as completed May 15, 2019

yaroslavvb added a commit to cybertronai/bflm that referenced this issue May 29, 2019

Evaluate using openai’s version of lambada

5b8f6be

Following advice in openai/gpt-2#131

sdtblck mentioned this issue Oct 5, 2020

Implement the LAMBADA evaluation EleutherAI/lm-evaluation-harness#6

Closed

kevinwatkins mentioned this issue Nov 18, 2020

LAMBADA metric_fn inaccurate for multi-token responses EleutherAI/gpt-neo#75

Closed

StellaAthena mentioned this issue Nov 21, 2022

Clarify Lambada Task EleutherAI/lm-evaluation-harness#356

Closed

Newmu mentioned this issue Jan 31, 2023

Dummy perplexity on LAMBADA EleutherAI/lm-evaluation-harness#350

Closed

jbergq added a commit to jbergq/transformer-language-modeling that referenced this issue May 24, 2023

Add OpenAI's version of LAMBADA testdataset

3179000

More info: openai/gpt-2#131

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release raw lambada dataset #131

Release raw lambada dataset #131

yaroslavvb commented May 11, 2019

WuTheFWasThat commented May 15, 2019

yaroslavvb commented May 15, 2019 •

edited

Loading

WuTheFWasThat commented May 15, 2019

yaroslavvb commented May 23, 2019

WuTheFWasThat commented May 25, 2019

yaroslavvb commented May 29, 2019

WuTheFWasThat commented May 29, 2019

lukesalamone commented Jun 2, 2021

WuTheFWasThat commented Jun 2, 2021

Release raw lambada dataset #131

Release raw lambada dataset #131

Comments

yaroslavvb commented May 11, 2019

WuTheFWasThat commented May 15, 2019

yaroslavvb commented May 15, 2019 • edited Loading

WuTheFWasThat commented May 15, 2019

yaroslavvb commented May 23, 2019

WuTheFWasThat commented May 25, 2019

yaroslavvb commented May 29, 2019

WuTheFWasThat commented May 29, 2019

lukesalamone commented Jun 2, 2021

WuTheFWasThat commented Jun 2, 2021

yaroslavvb commented May 15, 2019 •

edited

Loading