Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement the LAMBADA evaluation #6

Closed
StellaAthena opened this issue Sep 16, 2020 · 4 comments · Fixed by #101
Closed

Implement the LAMBADA evaluation #6

StellaAthena opened this issue Sep 16, 2020 · 4 comments · Fixed by #101
Assignees
Labels
feature request A feature that isn't implemented yet.

Comments

@StellaAthena
Copy link
Member

The LAMBADA dataset [PKL+16] tests the modeling of long-range dependencies in text – the model is asked to predict the last word of sentences which require reading a paragraph of context. It has recently been suggested that the continued scaling of language models is yielding diminishing returns on this difficult benchmark. [BHT+20] reflect on the small 1.5% improvement achieved by a doubling of model size between two recent state of the art results ([SPP+19]and [Tur20]) and argue that "continuing to expand hardware and data sizes by orders of magnitude is not the path forward”. We find that path is still promising and in a zero-shot setting GPT-3 achieves 76% on LAMBADA, a gain of 8% over the previous state of the art.

@StellaAthena StellaAthena added the feature request A feature that isn't implemented yet. label Sep 16, 2020
@StellaAthena StellaAthena changed the title LAMBDA Implement the LAMBDA evaluation Sep 16, 2020
@cfoster0
Copy link
Contributor

cfoster0 commented Oct 5, 2020

Original dataset is available at the link below:

https://zenodo.org/record/2630551

@sdtblck
Copy link
Contributor

sdtblck commented Oct 5, 2020

this is the dataset openAI used for evaluating GPT2 #39
(reference: openai/gpt-2#131)

@leogao2 leogao2 changed the title Implement the LAMBDA evaluation Implement the LAMBADA evaluation Oct 6, 2020
@StellaAthena StellaAthena added Eval Set and removed feature request A feature that isn't implemented yet. labels Oct 23, 2020
@StellaAthena StellaAthena pinned this issue Oct 23, 2020
@StellaAthena StellaAthena unpinned this issue Oct 23, 2020
@anishthite anishthite self-assigned this Oct 24, 2020
@leogao2
Copy link
Contributor

leogao2 commented Nov 29, 2020

@anishthite is this done/are you still working on this?

@leogao2
Copy link
Contributor

leogao2 commented Nov 29, 2020

I think I'll be taking over this one

@leogao2 leogao2 assigned leogao2 and unassigned anishthite Nov 29, 2020
@leogao2 leogao2 reopened this Jan 28, 2021
@leogao2 leogao2 closed this as completed Jan 29, 2021
StellaAthena added a commit that referenced this issue Jan 29, 2021
@StellaAthena StellaAthena linked a pull request Jan 30, 2021 that will close this issue
@StellaAthena StellaAthena added the feature request A feature that isn't implemented yet. label Jan 30, 2021
StellaAthena added a commit to dirkgr/lm-evaluation-harness that referenced this issue Apr 27, 2022
qmdnls pushed a commit to qmdnls/lm-evaluation-harness that referenced this issue Aug 17, 2023
qmdnls pushed a commit to qmdnls/lm-evaluation-harness that referenced this issue Aug 17, 2023
LZY-the-boys pushed a commit to LZY-the-boys/lm-evaluation-harness-fast that referenced this issue Sep 12, 2023
lintangsutawika pushed a commit that referenced this issue Jul 8, 2024
update metrics for afrixnli
penfever pushed a commit to penfever/lm-evaluation-harness that referenced this issue Aug 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request A feature that isn't implemented yet.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants