Skip to content
This repository has been archived by the owner on Feb 25, 2022. It is now read-only.

LAMBADA metric_fn inaccurate for multi-token responses #75

Closed
sdtblck opened this issue Nov 18, 2020 · 5 comments
Closed

LAMBADA metric_fn inaccurate for multi-token responses #75

sdtblck opened this issue Nov 18, 2020 · 5 comments
Labels
bug Something isn't working.

Comments

@sdtblck
Copy link
Collaborator

sdtblck commented Nov 18, 2020

we need to check if the answer is split across multiple tokens rather than just calculating accuracy from the very last token.

should just be able to parse whether the token has a space at the beginning of the word and if not, take the previous token as well.

https://github.com/EleutherAI/GPTNeo/blob/master/model_fns.py#L255

@kevinwatkins
Copy link
Collaborator

My understanding was that that was how OpenAI had done it (last token rather than all the tokens of the last word), based on this remark at openai/gpt-2#131 (comment). But of course that could be my misinterpretation, or you might not want to follow them in that; it does seem rather odd.

Simplifying the procedure to test accuracy by comparing for equality of last BPE token instead of last word the accuracy is up to 46.89

@sdtblck
Copy link
Collaborator Author

sdtblck commented Nov 21, 2020

Hm, we had "WutheFwasthat" from OpenAI from the discord in our server the other day, and he seemed to concur with my statement above. I think the other guy in that thread is not OpenAI affiliated.

@kevinwatkins
Copy link
Collaborator

Cool... I should catch up with the Discord, sorry about that

@leogao2
Copy link
Collaborator

leogao2 commented Nov 27, 2020

We probably want this fixed asap so we can run LAMBADA on the Pile ablation

@StellaAthena StellaAthena added the bug Something isn't working. label Jan 6, 2021
@StellaAthena
Copy link
Member

@leogao2 @sdtblck Is this still a concern?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working.
Projects
None yet
Development

No branches or pull requests

4 participants