You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Feb 25, 2022. It is now read-only.
Describe the bug
A clear and concise description of what the bug is.
To Reproduce
Steps to reproduce the behavior:
Run conversion script
Load results into the HuggingFace transformers library
Feed it a context of 450 tokens and then have it generate another 200
Observe that around the 500th token the coherency falls off a cliff
Expected behavior
Performance should not jump off a cliff
Proposed solution
It appears that the problem is the lack of compatibility between the local attention function used in GPT-Neo and the transformers library. While the transformers library does include models with local attention (longformer, for example) it’s not consistent with how the GPT-2 model is defined in the transformers library.
Screenshots
n/a
Environment (please complete the following information):
GPUs: v3-8s, Ti1080s, A100s
Configs: any config that has local attention
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered:
Describe the bug
A clear and concise description of what the bug is.
To Reproduce
Steps to reproduce the behavior:
Expected behavior
Performance should not jump off a cliff
Proposed solution
It appears that the problem is the lack of compatibility between the local attention function used in GPT-Neo and the
transformers
library. While thetransformers
library does include models with local attention (longformer, for example) it’s not consistent with how the GPT-2 model is defined in thetransformers
library.Screenshots
n/a
Environment (please complete the following information):
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: