You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the arXiv paper regarding Figure 2, it says that the models are trained on sequences with 256 tokens and evaluated on 1024 tokens, however, in the code, it seems that the training data consist of sequences of both 256 in lengths of shorter sequences (64 and 128), and the evaluation data, similarly, consists of sequences of different lengths up to 1024. Can you please confirm whether this data mixture is the data used to produce Figure 2?
Thanks in advance.
The text was updated successfully, but these errors were encountered:
Hi, thank you very much for this work!
In the arXiv paper regarding Figure 2, it says that the models are trained on sequences with 256 tokens and evaluated on 1024 tokens, however, in the code, it seems that the training data consist of sequences of both 256 in lengths of shorter sequences (64 and 128), and the evaluation data, similarly, consists of sequences of different lengths up to 1024. Can you please confirm whether this data mixture is the data used to produce Figure 2?
Thanks in advance.
The text was updated successfully, but these errors were encountered: