Questions about the data lengths for Figure 2 #27

chen-yingfa · 2024-04-23T12:09:36Z

Hi, thank you very much for this work!

In the arXiv paper regarding Figure 2, it says that the models are trained on sequences with 256 tokens and evaluated on 1024 tokens, however, in the code, it seems that the training data consist of sequences of both 256 in lengths of shorter sequences (64 and 128), and the evaluation data, similarly, consists of sequences of different lengths up to 1024. Can you please confirm whether this data mixture is the data used to produce Figure 2?

Thanks in advance.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions about the data lengths for Figure 2 #27

Questions about the data lengths for Figure 2 #27

chen-yingfa commented Apr 23, 2024

Questions about the data lengths for Figure 2 #27

Questions about the data lengths for Figure 2 #27

Comments

chen-yingfa commented Apr 23, 2024