Skip to content

Latest commit

 

History

History
59 lines (59 loc) · 2.21 KB

2022-06-28-bansal22b.md

File metadata and controls

59 lines (59 loc) · 2.21 KB
title booktitle abstract layout series publisher issn id month tex_title firstpage lastpage page order cycles bibtex_author author date address container-title volume genre issued pdf extras
Data Scaling Laws in NMT: The Effect of Noise and Architecture
Proceedings of the 39th International Conference on Machine Learning
In this work, we study the effect of varying the architecture and training data quality on the data scaling properties of Neural Machine Translation (NMT). First, we establish that the test loss of encoder-decoder transformer models scales as a power law in the number of training samples, with a dependence on the model size. Then, we systematically vary aspects of the training setup to understand how they impact the data scaling laws. In particular, we change the following (1) Architecture and task setup: We compare to a transformer-LSTM hybrid, and a decoder-only transformer with a language modeling loss (2) Noise level in the training distribution: We experiment with filtering, and adding iid synthetic noise. In all the above cases, we find that the data scaling exponents are minimally impacted, suggesting that marginally worse architectures or training data can be compensated for by adding more data. Lastly, we find that using back-translated data instead of parallel data, can significantly degrade the scaling exponent.
inproceedings
Proceedings of Machine Learning Research
PMLR
2640-3498
bansal22b
0
Data Scaling Laws in {NMT}: The Effect of Noise and Architecture
1466
1482
1466-1482
1466
false
Bansal, Yamini and Ghorbani, Behrooz and Garg, Ankush and Zhang, Biao and Cherry, Colin and Neyshabur, Behnam and Firat, Orhan
given family
Yamini
Bansal
given family
Behrooz
Ghorbani
given family
Ankush
Garg
given family
Biao
Zhang
given family
Colin
Cherry
given family
Behnam
Neyshabur
given family
Orhan
Firat
2022-06-28
Proceedings of the 39th International Conference on Machine Learning
162
inproceedings
date-parts
2022
6
28