This repo contains the dataset built in the following paper:
Ancient-Modern Chinese Translation with a New Large Training Dataset. Dayiheng Liu, Kexin Yang, Qian Qu, Jiancheng Lv, TALLIP 2019 [arXiv]
We create a new large-scale Ancient-Modern Chinese parallel corpus which contains 1.24M bilingual pairs. To our best knowledge, this is the first large high-quality Ancient-Modern Chinese dataset.
We plan to gradually release the dataset.
The dataset can be downloaded at the [link].