normalization-NMT

This repository is the code for the COLING 2018 paper "An Evaluation of Neural Machine Translation Models on Historical Spelling Normalization"

We use Marian framework (https://github.com/marian-nmt/marian-dev) to train our NMT models.

The dataset can be found here: http://stp.lingfil.uu.se/histcorp/tools.html

You need to segment the token pairs into char sequences or subword units before feeding into Marian. We use the subword-nmt tool (https://github.com/rsennrich/subword-nmt) to learn subword units.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

normalization-NMT

Files

README.md

Latest commit

History

README.md

File metadata and controls

normalization-NMT