normalization-NMT

This repository is the code for the COLING 2018 paper "An Evaluation of Neural Machine Translation Models on Historical Spelling Normalization"

We use Marian framework (https://github.com/marian-nmt/marian-dev) to train our NMT models.

The dataset can be found here: http://stp.lingfil.uu.se/histcorp/tools.html

You need to segment the token pairs into char sequences or subword units before feeding into Marian. We use the subword-nmt tool (https://github.com/rsennrich/subword-nmt) to learn subword units.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
README.md		README.md
bpe_postprocess.sh		bpe_postprocess.sh
decode.sh		decode.sh
editbpe.py		editbpe.py
eval_acc.py		eval_acc.py
eval_cer.py		eval_cer.py
marian_s2s_no_attention.h		marian_s2s_no_attention.h
run_eval_acc.sh		run_eval_acc.sh
run_eval_cer.sh		run_eval_cer.sh
token2char.py		token2char.py
train_seq2seq.sh		train_seq2seq.sh
train_transformer.sh		train_transformer.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

normalization-NMT

About

Releases

Packages

Languages

tanggongbo/normalization-NMT

Folders and files

Latest commit

History

Repository files navigation

normalization-NMT

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages