Character-Level Neural Machine Translation

This is an implementation of the models described in the paper "A Character-Level Decoder without Explicit Segmentation for Neural Machine Translation". http://arxiv.org/abs/1603.06147

Dependencies:

The majority of the script files are written in pure Theano.
In the preprocessing pipeline, there are the following dependencies.
Python Libraries: NLTK
MOSES: https://github.com/moses-smt/mosesdecoder
Subword-NMT (http://arxiv.org/abs/1508.07909): https://github.com/rsennrich/subword-nmt

This code is based on the dl4mt library.
link: https://github.com/nyu-dl/dl4mt-tutorial

Be sure to include the path to this library in your PYTHONPATH.

We recommend you to use the latest version of Theano.
If you want exact reproduction however, please use the following version of Theano.
commit hash: fdfbab37146ee475b3fd17d8d104fb09bf3a8d5c

Preparing Text Corpora:

The original text corpora can be downloaded from http://www.statmt.org/wmt15/translation-task.html
Once the downloading is finished, use the 'preprocess.sh' in 'preprocess' directory to preprocess the text files. For the character-level decoders, preprocessing is not necessary however, in order to compare the results with subword-level decoders and other word-level approaches, we apply the same process to all of the target corpora. Finally, use 'build_dictionary_char.py' for character-case and 'build_dictionary_word.py' for subword-case to build the vocabulary.
Updating...

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
character_base		character_base
character_biscale		character_biscale
preprocess		preprocess
presentation		presentation
subword_base		subword_base
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
data_iterator.py		data_iterator.py
mixer.py		mixer.py
nmt.py		nmt.py
translate_readme.txt		translate_readme.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Character-Level Neural Machine Translation

Dependencies:

Preparing Text Corpora:

About

Releases

Packages

Contributors 2

Languages

License

nyu-dl/dl4mt-cdec

Folders and files

Latest commit

History

Repository files navigation

Character-Level Neural Machine Translation

Dependencies:

Preparing Text Corpora:

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages