Skip to content

Arabic to english machine translation using classic and transliterated approach.

Notifications You must be signed in to change notification settings

lukasz-staniszewski/ar-en-transliteration-mt

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Impact of transliteration on Arabic to English Machine Translation

Bartosz Cywiński & Łukasz Staniszewski (Warsaw Univerity of Technology)

banner

Task

Task was to investigate impact of transliteration on Machine Translation from Arabic to English. Full documentation written in Polish is here.

Instalation:

  1. To download all data and preprocess it go to notebooks/data_preprocessing.ipynb and use it to get all processed data in data/processed/

  2. To get your processed data sentencepieced, go to model_scripts, and, using python environment with fairseq installed, run bash script:

$ bash train_decode_bpe_sentencepiece.sh
  1. Now you need to binarize all data to have it work with fairseq - run bash script:
$ bash preprocess_fairseq.sh
  1. Now start learning of model:
$ bash train_fairseq.sh
  1. Generate model predictions:
$ bash generate_fairseq.sh
  1. To evaluate metrics go to notebooks/ and run metrics.ipynb

About

Arabic to english machine translation using classic and transliterated approach.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published