eng_guj_parallel_corpus

The repository contains 65k corpuses translated from Gujarati to English language.

The seperator used is '\n'. User can do some extra stuff to change the seperation, according to the need of the expected sulution.

About Dataset

Dataset is developed at the Language Processing Laboratory, Uka Tarsadia University, Gujarat, India. It was part of ongoing research on Natural Lanugage Processing and Machine Translation. This dataset contains around 65000 english sentiences from MSCOCO captioning dataset that are translated to Gujarati and converted to parallel format.

Citation

P. Shah and V. Bakrola, "Neural Machine Translation System of Indic Languages - An Attention based Approach," 2019 Second International Conference on Advanced Computational and Communication Paradigms (ICACCP), Gangtok, India, 2019, pp. 1-5, doi: 10.1109/ICACCP.2019.8882969. IEEE Xlpore arXiv

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
train.en		train.en
train.gu		train.gu
vocab.en		vocab.en
vocab.gu		vocab.gu

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

eng_guj_parallel_corpus

About Dataset

Citation

About

Releases

Packages

Contributors 2

shahparth123/eng_guj_parallel_corpus

Folders and files

Latest commit

History

Repository files navigation

eng_guj_parallel_corpus

About Dataset

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Packages