Efficient Machine Translation Corpus Generation with LLM Integration

Introduction

This README outlines a groundbreaking approach to Machine Translation (MT) corpus generation, developed and implemented by aiXplain Inc. This method leverages Large Language Models (LLMs) to enhance the quality and efficiency of MT corpora, addressing the inherent challenges in traditional corpus generation methods.

Abstract

This project presents an innovative approach to machine translation (MT) corpus generation, blending human-in-the-loop post-editing with the utilization of Large Language Models (LLMs). This methodology enhances both the efficiency and quality of MT processes. We've built on our previous work, which focused on real-time training of custom MT quality estimation metrics informed by linguist updates, and expanded it to include the capabilities of LLMs. This integration leads to a more efficient and accurate translation process, including features like LLM-Enhanced Translation Synthesis, LLM-Assisted Annotation Analysis, LLM-Driven Pseudo Labeling, and LLM Translation Recommendation System. The source code and a demo video of this project are available for community use.

Source Code: GitHub Repository Demo Video: YouTube Link

How to run?

docker-compose up -d
uvicorn main:app --port 8088

Admin app

python -m streamlit run  /home/ubuntu/repos/Efficient-MT/web_app/admin/admin_app.py

Rater app

python -m streamlit run  /home/ubuntu/repos/Efficient-MT/web_app/annotator/rater_app.py

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
HyperMT		HyperMT
data_centric		data_centric
embedders		embedders
gemba		gemba
imgs		imgs
models		models
notebooks		notebooks
outputs/cache		outputs/cache
src		src
web_app		web_app
wmt-mqm-human-evaluation @ 07dcaaa		wmt-mqm-human-evaluation @ 07dcaaa
.gitignore		.gitignore
.gitmodules		.gitmodules
.pre-commit-config.yaml		.pre-commit-config.yaml
README.md		README.md
dataset_vizs.ipynb		dataset_vizs.ipynb
extract_features.py		extract_features.py
helpers.py		helpers.py
llm_utils.py		llm_utils.py
main.py		main.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
utils.py		utils.py
vars.env		vars.env

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Efficient Machine Translation Corpus Generation with LLM Integration

Introduction

Abstract

How to run?

About

Releases

Packages

Languages

aixplain/Efficient-MT

Folders and files

Latest commit

History

Repository files navigation

Efficient Machine Translation Corpus Generation with LLM Integration

Introduction

Abstract

How to run?

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages