Massively Multilingual Conversational AI

A Ray Serve-based microservice providing:

Language Identification for ~4,017 languages (via a quantized version of MMS-LID).
Speech-to-Text for ~1,162 languages (via MMS-1B-ALL).
Translation for ~200 languages (via NLLB-200-distilled-600M).
Conversational Agents coming soon

This repository is inspired by Meta’s Massively Multilingual Speech (MMS) and No Language Left Behind (NLLB) initiatives. By combining these open-source models, the goal is to surface language technologies for diverse and low-resource languages.

Features

Language Identification
- Identifies the language of an audio clip.
- Uses a quantized ONNX version of MMS-LID, supporting over 4,000 language IDs.
- TODO: ensure CUDA execution on ONNX
Speech-to-Text
- Transcribes audio into text, using Facebook MMS-1B-ALL.
- Over 1,000 languages supported with the appropriate language adapter.
- TODO: ensure CUDA execution on ONNX
Text Translation
- Translates text between 200 languages using NLLB-200-distilled-600M.
- Supports ISO 639-3 language codes with script codes (e.g., fra_Latn, eng_Latn, etc.).
- TODO: Add ONNX implementation
- TODO: ensure CUDA execution
Ray Serve Microservice
- Provides a FastAPI-based interface served by Ray.
- Automatic scaling of replicas (GPU or CPU usage can be specified).

Architecture Overview

The LangIdDeployment handles language identification (mms-lid-4017).
The TranscriptionDeployment handles audio transcription (mms-1b-all).
The NLLBDeployment handles text translation (nllb-200-distilled-600M).
A single App deployment includes the FastAPI routes and ties them all together.

Requirements

Python 3.10 (recommended)
Conda for installing dependencies
FFmpeg (for audio processing)
GPU is optional but recommended for faster inference
CUDA >=12.4

Installation

Clone the repository:

git clone https://github.com/klebster2/mms-conversational-ai
cd mms-conversational-ai

Set up a Python environment:

conda env create -f environment.yml
conda activate

Usage

Running the Service

Start the Ray cluster (optionally in a separate terminal):
```
ray start --head
```
or
```
ray start --head --port=6379 --include-dashboard=true --dashboard-host=0.0.0.0 --dashboard-port=8265 --num-gpus=1
```
Or simply let Ray automatically start in local mode when you run the script.
Run the main script:
```
python api.py
```
This will:
- Initialize Ray
- Deploy the LangIdDeployment, TranscriptionDeployment, and NLLBDeployment
- Start a FastAPI server with the endpoints defined in app = FastAPI()
- Print logs as it runs any built-in smoke tests (if configured)

By default, the service will be available at http://127.0.0.1:8000. Visit http://127.0.0.1:8000/docs for an auto-generated Swagger UI.

Endpoints

Root – GET /
- Redirects to /docs for the Swagger UI.
Language Identification – POST /audio/languageidentification
- Accepts an audio file (Form data).
- Returns a JSON object with the detected language code (ISO 639-3) and the autonym.
Speech-to-Text – POST /audio/transcription
- Accepts an audio file (Form data) and an optional language query parameter.
- If language is not provided, it will first call the language identification endpoint to guess the language.
- Returns the transcription and the language code used.
Translation – POST /text/translation
- Accepts a JSON body with { "text": "...", "src_lang": "...", "tgt_lang": "..." }.
- Returns the translated text using the NLLB-200-distilled model.

Example Requests

Using curl from the command line, here are some basic examples:

Language Identification:

curl -X POST "http://127.0.0.1:8000/audio/languageidentification" \
     -H "accept: application/json" \
     -F "audio=@/path/to/your_audio.mp3"

Transcription (with automatic language detection):

curl -X POST "http://127.0.0.1:8000/audio/transcription" \
     -H "accept: application/json" \
     -F "audio=@/path/to/your_audio.wav"

Transcription (specifying a language):

curl -X POST "http://127.0.0.1:8000/audio/transcription?language=fra" \
     -H "accept: application/json" \
     -F "audio=@/path/to/french_audio.wav"

Translation:

curl -X POST "http://127.0.0.1:8000/text/translation" \
     -H "Content-Type: application/json" \
     -d '{"text":"Hello, world!", "src_lang":"eng", "tgt_lang":"fra"}'

Smoke Testing

The script includes run_smoke_test_audio and run_smoke_test_text helper functions that download test files and call the local endpoints. When you run python your_script.py, it performs a few sample calls:

French “merci” (should detect fra)
Buriat sample (expected bxm)
Gettysburg address (English) for transcription
Yoruba audio sample for transcription
Simple English-to-French translation

You can customize or remove these tests in the __main__ section.

License

Code: CC0 1.0 – You’re free to use and adapt without restriction.
Models: The pretrained models from Meta (MMS-LID, MMS-1B-ALL, and NLLB) are released under a CC-BY-NC-4.0 license.
- Please consult each model’s license (see their Hugging Face model pages) for usage terms and attribution requirements.
- Important: Commercial usage may be restricted under the CC-BY-NC-4.0 license.

References and Acknowledgments

Meta AI: Massively Multilingual Speech (MMS)
Meta AI: No Language Left Behind (NLLB)
Hugging Face model repositories:
Ray Serve for serving the FastAPI application at scale.
ISO 639-3 Language Codes

Also See:

Unicef - Why Mother Tongue Education holds the key to unlocking every child's potential
Letter to the UK Parliament By Lucy-Crompton Reid (Chief Executive, Wikimedia UK)
Unesco The world atlas of languages
Unesco World Atlas of Languages - Summary Document
Atlas of the World’s Languages in Danger
CIA world factbook - Languages
UCLA Phonetics Lab Data
ISO 639-3 Criticism

If you use or build upon this repository, please consider citing or mentioning the original models and referencing Meta’s relevant research.

Enjoy building massively multilingual conversational AI!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Massively Multilingual Conversational AI

Table of Contents

Features

Architecture Overview

Requirements

Installation

Usage

Running the Service

Endpoints

Example Requests

Smoke Testing

License

References and Acknowledgments

Also See:

Files

README.md

Latest commit

History

README.md

File metadata and controls

Massively Multilingual Conversational AI

Table of Contents

Features

Architecture Overview

Requirements

Installation

Usage

Running the Service

Endpoints

Example Requests

Smoke Testing

License

References and Acknowledgments

Also See: