Skip to content

Chaos96/Parrot

Repository files navigation

🦜 Parrot: Seamless Spoken Dialogue Interaction with Double-Channel Large Language Models

Abstract

We present Parrot, an innovative spoken dialogue language model with a unique pre-training and supervised fine-tuning (SFT) pipeline. Our approach uses both single-channel audio data and double-channel spoken dialogue data to train a textless speech language model.

Parrot

Key features:

  • Pre-training: Transform single-channel audio into discrete tokens for next-token prediction
  • SFT: Novel "next-token-pair prediction" objective for natural conversation comprehension
  • Result: More natural and fluid spoken interactions compared to baseline approaches

Parrot

Installation

git clone https://github.com/anonymous/parrot.git
cd parrot
python -m venv venv
source venv/bin/activate  # On Windows, use `venv\Scripts\activate`
pip install -r requirements.txt

Usage

  1. Prepare audio data for pre-training and fine-tuning
  2. Pre-train: python pretrain.py --input_data path/to/single_channel_data
  3. Fine-tune: python finetune.py --input_data path/to/double_channel_data
  4. Inference: python inference.py --input_audio path/to/input.wav

For detailed instructions, see the docs folder.

Project Structure

parrot/
├── data/
│   ├── single_channel/
│   └── double_channel/
├── models/
│   ├── dual_llama.py
│   └── dual_phi.py
├── utils/
│   ├── audio_processing.py
│   └── tokenization.py
├── scripts/
│   ├── pretrain.py
│   ├── finetune.py
│   └── inference.py
├── docs/
│   └── usage_guide.md
├── tests/
│   └── test_models.py
├── requirements.txt
├── setup.py
└── README.md

License

This project is licensed under the MIT License. See the LICENSE file for details.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published