ai-audio-tools

Community list of open-source AI tools for audio, music, and speech applications

To contribute to the list

Edit the README and make a PR

Audio

DAW

OpenVINO: OpenVINO AI effects for Audacity
TuneFlow: TuneFlow is a next-gen DAW that aims to boost music making productivity through the power of AI

Music

Analysis

Essentia: open-source C++ library for audio analysis and audio-based music information retrieval
Librosa: Python library for audio and music analysis
DDSP: DDSP is a library of differentiable versions of common DSP functions (such as synthesizers, waveshapers, and filters). This allows these interpretable elements to be used as part of an deep learning model, especially as the output layers for audio generation
MIDI-DDSP: MIDI-DDSP is a hierarchical audio generation model for synthesizing MIDI expanded from DDSP
TorchAudio: Data manipulation and transformation for audio signal processing, powered by PyTorch
nnAudio: Audio processing by using pytorch 1D convolution network
pyAudioAnalysis: Python Audio Analysis Library: Feature Extraction, Classification, Segmentation and Applications
mutagen: a Python module to handle audio metadata
dejavu: Audio fingerprinting and recognition in Python
audiomentations: A Python library for audio data augmentation. Inspired by albumentations. Useful for machine learning
soundata: Python library for downloading, loading, and working with sound datasets
EfficientAT: This repository aims at providing efficient CNNs for Audio Tagging. We provide AudioSet pre-trained models ready for downstream training and extraction of audio embeddings
AugLy: A data augmentations library for audio, image, text, and video
Pedalboard: A Python library for working with audio
TinyTag: a Python library for reading audio file metadata
OpenSmile: The Munich Open-Source Large-Scale Multimedia Feature Extractor
Madmom: Python audio and music signal processing library
Beets: a music library manager and MusicBrainz tagger
Mirdata: Python library for working with Music Information Retrieval datasets
Partitura: A python package for handling modern staff notation of music
msaf: a python package for the analysis of music structural segmentation algorithms
basic-pitch: A lightweight yet powerful audio-to-MIDI converter with pitch bend detection
jams: A JSON Annotated Music Specification for Reproducible MIR Research

Production

Spleeter: Deezer source separation library including pretrained models
DeepAFx: Third-party audio effects plugins as differentiable layers within deep neural networks
matchering: open source audio matching and mastering
AudioDec: An Open-source Streaming High-fidelity Neural Audio Codec
USS: This is the PyTorch implementation of the Universal Source Separation with Weakly labelled Data
FAST-RIR: This is the official implementation of our neural-network-based fast diffuse room impulse response generator (FAST-RIR) for generating room impulse responses (RIRs) for a given rectangular acoustic environment

Generation

StableAudio: Generative models for conditional audio generation
AudioCraft: a PyTorch library for deep learning research on audio generation. AudioCraft contains inference and training code for two state-of-the-art AI generative models producing high-quality audio: AudioGen and MusicGen.
Jukebox: A generative model for music
Magenta: symbolic music generation with diffusion models
TorchSynth: A GPU-optional modular synthesizer in pytorch, 16200x faster than realtime, for audio ML researchers
audiobox: Audiobox is Meta’s new foundation research model for audio generation. It can generate voices and sound effects using a combination of voice inputs and natural language text prompts
Amphion: Amphion is a toolkit for Audio, Music, and Speech Generation
AudioGPT: AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head
WaveGAN: WaveGAN: Learn to synthesize raw audio with generative adversarial networks
RAVE: Official implementation of the RAVE model: a Realtime Audio Variational autoEncoder
AudioLDM: This toolbox aims to unify audio generation model evaluation for easier comparison
Make-An-Audio: a conditional diffusion probabilistic model capable of generating high fidelity audio efficiently from X modality
Diffuser: Diffusers is the go-to library for state-of-the-art pretrained diffusion models for generating images, audio, and even 3D structures of molecules
stable-audio-tools: Generative models for conditional audio generation
MidiTok: MIDI / symbolic music tokenizers for Deep Learning models
muspy: an open source Python library for symbolic music generation
[MusicLM] (https://google-research.github.io/seanet/musiclm/examples/): a model generating high-fidelity music from text descriptions
riffusion: Stable diffusion for real-time music generation
muzic: Music Understanding and Generation with Artificial Intelligence
midi-lm: Generative modeling of MIDI files
UniAudio: The Open Source Code of UniAudio
MuseGAN: An AI for Music Generation

Speech

Recognition

Whisper: a multitasking model that can perform multilingual speech recognition, speech translation, and language identification
Deep Speech: Mozilla's open-source speech-to-text engine
Kaldi ASR: open-source speech recognition toolkit written in C++
PaddleSpeech: Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting
NeMo: a framework for generative AI
julius: Open-Source Large Vocabulary Continuous Speech Recognition Engine
speechbrain: an open-source and all-in-one conversational AI toolkit based on PyTorch
pocketsphinx: A small speech recognizer
FunASR: A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models
NeuralSpeech: a research project at Microsoft Research Asia, which focuses on neural network based speech processing, including automatic speech recognition (ASR), text-to-speech synthesis (TTS), spatial audio synthesis, video dubbing, etc
espnet: End-to-End Speech Processing Toolkit

Production

Descript audio codec: State-of-the-art audio codec with 90x compression factor. Supports 44.1kHz, 24kHz, and 16kHz mono/stereo audio
Descript audio tools: Object-oriented handling of audio data, with GPU-powered augmentations, and more
Meta encodec: State-of-the-art deep learning based audio codec supporting both mono 24 kHz audio and stereo 48 kHz audio
audino: Open source audio annotation tool for humans

Synthesis

Coqui TTS: a deep learning toolkit for Text-to-Speech, battle-tested in research and production
DiffSinger: singing voice synthesis via shallow diffusion mechanism
Real-Time-Voice-Cloning: Clone a voice in 5 seconds to generate arbitrary speech in real-time
wavenet: A TensorFlow implementation of DeepMind's WaveNet paper
FastSpeech2: An implementation of Microsoft's "FastSpeech 2: Fast and High-Quality End-to-End Text to Speech"
MelGAN: Unofficial PyTorch implementation of MelGAN vocoder
hifi-gan: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis
elevenlabs-pythons: The official Python API for ElevenLabs Text to Speech.
tortoise-tts: A multi-voice TTS system trained with an emphasis on quality
lyrebird: Simple and powerful voice changer for Linux, written with Python & GTK
elevenlabs: The official Python API for ElevenLabs Text to Speech
piper: A fast, local neural text to speech system
tts-generation-webui: TTS Generation Web UI (Bark, MusicGen + AudioGen, Tortoise, RVC, Vocos, Demucs, SeamlessM4T, MAGNet)
GPT-SoVITS: 1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
metavoice-src: Foundational model for human-like, expressive TTS
Real-Time-Voice-Cloning: Clone a voice in 5 seconds to generate arbitrary speech in real-time
Retrieval-based-Voice-Conversion-WebUI: Voice data <= 10 mins can also be used to train a good VC model!
midi2voice: Singing synthesis from MIDI file
OpenVoice: Instant voice cloning by MyShell

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ai-audio-tools

To contribute to the list

Audio

DAW

Music

Analysis

Production

Generation

Speech

Recognition

Production

Synthesis

About

Releases

Packages

yyf/ai-audio-tools

Folders and files

Latest commit

History

Repository files navigation

ai-audio-tools

To contribute to the list

Audio

DAW

Music

Analysis

Production

Generation

Speech

Recognition

Production

Synthesis

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages