Please support the original RVC, without it, this inference wont be possible to make.
- Support V1 & V2 Model ✅
- Youtube Audio Downloader ✅
- Demucs (Voice Splitter) [Internet required for downloading model] ✅
- TTS Support ✅
- Microphone Support ✅
- HuggingFace Spaces Inference [for CPU Tier only] ✅
- Remove Youtube & Input Path ✅
- Remove Crepe Support due to gpu requirement ✅
Install ffmpeg first before running these command.
- Windows
Run the
start.bat
to download the model and dependencies.
Run therun.bat
to run the inference - MacOS & Linux
For MacOS. before running the script, please install wget
Run thestart.sh
to download the model and dependencies.
Run therun.sh
to run the inference
-
Install Pytorch
- CPU only (any OS)
pip install torch torchvision torchaudio
- Nvidia (CUDA used)
# For Windows (Due to flashv2 not supported in windows, Issue: https://github.com/Dao-AILab/flash-attention/issues/345#issuecomment-1747473481) pip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 --index-url https://download.pytorch.org/whl/cu121 # Other (Linux, etc) pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
-
Install ffmpeg
-
Install Dependencies
pip install -r requirements.txt
- Download Pre-model
# Hubert Model
https://huggingface.co/lj1995/VoiceConversionWebUI/blob/main/hubert_base.pt
# Save it to /assets/hubert/hubert_base.pt
# RVMPE (rmvpe pitch extraction, Optional)
https://huggingface.co/lj1995/VoiceConversionWebUI/blob/main/rmvpe.pt
# Save it to /assets/rvmpe/rmvpe.pt
- Run WebUI
python app.py