Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add TTS with VITS #360

Merged
merged 6 commits into from
Oct 13, 2023
Merged

Add TTS with VITS #360

merged 6 commits into from
Oct 13, 2023

Conversation

csukuangfj
Copy link
Collaborator

No description provided.

@csukuangfj
Copy link
Collaborator Author

Usage of this PR

Build sherpa-onnx with this PR

mkdir build
cd build
cmake ..
make -j 
(py38) fangjuns-MacBook-Pro:build fangjun$ ls -lh bin/sherpa-onnx-offline-tts
-rwxr-xr-x  1 fangjun  staff    36K Oct 13 19:17 bin/sherpa-onnx-offline-tts

Download the model files

wget https://huggingface.co/csukuangfj/vits-ljs/resolve/main/vits-ljs.onnx
wget https://huggingface.co/csukuangfj/vits-ljs/resolve/main/lexicon.txt
wget https://huggingface.co/csukuangfj/vits-ljs/resolve/main/tokens.txt

Run the model

./bin/sherpa-onnx-offline-tts \
  --vits-model=./vits-ljs.onnx \
  --vits-lexicon=./lexicon.txt \
  --vits-tokens=./tokens.txt \
  'Success is not final, failure is not fatal, it is the courage to continue that counts!'

It will generate a file

$ ls -lh t.pcm
-rw-r--r--  1 fangjun  staff   489K Oct 13 19:16 t.pcm

Please use the following command to convert it to a wave file:

 sox -t raw -r 22050 -b 32 -e floating-point -c 1 ./t.pcm ./1.wav

And you will see

$ ls -lh 1.wav
-rw-r--r--  1 fangjun  staff   489K Oct 13 19:16 1.wav
$ soxi 1.wav

Input File     : '1.wav'
Channels       : 1
Sample Rate    : 22050
Precision      : 25-bit
Duration       : 00:00:05.68 = 125184 samples ~ 425.796 CDDA sectors
File Size      : 501k
Bit Rate       : 706k
Sample Encoding: 32-bit Floating Point PCM

I have converted 1.wav to 1.mov and posted it here so you can listen to it.

1.mov

@csukuangfj
Copy link
Collaborator Author

TODOs

  • Support writing wav to file
  • Provide Python API
  • Provide C API
  • Provide Go API
  • Provide C# API
  • Provide Android demos
  • Provide iOS demos
  • Add documentation

@csukuangfj csukuangfj changed the title WIP: Begin to add TTS with VITS Add TTS with VITS Oct 13, 2023
@csukuangfj csukuangfj merged commit 536d580 into k2-fsa:master Oct 13, 2023
133 of 144 checks passed
@csukuangfj csukuangfj deleted the vits branch October 13, 2023 11:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant