Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add studio speakers to open source XTTS! #3405

Merged
merged 18 commits into from
Dec 12, 2023
Merged
Show file tree
Hide file tree
Changes from 15 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
53 changes: 0 additions & 53 deletions .github/workflows/api_tests.yml

This file was deleted.

52 changes: 0 additions & 52 deletions .github/workflows/zoo_tests_tortoise.yml

This file was deleted.

3 changes: 0 additions & 3 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -35,9 +35,6 @@ test_zoo: ## run zoo tests.
inference_tests: ## run inference tests.
nose2 -F -v -B --with-coverage --coverage TTS tests.inference_tests

api_tests: ## run api tests.
nose2 -F -v -B --with-coverage --coverage TTS tests.api_tests

data_tests: ## run data tests.
nose2 -F -v -B --with-coverage --coverage TTS tests.data_tests

Expand Down
31 changes: 0 additions & 31 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,6 @@
- 📣 [🐶Bark](https://github.com/suno-ai/bark) is now available for inference with unconstrained voice cloning. [Docs](https://tts.readthedocs.io/en/dev/models/bark.html)
- 📣 You can use [~1100 Fairseq models](https://github.com/facebookresearch/fairseq/tree/main/examples/mms) with 🐸TTS.
- 📣 🐸TTS now supports 🐢Tortoise with faster inference. [Docs](https://tts.readthedocs.io/en/dev/models/tortoise.html)
- 📣 **Coqui Studio API** is landed on 🐸TTS. - [Example](https://github.com/coqui-ai/TTS/blob/dev/README.md#-python-api)
- 📣 [**Coqui Studio API**](https://docs.coqui.ai/docs) is live.
- 📣 Voice generation with prompts - **Prompt to Voice** - is live on [**Coqui Studio**](https://app.coqui.ai/auth/signin)!! - [Blog Post](https://coqui.ai/blog/tts/prompt-to-voice)
- 📣 Voice generation with fusion - **Voice fusion** - is live on [**Coqui Studio**](https://app.coqui.ai/auth/signin).
- 📣 Voice cloning is live on [**Coqui Studio**](https://app.coqui.ai/auth/signin).
Expand Down Expand Up @@ -253,29 +251,6 @@ tts.tts_with_vc_to_file(
)
```

#### Example using [🐸Coqui Studio](https://coqui.ai) voices.
You access all of your cloned voices and built-in speakers in [🐸Coqui Studio](https://coqui.ai).
To do this, you'll need an API token, which you can obtain from the [account page](https://coqui.ai/account).
After obtaining the API token, you'll need to configure the COQUI_STUDIO_TOKEN environment variable.

Once you have a valid API token in place, the studio speakers will be displayed as distinct models within the list.
These models will follow the naming convention `coqui_studio/en/<studio_speaker_name>/coqui_studio`

```python
# XTTS model
models = TTS(cs_api_model="XTTS").list_models()
# Init TTS with the target studio speaker
tts = TTS(model_name="coqui_studio/en/Torcull Diarmuid/coqui_studio", progress_bar=False)
# Run TTS
tts.tts_to_file(text="This is a test.", language="en", file_path=OUTPUT_PATH)

# V1 model
models = TTS(cs_api_model="V1").list_models()
# Run TTS with emotion and speed control
# Emotion control only works with V1 model
tts.tts_to_file(text="This is a test.", file_path=OUTPUT_PATH, emotion="Happy", speed=1.5)
```

#### Example text to speech using **Fairseq models in ~1100 languages** 🤯.
For Fairseq models, use the following name format: `tts_models/<lang-iso_code>/fairseq/vits`.
You can find the language ISO codes [here](https://dl.fbaipublicfiles.com/mms/tts/all-tts-languages.html)
Expand Down Expand Up @@ -351,12 +326,6 @@ If you don't specify any models, then it uses LJSpeech based English model.
$ tts --text "Text for TTS" --pipe_out --out_path output/path/speech.wav | aplay
```

- Run TTS and define speed factor to use for 🐸Coqui Studio models, between 0.0 and 2.0:

```
$ tts --text "Text for TTS" --model_name "coqui_studio/<language>/<dataset>/<model_name>" --speed 1.2 --out_path output/path/speech.wav
```

- Run a TTS model with its default vocoder model:

```
Expand Down
75 changes: 38 additions & 37 deletions TTS/.models.json
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,13 @@
"multilingual": {
"multi-dataset": {
"xtts_v2": {
"description": "XTTS-v2.0.2 by Coqui with 16 languages.",
"description": "XTTS-v2.0.3 by Coqui with 17 languages.",
"hf_url": [
"https://coqui.gateway.scarf.sh/hf-coqui/XTTS-v2/main/model.pth",
"https://coqui.gateway.scarf.sh/hf-coqui/XTTS-v2/main/config.json",
"https://coqui.gateway.scarf.sh/hf-coqui/XTTS-v2/main/vocab.json",
"https://coqui.gateway.scarf.sh/hf-coqui/XTTS-v2/main/hash.md5"
"https://coqui.gateway.scarf.sh/hf-coqui/XTTS-v2/main/hash.md5",
"https://coqui.gateway.scarf.sh/hf-coqui/XTTS-v2/main/speakers_xtts.pth"
],
"model_hash": "10f92b55c512af7a8d39d650547a15a7",
"default_vocoder": null,
Expand Down Expand Up @@ -39,22 +40,22 @@
"commit": "e9a1953e",
"license": "CC BY-NC-ND 4.0",
"contact": "egolge@coqui.ai"
},
"bark": {
"description": "🐶 Bark TTS model released by suno-ai. You can find the original implementation in https://github.com/suno-ai/bark.",
"hf_url": [
"https://coqui.gateway.scarf.sh/hf/bark/coarse_2.pt",
"https://coqui.gateway.scarf.sh/hf/bark/fine_2.pt",
"https://app.coqui.ai/tts_model/text_2.pt",
"https://coqui.gateway.scarf.sh/hf/bark/config.json",
"https://coqui.gateway.scarf.sh/hf/bark/hubert.pt",
"https://coqui.gateway.scarf.sh/hf/bark/tokenizer.pth"
],
"default_vocoder": null,
"commit": "e9a1953e",
"license": "MIT",
"contact": "https://www.suno.ai/"
}
// "bark": {
// "description": "🐶 Bark TTS model released by suno-ai. You can find the original implementation in https://github.com/suno-ai/bark.",
// "hf_url": [
// "https://coqui.gateway.scarf.sh/hf/bark/coarse_2.pt",
// "https://coqui.gateway.scarf.sh/hf/bark/fine_2.pt",
// "https://app.coqui.ai/tts_model/text_2.pt",
// "https://coqui.gateway.scarf.sh/hf/bark/config.json",
// "https://coqui.gateway.scarf.sh/hf/bark/hubert.pt",
// "https://coqui.gateway.scarf.sh/hf/bark/tokenizer.pth"
// ],
// "default_vocoder": null,
// "commit": "e9a1953e",
// "license": "MIT",
// "contact": "https://www.suno.ai/"
// }
}
},
"bg": {
Expand Down Expand Up @@ -266,26 +267,26 @@
"contact": "adamfroghyar@gmail.com"
}
},
"multi-dataset": {
"tortoise-v2": {
"description": "Tortoise tts model https://github.com/neonbjb/tortoise-tts",
"github_rls_url": [
"https://app.coqui.ai/tts_model/autoregressive.pth",
"https://coqui.gateway.scarf.sh/v0.14.1_models/clvp2.pth",
"https://coqui.gateway.scarf.sh/v0.14.1_models/cvvp.pth",
"https://coqui.gateway.scarf.sh/v0.14.1_models/diffusion_decoder.pth",
"https://coqui.gateway.scarf.sh/v0.14.1_models/rlg_auto.pth",
"https://coqui.gateway.scarf.sh/v0.14.1_models/rlg_diffuser.pth",
"https://coqui.gateway.scarf.sh/v0.14.1_models/vocoder.pth",
"https://coqui.gateway.scarf.sh/v0.14.1_models/mel_norms.pth",
"https://coqui.gateway.scarf.sh/v0.14.1_models/config.json"
],
"commit": "c1875f6",
"default_vocoder": null,
"author": "@neonbjb - James Betker, @manmay-nakhashi Manmay Nakhashi",
"license": "apache 2.0"
}
},
// "multi-dataset": {
// "tortoise-v2": {
// "description": "Tortoise tts model https://github.com/neonbjb/tortoise-tts",
// "github_rls_url": [
// "https://app.coqui.ai/tts_model/autoregressive.pth",
// "https://coqui.gateway.scarf.sh/v0.14.1_models/clvp2.pth",
// "https://coqui.gateway.scarf.sh/v0.14.1_models/cvvp.pth",
// "https://coqui.gateway.scarf.sh/v0.14.1_models/diffusion_decoder.pth",
// "https://coqui.gateway.scarf.sh/v0.14.1_models/rlg_auto.pth",
// "https://coqui.gateway.scarf.sh/v0.14.1_models/rlg_diffuser.pth",
// "https://coqui.gateway.scarf.sh/v0.14.1_models/vocoder.pth",
// "https://coqui.gateway.scarf.sh/v0.14.1_models/mel_norms.pth",
// "https://coqui.gateway.scarf.sh/v0.14.1_models/config.json"
// ],
// "commit": "c1875f6",
// "default_vocoder": null,
// "author": "@neonbjb - James Betker, @manmay-nakhashi Manmay Nakhashi",
// "license": "apache 2.0"
// }
// },
"jenny": {
"jenny": {
"description": "VITS model trained with Jenny(Dioco) dataset. Named as Jenny as demanded by the license. Original URL for the model https://www.kaggle.com/datasets/noml4u/tts-models--en--jenny-dioco--vits",
Expand Down
Loading
Loading