Replies: 42 comments 36 replies
-
Any ELI5 tutorial/doc for creating a dataset for your own language/dialect? |
Beta Was this translation helpful? Give feedback.
-
Not sure if it is ELI5, but there is this link https://github.com/coqui-ai/TTS/wiki/What-makes-a-good-TTS-dataset Also, @thorstenMueller has created a TTS dataset from the gecko so he might have valuable comments if you have specific questions. |
Beta Was this translation helpful? Give feedback.
-
Feel free to ask specific question. I'd happy to share my experiences on recording a new dataset here.
|
Beta Was this translation helpful? Give feedback.
-
Hi @erogol , thank you for the amazing work, from Mozilla TTS to coqui-ai. Although Mozilla seemed perfect to me as it had wider community reach, just hope this grows even wider and faster than Mozilla. I am planning to share my models for Spanish and Italian using (Taco2 600k steps + WaveRNN). Audio quality seems to be good but I need to train it a bit more and also ask dataset providers if that would be okay if I make the models public. Let me know if I can contribute in any way I have Google Colab Pro resources laying around free. +-----------------------------------------------------------------------------+ +-----------------------------------------------------------------------------+ |
Beta Was this translation helpful? Give feedback.
-
@Sadam1195 thx for the amazing work 🚀🚀. I really hope we can include your models, of course with the right attribution going to you. Just waiting for your signal. For general contribution, this is a nice place to start https://github.com/coqui-ai/TTS/blob/main/CONTRIBUTING.md If you just like to train models, let me know we can also find new datasets to attack. |
Beta Was this translation helpful? Give feedback.
-
Nonetheless, I would love to train model on new datasets (if you have any) specially in the languages in which TTS models haven't been made public yet. |
Beta Was this translation helpful? Give feedback.
-
Hello, I've just started to train a public domain Japanese dataset https://github.com/kaiidams/Kokoro-Speech-Dataset with Tacotron 2 of the latest master of https://github.com/mozilla/TTS on Google Colab Free. After 19K steps, I can hear what he says, although it is metallic. To proceed, I'd like to know which branch and repo do you recommend for me to use? https://github.com/erogol/TTS_recipes seems a bit old. |
Beta Was this translation helpful? Give feedback.
-
Please use this https://github.com/coqui-ai/TTS instead of https://github.com/mozilla/TTS and use the latest main branch. @kaiidams |
Beta Was this translation helpful? Give feedback.
-
I trained Tacotron 2 for 130K steps with this code https://github.com/kaiidams/TTS/tree/kaiidams/kokoro which was forked from the latest main. The input of the model is Romanized Japanese text. It requires some dependencies like MeCab to convert texts from ordinary ones. |
Beta Was this translation helpful? Give feedback.
-
@kaiidams if you can send a PR for text conversion something similar to the Chinese API we have, with the model, would be a great contribution. |
Beta Was this translation helpful? Give feedback.
-
Any reason why this and this isn't in the readme? |
Beta Was this translation helpful? Give feedback.
-
Hi @zubairahmed-ai. |
Beta Was this translation helpful? Give feedback.
-
@thorstenMueller Perfect timing, thank you |
Beta Was this translation helpful? Give feedback.
-
Oh just realized this talk happened during recent Google I/O and I somehow didn't catch it while watching other videos :) |
Beta Was this translation helpful? Give feedback.
-
@thorstenMueller Thanks so much for the great video explaining your process in details with some tips. I'll make sure I follow that, do you plan to give a try to other models besides Tacotron-2? like Align-TTS? |
Beta Was this translation helpful? Give feedback.
-
Hi all, I want to build voice conversion using the cross-language technique. For this purpose, I have used voice conversion challenge 2020 architecture. After 6 weeks of working on it, there are no acceptable results found. Now I decide to use YourTTS voice conversion architecture. I want to train it on the English-Urdu dataset. But I don't know where to start, can anyone guide me in this regard, Any help is appreciated. |
Beta Was this translation helpful? Give feedback.
-
Hey , I would like to share my Nepali Model trained on Openslr dataset demo : vitsexamplex2_1015k.webm Here is the drive link: https://drive.google.com/drive/folders/1Jwr7ITDA4hFKLMSVXUj3A8nQlpdxT6ql?usp=sharing Thanks :) |
Beta Was this translation helpful? Give feedback.
-
Checkout Voice models for Mimic 3 from Mycroft AI |
Beta Was this translation helpful? Give feedback.
-
@Tarek-Hasan looking at the released models they pretty much used our code but I don't see the released model binaries. |
Beta Was this translation helpful? Give feedback.
-
Sorry, I somehow included wrong link. Here's the voice models link |
Beta Was this translation helpful? Give feedback.
-
I would like to present a new VITS multispeaker model trained by @GerrySant for Catalan within the framework of @projecte-aina. It is trained from zero with 101460 utterances consisting of 257 speakers, approx 138 hours of speech. We used three datasets; Festcat and Google Catalan TTS (both TTS datasets) and also a part of Common Voice 8. It is trained with TTS v0.8.0. Here are two examples of a male and a female voice. f_occ_de.mp4m_occ_88.mp4The model is uploaded in Huggingface with its own space to generate voices. We also would like the models to be accessible as a part of Coqui models. |
Beta Was this translation helpful? Give feedback.
-
@gullabi thanks for sharing. I'll add the model asap. |
Beta Was this translation helpful? Give feedback.
-
I trained a glow_tts model for Persian language. |
Beta Was this translation helpful? Give feedback.
-
@erogol To the best of our knowledge (from our extensive google search and research and extensive human validation) we’ve discovered that the Bangla Vits TTS (text to speech) system that we trained and used for reading various bangla tafsir / hadith is the highest performing State of the Art (SOTA) Bangla neural voice cloning system till this date (Thursday, December 29, 2022) that’s ever released publicly for Bangla language for free and it beats past TTS systems like gtts,silero-tts,indic-tts by large margin in terms of quality. |
Beta Was this translation helpful? Give feedback.
-
Started a project for creating collections of voices: https://huggingface.co/voices/ The first addition is https://huggingface.co/voices/VCTK_European_English_Females (trained at 85000 steps, might go a bit further later) A nice feature is the ability to quickly preview each of the voices based on the wav samples / README markdown. I plan to go through and include all of the VCTK voices. It would be kind of cool to do one big VCTK dataset with the 110 english voices, but it would be so many dataset files that it'd feel like a bit of a slog to get through, and also I notice that the voices kind of pull each other different directions (in the VCTK European English Females, for example, the original YourTTS male-en-2 has taken on the properties of the female voices) so I considered that segmenting them by similarity might produce better outputs. A couple other community members in the discord have been helping, one is setting up a space where people can test the voices out and another tested the model and helped me iron out some issues in the config.json (which I think may still need a little work) These can be added to the model zoo or list if wanted. The VCTK set will be 100% from VCTK & YourTTS training data so it should all be 'for sure' CC-by-4.0 I may add some custom curated voices that sit more in the gray area to the voices space on huggingface but it will be labeled clearly which ones were trained off of cc-by-4.0 or if they have manually sourced training. |
Beta Was this translation helpful? Give feedback.
-
Anyone happens to have a Norwegian voice model? |
Beta Was this translation helpful? Give feedback.
-
I prepared a small dataset in a format similar to LJSpeech for Bulgarian and English. I can also add more audio files for an additional speaker if that will be helpful. |
Beta Was this translation helpful? Give feedback.
-
Hi. I would like to share model for Belarusian language trained on the Mozilla Common Voice dataset. License: CC-BY-SA 4.0 |
Beta Was this translation helpful? Give feedback.
-
Bellow repo contains my Models and demoes and training codes for Persian tts |
Beta Was this translation helpful? Give feedback.
-
Can you share some good British English model voice? US English voices like 17: tts_models/en/ljspeech/vits--neon is great but British voices are not as good. As a point of reference I propose voices from a dictionary. |
Beta Was this translation helpful? Give feedback.
-
Please consider sharing your pre-trained models in any language (If the licences allow that).
We can include them in our model catalogue for public use by attributing your name (website, company etc.).
That would enable more people to experiment together and coordinate, instead of individual efforts to achieve similar goals.
That is also a chance to make your work more visible.
You can share in two ways;
Models are served under
.models.json
file and any model is available undertts
CLI or Server end points. More details...(previously mozilla/TTS#395)
Beta Was this translation helpful? Give feedback.
All reactions