coqui-ai · erogol · Dec 6, 2023 · Dec 5, 2023 · Dec 5, 2023 · Dec 5, 2023
diff --git a/docs/source/configuration.md b/docs/source/configuration.md
@@ -56,4 +56,4 @@ ModelConfig()
 
 In the example above, ```ModelConfig()``` is the final configuration that the model receives and it has all the fields necessary for the model.
 
-We host pre-defined model configurations under ```TTS/<model_class>/configs/```.Although we recommend a unified config class, you can decompose it as you like as for your custom models as long as all the fields for the trainer, model, and inference APIs are provided.
+We host pre-defined model configurations under ```TTS/<model_class>/configs/```. Although we recommend a unified config class, you can decompose it as you like as for your custom models as long as all the fields for the trainer, model, and inference APIs are provided.
diff --git a/docs/source/finetuning.md b/docs/source/finetuning.md
@@ -21,7 +21,7 @@ them and fine-tune it for your own dataset. This will help you in two main ways:
     Fine-tuning comes to the rescue in this case. You can take one of our pre-trained models and fine-tune it on your own
     speech dataset and achieve reasonable results with only a couple of hours of data.
 
-    However, note that, fine-tuning does not ensure great results. The model performance is still depends on the
+    However, note that, fine-tuning does not ensure great results. The model performance still depends on the
     {ref}`dataset quality <what_makes_a_good_dataset>` and the hyper-parameters you choose for fine-tuning. Therefore,
     it still takes a bit of tinkering.
 
@@ -41,7 +41,7 @@ them and fine-tune it for your own dataset. This will help you in two main ways:
     tts --list_models
     ```
 
-    The command above lists the the models in a naming format as ```<model_type>/<language>/<dataset>/<model_name>```.
+    The command above lists the models in a naming format as ```<model_type>/<language>/<dataset>/<model_name>```.
 
     Or you can manually check the `.model.json` file in the project directory.
 

diff --git a/docs/source/formatting_your_dataset.md b/docs/source/formatting_your_dataset.md
@@ -7,7 +7,7 @@ If you have a single audio file and you need to split it into clips, there are d
 
 It is also important to use a lossless audio file format to prevent compression artifacts. We recommend using `wav` file format.
 
-Let's assume you created the audio clips and their transcription. You can collect all your clips under a folder. Let's call this folder `wavs`.
+Let's assume you created the audio clips and their transcription. You can collect all your clips in a folder. Let's call this folder `wavs`.
 
 ```
 /wavs
@@ -17,7 +17,7 @@ Let's assume you created the audio clips and their transcription. You can collec
   ...
 ```
 
-You can either create separate transcription files for each clip or create a text file that maps each audio clip to its transcription. In this file, each column must be delimitered by a special character separating the audio file name, the transcription and the normalized transcription. And make sure that the delimiter is not used in the transcription text.
+You can either create separate transcription files for each clip or create a text file that maps each audio clip to its transcription. In this file, each column must be delimited by a special character separating the audio file name, the transcription and the normalized transcription. And make sure that the delimiter is not used in the transcription text.
 
 We recommend the following format delimited by `|`. In the following example, `audio1`, `audio2` refer to files `audio1.wav`, `audio2.wav` etc.
 
@@ -55,7 +55,7 @@ For more info about dataset qualities and properties check our [post](https://gi
 
 After you collect and format your dataset, you need to check two things. Whether you need a `formatter` and a `text_cleaner`. The `formatter` loads the text file (created above) as a list and the `text_cleaner` performs a sequence of text normalization operations that converts the raw text into the spoken representation (e.g. converting numbers to text, acronyms, and symbols to the spoken format).
 
-If you use a different dataset format then the LJSpeech or the other public datasets that 🐸TTS supports, then you need to write your own `formatter`.
+If you use a different dataset format than the LJSpeech or the other public datasets that 🐸TTS supports, then you need to write your own `formatter`.
 
 If your dataset is in a new language or it needs special normalization steps, then you need a new `text_cleaner`.
 

diff --git a/docs/source/implementing_a_new_language_frontend.md b/docs/source/implementing_a_new_language_frontend.md
@@ -2,11 +2,11 @@
 
 - Language frontends are located under `TTS.tts.utils.text`
 - Each special language has a separate folder.
-- Each folder containst all the utilities for processing the text input.
+- Each folder contains all the utilities for processing the text input.
 - `TTS.tts.utils.text.phonemizers` contains the main phonemizer for a language. This is the class that uses the utilities
 from the previous step and used to convert the text to phonemes or graphemes for the model.
 - After you implement your phonemizer, you need to add it to the `TTS/tts/utils/text/phonemizers/__init__.py` to be able to
 map the language code in the model config - `config.phoneme_language` - to the phonemizer class and initiate the phonemizer automatically.
 - You should also add tests to `tests/text_tests` if you want to make a PR.
 
-We suggest you to check the available implementations as reference. Good luck!
+We suggest you to check the available implementations as reference. Good luck!
diff --git a/docs/source/implementing_a_new_model.md b/docs/source/implementing_a_new_model.md
@@ -145,7 +145,7 @@ class MyModel(BaseTTS):
         Args:
             ap (AudioProcessor): audio processor used at training.
             batch (Dict): Model inputs used at the previous training step.
-            outputs (Dict): Model outputs generated at the previoud training step.
+            outputs (Dict): Model outputs generated at the previous training step.
 
         Returns:
             Tuple[Dict, np.ndarray]: training plots and output waveform.
@@ -183,7 +183,7 @@ class MyModel(BaseTTS):
         ...
 
     def get_optimizer(self) -> Union["Optimizer", List["Optimizer"]]:
-        """Setup an return optimizer or optimizers."""
+        """Setup a return optimizer or optimizers."""
         pass
 
     def get_lr(self) -> Union[float, List[float]]:

diff --git a/docs/source/marytts.md b/docs/source/marytts.md
@@ -2,13 +2,13 @@
 
 ## What is Mary-TTS?
 
-[Mary (Modular Architecture for Research in sYynthesis) Text-to-Speech](http://mary.dfki.de/) is an open-source (GNU LGPL license), multilingual Text-to-Speech Synthesis platform written in Java. It was originally developed as a collaborative project of [DFKI’s](http://www.dfki.de/web) Language Technology Lab and the [Institute of Phonetics](http://www.coli.uni-saarland.de/groups/WB/Phonetics/) at Saarland University, Germany. It is now maintained by the Multimodal Speech Processing Group in the [Cluster of Excellence MMCI](https://www.mmci.uni-saarland.de/) and DFKI.
+[Mary (Modular Architecture for Research in sYnthesis) Text-to-Speech](http://mary.dfki.de/) is an open-source (GNU LGPL license), multilingual Text-to-Speech Synthesis platform written in Java. It was originally developed as a collaborative project of [DFKI’s](http://www.dfki.de/web) Language Technology Lab and the [Institute of Phonetics](http://www.coli.uni-saarland.de/groups/WB/Phonetics/) at Saarland University, Germany. It is now maintained by the Multimodal Speech Processing Group in the [Cluster of Excellence MMCI](https://www.mmci.uni-saarland.de/) and DFKI.
 MaryTTS has been around for a very! long time. Version 3.0 even dates back to 2006, long before Deep Learning was a broadly known term and the last official release was version 5.2 in 2016.
 You can check out this OpenVoice-Tech page to learn more: https://openvoice-tech.net/index.php/MaryTTS
 
 ## Why Mary-TTS compatibility is relevant
 
-Due to it's open-source nature, relatively high quality voices and fast synthetization speed Mary-TTS was a popular choice in the past and many tools implemented API support over the years like screen-readers (NVDA + SpeechHub), smart-home HUBs (openHAB, Home Assistant) or voice assistants (Rhasspy, Mycroft, SEPIA). A compatibility layer for Coqui-TTS will ensure that these tools can use Coqui as a drop-in replacement and get even better voices right away.
+Due to its open-source nature, relatively high quality voices and fast synthetization speed Mary-TTS was a popular choice in the past and many tools implemented API support over the years like screen-readers (NVDA + SpeechHub), smart-home HUBs (openHAB, Home Assistant) or voice assistants (Rhasspy, Mycroft, SEPIA). A compatibility layer for Coqui-TTS will ensure that these tools can use Coqui as a drop-in replacement and get even better voices right away.
 
 ## API and code examples
 
@@ -40,4 +40,4 @@ You can enter the same URLs in your browser and check-out the results there as w
 ### How it works and limitations
 
 A classic Mary-TTS server would usually show all installed locales and voices via the corresponding endpoints and accept the parameters `LOCALE` and `VOICE` for processing. For Coqui-TTS we usually start the server with one specific locale and model and thus cannot return all available options. Instead we return the active locale and use the model name as "voice". Since we only have one active model and always want to return a WAV-file, we currently ignore all other processing parameters except `INPUT_TEXT`. Since the gender is not defined for models in Coqui-TTS we always return `u` (undefined).
-We think that this is an acceptable compromise, since users are often only interested in one specific voice anyways, but the API might get extended in the future to support multiple languages and voices at the same time.
+We think that this is an acceptable compromise, since users are often only interested in one specific voice anyways, but the API might get extended in the future to support multiple languages and voices at the same time.
Original file line number	Diff line number	Diff line change
Expand Up		@@ -56,4 +56,4 @@ ModelConfig()

		In the example above, ```ModelConfig()``` is the final configuration that the model receives and it has all the fields necessary for the model.

		We host pre-defined model configurations under ```TTS/<model_class>/configs/```.Although we recommend a unified config class, you can decompose it as you like as for your custom models as long as all the fields for the trainer, model, and inference APIs are provided.
		We host pre-defined model configurations under ```TTS/<model_class>/configs/```. Although we recommend a unified config class, you can decompose it as you like as for your custom models as long as all the fields for the trainer, model, and inference APIs are provided.