v0.7.0 #1537

erogol · 2022-04-26T09:48:40Z

No description provided.

* Add upsample VITS support * Fix the bug in inference * Fix lint checks * Add RMS based norm in save_wav method * Style fix * Add the period for VITS multi-period discriminator in model_args * Bug fix in speaker encoder load in inference time * Add unit tests * Remove useless detach_z_vocoder parameter * Add docs for VITS upsampling * Fix the docs * Rename TTS_part_sample_rate to encoder_sample_rate * Add upsampling_init and upsampling_z methods * Add asserts for encoder_sample_rate part * Move upsampling tests to test_vits.py

* returns y_mask * make style

commit 212d330 Author: Edresson Casanova <edresson1@gmail.com> Date: Fri Apr 29 16:29:44 2022 -0300 Fix unit test commit 44456b0 Author: Edresson Casanova <edresson1@gmail.com> Date: Fri Apr 29 07:28:39 2022 -0300 Fix style commit d545bea Author: Edresson Casanova <edresson1@gmail.com> Date: Thu Apr 28 17:08:04 2022 -0300 Change order of HIFI-GAN optimizers to be equal than the original repository commit 657c544 Author: Edresson Casanova <edresson1@gmail.com> Date: Thu Apr 28 15:40:16 2022 -0300 Remove audio padding before mel spec extraction commit 76b274e Merge: 379ccd7 6233f4f Author: Edresson Casanova <edresson1@gmail.com> Date: Wed Apr 27 07:28:48 2022 -0300 Merge pull request #1541 from coqui-ai/comp_emb_fix Bug fix in compute embedding without eval partition commit 379ccd7 Author: WeberJulian <julian.weber@hotmail.fr> Date: Wed Apr 27 10:42:26 2022 +0200 returns y_mask in VITS inference (#1540) * returns y_mask * make style

…1532)

Fix style

* Add reinit encoder and duration predictor option * Add .data to prevent any overlooked autograd hook

Fix VITS upsampling asserts

* Update requirements * Update CI for p3.10 * Update numpy requirement * Drop 🐍p3.6 support Numpy also dropped support for p3.6 * Bind cython v0.29.28 * Bind pyworld to v0.2.10 > 0.2.10 is not p3.10.x compatible * Update Dockerfile

* Use direct model URLs in CI * Fixup * Fixup

Co-authored-by: Reuben Morais <reuben.morais@gmail.com>

* Add audio length sampler balancer * Add unit tests

* Add voice conversion zoo test * Fix style * Fix unit test

* new CI config * initial Capacitron implementation * delete old unused file * fix empty formatting changes * update losses and training script * fix previous commit * fix commit * Add Capacitron test and first round of test fixes * revert formatter change * add changes to the synthesizer * add stepwise gradual lr scheduler and changes to the recipe * add inference script for dev use * feat: add posterior inference arguments to synth methods - added reference wav and text args for posterior inference - some formatting * fix: add espeak flag to base_tts and dataset APIs - use_espeak_phonemes flag was not implemented in those APIs - espeak is now able to be utilised for phoneme generation - necessary phonemizer for the Capacitron model * chore: update training script and style - training script includes the espeak flag and other hyperparams - made style * chore: fix linting * feat: add Tacotron 2 support * leftover from dev * chore:rename parser args * feat: extract optimizers - created a separate optimizer class to merge the two optimizers * chore: revert arbitrary trainer changes * fmt: revert formatting bug * formatting again * formatting fixed * fix: log func * fix: update optimizer - Implemented load_state_dict for continuing training * fix: clean optimizer init for standard models * improvement: purge espeak flags and add training scripts * Delete capacitronT2.py delete old training script, new one is pushed * feat: capacitron trainer methods - extracted capacitron specific training operations from the trainer into custom methods in taco1 and taco2 models * chore: renaming and merging capacitron and gst style args * fix: bug fixes from the previous commit * fix: implement state_dict method on CapacitronOptimizer * fix: call method * fix: inference naming * Delete train_capacitron.py * fix: synthesize * feat: update tests * chore: fix style * Delete capacitron_inference.py * fix: fix train tts t2 capacitron tests * fix: double forward in T2 train step * fix: double forward in T1 train step * fix: run make style * fix: remove unused import * fix: test for T1 capacitron * fix: make lint * feat: add blizzard2013 recipes * make style * fix: update recipes * chore: make style * Plot test sentences in Tacotron * chore: make style and fix import * fix: call forward first before problematic floordiv op * fix: update recipes * feat: add min_audio_len to recipes * aux_input["style_mel"] * chore: make style * Make capacitron T2 recipe more stable * Remove T1 capacitron Ljspeech * feat: implement new grad clipping routine and update configs * make style * Add pretrained checkpoints * Add default vocoder * Change trainer package * Fix grad clip issue for tacotron * Fix scheduler issue with tacotron Co-authored-by: Eren Gölge <egolge@coqui.ai> Co-authored-by: WeberJulian <julian.weber@hotmail.fr> Co-authored-by: Eren Gölge <erogol@hotmail.com>

Added use_cuda argument in self.init_encoder method

Fixed use_cuda issue in compute_embeddings.py

* Fix style * Fix isort * Remove tensorboardX from requirements Co-authored-by: logan hart <72301874+loganhart420@users.noreply.github.com> Co-authored-by: Eren Gölge <egolge@coqui.ai>

* Use fsspec and torch for embedding file * Fixup * Fix load and save files * Fix compute embedding script * Set use_cuda to true if available * Add dummy speakers.pth file * Make style * Change default speakers file extension Co-authored-by: WeberJulian <julian.weber@hotmail.fr>

* Adding inferencing notebook * added multispeaker explanation and usecase and renamed the file * Adding training tutorial * fixed dummy paths * fixed review comments * fixed metadata extension Co-authored-by: Eren Gölge <erogol@hotmail.com>

* Add coqui formatter * Make style

edited `...servers our needs.` to `...serves your needs.`

* Add synpaflex formatter * Fix formatter * Make style

* model_info * model_info * model_info_by_idx and name * model_info_by_idx and name * model_info * Update manage.py * fixed linter * fixed linter * fixed linter * fixed linter * fixed return style checks * fixed linter * fixed linter * fixed idx always positive * added comments * fix parser.args check * fix parser.args check * Make style Co-authored-by: Eren G??lge <egolge@coqui.ai>

erogol added the 🚀 new version label Apr 26, 2022

Edresson and others added 6 commits April 26, 2022 17:39

Update Coqpit requirement (#1539)

a41e860

Bug fix in compute embedding without eval partition

6233f4f

returns y_mask in VITS inference (#1540)

fbdf76b

* returns y_mask * make style

Remove audio padding before mel spec extraction

6003467

Update documentation for multi-gpu training

a34076a

erogol force-pushed the dev branch from 0bd7a4a to a34076a Compare May 7, 2022 11:30

code-review-doctor and others added 20 commits May 7, 2022 13:33

Fix issue probably-meant-fstring found at https://codereview.doctor (#…

fa887ef

…1532)

Fix batch_group_size in VITS

3f03e30

Use torch.no_grad for VITS inference

5021a03

Return durations at VITS inference

c18bd21

Pass use_cuda to init_encoder

121e9ed

Return default SpeakerManager if no d_vector_file

c3f8c4d

Update SpeakerManager init in Synthesizer

2fc38f6

Improve data_path resolvement (#1567)

f9d91a5

Fix the VITS upsampling asserts

1827110

Fix style

Add reinit text encoder and duration predictor parameter (#1562)

175ca06

* Add reinit encoder and duration predictor option * Add .data to prevent any overlooked autograd hook

Merge pull request #1550 from coqui-ai/fix-upsampling-asserts

e45ae57

Fix VITS upsampling asserts

Fix the bug in eSpeak wrapper for eSpeak version 1.48.15 (#1560)

a97eed6

Update CI tests (#1572)

27cf388

* Use direct model URLs in CI * Fixup * Fixup

Add CPU only Docker image (#1573)

6048959

Co-authored-by: Reuben Morais <reuben.morais@gmail.com>

Add an assert for the upsampling trick (#1538)

6e460b7

Add audio length sampler balancer (#1561)

c6008e5

* Add audio length sampler balancer * Add unit tests

Change the VITS upsampling interpolation trick to linear (#1564)

e5d8ec2

Update CI badges

e282da5

Fix voice conversion inference (#1583)

ee99a6c

* Add voice conversion zoo test * Fix style * Fix unit test

WeberJulian force-pushed the dev branch from 74f5c3f to ee99a6c Compare May 20, 2022 13:53

a-froghyar and others added 5 commits May 20, 2022 16:17

Fixed use_cuda issue in compute_embeddings.py

3b84ef9

Added use_cuda argument in self.init_encoder method

Merge pull request #1587 from ribeiromiranda/patch-1

71111d1

Fixed use_cuda issue in compute_embeddings.py

Training recipes for thorsten dataset (#1020)

a790df4

* Fix style * Fix isort * Remove tensorboardX from requirements Co-authored-by: logan hart <72301874+loganhart420@users.noreply.github.com> Co-authored-by: Eren Gölge <egolge@coqui.ai>

fix invalid json (#1599)

b6bd74a

erogol changed the title ~~v0.0.7~~ v0.7.0 Jun 1, 2022

erogol and others added 7 commits June 1, 2022 13:49

Adding TTS Tutorials (#1584)

68cef28

* Adding inferencing notebook * added multispeaker explanation and usecase and renamed the file * Adding training tutorial * fixed dummy paths * fixed review comments * fixed metadata extension Co-authored-by: Eren Gölge <erogol@hotmail.com>

Internal formatter (#1629)

f09ea11

* Add coqui formatter * Make style

Update training_a_model.md (#1620)

c44e39d

edited `...servers our needs.` to `...serves your needs.`

Add synpaflex formatter (#1616)

6126c23

* Add synpaflex formatter * Fix formatter * Make style

Bump up to v0.7.0

8b75e8b

erogol merged commit c7cca41 into main Jun 20, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.7.0 #1537

v0.7.0 #1537

erogol commented Apr 26, 2022

v0.7.0 #1537

v0.7.0 #1537

Conversation

erogol commented Apr 26, 2022