-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is it missing some activation functions between some layers? #9
Comments
demo with 160k steps |
Hi, |
@bigpon I confirmed changing autoencoder model can improve results. I changed as follows:
Here is my training config.yaml demos for new config with training 200K steps by using librispeech and aishell datasets, but testing on an unseen dataset |
Hi @BridgetteSong,
|
|
1. According to issue #9, we implement the codec version (activate_audiodec) with more activations like HiFiGAN and release the pre-trained model “symAAD_vctk_48000_hop300”. 2. We fix the MSTFT 2D conv padding issues mentioned in issue #9 and release the updated “symADuniv_vctk_48000_hop300” and “AudioDec_v3_symADuniv_vctk_48000_hop300_clean”. 3. We implement the more flexible CausalConvTranspose1d padding for arbitrary kernel_size and stride according to issue #11. 4. We release a 24kbps model, “symAD_c16_vctk_48000_hop320”, which achieves better speech quality and robustness to unseen data.
Hi, According to our internal experiments, we get some conclusions.
On the other hand, the 2D conv padding issue of the MSTFT discriminator has been fixed, and the corresponding models have been updated. Thanks for your contributions again. |
@bigpon Hi bigpon, |
According to DAC, they claimed that snake is much better.
Since we gave it a quick try without carefully tuning the hyperparameters, further investigations are required. |
Thanks for your work. I have trained model in my own dataset. I met same question as ISSUE7. When I checked the model, I found some difference in AutoEncoder:
In other popular implementations, they all added those. So I add those:
AudioDec/models/autoencoder/modules/projector.py
Line 50 in 9b49838
AudioDec/models/autoencoder/modules/decoder.py
Line 62 in 9b49838
AudioDec/models/autoencoder/modules/decoder.py
Line 120 in 9b49838
AudioDec/models/autoencoder/modules/decoder.py
Line 120 in 9b49838
When I added those and trained again, I got some improvement in unseen datasets than your base when I only trained AutoEncoder with discriminators and don't finetune it with AudioDec.
BTW, I trained model only with Librispeech and AIShell with 16K sampling_rate and tested model by another clean TTS dataset with training 160K steps. When my model is finished(total 800k), I will compare final results, upload some demos and share my training config.
The text was updated successfully, but these errors were encountered: