Releases: fishaudio/fish-diffusion
v2.2.1
V2.1.0
我们在此版本增加了 HiFiSinger 架构支持 (见 configs/svc_hifisinger.py
), 它有以下优势:
In this version, we added HiFiSinger architecture support (see configs/svc_hifisinger.py
) with the following advantages:
- 推理速度远远快于 DiffSVC.
- The inference speed is much faster than DiffSVC.
- 在脏样本情况下表现更好 (但是极限性能不如 DiffSVC).
- It performs better under noisy sample conditions (although the ultimate performance is not as good as DiffSVC).
同时, 我们在此架构中加入了响度嵌入, 提高了模型的表现力.
At the same time, we added loudness (power) embedding to this architecture, enhancing the model's expressiveness.
2023-03-29 更新:
2023-03-29 Update:
我们新增了音色混合功能, 现有模型均可使用, 只需在推理命令中加入:
We added the timbre (or speaker) mixing function. You only need to update your inference command to:
--speaker "speaker_a:0.5,speaker_b:0.5"
我们发布了一个 HiFiSinger + Content Vec 的预训练模型
We released a HiFiSinger + Content Vec pre-trained model.
我们强烈建议您参考随附的配置进行微调.
We strongly recommend that you refer to the config I've attached for finetuning.
Model Info
- Dataset Size: ~50 hours (M4Singer, OpenCpop, and In-House Data), 2.25x data aug
- Feature Extractor: ContentVec
- MD5: 45a84d1b626cbdb23f72042c7eac680f
- Steps: 540k on a 2x3090 server
本模型根据 CC-BY-NC-SA 4.0 license 发布, 下载前请仔细阅读.
This model is released under CC-BY-NC-SA 4.0 license, please read it before you download.
V2.0.0
我们将在这个版本整合 so-vits-svc, 因此, 会有很多大改动.
We will integrate so-vits-svc in this version. Therefore, there are many breaking changes.
我们发布了一个 Content Vec 的预训练模型
We released a Content Vec pre-trained model.
我们强烈建议您参考随附的配置进行微调. 它应用了大量的新功能.
We strongly recommend you refer to the attached config for finetuning. It applied lots of new functions.
Model Info
- Dataset Size: ~100 hours, ~100 singers (M4Singer, OpenSinger, OpenCpop, and In-House Data), 1.5x data aug
- Vocoder: NSF HifiGAN 44.1 khz (OpenVPI)
- Feature Extractor: ContentVec
- MD5: 64034133bdf05910210f2f08cbda65c6
- Steps: 300k on a 2x3090 server
2023-03-17 更新:
我们完成了 FishAudio 稳定版声码器的训练和试验, 该声码器在 60-1200 Hz 表现良好. 经验证, 完全可以作为 OpenVPI 声码器的上位替代.
We finished the training and testing of FishAudio stable vocoder (based on NSF-HiFiGAN), and it works well between 60 to 1200 Hz. Now, it will be the replacement of the original OpenVPI NSF-HiFiGAN vocoder.
使用方法: 下载 nsf_hifigan-stable-v1.zip 解压到 checkpoints
How to use: Download and decompress nsf_hifigan-stable-v1.zip
to checkpoints
为了更方便用户使用, 我们还增加了 OpenUTAU vocoder: nsf_hifigan-stable-v1.dsvocoder
.
For convenience, we also attached an OpenUTAU vocoder: nsf_hifigan-stable-v1.dsvocoder
.
本模型根据 CC-BY-NC-SA 4.0 license 发布, 下载前请仔细阅读.
This model is released under CC-BY-NC-SA 4.0 license, please read it before you download.
V1.12
As many users have already tested the current version, it should be stable.
该版本已经过大量用户测试, 他应该已经稳定了.
We will modify the dataset structure and add data augmentation in the next version (which will lead to many changes).
我们将在下个版本优化数据集结构和添加数据增强 (可能还有完善 DiffSinger), 这有可能导致程序爆炸.
除此之外, 我们放出了一个测试版声码器, 配置文件和 OpenVPI NSF-HiFiGAN 一致. 它再高频和低频有更好的表现 (起码不会破音了).
Besides that, we released a beta vocoder, which has an identical config as OpenVPI's NSF-HiFiGAN. It has a higher performance on both high and low notes.
使用方法: 下载 nsf_hifigan-beta-v2-epoch-434.zip
解压到 checkpoints
How to use: Download and decompress nsf_hifigan-beta-v2-epoch-434.zip
to checkpoints
注: 该测试声码器在 M4Singer 和 OpenCpop 上训练了大约三天 (双路 3090). 我们会在更优质的数据集上训练, 并发布一个正式版.
Note: This beta VOCODER was trained on M4Singer and OpenCpop for about 3 days; we will release a vocoder trained on in-house data later.
03-01 更新:
03-01 Update:
- 我们把模型多训练了一天, 并且提供了 onnx 导出文件.
- We trained the model for one extra day and provided the ONNX model.
- 为了更方便用户使用, 我们还增加了 OpenUTAU vocoder:
nsf_hifigan-beta-v2-epoch-434.dsvocoder
. - For convenience, we also attached an OpenUTAU vocoder:
nsf_hifigan-beta-v2-epoch-434.dsvocoder
.
该声码器在 Attribution-NonCommercial-ShareAlike 4.0 International 协议下发布.
Pretrained vocoders are released under the Attribution-NonCommercial-ShareAlike 4.0 International license.
03-03 更新:
03-03 Update:
- 我们将 ContentVec 中的
content-vec-best-legacy-500.pt
加入了附件, 方便用户下载. - For your convenience, we added ContentVec's
content-vec-best-legacy-500.pt
.
V1.11
我们很高兴地宣布预训练模型现已可用, 这意味着您只需要 30 分钟的音频数据和 15 分钟的微调时间 (在 3090 上) 就可以模拟你想要的音色.
We are happy to announce that the pre-trained model is now available, which means you only need 30 minutes of audio data and 15 minutes to fine-tune it (on 3090).
我们建议您参考随附的配置进行微调. 它更改了学习率调度程序和保存检查点之间的步骤间隔.
We recommend you refer to the attached config for finetuning. It changed the lr scheduler and steps between saving checkpoints.
Model Info
- Dataset Size: ~300 hours, ~600 singers (M4Singer, OpenSinger, OpenCpop, and In House Data)
- Vocoder: NSF HifiGAN 44.1 khz (OpenVPI)
- Feature Extractor: Chinese Hubert Soft with gate size 25
- MD5: 9d88f1bbca34053919ee1ea8bd780a9b
- Steps: 260k on a 4 x RTXA6000 server
本模型根据 CC-BY-NC-SA 4.0 license 发布, 下载前请仔细阅读.
This model is released under CC-BY-NC-SA 4.0 license, please read it before you download.
V1.1
V1.2 Beta 0
发布中文 Aligned Whisper 模型, 获得更好的 SVC 咬字/扛口胡能力.
Release Aligned Whisper Model, To obtain better SVC performance
注意: 该模型在中文之外的语言效果并不理想
Notice: This model's accuracy is not good in languages other than Chinese
模型说明:
aligned-whisper-cn-25k-v1.ckpt
- Base Model: Whisper Medium (~300M)
- Aligned Embedding Dim: 256
- Dataset: (Chinese) OpenCpop, OpenSinger, M4Singer
- Trained on 2x3090 for 50 hours
- MD5 checksum: 840dad46fadd2b1f8a324ef7209f9ee1
aligned-whisper-cn-40k-v1.1.ckpt
- Trained for extra 30 hours
- This model has better calibration on voice and phones
- MD5 checksum: 90a6852d67b7dc01f9e8e0c86378ceef
The multilingual modal is not released yet.
该模型在 Attribution-NonCommercial-ShareAlike 4.0 International 协议下发布.
Pretrained models are released under the Attribution-NonCommercial-ShareAlike 4.0 International license.