TCSinger: Zero-Shot Singing Voice Synthesis with Style Transfer and Multi-Level Style Control

Yu Zhang, Ziyue Jiang, Ruiqi Li, Changhao Pan, Jinzheng He, Rongjie Huang, Chuxin Wang, Zhou Zhao | Zhejiang University

PyTorch Implementation of TCSinger (EMNLP 2024): Zero-Shot Singing Voice Synthesis with Style Transfer and Multi-Level Style Control.

We provide our implementation and pre-trained models in this repository.

Visit our demo page for audio samples.

News

2024.09: We released the full dataset of GTSinger!
2024.09: TCSinger is accepted by EMNLP 2024!

Key Features

We present TCSinger, the first zero-shot SVS model for style transfer across cross-lingual speech and singing styles, along with multi-level style control. TCSinger excels in personalized and controllable SVS tasks.
We introduce the clustering style encoder to extract styles, and the Style and Duration Language Model (S&D-LM) to predict both style information and phoneme duration, addressing style modeling, transfer, and control.
We propose the style adaptive decoder to generate intricately detailed songs using a novel mel-style adaptive normalization method.
Experimental results show that TCSinger surpasses baseline models in synthesis quality, singer similarity, and style controllability across various tasks: zero-shot style transfer, multi-level style control, cross-lingual style transfer, and speech-to-singing style transfer.

Quick Start

We provide an example of how you can generate high-fidelity samples using TCSinger.

To try on your own dataset or GTSinger, simply clone this repo in your local machine provided with NVIDIA GPU + CUDA cuDNN and follow the below instructions.

The code will come soon...

Acknowledgements

This implementation uses parts of the code from the following Github repos: NATSpeech, StyleSinger as described in our code.

Citations

If you find this code useful in your research, please cite our work:

@article{zhang2024tcsinger,
  title={TCSinger: Zero-Shot Singing Voice Synthesis with Style Transfer and Multi-Level Style Control},
  author={Zhang, Yu and Jiang, Ziyue and Li, Ruiqi and Pan, Changhao and He, Jinzheng and Huang, Rongjie and Wang, Chuxin and Zhao, Zhou},
  journal={arXiv preprint arXiv:2409.15977},
  year={2024}
}

Disclaimer

Any organization or individual is prohibited from using any technology mentioned in this paper to generate someone's singing without his/her consent, including but not limited to government leaders, political figures, and celebrities. If you do not comply with this item, you could be in violation of copyright laws.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

TCSinger: Zero-Shot Singing Voice Synthesis with Style Transfer and Multi-Level Style Control

Yu Zhang, Ziyue Jiang, Ruiqi Li, Changhao Pan, Jinzheng He, Rongjie Huang, Chuxin Wang, Zhou Zhao | Zhejiang University

News

Key Features

Quick Start

Acknowledgements

Citations

Disclaimer

Files

README.md

Latest commit

History

README.md

File metadata and controls

TCSinger: Zero-Shot Singing Voice Synthesis with Style Transfer and Multi-Level Style Control

Yu Zhang, Ziyue Jiang, Ruiqi Li, Changhao Pan, Jinzheng He, Rongjie Huang, Chuxin Wang, Zhou Zhao | Zhejiang University

News

Key Features

Quick Start

Acknowledgements

Citations

Disclaimer