Cross-Utterance Conditioned VAE for Non-Autoregressive Text-to-Speech

ACL, 2022

Recommended citation: Yang Li, Cheng Yu, Guangzhi Sun, Hua Jiang, Fanglei Sun, Weiqin Zu, Ying Wen, Yang Yang, and Jun Wang. "Cross-Utterance Conditioned VAE for Non-Autoregressive Text-to-Speech." arXiv preprint arXiv:2205.04120 (2022). 1(3). https://arxiv.org/abs/2205.04120

(ChatGPT-Generated) The paper proposes a cross-utterance conditioned VAE for non-autoregressive text-to-speech, which allows the prosody features generated by the TTS system to be related to the context, and demonstrates superior performance in terms of naturalness, intelligibility, and prosody diversity on LJ-Speech and LibriTTS data. The proposed CUC-VAE estimates a posterior probability distribution of the latent prosody features for each phoneme by conditioning on acoustic features, speaker information, and text features obtained from both past and future sentences.

Download paper here

Recommended citation: Yang Li, Cheng Yu, Guangzhi Sun, Hua Jiang, Fanglei Sun, Weiqin Zu, Ying Wen, Yang Yang, and Jun Wang. “Cross-Utterance Conditioned VAE for Non-Autoregressive Text-to-Speech.” arXiv preprint arXiv:2205.04120 (2022). 1(3).