Artificial Intelligence Speech Synthesis Based on the Deep Learning

Sihang Li

2025

Abstract

Speech synthesis is one of the most popular topics in machine learning, and it aims to generate an expressive voice that can satisfy the demands of different fields. This survey introduces the technology of generating speech based on deep learning. Besides, it reviews the development of autoregressive frames (represented by Transformer TTS) as well as non-autoregressive (represented by Fastspeech) in terms of speech synthesis. Autoregressive frame strengths in generating expressive speech while non-autoregressive tend to efficiently generate that. Moreover, extra refined programs based on the above are also included. It contains an Autoregressive Acoustic Model with Mixed Self-attention and Lightweight Convolution(AAMSLC), Autoregressive Diffusion Transformer(ARDiT), RoubuTrans, FastSpeech 3, FastPitch, ProbSparseFS, LinearizedFS, LightTTS and so forth. These techniques represent current cutting-edge advances in the field of speech synthesis. The purpose of this passage is to provide a systematic knowledge review for beginners in the field, which helps them to better understand the latest developments in speech synthesis technology while providing new ideas for future research and applications. .

Download


Paper Citation


in Harvard Style

Li S. (2025). Artificial Intelligence Speech Synthesis Based on the Deep Learning. In Proceedings of the 2nd International Conference on Data Science and Engineering - Volume 1: ICDSE; ISBN 978-989-758-765-8, SciTePress, pages 591-595. DOI: 10.5220/0013702500004670


in Bibtex Style

@conference{icdse25,
author={Sihang Li},
title={Artificial Intelligence Speech Synthesis Based on the Deep Learning},
booktitle={Proceedings of the 2nd International Conference on Data Science and Engineering - Volume 1: ICDSE},
year={2025},
pages={591-595},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0013702500004670},
isbn={978-989-758-765-8},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 2nd International Conference on Data Science and Engineering - Volume 1: ICDSE
TI - Artificial Intelligence Speech Synthesis Based on the Deep Learning
SN - 978-989-758-765-8
AU - Li S.
PY - 2025
SP - 591
EP - 595
DO - 10.5220/0013702500004670
PB - SciTePress