Temporal Convolutional Networks for Speech Emotion Recognition: A Benchmark Study against Deep Learning Models

Nitasha Rathore, Pratibha Barua, Arhaan Sood, Bandaru Yogesh Kumar, Ashutosh Singh

2025

Abstract

As technology becomes more and more human-centric, the ability to recognize and interpret emotions from speech is becoming more than just an innovation it is a necessity. Speech Emotion Recognition (SER) is a field that sits at the nexus of artificial intelligence and human communication. It offers perspectives on both our spoken words and our emotions. In addition to enhancing digital assistants, SER is revolutionizing mental health monitoring and how people interact with robots. This research investigates the intricate world of emotion-laden speech to learn how cutting- edge deep learning models, including Temporal Convolutional Networks (TCN), Artificial Neural Networks (ANN), Recurrent Convolutional Neural Networks (RCNN), Recurrent Neural Networks (RNN), and Long Short-Term Memory (LSTM) networks, decode the smallest emotional cues. While some models employ spatial patterns in speech patterns, others are very adept at recognizing temporal links. Each model has its own merits. We analyze their performance in emotionally enriched environments and show a potential proof-of-concept how can they impact at the human-computer level. This study imagines a future of SER that is relevant to sympathetic AI, which is essential for the connections that exist between AI and humans, beyond the current measures of numbers and accuracy scores. So, the question is: How close are we really, as we strain those outer limits, to teaching machines the vocabulary of feelings?.

Download


Paper Citation


in Harvard Style

Rathore N., Barua P., Sood A., Kumar B. and Singh A. (2025). Temporal Convolutional Networks for Speech Emotion Recognition: A Benchmark Study against Deep Learning Models. In Proceedings of the 1st International Conference on Research and Development in Information, Communication, and Computing Technologies - ICRDICCT`25; ISBN 978-989-758-777-1, SciTePress, pages 753-759. DOI: 10.5220/0013889500004919


in Bibtex Style

@conference{icrdicct`2525,
author={Nitasha Rathore and Pratibha Barua and Arhaan Sood and Bandaru Kumar and Ashutosh Singh},
title={Temporal Convolutional Networks for Speech Emotion Recognition: A Benchmark Study against Deep Learning Models},
booktitle={Proceedings of the 1st International Conference on Research and Development in Information, Communication, and Computing Technologies - ICRDICCT`25},
year={2025},
pages={753-759},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0013889500004919},
isbn={978-989-758-777-1},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 1st International Conference on Research and Development in Information, Communication, and Computing Technologies - ICRDICCT`25
TI - Temporal Convolutional Networks for Speech Emotion Recognition: A Benchmark Study against Deep Learning Models
SN - 978-989-758-777-1
AU - Rathore N.
AU - Barua P.
AU - Sood A.
AU - Kumar B.
AU - Singh A.
PY - 2025
SP - 753
EP - 759
DO - 10.5220/0013889500004919
PB - SciTePress