Emotion Recognition from Human Speech Using AI

D. R. Shreya; Vasa Vidhyadhari; B.  Naga Divya; Vadisela  Lakshmi Chandrika; Patnam  Shetty Bhoomika

doi:10.5220/0013902100004919

Emotion Recognition from Human Speech Using AI

D. R. Shreya, Vasa Vidhyadhari, B. Naga Divya, Vadisela Lakshmi Chandrika, Patnam Shetty Bhoomika

2025

Abstract

This research aims to use a multidimensional dataset (MELD, Multimodal Emotion Lines Dataset) to develop a web-based model for spoken emotion recognition. The dataset comprises text, audio, and image, and is used for identifying emotions in conversational data. Before we generate the final results like image scaling, reducing noise, incoming data. Natural language data is tokenized using basic text tokenization, and various audio-feature extraction techniques are used, including Mel Frequency Cepstral Coefficients (MFCC). After pre-processing, the data is split into training and testing data and used to classify the emotion with deep learning models, including CNN and DenseNet121. The aim of this study is creating of web interface-based model of spoken emotion recognition with the help of multidimensional dataset (MELD: Multimodal Emotion Lines Dataset). The dataset utilized is Emotions Dataset which detects emotions in conversational data in the form of text, audio, and images. Then, the entering data are pre-processed using some techniques such as picture scaling, noise removal, gray conversion, normalization etc. MFCC (Mel Frequency Cepstral Coefficients), an algorithm for audio features extraction, and text tokenization for natural language data are used. Data for pictures are collected, pre-processed, and separated to train and test, then further used to classify the emotions using CNN, DenseNet121 and so on, deep learning models. Using the MELD (Multimodal Emotion Lines Database), a multidimensional dataset, this study aims to create a web-based model for spoken emotion recognition. It is used to extract emotion from the conversational data comprising text, audio, and pictures. The incoming data is pre-processed using techniques such as picture scaling, noise removal, gray conversion, and normalization. We use Mel Frequency Cepstral Coefficients (MFCC) for extracting audio features and tokenization for language data. The data set is split into training and testing post pre-processing, and the emotions are classified using deep learning models CNN and DenseNet121.

Download

Paper Citation

in Harvard Style

Shreya D., Vidhyadhari V., Divya B., Chandrika V. and Bhoomika P. (2025). Emotion Recognition from Human Speech Using AI. In Proceedings of the 1st International Conference on Research and Development in Information, Communication, and Computing Technologies - ICRDICCT`25; ISBN 978-989-758-777-1, SciTePress, pages 568-578. DOI: 10.5220/0013902100004919

in Bibtex Style

@conference{icrdicct`2525,
author={D. Shreya and Vasa Vidhyadhari and B. Divya and Vadisela Chandrika and Patnam Bhoomika},
title={Emotion Recognition from Human Speech Using AI},
booktitle={Proceedings of the 1st International Conference on Research and Development in Information, Communication, and Computing Technologies - ICRDICCT`25},
year={2025},
pages={568-578},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0013902100004919},
isbn={978-989-758-777-1},
}

in EndNote Style

TY - CONF

JO - Proceedings of the 1st International Conference on Research and Development in Information, Communication, and Computing Technologies - ICRDICCT`25
TI - Emotion Recognition from Human Speech Using AI
SN - 978-989-758-777-1
AU - Shreya D.
AU - Vidhyadhari V.
AU - Divya B.
AU - Chandrika V.
AU - Bhoomika P.
PY - 2025
SP - 568
EP - 578
DO - 10.5220/0013902100004919
PB - SciTePress