loading
Papers Papers/2022 Papers Papers/2022

Research.Publish.Connect.

Paper

Authors: Nina Hosseini-Kivanani 1 ; Homa Asadi 2 and Christoph Schommer 1

Affiliations: 1 Department of Computer Science, University of Luxembourg, Esch-sur-Alzette, Luxembourg ; 2 Faculty of Foreign Languages, University of Isfahan, Isfahan, Iran

Keyword(s): Speaker Verification, Mel-Frequency Cepstral Coefficients (MFCCs), Vowel Formants, Deep Learning, Persian Language.

Abstract: This paper investigates the impact of speaking rate variation on speaker verification using a hybrid feature approach that combines Mel-Frequency Cepstral Coefficients (MFCCs), their dynamic derivatives (delta and delta-delta), and vowel formants. To enhance system robustness, we also applied data augmentation techniques such as time-stretching, pitch-shifting, and noise addition. The dataset comprises recordings of Persian speakers at three distinct speaking rates: slow, normal, and fast. Our results show that the combined model integrating MFCCs, delta-delta features, and formant frequencies significantly outperforms individual feature sets, achieving an accuracy of 75% with augmentation, compared to 70% without augmentation. This highlights the benefit of leveraging both spectral and temporal features for speaker verification under varying speaking conditions. Furthermore, data augmentation improved the generalization of all models, particularly for the combined feature set, where precision, recall, and F1-score metrics showed substantial gains. These findings underscore the importance of feature fusion and augmentation in developing robust speaker verification systems. Our study contributes to advancing speaker identification methodologies, particularly in real-world applications where variability in speaking rate and environmental conditions presents a challenge. (More)

CC BY-NC-ND 4.0

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 18.216.95.250

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Hosseini-Kivanani, N., Asadi, H. and Schommer, C. (2025). Speaker Verification Enhancement via Speaking Rate Dynamics in Persian Speechprints. In Proceedings of the 14th International Conference on Pattern Recognition Applications and Methods - ICPRAM; ISBN 978-989-758-730-6; ISSN 2184-4313, SciTePress, pages 665-672. DOI: 10.5220/0013189100003905

@conference{icpram25,
author={Nina Hosseini{-}Kivanani and Homa Asadi and Christoph Schommer},
title={Speaker Verification Enhancement via Speaking Rate Dynamics in Persian Speechprints},
booktitle={Proceedings of the 14th International Conference on Pattern Recognition Applications and Methods - ICPRAM},
year={2025},
pages={665-672},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0013189100003905},
isbn={978-989-758-730-6},
issn={2184-4313},
}

TY - CONF

JO - Proceedings of the 14th International Conference on Pattern Recognition Applications and Methods - ICPRAM
TI - Speaker Verification Enhancement via Speaking Rate Dynamics in Persian Speechprints
SN - 978-989-758-730-6
IS - 2184-4313
AU - Hosseini-Kivanani, N.
AU - Asadi, H.
AU - Schommer, C.
PY - 2025
SP - 665
EP - 672
DO - 10.5220/0013189100003905
PB - SciTePress