Pitch-synchronous Discrete Cosine Transform Features for Speaker Identification and Verification

Amit Meghanani, A. Ramakrishnan

2020

Abstract

We propose a feature called pitch-synchronous discrete cosine transform (PS-DCT), derived from the voiced part of the speech for speaker identification (SID) and verification (SV) tasks. PS-DCT features are derived from the ‘time-domain, quasi-stationary waveform shape’ of the voiced sounds. We test our PS-DCT feature on TIMIT, Mandarin and YOHO datasets. On TIMIT with 168 and Mandarin with 855 speakers, we obtain the SID accuracies of 99.4% and 96.1%, respectively, using a Gaussian mixture model-based classifier. In the i-vector-based SV framework, fusing the ‘PS-DCT based system’ with the ‘MFCC-based system’ at the score level reduces the equal error rate (EER) for both YOHO and Mandarin datasets. In the case of limited test data and session variabilities, we obtain a significant reduction in EER, up to 5.8% (for test data of duration < 3 sec).

Download


Paper Citation


in Harvard Style

Meghanani A. and Ramakrishnan A. (2020). Pitch-synchronous Discrete Cosine Transform Features for Speaker Identification and Verification. In Proceedings of the 9th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM, ISBN 978-989-758-397-1, pages 395-401. DOI: 10.5220/0008911503950401


in Bibtex Style

@conference{icpram20,
author={Amit Meghanani and A. Ramakrishnan},
title={Pitch-synchronous Discrete Cosine Transform Features for Speaker Identification and Verification},
booktitle={Proceedings of the 9th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,},
year={2020},
pages={395-401},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0008911503950401},
isbn={978-989-758-397-1},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 9th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,
TI - Pitch-synchronous Discrete Cosine Transform Features for Speaker Identification and Verification
SN - 978-989-758-397-1
AU - Meghanani A.
AU - Ramakrishnan A.
PY - 2020
SP - 395
EP - 401
DO - 10.5220/0008911503950401