Authors:
Amit Meghanani
and
A. G. Ramakrishnan
Affiliation:
Indian Institute of Science, Bangalore, India
Keyword(s):
Pitch-synchronous, DCT, MFCC, Speaker Identification, Speaker Verification.
Abstract:
We propose a feature called pitch-synchronous discrete cosine transform (PS-DCT), derived from the voiced
part of the speech for speaker identification (SID) and verification (SV) tasks. PS-DCT features are derived
from the ‘time-domain, quasi-stationary waveform shape’ of the voiced sounds. We test our PS-DCT feature
on TIMIT, Mandarin and YOHO datasets. On TIMIT with 168 and Mandarin with 855 speakers, we obtain
the SID accuracies of 99.4% and 96.1%, respectively, using a Gaussian mixture model-based classifier. In the
i-vector-based SV framework, fusing the ‘PS-DCT based system’ with the ‘MFCC-based system’ at the score
level reduces the equal error rate (EER) for both YOHO and Mandarin datasets. In the case of limited test data
and session variabilities, we obtain a significant reduction in EER, up to 5.8% (for test data of duration < 3
sec).