An Efficient Method for Making Un-supervised Adaptation of HMM-based Speech Recognition Systems Robust Against Out-of-Domain Data

Thomas Plötz, Gernot A. Fink

2007

Abstract

Major aspects of cognitive science are based on natural language processing utilizing automatic speech recognition (ASR) systems in scenarios of human-computer interaction. In order to improve the accuracy of related HMM-based ASR systems efficient approaches for un-supervised adaptation represent the methodology of choice. The recognition accuracy of speaker-specific recognition systems derived by on-line acoustic adaptation directly depends on the quality of the adaptation data actually used. It drops significantly if sample data out-of-scope (lexicon, acoustic conditions) of the original recognizer generating the necessary annotation is exploited without further analysis. In this paper we present an approach for fast and robust MLLR adaptation based on a rejection model which rapidly evaluates an alternative to existing confidence measures, so-called log-odd scores. These measures are computed as ratio of scores obtained from acoustic model evaluation to those produced by some reasonable background model. By means of log-odd scores threshold based detection and rejection of improper adaptation samples, i.e. out-of-domain data, is realized. By means of experimental evaluations on two challenging tasks we demonstrate the effectiveness of the proposed approach.

References

  1. Huang, X., Acero, A., Hon, H.: Spoken Language Processing - A Guide to Theory, Algorithm, and System Development. Prentice Hall PTR (2001)
  2. Leggetter, C.J., Woodland, P.C.: Maximum likelihood linear regression for speaker adaptation of continuous density Hidden Markov Models. Computer Speech & Language (1995) 171-185
  3. Pitz, M., et al.: Improved MLLR speaker adaptation using confidence measures for conversational speech recognition. In: Int. Conf. Spoken Lang. Proc. (2000)
  4. Plötz, T., Fink, G.A.: Robust time-synchronous environmental adaptation for continuous speech recognition systems. In: Int. Conf. Spoken Lang. Proc. Volume 2. (2002) 1409-1412
  5. Zhang, Z., Furui, S., Ohtsuki, K.: On-line incremental speaker adaptation with automatic speaker change detection. In: Proc. Int. Conf. on Acoustics, Speech, and Signal Processing. (2000)
  6. Wessel, F., Schlüter, R., Macherey, K., Ney, H.: Confidence measures for large vocabulary continuous speech recognition. IEEE Trans. on Speech and Audio Processing 91 (2001)
  7. Chase, L.: Word and acoustic confidence annotation for large vocabulary speech recognition. In: Proc. European Conf. on Speech Communication and Technology. (1997)
  8. Feng, J., Sears, A.: Using confidence scores to improve hands-free speech-based navigation in continuous dictation systems. ACM Transactions on Computer-Human Interaction 11 (2004) 329-356
  9. Durbin, R., Eddy, S., Krogh, A., Mitchison, G.: Biological sequence analysis: Probabilistic models of proteins and nucleic acids. Cambridge University Press (1998)
  10. Huang, X.D., Jack, M.A.: Semi-continuous Hidden Markov Models for speech signals. Computer Speech & Language 3 (1989) 239-251
  11. Haasch, A., et al.: BIRON - The Bielefeld Robot Companion. In: Proc. Int. Workshop on Advances in Service Robotics, Fraunhofer IRB Verlag (2004) 27-32
  12. Schillo, C.: Der SLACC Korpus. Technical report, Faculty of Technology, Bielefeld University (2001)
  13. Paul, D.B., Baker, J.M.: The design for the Wall Street Journal-based CSR corpus. In: Speech and Natural Language Workshop. (1992)
  14. Kohler, K., et al.: Handbuch zur Datenaufnahme und Transliteration in TP 14 von VERBMOBIL - 3.0. Technical Report 11, Institut für Phonetik und digitale Sprachverarbeitung, Universität Kiel (1994)
  15. Fink, G.A.: Developing HMM-based recognizers with ESMERALDA. In: Text, Speech and Dialogue. Volume 1692 of Lecture Notes in Artificial Intelligence. Springer, Berlin Heidelberg (1999) 229-234
Download


Paper Citation


in Harvard Style

Plötz T. and A. Fink G. (2007). An Efficient Method for Making Un-supervised Adaptation of HMM-based Speech Recognition Systems Robust Against Out-of-Domain Data . In Proceedings of the 4th International Workshop on Natural Language Processing and Cognitive Science - Volume 1: NLPCS, (ICEIS 2007) ISBN 978-972-8865-97-9, pages 109-118. DOI: 10.5220/0002416701090118


in Bibtex Style

@conference{nlpcs07,
author={Thomas Plötz and Gernot A. Fink},
title={An Efficient Method for Making Un-supervised Adaptation of HMM-based Speech Recognition Systems Robust Against Out-of-Domain Data},
booktitle={Proceedings of the 4th International Workshop on Natural Language Processing and Cognitive Science - Volume 1: NLPCS, (ICEIS 2007)},
year={2007},
pages={109-118},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002416701090118},
isbn={978-972-8865-97-9},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 4th International Workshop on Natural Language Processing and Cognitive Science - Volume 1: NLPCS, (ICEIS 2007)
TI - An Efficient Method for Making Un-supervised Adaptation of HMM-based Speech Recognition Systems Robust Against Out-of-Domain Data
SN - 978-972-8865-97-9
AU - Plötz T.
AU - A. Fink G.
PY - 2007
SP - 109
EP - 118
DO - 10.5220/0002416701090118