ACOUSTIC MODELLING FOR SPEECH PROCESSING IN COMPLEX ENVIRONMENTS

Nora Barroso; Karmele López de Ipiña; Aitzol Ezeiza

doi:10.5220/0003894105070516

ACOUSTIC MODELLING FOR SPEECH PROCESSING IN COMPLEX ENVIRONMENTS

Nora Barroso, Karmele López de Ipiña, Aitzol Ezeiza

2012

Abstract

Automatic Speech Recognition (ASR) is one of the classical multivariate statistical modelling applications that involves dealing with issues such as Acoustic Modelling (AM) or Language Modelling (LM). These tasks are generally very language-dependent and require very large resources. This work is focused on the selection of appropriate acoustic models for Speech Processing in a complex environment (a multilingual context in under-resourced and noisy conditions) oriented to general ASR tasks. The work has been carried out with a small trilingual speech database with very low audio quality. Thus, in order to decrease the negative impact that the lack of resources has in this task there have been selected two techniques: In the one hand, Hidden Markov Models have been enhanced using hybrid topologies and parameters as acoustic models of the sublexical units. In the other hand, an optimum configuration has been developed for the Acoustic Phonetic Decoding system, based on multivariate Gaussian numbers and the insertion penalty.

References

Baker, J., 1975, Stochastic Modeling for Automatic Speech Recognition, Speech Recognition, Reddy, Academic Press.
Baker, J., 1975, Stochastic Modeling for Automatic Speech Recognition, Speech Recognition, Reddy, Academic Press.
Barroso, N. Ezeiza A., Gilisagasti, N., L Ipiña K., López A. and López J. M.,2007, Development of Multimodal Resources for Multilingual Information Retrieval in the Basque context., INTERSPEECH Antwerp, Belgium, 2007.
Barroso, N. Ezeiza A., Gilisagasti, N., L Ipiña K., López A. and López J. M.,2007, Development of Multimodal Resources for Multilingual Information Retrieval in the Basque context., INTERSPEECH Antwerp, Belgium, 2007.
Barroso, N., Lopez De Ipiña K., Hernandez C. and Ezeiza A., 2011a. Matrix covariance estimation methods for robust security speech recognition with underresourced conditions, 45th IEEE International Carnahan Conference on Security Technology, Mataro Barcelona
Barroso, N., Lopez De Ipiña K., Hernandez C. and Ezeiza A., 2011a. Matrix covariance estimation methods for robust security speech recognition with underresourced conditions, 45th IEEE International Carnahan Conference on Security Technology, Mataro Barcelona
Barroso, N., López de Ipiña, K., Ezeiza, A., Hernández, C., Ezeiza, N., Barroso, O., Susperregi, U. and Barroso, S., 2011. GorUp: an ontology-driven Audio Information Retrieval system that suits the requirements of under-resourced languages, INTERSPEECH, Florence Italy.
Barroso, N., López de Ipiña, K., Ezeiza, A., Hernández, C., Ezeiza, N., Barroso, O., Susperregi, U. and Barroso, S., 2011. GorUp: an ontology-driven Audio Information Retrieval system that suits the requirements of under-resourced languages, INTERSPEECH, Florence Italy.
Baum, E., Petrie, T., Soules, G., & Weiss, N., 1970, A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. Ann. Math. Statistics, vol. 41, no. 1, pp. 164-171.
Baum, E., Petrie, T., Soules, G., & Weiss, N., 1970, A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. Ann. Math. Statistics, vol. 41, no. 1, pp. 164-171.
Baum, L. E., and Eagon, J. A., 1967, An Inequality with Applications to Statistical Estimation for Probabilistic Functions of a Markov Process and to a Model for Ecology., In Bulletin of the American Mathematical Society, vol. 73, pp. 360-370.
Baum, L. E., and Eagon, J. A., 1967, An Inequality with Applications to Statistical Estimation for Probabilistic Functions of a Markov Process and to a Model for Ecology., In Bulletin of the American Mathematical Society, vol. 73, pp. 360-370.
Cosi P. “Hybrid HMM-NN architectures for connected digit recognition”. Proc. of the IJC on Neural Networks, vol. 5, 2000
Cosi P. “Hybrid HMM-NN architectures for connected digit recognition”. Proc. of the IJC on Neural Networks, vol. 5, 2000
Ellis, D., 2011, http://labrosa.ee.columbia.edu/
Ellis, D., 2011, http://labrosa.ee.columbia.edu/
Friedman J. H., 1989, Regularized discriminant analysis. Journal of the American Statistical Association, vol. 84, pp. 165-175, 1989.
Friedman J. H., 1989, Regularized discriminant analysis. Journal of the American Statistical Association, vol. 84, pp. 165-175, 1989.
Jelinek., 1976, Continuous Speech Recognition by Statistical Methods. Proceedings of the IEEE, vol. 64, no. 4, pp. 532-556.
Jelinek., 1976, Continuous Speech Recognition by Statistical Methods. Proceedings of the IEEE, vol. 64, no. 4, pp. 532-556.
Le V. B. and Besacier L., 2009 Automatic speech recognition for under-resourced languages: application to Vietnamese language. IEEE Transactions on Audio, Speech, and Language Processing, Volume 17, Issue 8, pp 1471-1482, 2
Le V. B. and Besacier L., 2009 Automatic speech recognition for under-resourced languages: application to Vietnamese language. IEEE Transactions on Audio, Speech, and Language Processing, Volume 17, Issue 8, pp 1471-1482, 2
Martinez A. and Kak A., 2001, PCA versus LDA, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 23, No.2, 228-233
Martinez A. and Kak A., 2001, PCA versus LDA, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 23, No.2, 228-233
Puertas, I., 2000, Robustez de Reconocimiento fonético de voz para aplicaciones telefónicas. Madrid: Tesis doctoral.
Puertas, I., 2000, Robustez de Reconocimiento fonético de voz para aplicaciones telefónicas. Madrid: Tesis doctoral.
Rabiner, H. R., & Juang, B. H., 1993, Fundamentals of Speech Recognition, USA: Prentice Hall
Rabiner, H. R., & Juang, B. H., 1993, Fundamentals of Speech Recognition, USA: Prentice Hall
Schultz, T. and Waibel, A., 1998, Multilingual and Crosslingual Speech Recognition, Proceedings of the DARPA BC. Workshop.
Schultz, T. and Waibel, A., 1998, Multilingual and Crosslingual Speech Recognition, Proceedings of the DARPA BC. Workshop.
Seng S., Sam S., Le V. B., Bigi B. and Besacier L., 2008, Which Units For Acoustic and Language Modeling For Khmer Automatic Speech Recognition., 1st International Conference on Spoken Language Processing for Under-resourced languages Hanoi, Vietnam
Seng S., Sam S., Le V. B., Bigi B. and Besacier L., 2008, Which Units For Acoustic and Language Modeling For Khmer Automatic Speech Recognition., 1st International Conference on Spoken Language Processing for Under-resourced languages Hanoi, Vietnam
Smith N., Gales M. “Speech recognition using SVMs”, Advances in Neural Information Processing Systems 14. MIT Press, 2002.
Smith N., Gales M. “Speech recognition using SVMs”, Advances in Neural Information Processing Systems 14. MIT Press, 2002.

Download

Paper Citation

in Harvard Style

Barroso N., López de Ipiña K. and Ezeiza A. (2012). ACOUSTIC MODELLING FOR SPEECH PROCESSING IN COMPLEX ENVIRONMENTS . In Proceedings of the International Conference on Bio-inspired Systems and Signal Processing - Volume 1: MPBS, (BIOSTEC 2012) ISBN 978-989-8425-89-8, pages 507-516. DOI: 10.5220/0003894105070516

in Harvard Style

Barroso N., López de Ipiña K. and Ezeiza A. (2012). ACOUSTIC MODELLING FOR SPEECH PROCESSING IN COMPLEX ENVIRONMENTS . In Proceedings of the International Conference on Bio-inspired Systems and Signal Processing - Volume 1: MPBS, (BIOSTEC 2012) ISBN 978-989-8425-89-8, pages 507-516. DOI: 10.5220/0003894105070516

in Bibtex Style

@conference{mpbs12,
author={Nora Barroso and Karmele López de Ipiña and Aitzol Ezeiza},
title={ACOUSTIC MODELLING FOR SPEECH PROCESSING IN COMPLEX ENVIRONMENTS},
booktitle={Proceedings of the International Conference on Bio-inspired Systems and Signal Processing - Volume 1: MPBS, (BIOSTEC 2012)},
year={2012},
pages={507-516},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003894105070516},
isbn={978-989-8425-89-8},
}

in Bibtex Style

@conference{mpbs12,
author={Nora Barroso and Karmele López de Ipiña and Aitzol Ezeiza},
title={ACOUSTIC MODELLING FOR SPEECH PROCESSING IN COMPLEX ENVIRONMENTS},
booktitle={Proceedings of the International Conference on Bio-inspired Systems and Signal Processing - Volume 1: MPBS, (BIOSTEC 2012)},
year={2012},
pages={507-516},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003894105070516},
isbn={978-989-8425-89-8},
}

in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Bio-inspired Systems and Signal Processing - Volume 1: MPBS, (BIOSTEC 2012)
TI - ACOUSTIC MODELLING FOR SPEECH PROCESSING IN COMPLEX ENVIRONMENTS
SN - 978-989-8425-89-8
AU - Barroso N.
AU - López de Ipiña K.
AU - Ezeiza A.
PY - 2012
SP - 507
EP - 516
DO - 10.5220/0003894105070516

in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Bio-inspired Systems and Signal Processing - Volume 1: MPBS, (BIOSTEC 2012)
TI - ACOUSTIC MODELLING FOR SPEECH PROCESSING IN COMPLEX ENVIRONMENTS
SN - 978-989-8425-89-8
AU - Barroso N.
AU - López de Ipiña K.
AU - Ezeiza A.
PY - 2012
SP - 507
EP - 516
DO - 10.5220/0003894105070516