SESSION-INDEPENDENT EMG-BASED SPEECH RECOGNITION

Michael Wand, Tanja Schultz

2011

Abstract

This paper reports on our recent research in speech recognition by surface electromyography (EMG), which is the technology of recording the electric activation potentials of the human articulatory muscles by surface electrodes in order to recognize speech. This method can be used to create Silent Speech Interfaces, since the EMG signal is available even when no audible signal is transmitted or captured. Several past studies have shown that EMG signals may vary greatly between different recording sessions, even of one and the same speaker. This paper shows that session-independent training methods may be used to obtain robust EMG-based speech recognizers which cope well with unseen recording sessions as well as with speaking mode variations. Our best session-independent recognition system, trained on 280 utterances of 7 different sessions, achieves an average 21.93% Word Error Rate (WER) on a testing vocabulary of 108 words. The overall best session-adaptive recognition system, based on a session-independent system and adapted towards the test session with 40 adaptation sentences, achieves an average WER of 15.66%, which is a relative improvement of 21% compared to the baseline average WER of 19.96% of a session-dependent recognition system trained only on a single session of 40 sentences.

References

  1. Chan, A., Englehart, K., Hudgins, B., and Lovely, D. (2001). Myoelectric Signals to Augment Speech Recognition. Medical and Biological Engineering and Computing, 39:500 - 506.
  2. Denby, B., Schultz, T., Honda, K., Hueber, T., and Gilbert, J. (2010). Silent Speech Interfaces. Speech Communication, 52.
  3. Janke, M., Wand, M., and Schultz, T. (2010a). A Spectral Mapping Method for EMG-based Recognition of Silent Speech. In Proc. B-INTERFACE.
  4. Janke, M., Wand, M., and Schultz, T. (2010b). Impact of Lack of Acoustic Feedback in EMG-based Silent Speech Recognition. In Proc. Interspeech.
  5. Jorgensen, C., Lee, D., and Agabon, S. (2003). Sub Auditory Speech Recognition Based on EMG/EPG Signals. In Proceedings of International Joint Conference on Neural Networks (IJCNN), pages 3128 - 3133, Portland, Oregon.
  6. Jou, S.-C., Schultz, T., Walliczek, M., Kraft, F., and Waibel, A. (2006). Towards Continuous Speech Recognition using Surface Electromyography. In Proc. Interspeech, pages 573 - 576, Pittsburgh, PA.
  7. Jou, S.-C. S., Schultz, T., and Waibel, A. (2007). Continuous Electromyographic Speech Recognition with a Multi-Stream Decoding Architecture. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pages 401 - 404, Honolulu, Hawaii.
  8. Leggetter, C. J. and Woodland, P. C. (1995). Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models. Computer Speech and Language, 9:171-185.
  9. Maier-Hein, L., Metze, F., Schultz, T., and Waibel, A. (2005). Session Independent Non-Audible Speech Recognition Using Surface Electromyography. In IEEE Workshop on Automatic Speech Recognition and Understanding, pages 331 - 336, San Juan, Puerto Rico.
  10. Schultz, T. and Wand, M. (2010). Modeling Coarticulation in Large Vocabulary EMG-based Speech Recognition. Speech Communication, 52:341 - 353.
  11. Schünke, M., Schulte, E., and Schumacher, U. (2006). Prometheus - Lernatlas der Anatomie, volume [3]: Kopf und Neuroanatomie. Thieme Verlag, Stuttgart, New York.
  12. Wand, M. and Schultz, T. (2009). Towards SpeakerAdaptive Speech Recognition Based on Surface Electromyography. In Proc. Biosignals, pages 155 - 162, Porto, Portugal.
Download


Paper Citation


in Harvard Style

Wand M. and Schultz T. (2011). SESSION-INDEPENDENT EMG-BASED SPEECH RECOGNITION . In Proceedings of the International Conference on Bio-inspired Systems and Signal Processing - Volume 1: BIOSIGNALS, (BIOSTEC 2011) ISBN 978-989-8425-35-5, pages 295-300. DOI: 10.5220/0003169702950300


in Bibtex Style

@conference{biosignals11,
author={Michael Wand and Tanja Schultz},
title={SESSION-INDEPENDENT EMG-BASED SPEECH RECOGNITION},
booktitle={Proceedings of the International Conference on Bio-inspired Systems and Signal Processing - Volume 1: BIOSIGNALS, (BIOSTEC 2011)},
year={2011},
pages={295-300},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003169702950300},
isbn={978-989-8425-35-5},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Bio-inspired Systems and Signal Processing - Volume 1: BIOSIGNALS, (BIOSTEC 2011)
TI - SESSION-INDEPENDENT EMG-BASED SPEECH RECOGNITION
SN - 978-989-8425-35-5
AU - Wand M.
AU - Schultz T.
PY - 2011
SP - 295
EP - 300
DO - 10.5220/0003169702950300