Direct Speech Generation for a Silent Speech Interface based on Permanent Magnet Articulography

Jose A. Gonzalez; Lam A. Cheah; James M. Gilbert; Jie Bai; Stephen R. Ell; Phil D. Green; Roger K. Moore

doi:10.5220/0005754100960105

Direct Speech Generation for a Silent Speech Interface based on Permanent Magnet Articulography

Jose A. Gonzalez, Lam A. Cheah, James M. Gilbert, Jie Bai, Stephen R. Ell, Phil D. Green, Roger K. Moore

2016

Abstract

Patients with larynx cancer often lose their voice following total laryngectomy. Current methods for post-laryngectomy voice restoration are all unsatisfactory due to different reasons: requires frequent replacement due to biofilm growth (tracheo-oesoephageal valve), speech sounds gruff and masculine (oesophageal speech) or robotic (electro-larynx) and, in general, are difficult to master (oesophageal speech and electro-larynx). In this work we investigate an alternative approach for voice restoration in which speech articulator movement is converted into audible speech using a speaker-dependent transformation learned from simultaneous recordings of articulatory and audio signals. To capture articulator movement, small magnets are attached to the speech articulators and the magnetic field generated while the user `mouths' words is captured by a set of sensors. Parallel data comprising articulatory and acoustic signals recorded before laryngectomy are used to learn the mapping between the articulatory and acoustic domains, which is represented in this work as a mixture of factor analysers. After laryngectomy, the learned transformation is used to restore the patient's voice by transforming the captured articulator movement into an audible speech signal. Results reported for normal speakers show that the proposed system is very promising.

References

Braz, D. S. A., Ribas, M. M., Dedivitis, R. A., Nishimoto, I. N., and Barros, A. P. B. (2005). Quality of life and depression in patients undergoing total and partial laryngectomy. Clinics, 60(2):135-142.
Byrne, A., Walsh, M., Farrelly, M., and O'Driscoll, K. (1993). Depression following laryngectomy. A pilot study. The British Journal of Psychiatry, 163(2):173- 176.
Cheah, L. A., Bai, J., Gonzalez, J. A., Ell, S. R., Gilbert, J. M., Moore, R. K., and Green, P. D. (2015). A usercentric design of permanent magnetic articulography based assistive speech technology. In Proc. BioSignals, pages 109-116.
Chen, L.-H., Ling, Z.-H., Liu, L.-J., and Dai, L.-R. (2014). Voice conversion using deep neural networks with layer-wise generative training. IEEE/ACM Trans. Audio Speech Lang. Process., 22(12):1859-1872.
Danker, H., Wollbrück, D., Singer, S., Fuchs, M., Brähler, E., and Meyer, A. (2010). Social withdrawal after laryngectomy. European Archives of Oto-RhinoLaryngology, 267(4):593-600.
De Jong, S. (1993). SIMPLS: an alternative approach to partial least squares regression. Chemometrics Intell. Lab. Syst., 18(3):251-263.
Denby, B., Schultz, T., Honda, K., Hueber, T., Gilbert, J., and Brumberg, J. (2010). Silent speech interfaces. Speech Commun., 52(4):270-287.
Fagan, M. J., Ell, S. R., Gilbert, J. M., Sarrazin, E., and Chapman, P. M. (2008). Development of a (silent) speech recognition system for patients following laryngectomy. Medical engineering & physics, 30(4):419-425.
Freitas, J., Teixeira, A., Bastos, C., and Dias, M. (2011). Speech Technologies, volume 10, chapter Towards a multimodal silent speech interface for European Portuguese, pages 125-150. InTech.
Fukada, T., Tokuda, K., Kobayashi, T., and Imai, S. (1992). An adaptive algorithm for Mel-cepstral analysis of speech. In Proc. ICASSP, pages 137-140.
Ghahramani, Z. and Hinton, G. E. (1996). The EM algorithm for mixtures of factor analyzers. Technical Report CRG-TR-96-1, University of Toronto.
Gilbert, J. M., Rybchenko, S. I., Hofe, R., Ell, S. R., Fagan, M. J., Moore, R. K., and Green, P. (2010). Isolated word recognition of silent speech using magnetic implants and sensors. Medical engineering & physics, 32(10):1189-1197.
Gonzalez, J. A., Cheah, L. A., Bai, J., Ell, S. R., Gilbert, J. M., 1, R. K. M., and Green, P. D. (2014). Analysis of phonetic similarity in a silent speech interface based on permanent magnetic articulography. In Proc. Interspeech, pages 1018-1022.
Herff, C., Heger, D., de Pesters, A., Telaar, D., Brunner, P., Schalk, G., and Schultz, T. (2015). Brain-to-text: decoding spoken phrases from phone representations in the brain. Frontiers in Neuroscience, 9(217).
Hofe, R., Ell, S. R., Fagan, M. J., Gilbert, J. M., Green, P. D., Moore, R. K., and Rybchenko, S. I. (2013). Small-vocabulary speech recognition using a silent speech interface based on magnetic sensing. Speech Commun., 55(1):22-32.
Hueber, T., Benaroya, E.-L., Chollet, G., Denby, B., Dreyfus, G., and Stone, M. (2010). Development of a silent speech interface driven by ultrasound and optical images of the tongue and lips. Speech Commun., 52(4):288-300.
Hueber, T., Benaroya, E.-L., Denby, B., and Chollet, G. (2011). Statistical mapping between articulatory and acoustic data for an ultrasound-based silent speech interface. In Proc. Interspeech, pages 593-596.
Jou, S.-C., Schultz, T., Walliczek, M., Kraft, F., and Waibel, A. (2006). Towards continuous speech recognition using surface electromyography. In Proc. Interspeech, pages 573-576.
Kominek, J. and Black, A. W. (2004). The CMU Arctic speech databases. In Fifth ISCA Workshop on Speech Synthesis, pages 223-224.
Kubichek, R. (1993). Mel-cepstral distance measure for objective speech quality assessment. In Proc. IEEE Pacific Rim Conference on Communications, Computers and Signal Processing, pages 125-128.
Leonard, R. (1984). A database for speaker-independent digit recognition. In Proc. ICASSP, pages 328-331.
Petajan, E. D. (1984). Automatic lipreading to enhance speech recognition (speech reading). PhD thesis, University of Illinois at Urbana-Champaign.
Schultz, T. and Wand, M. (2010). Modeling coarticulation in EMG-based continuous speech recognition. Speech Commun., 52(4):341-353.
Toda, T., Black, A. W., and Tokuda, K. (2007). Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory. IEEE Trans. Audio Speech Lang. Process., 15(8):2222-2235.
Toda, T., Black, A. W., and Tokuda, K. (2008). Statistical mapping between articulatory movements and acoustic spectrum using a Gaussian mixture model. Speech Commun., 50(3):215-227.
Toda, T., Nakagiri, M., and Shikano, K. (2012). Statistical voice conversion techniques for body-conducted unvoiced speech enhancement. IEEE Trans. Audio Speech Lang. Process., 20(9):2505-2517.
Toth, A. R., Kalgaonkar, K., Raj, B., and Ezzat, T. (2010). Synthesizing speech from Doppler signals. In Proc. ICASSP, pages 4638-4641.
Uria, B., Renals, S., and Richmond, K. (2011). A deep neural network for acoustic-articulatory speech inversion. In Proc. NIPS 2011 Workshop on Deep Learning and Unsupervised Feature Learning.
Wand, M., Janke, M., and Schultz, T. (2014). Tackling speaking mode varieties in EMG-based speech recognition. IEEE Trans. Bio-Med. Eng., 61(10):2515- 2526.
Wang, J., Samal, A., Green, J. R., and Rudzicz, F. (2012). Sentence recognition from articulatory movements for silent speech interfaces. In Proc. ICASSP, pages 4985-4988.
Zahner, M., Janke, M., Wand, M., and Schultz, T. (2014). Conversion from facial myoelectric signals to speech: a unit selection approach. In Proc. Interspeech, pages 1184-1188.

Download

Paper Citation

in Harvard Style

Gonzalez J., Cheah L., Gilbert J., Bai J., Ell S., Green P. and Moore R. (2016). Direct Speech Generation for a Silent Speech Interface based on Permanent Magnet Articulography . In Proceedings of the 9th International Joint Conference on Biomedical Engineering Systems and Technologies - Volume 4: BIOSIGNALS, (BIOSTEC 2016) ISBN 978-989-758-170-0, pages 96-105. DOI: 10.5220/0005754100960105

in Bibtex Style

@conference{biosignals16,
author={Jose A. Gonzalez and Lam A. Cheah and James M. Gilbert and Jie Bai and Stephen R. Ell and Phil D. Green and Roger K. Moore},
title={Direct Speech Generation for a Silent Speech Interface based on Permanent Magnet Articulography},
booktitle={Proceedings of the 9th International Joint Conference on Biomedical Engineering Systems and Technologies - Volume 4: BIOSIGNALS, (BIOSTEC 2016)},
year={2016},
pages={96-105},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005754100960105},
isbn={978-989-758-170-0},
}

in EndNote Style

TY - CONF
JO - Proceedings of the 9th International Joint Conference on Biomedical Engineering Systems and Technologies - Volume 4: BIOSIGNALS, (BIOSTEC 2016)
TI - Direct Speech Generation for a Silent Speech Interface based on Permanent Magnet Articulography
SN - 978-989-758-170-0
AU - Gonzalez J.
AU - Cheah L.
AU - Gilbert J.
AU - Bai J.
AU - Ell S.
AU - Green P.
AU - Moore R.
PY - 2016
SP - 96
EP - 105
DO - 10.5220/0005754100960105