
Komeiji, S., Shigemi, K., Mitsuhashi, T., Iimura, Y.,
Suzuki, H., Sugano, H., Shinoda, K., and Tanaka, T.
(2022). Transformer-based estimation of spoken sen-
tences using electrocorticography. In ICASSP 2022-
2022 IEEE International Conference on Acoustics,
Speech and Signal Processing (ICASSP), pages 1311–
1315. IEEE.
Komeiji, S. and Tanaka, T. (2019). A language model-based
design of reduced phoneme set for acoustic model.
In 2019 Asia-Pacific Signal and Information Process-
ing Association Annual Summit and Conference (AP-
SIPA), pages 192–197.
Lu, Y. and Morgan, J. L. (2020). Homophone auditory pro-
cessing in cross-linguistic perspective. Proceedings of
the Linguistic Society of America, 5(1):529–542.
Luo, S., Rabbani, Q., and Crone, N. E. (2023). Brain-
computer interface: applications to speech decoding
and synthesis to augment communication. Neurother-
apeutics, 19(1):263–273.
Mak, B. and Barnard, E. (1996). Phone clustering using the
bhattacharyya distance. In Fourth International Con-
ference on Spoken Language Processing, volume 4,
pages 2005–2008.
Makin, J. G., Moses, D. A., and Chang, E. F. (2020). Ma-
chine translation of cortical activity to text with an
encoder–decoder framework. Technical report, Nature
Publishing Group.
Martin, S., Brunner, P., Iturrate, I., Mill
´
an, J. d. R., Schalk,
G., Knight, R. T., and Pasley, B. N. (2016). Word
pair classification during imagined speech using direct
brain recordings. Scientific Reports, 6:25803.
Mohri, M., Pereira, F., and Riley, M. (2002). Weighted
finite-state transducers in speech recognition. Com-
puter Speech & Language, 16(1):69–88.
Moriya, T., Tanaka, T., Shinozaki, T., Watanabe, S., and
Duh, K. (2015). Automation of system building for
state-of-the-art large vocabulary speech recognition
using evolution strategy. In Automatic Speech Recog-
nition and Understanding (ASRU), 2015 IEEE Work-
shop on, pages 610–616.
Moses, D. A., Leonard, M. K., and Chang, E. F. (2018).
Real-time classification of auditory sentences using
evoked cortical activity in humans. Journal of Neu-
ral Engineering, 15(3):036005.
Moses, D. A., Mesgarani, N., Leonard, M. K., and Chang,
E. F. (2016). Neural speech recognition: continuous
phoneme decoding using spatiotemporal representa-
tions of human cortical activity. Journal of Neural
Engineering, 13(5):056004.
Oh, D., Park, J.-S., Kim, J.-H., and Jang, G.-J. (2021). Hi-
erarchical phoneme classification for improved speech
recognition. Applied Sciences, 11(1):428.
Peddinti, V., Povey, D., and Khudanpur, S. (2015). A time
delay neural network architecture for efficient model-
ing of long temporal contexts. In Sixteenth Annual
Conference of the International Speech Communica-
tion Association.
Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glem-
bek, O., Goel, N., Hannemann, M., Motlicek, P.,
Qian, Y., Schwarz, P., et al. (2011). The kaldi speech
recognition toolkit. In IEEE 2011 Workshop on Auto-
matic Speech Recognition and Understanding, num-
ber EPFL-CONF-192584.
Povey, D., Peddinti, V., Galvez, D., Ghahremani, P.,
Manohar, V., Na, X., Wang, Y., and Khudanpur, S.
(2016). Purely sequence-trained neural networks for
asr based on lattice-free mmi. In Interspeech, pages
2751–2755.
Proix, T., Delgado Saa, J., Christen, A., Martin, S.,
Pasley, B. N., Knight, R. T., Tian, X., Poeppel, D.,
Doyle, W. K., Devinsky, O., et al. (2022). Imagined
speech can be decoded from low-and cross-frequency
intracranial eeg features. Nature communications,
13(1):48.
Sivasankaran, S., Srivastava, B. M. L., Sitaram, S., Bali,
K., and Choudhury, M. (2018). Phone merging for
code-switched speech recognition. In Proceedings of
the Third Workshop on Computational Approaches to
Linguistic Code-Switching, pages 11–19.
Sun, P., Anumanchipalli, G. K., and Chang, E. F. (2020).
Brain2char: a deep architecture for decoding text
from brain recordings. Journal of neural engineering,
17(6):066015.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones,
L., Gomez, A. N., Kaiser, Ł., and Polosukhin, I.
(2017). Attention is all you need. Advances in neural
information processing systems, 30.
Vazhenina, D. and Markov, K. (2011). Phoneme set selec-
tion for Russian speech recognition. In 2011 7th Inter-
national Conference on Natural Language Processing
and Knowledge Engineering, pages 475–478. IEEE.
Wang, X., Zhang, J.-S., Nishida, M., and Yamamoto, S.
(2014). Phoneme set design using English speech
database by Japanese for dialogue-based english call
systems. In LREC, pages 3948–3951.
Willett, F. R., Kunz, E. M., Fan, C., Avansino, D. T., Wil-
son, G. H., Choi, E. Y., Kamdar, F., Glasser, M. F.,
Hochberg, L. R., Druckmann, S., et al. (2023). A
high-performance speech neuroprosthesis. Nature,
620(7976):1031–1036.
Toward Designing a Reduced Phone Set Using Text Decoding Accuracy Estimates in Speech BCI
987