Preliminary Evaluation of a Silent Speech Interface based on

Intra-Oral Magnetic Sensing

Lam A. Cheah

, Jie Bai

, Jose A. Gonzalez

, James M. Gilbert

, Stephen R. Ell

Phil D. Green

and Roger K. Moore

School of Engineering, The University of Hull, Kingston upon Hull, U.K.

Department of Computer Science, The University of Sheffield, Sheffield, U.K.

Hull and East Yorkshire Hospitals Trust, Castle Hill Hospital, Cottingham, U.K.

Keywords: Assistive Technology, Silent Speech Interface, Permanent Magnet Articulography, Intraoral Magnetic

Sensing.

Abstract: This paper addresses the hardware challenges faced in developing a practical silent speech interface (SSI)

for post-laryngectomy speech rehabilitation. Although a number of SSIs have been developed, many are

still deemed as impractical due to a high degree of intrusiveness and discomfort, hence limiting their

transition to outside of the laboratory environment. The aim of this paper is to build upon our previous

work, in developing a user-centric prototype and enhancing its desirable features. A new Permanent Magnet

Articulography (PMA) system is presented which fits within the palatal cavity of the user’s mouth, giving

unobtrusive appearance and high portability. The prototype is comprised of a miniaturised circuit

constructed using commercial off-the-shelf (COTS) components and is implemented in the form of a dental

retainer, which is mounted under roof of the user’s mouth and firmly clasps onto the upper teeth.

Preliminary evaluation via speech recognition experiments demonstrates that the intraoral prototype

achieves word recognition accuracy of 75.7%, slightly lower than its predecessor. Nonetheless, the intraoral

design is expected to improve the stability and robustness of the PMA system with a much improved

appearance since it can be completely hidden inside the user’s mouth.

1 INTRODUCTION

Speech is a key capability and the most natural form

of communication of human beings. However, there

are a variety of situations in which people wish to

communicate orally but where normal speech can be

either impossible or undesirable. For instance,

people with speech impairments who have

undergone laryngectomy: the surgical removal of

larynx as part of treatment for cancer or other

diseases affected the vocal cords. These post-

laryngectomy patients, who have lost their voice,

often find themselves struggling with their daily

communication and may experience a severe impact

on their quality of life (Braz et al., 2005; Fagan et

al., 2008). However, there are currently only a

limited number of post-laryngectomy voice

restoration methods available for these individuals:

oesophageal speech, the electrolarynx and speech

valves. Unfortunately, these methods are often

limited by their usability and abnormal voice quality

(Fagan et al., 2008; Gilbert et al., 2010). Whereas,

typing-based augmented and alternative

communication (AAC) devices are limited by slow

manual text input (Wang et al., 2012). Although

some improvements were achieved in term of

voicing quality of the electrolarynx and oesophageal

speech (Doi et al., 2010; Toda et al., 2012),

emerging assistive technologies (ATs) such as silent

speech interfaces (SSIs) have shown promising

potential in recent years as an alternate solution.

In principle, SSIs are devices that enable speech

communication to take place in the absence of

audible acoustic signals (Denby et al., 2010). Hence,

aside from use as a communication aid for speech

impaired individuals, SSIs can also be deployed in

acoustically challenging environment or where

privacy/confidentially is desirable. To date, a

number of SSIs have been proposed in an attempt to

extract non-acoustic information generated during

speech production and reproduce audible speech

using different sensing modalities. A comprehensive

108

Cheah, L., Bai, J., Gonzalez, J., Gilbert, J., Ell, S., Green, P. and Moore, R.

Preliminary Evaluation of a Silent Speech Interface based on Intra-Oral Magnetic Sensing.

DOI: 10.5220/0005824501080116

In Proceedings of the 9th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2016) - Volume 1: BIODEVICES, pages 108-116

ISBN: 978-989-758-170-0

summary on different SSIs technologies were

presented in (Denby et al., 2010). Permanent magnet

articulography (PMA) is a type SSI that is based on

sensing the changes in the magnetic field generated

by a set of permanent magnet markers attached onto

the vocal apparatus (i.e. lips and tongue) during

speech articulation by using an array of magnetic

sensors located around the mouth (Fagan et al.,

2008; Gilbert et al., 2010), which shares some

similarities with the electromagnetic articulography

(EMA) (Toutios and Margaritis, 2005; Toda et al.,

2008; Wang et al., 2012). In contrast to EMA, PMA

does not explicitly provide the position of the

markers, but rather a summation of the magnetic

fields from magnets that are associated with a

particular articulatory gesture. Previous work

(Gilbert et al., 2010; Hofe et al., 2013a, 2013b;

Cheah et al., 2015) demonstrated the possibility of

performing automatic speech recognition (ASR) on

PMA data.

Despite the attractive attributes of SSIs, two

major challenges of building an effective SSI exist

in the form of hardware and processing software.

Preliminary discussions on the influential factors

(e.g. invasiveness, market readiness, potential

costing and etc.) affecting the SSIs’ implementation

were presented in (Denby et al., 2010). Earlier

PMA-based prototypes (Gilbert et al., 2010; Hofe et

al., 2013b) showed acceptable speech recognition

performance, but were not particularly satisfactory

in terms of their appearances, comfort and

ergonomic factors for the users. To address these

hardware challenges, a PMA prototype in the form

of a wearable headset (design based on a customised

pair of spectacles or a headband) comprising of

minituarised sensing modules and wireless

capability was developed (Cheah et al., 2015). The

second generation prototype was re-designed based

on a user-centred approach through utilising

feedback from user questionnaires and through

discussion with stakeholders including clinicians,

potentials users and their families. The appearance

and comfort of the prototype was much improved

and it demonstrated comparable performances to its

predecessors.

As illustrated in figure 1, the second generation

PMA system consists of a set of six cylindrical

Neodynium Iron Boron (NdFeB) permanent

magnets, four on the lips (ø1 mm × 5 mm), one at

the tongue tip (ø2 mm × 4 mm) and one on the

tongue blade (ø5 mm × 1 mm). These magnets are

currently attached using Histoacryl surgical tissue

adhesive (Braun, Melsungen, Germany) during

experimental trials, but will be surgically implanted

for long term usage. The remainder of the PMA

system is composed of a set of four tri-axial

Anisotropic Magnetoresistive (AMR) magnetic

sensors mounted on the wearable headset, a set of

microcontrollers, rechargeable battery and a

processing unit (e.g. computer/ tablet PC). Although

the prototype has many desirable hardware features,

it is not without limitations.

Figure 1: (a) A wearable PMA prototype designed in a

form of spectacles. (b) & (c) Placement of six magnets on

lips (pellets 1-4), tongue tip (pellet 5) and tongue blade

(pellet 6).

The present work builds upon the work of

(Cheah et al., 2015), to further improve and alleviate

the shortcomings from a hardware perspective. The

proposed prototype has several distinctive features,

such as being miniature in size, highly portable,

discreet and unobtrusive since it is hidden from sight

within the user’s mouth. The remainder of this paper

is structured as follows. Section 2 outlines the design

challenges of the intraoral version of the PMA

device. Section 3 describes the architecture of the

intraoral PMA prototype, followed by the

performance evaluation in Section 4. The final

section concludes this paper and provides an outlook

for future work.

2 DESIGN CHALLENGES

Despite the improvements made in the second

generation PMA prototype, it also has drawbacks.

Firstly, the performance of the external headset

cannot be maintained in certain real-life conditions

(i.e. exaggerated movement or sports activity) due to

issues with instability. If there is a considerable

movement of the headset on the user’s head, the

PMA system may need re-calibration/re-training to

avoid degradation in performance.

Secondly, wearing the headset over long periods

may not be comfortable, despite the fact that the

Preliminary Evaluation of a Silent Speech Interface based on Intra-Oral Magnetic Sensing

109

Figure 2: Simplified operation block diagram.

device was designed to be lightweight and

ergonomically friendly. Thirdly, the external version

of the PMA device may still be cosmetically

unacceptable to some users. Previous studies

indicated that the appearance is one of the most

important factors that affect the acceptability of any

AT by their potential end users (Hirsch et al., 2000;

Martin et al., 2006; Bright and Conventry, 2013).

To overcome these limitations, an intraoral

version of the PMA prototype, which fits under the

palate inside the user’s mouth in a form of a dental

retainer, was proposed. Being tightly clamped onto

the upper teeth means that the device would be more

stable than the previous wearable headset. Due to the

fact that the device is completely hidden from sight

during normal use, it is cosmetically inconspicous.

In addition, since the sensors are much closer to the

articulators than the external headset, the size of the

implants can be significantly reduced. Similar

intraoral-based designs have been previously

implemented for other non-speech related ATs with

various degree of success (Tang and Beebe, 2006;

Lontis et al., 2010; Park et al., 2012).

3 SYSTEM DESCRIPTION

3.1 Space Budget

The intraoral circuitry necessary to implement a

PMA system is made up of: three tri-axial magnetic

sensors, a wireless communication module, a

microprocessor to synchronise data capture and

communications and a suitable power source

capable of providing an appropriate operating

lifetime. This must be accommodated within the oral

cavity, without interfering with the natural tongue

articulation during speech. A recent study (Bai et al.,

2015) suggested that the palatal cavity is suitable to

house the intraoral circuitry because of its relatively

flat surfaces and proximity to the articulators. The

estimated space available in the palatal cavity on our

test subject is 59.7mm

3.2 Description of the Intraoral

Circuitry

In order to fit all necessary circuitry inside the

mouth, the size of the electronics and rechargeble

battery of the external version of PMA prototype

had to be shrunk down. The major components of

the PMA prototype are shown in figure 2. These are

implemented using a low-power ATmega328P

microcontroller, three tri-axial HMC5883L magnetic

sensors (AMR), a rechargeable Li-Ion coin battery

(capacity of 40mAh, 3.7V and 20 mm diameter ×

3.2 mm

thickness), and a wireless transceiver

(Bluetooth 2.0 module). The remainder of the

system shown in figure 3 consists of a processing

unit (e.g. computer/tablet PC) and a set six

permanent magnets (NdFeB) attached onto lips and

tongue in the same locations as illustrated in figure

1. The elements of the intraoral sensing system

(which have a total volume of 36.7 mm

) are

arranged as shown in figure 3(a). These may be

encapsulated and placed in the oral cavity as shown

in figure 3(d).

The positions of the magnets remained

unchanged from the earlier prototype but because of

the proximity of the sensors, significantly smaller

magnets (see Figure 3c) can be used: four on lips (ø1

mm × 4 mm), one on the tongue tip (ø1 mm × 1

mm) and one on the tongue blade (ø1 mm × 1 mm).

Note that the magnetic field strength decreases with

the cube of the distance away from the magnets.

BIODEVICES 2016 - 9th International Conference on Biomedical Electronics and Devices

110

Figure 3: (a) & (b) Circuitry of the intraoral version of the

PMA system. (c) Placement of magnets on lips (pellets 1-

4), tongue tip (pellet 5) and tongue blade (pellet 6). (d)

View of the device when worn by user.

3.3 Circuit Operation

Figure 2 shows an operational block diagram of the

intraoral version of the PMA system. A command is

sent wirelessly from the processing unit to the

intraoral sensing module via Bluetooth to trigger

data acquisition. All three tri-axial magnetic sensors

then measure the magnetic field and digitize it with

12-bit resolution. The microcontroller acquires these

measurements (9 PMA channels sampled at 80 Hz)

through managing a multiplexer using three control

signals (S0, S1 and SCL). The multiplexer acts as a

switching device to route the serial clock (SCL) to

the desired magnetic sensor through the I

interface. The acquired samples are then transmitted

back to a processing unit wirelessly via the Blutooth

transceiver and custom designed Bluetooth dongle

(Figure 3(b)) for further processing. Unlike the

external version of the PMA prototype, the intraoral

device is restricted to only operate wirelessly from

inside the mouth. Wired connectivity is impossible,

as the sensing modules are to be sealed and

packaged inside a dental retainer. A description of

the operational and timing diagrams of the sub-

modules are presented in (Bai et al., 2015).

In terms of software, an ad-hoc MATLAB-based

graphical user interface (GUI) developed in (Cheah

et al., 2015) was adapted, where all speech

processing and recognition algorithms were

embedded. During silent speech recognition, if the

acquired PMA signal is correctly matched to an

articulated gesture from the training database, the

corresponsed utterance will be identified. A text-to-

speech synthesiser (TTS) is then used to generate an

acoustic signals (e.g. pre-recorded individual’s own

voice) for the recognised utterance through built-in

speakers.

3.4 Power Budget

Since the circuitry is to be sealed into a dental

retainer, the intraoral device can only acquire power

from a battery. Given the limited space available, a

small, low capacity battery must be used (in the

current design, the battery takes 27% of the total

volume of the circuitry). In addition, any measures

to extend the battery life will be of interest. Power

hungry components such as the microcontroller, the

magnetic sensors and the Bluetooth may be set to

standby mode or sleep mode to reduce the current

consumption when they are inactive. A shown in

table 1, sleep mode gives a saving of 93% over

standby mode or a saving of 97% over active mode.

Table 1: Current consumption in difference operational

modes.

Current

Consumption

Active

mode

(mA)

Standby

mode

(mA)

Sleep

mode

(mA)

Sensors 5.1 0.006 0.006

Microcontroller 5.4 4.4 0.7

Bluetooth 19.0 7.22 0.07

Total 29.5 11.626 0.776

Figure 4: Battery discharging over time under active mode

and sleep mode.

If the system is to operate continuously (in active

mode), the battery will last approximately one hour

before being depleted below the minimum operating

voltage (cut-off voltage) required by the Bluetooth

module of 2.1V. The battery can then be recharged

through a charging point located at the bottom side

of the dental retainer. In contrast, if the system was

inactive at all times (in sleep mode), the battery

would last about 32 hours. Figure 4 shows a

Preliminary Evaluation of a Silent Speech Interface based on Intra-Oral Magnetic Sensing

111

summary of the discharging cycle of the battery in

different modes. Neither of these operating regimes

is fully representative of the expected use since they

correspond to continuous speech and no speech

respectively. Based on the measurements in table 1

and figure 4, a more realistic regime would be to

allow 30 minutes of speech with a further 16 hours

in sleep mode. Hence, the estimated usage time is

considered to be sufficient for a typical day before

charging is required.

3.5 Construction of the Dental Retainer

The circuit described in the previous section must be

encapsulated to protect it from damage and short

circuits due to saliva and to ensure it is held in place

within the palate. The retainer must be customised

according to the individual’s oral anatomy. This may

be achieved by forming it on a dental impression of

the user’s oral cavity (seen in the background of

figure 3a). The intraoral PMA prototype was

implemented in the form of dental retainers utilising

both hard and soft materials, as illustrated in figure

Figure 5: PMA circuitry embedded inside a (a) Hawley

dental retainer and (b) soft bite raiser like dental retainer.

The hard retainer is similar to a Hawley retainer

and is made of dental acrylic resin, which is

commonly used in fabrication of orthodontic

appliances. On the other hand, the soft retainer is

similar to a soft bite raising appliance and is made of

polypropylene or polyvinylchloride (PVC) material.

To allow stable fitting in the palate, the Hawley

retainer utilises a set of ball clasps to tightly secure it

onto the upper teeth. In contrast, the soft bite raiser

is fitted over the entire arch of the upper teeth. Note

that only soft retainer was used in the preliminary

experiments, but similar performance is expected

from the hard retainer.

4 PERFORMANCE EVALUATION

4.1 Experimental Design and Setup

The PMA-based SSIs (both intraoral and external

version) are speaker dependant systems because

their designs need to be individually tailored based

on the speaker’s head or oral anatomy for optimal

performance. The data used for evaluating the new

intraoral prototype were collected from a male

native English speaker who is proficient in the usage

of the external PMA device. Magnets were

temporary attached on the subject using Histoacryl

surgical tissue adhesive (Braun, Melsungen,

Germany).

Recordings of PMA and audio data for training

and evaluation were performed using a bespoke

Matlab-based GUI. The software provides a visual

prompt of randomised utterances to the subject at

interval of 5 seconds during the training session. The

subject’s head was not restrained during the

recording sessions, but the subject was requested to

avoid any large head movements. This was

necessary to ensure that interference induced by

movement relative to earth’s magnetic field was at

its minimum, so that it did not corrupt or distort the

desired signal. This is because the current prototype

is not yet equipped with a non-articulatory

cancellation/removal mechanism.

For optimal sound quality, the recordings were

conducted in an acoustically isolated room. The

audio data were recorded using a shock-mounted

AKG C1000S condenser microphone via a dedicated

stereo USB-sound card (Lexicon Lambda) to a PC,

with a 16 kHz sampling rate. Meanwhile, the PMA

data were captured at a sampling frequency of 80 Hz

via the intraoral PMA device and transmitted to the

same PC wirelessly via Bluetooth, as illustrated in

figure 2. Since both data streams (PMA & audio) are

acquired from separate modality, synchronisation

between the two data streams is necessary. Hence,

an automatic timing re-alignment mechanism was

implemented utilising start-stop markers generated

in additional to both data streams.

4.2 Data Recording

Our long term goal is to explore the feasibility of

using the intraoral device for continuous speech

reconstruction. For preliminary testing, the TIDigits

database (Leonard, 1984) was selected because the

limited size of the vocabulary enables whole-word

model training from relatively sparse data and

because of the simplicity of the language involved.

BIODEVICES 2016 - 9th International Conference on Biomedical Electronics and Devices

112

The corpus consists of sequences of connected

English digits with up to seven digits per utterance.

The vocabulary is made up of eleven individual

digits, i.e. from ‘one’ to ‘nine’, plus ‘zero’ and ‘oh’

(both representing digit 0).

The experimental data were collected from two

independent sessions, with each session consisted of

four datasets containing 77 sentences each. A total

of 308 utterances containing 1012 individual digits

were recorded during each session. To prevent

subject fatigue, short breaks in between each

recording session were allowed.

4.3 HMM Training and Recognition

Prior to the training and recognition processes, the

acquired PMA data were segmented and checked

using the audio data. Inappropriate endpoints were

manually corrected if necessary. In addition, any

mis-labelled utterances were corrected using the

acquired audio data.

The PMA data was then subjected to offset

removal via median subtraction over 2s windows

with 50% overlap and followed by data

normalization. Next, the delta parameters were

computed for all PMA channels and added to its

original time series data, resulting in a feature vector

of size 18. The delta-delta parameters were not

included as part of the feature vector as they did not

produced significant improvement in performance

(Hofe et al., 2013a, 2013b). The recognition

performance based on the audio data was also

evaluated for comparison purposes. In this case, 13

Mel-frequency cepstral coefficients (MFCCs) were

extracted from the audio signals using 25ms analysis

windows with 10ms overlap. Next, the delta and

delta-delta parameters were computed and appended

to the static parameters, resulting in a feature vector

of dimension 39.

The extracted PMA and audio features were used

for training two independent speech recognisers

using the HTK toolkit (Young et al., 2009). In both

cases, the acoustic model in the recogniser uses

whole-word Hidden Markov Models (HMMs)

(Rabiner, 1989) for each of the eleven digits. Each

HMM has 21 states and 5 Gaussians per state. The

selected parameters were not optimised, but were

known for their performances based on previous

work (Hofe et al., 2013a, 2013b). The HMM

training and recognition was carried out in four

validation cycles. In each cycle, three out of four

sets within a session were used for training and the

remaining one for testing. The recognition results

were averaged over four cycles and across two

independent sessions.

4.4 Recognition Performance

Both word and sequence accuracy results for the

intraoral and external versions of the PMA device

are presented in figure 6 and figure 7. In addition,

the performances of the PMA devices were

compared with audio-based recognition. The blue-

coloured bars indicate the performance achieved

using only static PMA data (vector size of 9),

whereas the red-coloured bars are the results

achieved using both static and dynamic features

(vector size of 18). In addition, the green-coloured

bars are the speech-recognition performance

achieved using audio data (vector size of 39). We

will refer to these three conditions as Sensor,

SensorD and Audio features, respectively.

The results reflect the mean of the data collected

across the two sessions, but were initially analysed

independently session-by-session. In order to avoid

the inconsistency of magnets placement during

individual training sessions, data were not merged

across different sessions. This however could be

solved, as magnets are to be surgically implanted for

long term usage. Alternatively session-independent

approaches, such as those presented for other SSIs

methods in (Maier-Hein et al., 2005) and (Wand and

Schultz, 2011) could be investigated.

As shown in both figure 6 and figure 7, it is quite

obvious that SensorD produced better recognition

performance on both occasions than using Sensor

alone Similar trends were also reported in (Hofe et

al., 2013b; Cheah et al., 2015). As expected, for this

simple task, recognition using Audio performed very

well (i.e. 99%). Preliminary evaluations indicated

some degradation (i.e. 15% for word accuracy and

17% for sequence accuracy) in recognition

performances in the intraoral device as compared to

the previous external version, as illustrated in figure

6 and figure 7. There are a number of possible

explanations for this degradation: 1) the presence of

the intraoral prototype affects articulation and, in

particular limits the tongue movements. This may

lead to inconsistent articulation, 2) the subject

was new to the intraoral version, but had prior

experiences on the external PMA version, 3) non-

articulatory features arise from unintentional

movements (e.g. swallowing, licking the lips, head

movements) which could have corrupted the data or

been confused with utterances, and 4) the magnets

are able to come much closer to the sensors in the

intraoral device than in the external device, resulting

Preliminary Evaluation of a Silent Speech Interface based on Intra-Oral Magnetic Sensing

113

in a more significant non-linear effect (since the

field strength decreases with cube of the distance).

This means that small unintentional articulator

movements can generate very large signals in some

instances. Further work is required to understand the

significance of each of these possible causes.

Figure 6: Comparison of word accuracy in the connected

digits.

Figure 7: Comparison of sequence accuracy in the

connected digits.

4.5 Hardware Evaluations

As discussed in section 2, one major obstacle to the

acceptability of an AT (e.g. SSI) is its appearance if

it is considered unattractive. Similar views were also

concluded through discussions with potential users

who have undergone a laryngectomy and an opinion

survey of 50 laryngectomees and their families/

friends, the appearance of the PMA-based device

was considered to be of a very high priority (Cheah

et al., 2015). To enhance its appeal to users,

influential factor such as appearance needed to be

accounted for during device development. The

challenge here is to satisfy the design objective and

continue improving on the PMA device’s

appearance but without compromising its speech

reconstruction performance. The latest intraoral

prototype employs the same functional principles as

the previous design reported in (Gonzalez et al.,

2014; Cheah et al., 2015), but implemented in a

different form. A summary of the hardware features

of the new intraoral PMA system compared to its

predecessor is presented in table 2.

Despite the improved appearance of the second

generation PMA system in the form of a wearable

headset, it might not yet to be appealing to all. To

address this shortcoming, the latest intraoral

circuitry was implemented in the form of a dental

retainer. To achieve this, the circuit was re-designed

to use fewer and smaller components. In addition,

the power consumption of the circuit was carefully

managed to allow it to operate from a small battery

suitable for inclusion within the dental retainer.

Hence, this led to a much smaller and lighter (i.e.

one tenth of previous weight) prototype as compared

to its predecessor. In addition, the intraoral prototype

is highly portable, it operates and can be controlled

wirelessly via Bluetooth using a computer/tablet PC.

Also, a higher signal-to-noise ratio (SNR) was

obtained with smaller magnetic tracers, due to their

proximity to the magnetic sensors. The tongue

magnets used with the intraoral sensor system were

16 to 25 times smaller volume than those used for

the external headset, potentially making them less

invasive when implanted.

A significant drawback with the intraoral device

is the limited battery size and capacity (i.e. 40mAh).

In contrast, the external version is less restricted in

term of size and weight of the battery. Hence, this

significantly reduces the operational time of the

intraoral device per charging. A number of steps

have been introduced to reduce its power

consumptions: a lower operational voltage is

selected and power-efficient components, lower data

sampling and transmission rates were chosen. In

addition, software was developed to switch from an

active mode to sleep mode when not in use. Using

these measures, it is estimated that the battery life

cycle could be extended from one hour to about 16.5

hours including 30 minutes of speech.

5 CONCLUSIONS

In this paper we have described a new intraoral

PMA prototype using commercial off-the-shelf

(COTS) components and embedded inside a dental

retainer constructed using the subject’s dental

BIODEVICES 2016 - 9th International Conference on Biomedical Electronics and Devices

114

Table 2: Summary of the PMA devices’ specifications and comparison [*Note that although the external sensing system has

12 channels, only 9 are used for speech recognition and 3 are used for cancellation of background magnetic fields].

Specifications Intraoral Sensing External Sensing

Appearance Dental retainer Wearable headset

Operating voltage 2.1 V 5 V

Magnets

Tongue Blade ø1 mm × 1 mm ø5 mm × 1 mm

Tongue Tip ø1 mm × 1 mm ø2 mm × 4 mm

Lips ø1 mm × 4 mm ø1 mm × 5 mm

Magnetic

Sensing

Dimension 12 × 12 × 3 mm

12 × 12 × 3 mm

Sensitivity 230 LSb/gauss 440 LSb/gauss

Samplig rate 80 Hz 100 Hz

Channels 9 12*

Data

Transmission

Type Bluetooth 2.0 Bluetooth 2.0 / USB

Frequency 2.4 GHz 2.4 GHz

Data rate 57.6 kbps 500 kbps

Power

Supply Rechargeable battery Rechargeable battery / USB

Battery Li-Ion 40 mAh Li-Ion 1080 mAh

Current consumption 30.5 mA 93.5 (wireless) / 67.1 (wired) mA

Lifetime 1 hour 10 hours

Prototype

Dimension 70 × 55 × 25 mm

160 × 160 × 150 mm

Weight 15 g 160 g

Material Acrylic resin / polypropylene VeroBlue / VeroWhitePlus resin

impression. Preliminary evaluation of the intraoral

prototype indicated a recognition performance,

slightly lower than the previous external PMA

device. However, there are a number of avenues for

further investigation to improve its performance.

Nonetheless, there are several advantages over

its predecessor. It is considered to be more stable

and robust against unintentional movement as it is

implemented in a form of a dental retainer, which

securely sits in the palatal cavity and is clasping

firmly on the upper teeth. Secondly, significantly

smaller magnets may be used for the intraoral

version (because of their proximity to the magnetic

sensors) while also giving a higher SNR. In addition,

the dental retainer can be completely hidden inside

the user’s mouth and out of sight. Hence, this would

eliminate the concern of being a sign of disability.

However, a downside of the intraoral design

would be the possibility of limiting the natural

movement of the tongue, because the device

occupies part of the user’s oral cavity. Further work

is required to assess whether users become

accustomed to the presence of the device and are

able to achieve more consistent articulation.

Encouraged by the results so far, extensive work

is needed to: 1) further reduce the size of future

intraoral prototypes, 2) improve the circuitry power

efficiency, 3) incorporate inductive charging for the

battery, and 4) introduce a background cancellation

mechanism for movement-induced interference.

Though there are still limitations, the present work

demonstrates a major step towards creating a viable

SSI that would appeal to speech impaired users. For

further information on the PMA-based SSI and its

speech restoration technique, please visit

www.hull.ac.uk/speech/disarm/demos.

ACKNOWLEDGEMENTS

The authors would like to thank Helen Dehkordy

from Hull and East Yorkshire Hospitals NHS Trust

for prototyping the dental retainers. The study is an

independent research funded by the National

Institute for Health Research (NIHR)’s Invention for

Innovation Programme. The views stated in this

report are those of the authors and should not be

interpreted as representing the official thoughts of

the sponsor.

REFERENCES

Bai, J., Cheah, L. A., Ell, S. R., and Gilbert, J. M. (2015).

Design of an intraoral device based on permanent

magnetic articulography. In Proceedings of Macau

Conference on Engineering, Technology and Applied

Science, pages 1-12, Macau, China.

Preliminary Evaluation of a Silent Speech Interface based on Intra-Oral Magnetic Sensing

115

Braz, D. S. A., Ribas, M. M., Dedivitis, R. A., Nishimoto,

I. N., and Barros, A. P. B. (2005). Quality of life and

depression in patients undergoing total and partial

laryngectomy. Clinics, 60(2):135-142.

Bright, A. K., and Conventry, L. (2013). Assistive

technology for older adults: psychological and socio-

emotional design requirements. In Proceedings of 6

International Conference on PErvaesive Technologies

Related to Assistive Environments, pages 1-4, Rhodes,

Greece.

Cheah, L. A., Bai, J., Gonzalez, J. A., Ell, S. R., Gilbert, J.

M., Moore, R. K., and Green, P. D. (2015). A user-

centric design of permanent magnetic articulography

based assistive speech technology. In Proceedings of

BIOSIGNALS, pages 109-116, Lisbon, Portugal.

Denby, B., Schultz, T., Honda, K., Hueber, T., Gilbert, J.

M., and Brumberg, J. S. (2010). Silent speech

interfaces. Speech Communication, 52(4):270-287.

Doi, H., Nakamura, K., Toda, T., Saruwatari, H., and

Shikano, K. (2010). Esophageal speech enhancement

based on statistical voice conversion with Gaussian

mixture model. IEICE Transactions on Information

and Systems, 93(9):2472-2482.

Fagan, M. J., Ell, S. R., Gilbert, J. M., Sarrazin, E., and

Chapman, P. M. (2008). Development of a (silent)

speech recognition system for patients following

laryngectomy. Medical Engineering & Physics,

30(4):419-425.

Gilbert, J. M., Rybchenko, S. I., Hofe, R., Ell, S. R.,

Fagan, M. J., Moore, R. K. and Green, P. D. (2010).

Isolated word recognition of silent speech using

magnetic implants and sensors. Medical Engineering

& Physics, 32(10):1189-1197.

Gonzalez, J. A., Cheah, L. A., Bai, J., Ell, S. R., Gilbert, J.

M., Moore, R. K., and Green, P. D. (2014). Analysis

of phonetic similarity in a silent speech interface based

on permanent magnetic articulography. In Proceedings

of 15

INTERSPEECH, pages 1018-1022, Singapore.

Hirsch, T., Forlizzi, J., Goetz, J., Stoback, J., and Kurtx, C.

(2000). The ELDer project: Social and emotional

factors in the design of eldercare technologies. In

Proceedings on the 2000 conference of Universal

Usability, pages 72-79, Arlington, USA.

Hofe, R., Bai, J., Cheah, L. A., Ell, S. R., Gilbert, J. M.,

Moore, R. K., and Green, P. D. (2013a). Performance

of the MVOCA silent speech interface across multiple

speakers. In Proceedings of 14

INTERSPEECH,

pages 1140-1143, Lyon, France.

Hofe, R., Ell, S. R., Fagan, M. J., Gilbert, J. M., Green, P.

D., Moore, R. K., and Rybchenko, S. I. (2013b).

Small-vocabulary speech recognition using silent

speech interface based on magnetic sensing. Speech

Communication, 55(1):22-32.

Leonard, R. G. (1984). A database for speaker-

independent digit recognition. In Proceedings of 9

ICASSP, pages 328-331, San Diego, USA.

Lontis, E. R., Lund, M. E., Christensen, H. V., Gaihede,

M., Caltenco, H. A., and Andreasen-Strujik, L. N.

(2010). Clinical evaluation of wireless inductive

tongue computer interface for control of computers

and assistive devices. In Proceedings of International

Conference on Engineering in Medicine and Biology

Society, pages 3365-3368, Beunos Aires, Argentina.

Maier-Hein, L., Metze, F., Schultz, T., and Waibel, A.

(2005). Session independent non-audible speech

recognition using surface electromyography. In

Automatic Speech Recognition and Understanding

Workshop, pages 331-336, Cancun, Mexico.

Martin, J. L., Murphy, E., Crowe, J. A., and Norris, B. J.

(2006). Capturing user requirements in medical

devices development: the role of ergonomics.

Physiological Measurement, 27(8):49-62.

Park, H., Kiani, M., Lee, H. M., Kim, J., Block, J.,

Gosselin, B., and Ghovanloo, M. (2012). A wireless

magnetoresistive sensing system for an intraoral

tongue-computer interface. IEEE Transactions on

Biomedical Circuits and Systems, 6(6):571:585.

Rabiner, L. R. (1989). A tutorial on Hidden Markov

Models and selected applications in speech

recognition. Proceedings of the IEEE, 77:257-286.

Tang, H., and Beebe, D. J. (2006). An oral interface for

blind navigation. IEEE Transactions on Neural

Systems and Rehabilitation Engineering, 14(1):116-

123.

Toda, T., Black, A. W., and Tokuda, K. (2008). Statistical

mapping between articulatory movements and acoustic

spectrum using a Gaussian mixture model. Speech

Communication, 50(3): 215-227.

Toda, T., Nakagiri, M., and Shikano, K. (2012). Statistical

voice conversion techniques for body-conducted

unvoiced speech enhancement. IEEE Transactions on

Audio, Speech and Language Processing, 20(9):2505-

2517.

Toutios, A., and Margaritis, K. G. (2005). A support

vector approach to the acoustic-to-articulatory

mapping. In Proceedings of 6

INTERSPEECH, pages

3221-3224, Lisbon, Portugal.

Wand, M., and Schultz, T. (2011). Session-independent

EMG-based speech recognition. In Proceedings of 4

BIOSIGNALS, pages 295-300, Rome, Italy.

Wang, J., Samal, A., Green, J. R., and Rudzicz, F. (2012).

Sentence recognition from articulatory movements for

silent speech interfaces. In Proceedings of 37

ICASSP, pages 4985-4988, Kyoto, Japan.

Young, S., Everman, G., Gales, M., Hain, T., Kershaw, D.,

Liu, X., Moore, G., Odell, J., Ollason, D., Povery, D.,

Valtchev, V., and Woodland, P. (2009). The HTK

Book (for HTK Version 3.4.1). Cambridge: Cambridge

University Press.

BIODEVICES 2016 - 9th International Conference on Biomedical Electronics and Devices

116