Deep Learning in EMG-based Gesture Recognition

P. Tsinganos

, B. Cornelis

, J. Cornelis

, B. Jansen

and A. Skodras

University of Patras, Department of Electrical and Computer Engineering, 26504 Patras, Greece

Vrije Universiteit Brussel, Department of Electronics and Informatics, 1050 Brussels, Belgium

Keywords:

sEMG, Gesture Recognition, Deep Learning, CNN.

Abstract:

In recent years, Deep Learning methods have been successfully applied to a wide range of image and speech

recognition problems highly impacting other research ﬁelds. As a result, new works in biomedical engineer-

ing are directed towards the application of these methods to electromyography-based gesture recognition. In

this paper, we present a brief overview of Deep Learning methods for electromyography-based hand gesture

recognition along with an analysis of a modiﬁed simple model based on Convolutional Neural Networks. The

proposed network yields a 3% improvement on the classiﬁcation accuracy of the basic model, whereas the

analysis helps in understanding the limitations of the model and exploring new ways to improve the perfor-

mance.

1 INTRODUCTION

Over the last decades there has been particular inte-

rest in gesture recognition for human-computer inte-

raction (HCI). This particular combination ﬁnds many

applications, including sign language recognition, ro-

botic equipment control, virtual reality gaming, and

prosthetics control (Cheok et al., 2017). Among

the various sensor modalities that have been used to

capture hand gesture information, electromyography

(EMG) is considered more appropriate since it captu-

res the muscle’s electrical activity; the physical phe-

nomenon that results in hand gestures. EMG data can

be recorded either with invasive or non-invasive met-

hods. Surface electromyography (sEMG) is a techni-

que that measures muscle’s action potential from the

surface of the skin, contrary to invasive methods that

penetrate the skin to reach the muscle.

A popular approach to sEMG-based gesture re-

cognition consists of using pattern recognition met-

hods derived from Machine Learning (ML) (Scheme

and Englehart, 2011). Conventional ML pipelines in-

clude data acquisition, feature extraction, model deﬁ-

nition and inference. Acquisition of sEMG signals in-

volves one or more electrodes attached around the tar-

get muscle group. The features used for classiﬁcation

are usually hand-crafted by human experts and cap-

ture the temporal and frequency characteristics of the

data. Typical features that have been used for sEMG

pattern classiﬁcation are shown in Table 1. These ex-

tracted features serve as the input to ML classiﬁers,

such as k-Nearest Neighbors (kNN), Support Vector

Machines (SVM), Multi-Layered Perceptron (MLP),

Linear Discriminant Analysis (LDA), and Random

Forests (RF), where the classiﬁers parameters are ad-

justed towards accurate classiﬁcation.

Deep Learning (DL) is a class of ML algorithms

that has revolutionized many ﬁelds of data analysis

(Goodfellow et al., 2016). For example, Convoluti-

onal Neural Networks (CNNs) and Recurrent Neu-

ral Networks (RNNs) were successfully deployed for

image classiﬁcation and speech recognition tasks, re-

spectively. DL methods differ from conventional ML

approaches in that feature extraction is part of the mo-

del deﬁnition, therefore obviating the need for hand-

crafted features. Although these methods are not new

(Goodfellow et al., 2016), they recently gained more

attention due to the increased availability of abundant

data and vast improvements in computing hardware

allowing these computationally demanding methods

to be executed in less time.

Motivated by the progress of DL methods we pro-

vide an overview of the application of these methods

to sEMG pattern classiﬁcation problems and propose

modiﬁcations to a simple CNN model (Atzori et al.,

2016). The comparison with the state of the art and

the analysis of the results sheds light on how the ar-

chitecture performs and allows for improvements to

be made.

The remaining of the paper is organized as fol-

lows. In Section 2, we provide an overview of the re-

lated gesture recognition approaches. Section 3 gives

Tsinganos, P., Cornelis, B., Cornelis, J., Jansen, B. and Skodras, A.

Deep Learning in EMG-based Gesture Recognition.

DOI: 10.5220/0006960201070114

In Proceedings of the 5th International Conference on Physiological Computing Systems (PhyCS 2018), pages 107-114

ISBN: 978-989-758-329-2

107

a detailed description of the proposed CNN architec-

ture. The experiments performed for the evaluation of

the model are presented in Section 4, while the results

and a brief discussion are given in Section 5. Finally,

in Section 6 we conclude the paper and outline our

future work.

2 RELATED WORK

There exists a great body of literature on the problem

of sEMG-based hand gesture recognition. One can

discriminate between approaches that use conventio-

nal ML techniques and studies based on deep learning

methods.

The most signiﬁcant study on sEMG classiﬁca-

tion with traditional ML techniques is the work des-

cribed in (Hudgins et al., 1993). For every 200ms

segment of 2 channel sEMG signals, 5 time-domain

features are extracted and fed to an MLP classiﬁer,

achieving an accuracy of 91.2% on the classiﬁcation

of 4 hand gestures. Later approaches based on this

work improve the classiﬁcation performance by using

more features or different classiﬁers. In (Englehart

and Hudgins, 2003), the same set of features is ex-

tracted from 4 channel sEMG signals and fed to an

LDA classiﬁer. The average accuracy obtained is gre-

ater than 90% and is further improved by applying a

majority vote window to the predictions of the clas-

siﬁer. The work presented in (Castellini et al., 2009)

achieves a 97.14% accuracy on the task of classifying

3 types of grasp motions using the RMS value from 7

electrodes as the input to an SVM classiﬁer. In (Ku-

zborskij et al., 2012), a set of time- and frequency-

domain features is extracted from 8 channel myoe-

lectric signals and evaluated with various classiﬁers.

This experiment is considered the ﬁrst successful ap-

proach for the classiﬁcation of a large number of hand

gestures, since they achieve high accuracy (70-80%)

on a set of 52 hand gestures (Ninapro dataset (Atzori

et al., 2015)) using any of the proposed features and

an SVM classiﬁer with RBF kernel. This work was

further improved in (Atzori et al., 2014) by conside-

ring linear combination of features and using a RF

classiﬁer resulting in an average accuracy of 75.32%.

In (Gijsberts et al., 2014), different kernel classiﬁers

were evaluated jointly on EMG and acceleration sig-

nals, improving the classiﬁcation accuracy by 5%.

Considering the advancements of DL methods in

the ﬁelds of image processing and speech recogni-

tion, many works have investigated their application

to EMG-based hand gesture recognition. In (Shim

and Lee, 2015) and (Shim et al., 2016), the authors

propose a Deep Belief Network (DBN) classiﬁer as

a more effective model compared to a shallow MLP

network trained with back-propagation. Time-domain

features are extracted from segments of 2 channel

EMG signals which are used to train the model in a

layer-by-layer fashion, either with a greedy approach

or using genetic algorithms, achieving an accuracy of

88.59% and 89.29% respectively on a set of 5 mo-

vements.

The ﬁrst end-to-end DL architecture, however,

was proposed by (Park and Lee, 2016). The authors

built a CNN-based model for the classiﬁcation of six

common hand movements resulting in a better classi-

ﬁcation accuracy compared to SVM. In (Atzori et al.,

2016), a simple CNN architecture based on 5 blocks

of convolutional and pooling layers is used to clas-

sify a large number of gestures. The classiﬁcation

accuracy is comparable to those obtained with clas-

sical methods, though not higher than the best per-

formance achieved on the same problem using a RF

classiﬁer. The works of (Geng et al., 2016) and (Wei

et al., 2017) improve their results across various da-

tasets incorporating dropout (Srivastava et al., 2014)

and batch normalization (Sergey and Szegedy, 2015)

techniques in their methodology. Apart from choo-

sing different model architectures, other differences

to previous works consist of using a high-density elec-

trode array to capture EMG data. Using instantaneous

EMG images, (Geng et al., 2016) achieves a 89.3%

accuracy on a set of 8 movements, going up to 99.0%

when using majority voting over 40ms windows. In

(Wei et al., 2017), the observation is made that a small

group of muscles play a signiﬁcant role in some mo-

vements. Therefore, a multi-stream CNN architecture

is employed, where the input is divided into smaller

images that are separately processed by convolutio-

nal layers before being merged with fully connected

layers. With this model the reported accuracy on the

Ninapro dataset is improved by 7.2% (from 77.8% to

85%).

Later works deal with the problem of inter-subject

classiﬁcation, i.e. where the train and test data

come from different subjects, either with recalibra-

tion ((Zhai et al., 2017)) or model adaptation ((Du

et al., 2017), (C

e-Allard et al., 2018)). The per-

formance of the network proposed in (Zhai et al.,

2017), which takes as input downsampled spectro-

grams of EMG segments, is improved by updating

the network weights using the predictions of previ-

ous sessions corrected by majority voting. In (Du

et al., 2017) it is assumed that the weights of each

layer of the network contain information that allows

for differentiation between gestures, while the mean

and variance of the batch normalization layers cor-

respond to discriminating between sessions/subjects.

PhyCS 2018 - 5th International Conference on Physiological Computing Systems

108

Table 1: Typical sEMG features.

Feature Domain Reference

Root Mean Square time (Castellini et al., 2009)

Variance time (Kuzborskij et al., 2012)

Mean Absolute Value time (Kuzborskij et al., 2012) (Atzori et al., 2014)

(Hudgins et al., 1993) (Englehart and Hudgins,

2003)

Zero Crossings time (Atzori et al., 2014) (Hudgins et al., 1993) (Eng-

lehart and Hudgins, 2003)

Slope Sign Changes time (Atzori et al., 2014) (Hudgins et al., 1993) (Eng-

lehart and Hudgins, 2003)

Waveform Length time (Kuzborskij et al., 2012) (Atzori et al., 2014)

(Hudgins et al., 1993) (Englehart and Hudgins,

2003)

Histogram time (Kuzborskij et al., 2012) (Atzori et al., 2014)

(Hudgins et al., 1993) (Englehart and Hudgins,

2003)

Short Time Fourier Transform frequency (Kuzborskij et al., 2012) (Englehart et al., 1999)

Cepstral coefﬁcients frequency (Kuzborskij et al., 2012)

Marginal Discrete Wavelet Transform time-frequency (Kuzborskij et al., 2012) (Atzori et al., 2014)

Therefore, they apply adaptive batch normalization

(AdaBN) (Li et al., 2016), where only the normali-

zation statistics are updated for each subject using a

few unlabeled data. The results show improved per-

formance compared to a model without adaptation.

The authors of (C

e-Allard et al., 2018) use transfer

learning techniques to exploit inter-subject data lear-

ned by a pre-trained source network. In their archi-

tecture, for each subject a new network is instantia-

ted with weighted connections to the source network.

Through this technique, which achieves an accuracy

of 98.31% on 7 movements, predictions for a new

subject are based both on previously learned informa-

tion and subject-speciﬁc data.

3 PROPOSED MODEL

The problem of sEMG-based hand gesture recogni-

tion can be formulated as an image classiﬁcation pro-

blem using CNNs, where the input sEMG image has

a size of H × W × 1 (height×width×depth). Vari-

ous approaches have been employed to construct an

sEMG image. For example, in the works of (Geng

et al., 2016), (Wei et al., 2017), and (Du et al., 2017),

the instantaneous sEMG signals from a high density

electrode array have been used, where the width and

the height of the array match the dimensions of the

image. In addition, sEMG images can be constructed

with segments of sEMG signals using (overlapping)

time-windows, in which case the width matches the

number of electrodes and the height is equal to the

window length (Atzori et al., 2016). Another ap-

proach is based on spectrograms using the STFT of

sEMG segments, where for each channel of the EMG

a spectrogram is created resulting in an image of size

frequency×time-bins×channels (Zhai et al., 2017),

e-Allard et al., 2018).

In this paper, we adhere to the approach of (Atzori

et al., 2016) and generate sEMG images with sliding

windows. These images are created using a window

length of 150ms and an overlap of 60%, i.e. 90ms, in

order to make fair comparisons with previous works

in the literature that use similar time-windows. The-

refore, the input EMG image has a size of 15×10

(height × width), where the height dimension corre-

sponds to the window length (i.e. 150ms sampled at

100Hz) and the width equals the number of electro-

des.

The proposed CNN (depicted in Fig. 1) is based

on the architecture proposed in (Atzori et al., 2016)

with modiﬁcations to increase the models classiﬁca-

tion accuracy. The main adjustments in the architec-

ture are the introduction of dropout (Srivastava et al.,

2014) layers and the use of max pooling instead of

average pooling, while the number of trainable para-

meters remains the same.

The CNN architecture has 4 hidden convolutional

layers and 1 output layer. The ﬁrst two hidden lay-

ers consist of 32 ﬁlters of size 1×10 and 3×3. The

third consists of 64 ﬁlters of size 5×5. The fourth

layer contains 64 ﬁlters of 5×5 size, whereas the last

one is a G-way convolutional layer with 1×1 ﬁlters,

where G is the number of gestures to be classiﬁed.

Zero padding is applied before the convolutions of the

hidden layers, which are followed by rectiﬁed linear

unit (ReLU) non-linearities and dropout layer with a

probability of 0.15 for zeroing the output of a hidden

unit. In addition, a subsampling layer performs max

Deep Learning in EMG-based Gesture Recognition

109

Figure 1: The proposed model architecture is based on the work of (Atzori et al., 2016) with modiﬁcations that were found to

improve the classiﬁcation accuracy.

pooling over a 3×3 window after the dropout of the

second and third layers. Finally, the last convolutional

layer is followed by a softmax activation function.

The weights were initialized with the Xavier initi-

alizer (Glorot and Bengio, 2010) and a weight decay

regularization) of 0.0002 was applied during trai-

ning. Network parameters were identiﬁed via cross-

validated random search and manual hyper-parameter

tuning on a validation set composed of three subjects

randomly selected from the ﬁrst dataset (DB-1) of the

Ninapro database (Atzori et al., 2014). This data-

set contains 10 repetitions for each gesture, therefore

approximately 2/3 of the repetitions was used as the

train set and the remaining repetitions consisted the

test set. In each fold of the cross-validation, EMG

data from one repetition of the training set were used

as test data and the rest repetitions for training. The

hyper-parameter search space included weight decay,

dropout rate, pooling method, kernel initializer, whe-

reas stride and padding values were computed such

that the size of the output tensor is correct. The se-

arch space along with the selected values are listed in

Table 2. In addition, the proper optimizer parameters

were found in the same fashion for each evaluation

method.

The EMG signals were preprocessed as follows.

Firstly, a 1st order 1 Hz low-pass Butterworth ﬁlter

was applied as in previous studies on Ninapro da-

tabase ((Atzori et al., 2016), (Geng et al., 2016)).

Then, EMG data were segmented into overlapping

windows of 150ms length and 90ms overlap, which

can be considered as a form of data augmentation si-

milar to image shifting. Additionally, data were aug-

mented during training by adding Gaussian noise to

each image with a signal to noise ratio (SNR) equal

to 25dB.

Due to the recording process followed in the Ni-

napro database, each gesture repetition is followed by

a rest phase, meaning that the majority of the images

correspond to the ‘rest’ gesture. In addition, there are

variations in the duration of the gesture repetitions,

which affects the number of generated images. The-

refore, accounting for the fact that gestures are not

equally represented in the dataset, two steps are taken

Table 2: Hyperparameter tuning.

Parameter Search space Selected value

Weight decay [0.0001, 0.001] 0.0002

Dropout [0, 0.333] 0.15

Pool method ‘max’,

‘average’

‘max’

Kernel initiali-

zer

‘glorot’, ‘he’,

‘normal’, ‘uni-

form’

‘glorot’

Optimizer ‘SGD’, ‘Adam’ ‘SGD’

Learning rate [0.001, 0.1] 0.05

Learning sche-

dule

‘constant’, ‘step

decay’, ‘expo-

nential decay’

‘step decay’

Epochs [30,150] 100

Batch size 32, 64, 128,

256, 512, 1024

512

to deal with the imbalance problem. First, the EMG

data of the ‘rest’ gesture are subsampled, such that the

same number of repetitions is shared between all ge-

stures. Secondly, during training the loss function is

weighted such that the network pays more attention to

under-represented gestures.

4 EXPERIMENTS

The proposed CNN architecture is evaluated on data

from the Ninapro database that includes EMG data re-

lated to 53 hand movements of 78 subjects (11 trans-

radial amputees, 67 intact subjects) divided into three

datasets. The Ninapro DB-1 includes data acquisiti-

ons of 27 intact subjects (7 females, 20 males; 2 left

handed, 25 right handed; age 28±3.4 years). The se-

cond dataset includes data acquisitions of 40 intact

subjects (12 females, 28 males; 6 left handed, 34

right handed; age 29.9±3.9 years). The third data-

set includes data acquisitions of 11 transradial ampu-

tees (11 males; 1 left handed, 10 right handed; age

42.36±11.96 years). More details about the database

and the acquisition procedure can be found in (Atzori

et al., 2016), and (Atzori et al., 2014). Table 3 and

Table 4 summarize the information about the Ninapro

database.

PhyCS 2018 - 5th International Conference on Physiological Computing Systems

110

All the evaluations of the model were carried out

on the Ninapro DB-1 using all the data available. This

dataset is comprised of sEMG signals captured from

27 subjects using 10 electrodes, of which 8 are pla-

ced around the forearm and the other two are placed

on the main activity spots of the large ﬂexor and ex-

tensor muscles of the forearm (Atzori et al., 2014).

To allow for a comparison with current literature, the

data were split into train and test datasets following

the approach described in (Atzori et al., 2016), i.e. re-

petitions 2,5, and 7 were used for testing and the rest

for training. Hyperparameter tuning was performed

using cross-validation on the training set. The model

was evaluated by means of two experiments. The ﬁrst

one used the evaluation procedure described in (At-

zori et al., 2016), while the second used the setting

of (Geng et al., 2016). The assessment of the results,

reported in Table 5, consists of the average accuracies

on the train and test sets, the average of the top-3 test

accuracies (i.e. the accuracy when any of the model

3 highest output probabilities match the expected ge-

sture) and the test accuracy after majority voting on

each gesture repetition (i.e. the repetition segment of

a speciﬁc gesture is assigned the majority gesture la-

bel of the EMG images that correspond to that repeti-

tion). Additionally, the model performance is further

evaluated by analyzing misclassiﬁcations per class,

provided by the confusion matrix, and the accuracy

over the gesture duration normalized time as in (At-

zori et al., 2015).

In accordance with (Atzori et al., 2016), a model

was trained using 7 repetitions and tested with the re-

maining 3 for each of the 27 subjects in the dataset.

Each model is initialized with randomized weights

and trained using stochastic gradient descent (SGD)

for 100 epochs with 0.05 initial learning rate and a ba-

tch size of 512. The learning rate was reduced every

15th epoch by a factor of 50%.

The second experiment follows the setting of

(Geng et al., 2016), which differs from the procedure

of (Atzori et al., 2016) in that a pre-trained network is

created using all the training data of all subjects and

then a ﬁne-tuned model is generated for each subject.

The ﬁrst model is initialized with randomized weights

and trained using SGD for 100 epochs with 0.05 lear-

ning rate, and a batch size of 512. The learning rate

was reduced every 15 epochs by a factor of 50%. The

subject-speciﬁc models were initialized with the pre-

trained network and the last two convolutional layers

were ﬁne-tuned using SGD optimizer for 30 epochs

with a learning rate of 0.01 halved every 10th epoch,

and a batch size of 128.

5 RESULTS AND DISCUSSION

For the problem of hand gesture recognition based

on EMG, a DL approach is presented in this pa-

per, which utilizes convolutional layers and learning

methods that have been successfully applied to ot-

her domains. Compared to similar works evaluated

on the same dataset, the proposed model outperforms

the original network of (Atzori et al., 2016), while it

is inferior to the more complex approaches of (Geng

et al., 2016) and (Wei et al., 2017). Table 6 shows

the comparison between these works under the same

evaluation that was used in each paper. The model

of (Geng et al., 2016) uses as input the instantane-

ous EMG images, i.e. 1×10 for the Ninapro DB-1, so

the majority vote over 200ms is shown in parentheses,

whereas the input image in the network of (Wei et al.,

2017) is 20×10 pixels.

Apart from differences in the input, there are more

model architecture dissimilarities. Both (Geng et al.,

2016) and (Wei et al., 2017) incorporate batch nor-

malization (Sergey and Szegedy, 2015) that allows

for faster convergence, and fully connected layers that

offer increased network capacity due to more traina-

ble weights. In addition, the approach of (Wei et al.,

2017) adopts a multi-stream pipeline where a number

of EMG electrodes are processed separately and are

then merged with fully connected layers. This split-

and-merge approach enables learning the correlation

between individual muscles and speciﬁc gestures lea-

ding to state-of-the-art accuracy of 85% on the Nina-

pro DB-1. However, we do not follow similar appro-

aches in this paper in order to better understand how

DL methods can be applied to sEMG data through a

simpler network.

The proposed network is further evaluated through

the loss graphs and an error analysis. Fig. 2 shows

the loss graphs during training on the train and test

sets, with coloring that corresponds to different sub-

jects. It can be seen that decaying the learning rate

helps the network parameters converge to a better op-

timum. When comparing the loss between the train

and test sets, it is obvious that there is some degree

of overﬁtting. However, applying more regularization

(e.g. dropout, weight decay) does not decrease the

test loss. Therefore, a different pipeline (e.g. prepro-

cessing steps, data augmentation, different ﬁlter sizes)

may reduce the generalization error of the network.

An error analysis was performed to better under-

stand the performance of our model. The confusion

matrix is calculated for each subject evaluation and

in Fig. 3 the average is shown. Most misclassiﬁca-

tions occur around the main diagonal and according

to the class labels (Table 4) similar movements are

Deep Learning in EMG-based Gesture Recognition

111

Table 3: The Ninapro dataset.

Dataset Subjects Movements Electrodes Sampling (Hz)

Dataset 1 (DB-1) 27 53 10 100

Dataset 2 (DB-2) 40 53 12 2000

Dataset 3 (DB-3) 11 53 12 2000

Table 4: Gestures label/number as in (Atzori et al., 2014).

Label Gesture

0 Rest

1-12 Individual ﬁnger extension/ﬂexion

13-20 Isometric/isotonic conﬁgurations

20-29 Wrist movements

30-52 Grasps and functional movements

Figure 2: Loss value after each training epoch calculated

on train set (up) and test set (down). Colors correspond to

different subjects.

falsely categorized. That is expected considering the

location of the EMG electrodes and the muscles that

participate in each movement. For example, gesture

labels ‘9’, ‘11’ represent the adduction and ﬂexion of

the thumb that are coordinated by the same forearm

muscles. In addition, there is a concentration of errors

in the low-right corner that corresponds to grasps and

functional hand gestures that involve more muscles.

Taking into account that each EMG image is a 150ms

segment and the gesture repetition lasts 5s, we may

Figure 3: Confusion matrices based on the per image pre-

dictions (up) and majority voting predictions (down).

conclude that for a given misclassiﬁcation a propor-

tion of the images will be similar between the two ge-

stures. A possible explanation is that some groups of

movements can be broken down into the same smal-

ler movements. It is only when the full sequence of

images is available that the network can decide which

gesture is performed. Comparing the confusion ma-

trices before and after the majority voting we see that

most errors around the diagonal are reduced.

Another reason for the low accuracy is the fact

that the errors are not evenly distributed on the du-

ration of the entire gesture repetition. Fig. 4, which

relates classiﬁcation errors with the time-normalized

movement duration, demonstrates that misclassiﬁcati-

ons are primarily concentrated in the beginning and at

the completion of the movement. The reason for that

PhyCS 2018 - 5th International Conference on Physiological Computing Systems

112

Table 5: Experimental results.

Setting Train accuracy Test accuracy Top-3 accuracy Vote accuracy

(Atzori et al., 2016) 83.03% 70.48% 87.06% 92.31%

(Geng et al., 2016) 81.21% 72.06% 88.06% 93.06%

Table 6: Comparison with other works.

Setting This work (Atzori et al., 2016) (Geng et al., 2016) (Wei et al., 2017)

(Atzori et al., 2016) 70.48% 66.59% - -

(Geng et al., 2016) 72.06% - 76.10% (77.80%) 85%

Figure 4: Plot of prediction accuracy against normalized

time duration. It can be seen that at the start and completion

of the gesture repetition the accuracy is lower.

is that during the recording session there is a gradual

transition between rest, gesture and rest, in contrast

to the discrete changes of the gesture labels. Con-

sequently, accuracy is lower during these transition

periods where the change in movement is not yet cle-

arly evident from the input EMG signal (Atzori et al.,

2015).

Overall, it is shown that a simple CNN architec-

ture can be successful at the task of sEMG hand ge-

sture recognition taking into account the chance le-

vel when classifying 53 gestures. Small modiﬁcati-

ons to the model parameters and the training process

can boost the performance, whereas deeper and more

complex networks yield the best performance. The

inability of the proposed model to generalize well to

unseen data needs to be addressed to facilitate furt-

her improvement. Finally, the use of small EMG

segments accounts for much of the classiﬁcation er-

ror assuming that a great amount of overlap happens

between the EMG signals of gesture groups especi-

ally during their transitive periods. Therefore, majo-

rity voting over these small EMG segments provides

a better evaluation metric.

6 CONCLUSIONS

This paper presented an overview of recent advances

in the use of DL methods for EMG hand gesture clas-

siﬁcation, while improvements to existing architectu-

res were discussed. The proposed model follows the

work of (Atzori et al., 2016) and is compared to the

state of the art. It improves on the basic model by 3%,

yet the works of (Geng et al., 2016) and (Wei et al.,

2017) outperform it under the same evaluation set-

tings. As future work, we plan to investigate the utili-

zation of time-frequency representations (e.g. Wave-

let and Fourier transforms) as a preprocessing step, as

well as more complex architectures based on RNNs

to beneﬁt from the temporal information in the data.

The implementation code is available at the

following link https://github.com/DSIP-UPatras/

PhyCS2018 paper.

ACKNOWLEDGEMENTS

This work was supported by the VUB-UPatras Inter-

national Joint Research Group (IJRG) on ICT.

REFERENCES

Atzori, M., Cognolato, M., and M

uller, H. (2016). Deep Le-

arning with Convolutional Neural Networks applied to

electromyography data: A resource for the classiﬁca-

tion of movements for prosthetic hands. Frontiers in

Neurorobotics, 10.

Atzori, M., Gijsberts, A., Castellini, C., Caputo, B., Hager,

A., Elsig, S., Giatsidis, G., Bassetto, F., and M

uller,

H. (2014). Electromyography data for non-invasive

naturally-controlled robotic hand prostheses. Scienti-

ﬁc Data, 1(140053).

Atzori, M., Gijsberts, A., Kuzborskij, I., Elsig, S., Hager,

A., Deriaz, O., Castellini, C., M

uller, H., and Caputo,

B. (2015). Characterization of a benchmark database

Deep Learning in EMG-based Gesture Recognition

113

for myoelectric movement classiﬁcation. IEEE Tran-

sactions on Neural Systems and Rehabilitation Engi-

neering, 23(1):73–83.

Castellini, C., Fiorilla, A., and Sandini, G. (2009). Multi-

subject/daily-life activity EMG-based control of me-

chanical hands. Journal of neuroengineering and re-

habilitation, 6:41.

Cheok, M., Omar, Z., and Jaward, M. (2017). A review

of hand gesture and sign language recognition techni-

ques. International Journal of Machine Learning and

Cybernetics.

e-Allard, U. et al. (2018). Deep Learning for elec-

tromyographic hand gesture signal classiﬁcation by

leveraging transfer learning. ArXiv e-prints.

Du, Y., Jin, W., Wei, W., Hu, Y., and Geng, W. (2017).

Surface EMG-based inter-session gesture recogni-

tion enhanced by deep domain adaptation. Sensors,

17(3):458.

Englehart, K. and Hudgins, B. (2003). A robust, real-time

control scheme for multifunction myoelectric cont-

rol. IEEE Transactions on Biomedical Engineering,

50(7):848–854.

Englehart, K., Hudgins, B., Parker, P., and Stevenson, M.

(1999). Classiﬁcation of the myoelectric signal using

time-frequency based representations. Medical Engi-

neering and Physics, 21(6):431–438.

Geng, W., Du, Y., Jin, W., Wei, W., Hu, Y., and Li, J.

(2016). Gesture recognition by instantaneous surface

EMG images. Scientiﬁc Reports, 6(36571).

Gijsberts, A., Atzori, M., Castellini, C., M

uller, H., and

Caputo, B. (2014). Movement error rate for evalua-

tion of Machine Learning methods for sEMG-based

hand movement classiﬁcation. IEEE Transactions

on Neural Systems and Rehabilitation Engineering,

22(4):735–744.

Glorot, X. and Bengio, Y. (2010). Understanding the difﬁ-

culty of training deep feedforward Neural Networks.

In Proceedings of the 13th International Conference

on Artiﬁcial Intelligence and Statistics (AISTATS),

Sardinia, Italy.

Goodfellow, I., Bengio, Y., and Courville, A. (2016). Deep

Learning. MIT Press, Cambridge, MA.

Hudgins, B., Parker, P., and Scott, R. (1993). A new strategy

for multifunction myoelectric control. IEEE Tran-

sactions on Biomedical Engineering, 40(1):82–94.

Kuzborskij, I., Gijsberts, A., and Caputo, B. (2012). On the

challenge of classifying 52 hand movements from sur-

face electromyography. In 2012 Annual International

Conference of the IEEE Engineering in Medicine and

Biology Society, pages 4931–4937. IEEE.

Li, Y., Wang, N., Shi, J., Liu, J., and Hou, X. (2016). Revi-

siting Batch Normalization for practical domain adap-

tation. ArXiv e-prints.

Park, K. and Lee, S. (2016). Movement intention decoding

based on Deep Learning for multiuser myoelectric in-

terfaces. In 2016 4th International Winter Conference

on Brain-Computer Interface (BCI), pages 1–2. IEEE.

Scheme, E. and Englehart, K. (2011). Electromyogram

pattern recognition for control of powered upper-limb

prostheses: State of the art and challenges for clini-

cal use. The Journal of Rehabilitation Research and

Development, 48(6):643–659.

Sergey, I. and Szegedy, C. (2015). Batch Normalization:

Accelerating Deep Network training by reducing in-

ternal covariate shift. ArXiv e-prints.

Shim, H., An, H., Lee, S., Lee, E., Min, H., and Lee, S.

(2016). EMG pattern classiﬁcation by split and merge

deep belief network. Symmetry, 8(12):148.

Shim, H. and Lee, S. (2015). Multi-channel electromyo-

graphy pattern classiﬁcation using deep belief net-

works for enhanced user experience. Journal of Cen-

tral South University, 22(5):1801–1808.

Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I.,

and Salakhutdinov, R. (2014). Dropout: A simple way

to prevent Neural Networks from overﬁtting. Journal

of Machine Learning Research, 15:1929–1958.

Wei, W., Wong, Y., Du, Y., Hu, Y., Kankanhalli, M., and

Geng, W. (2017). A multi-stream Convolutional Neu-

ral Network for sEMG-based gesture recognition in

muscle-computer interface. Pattern Recognition Let-

ters.

Zhai, X., Jelfs, B., Chan, R., and Tin, C. (2017). Self-

recalibrating surface EMG pattern recognition for

neuroprosthesis control based on Convolutional Neu-

ral Network. Frontiers in Neuroscience, 11:379–390.

PhyCS 2018 - 5th International Conference on Physiological Computing Systems

114