Distance-based Algorithm for Biometric Applications in Meanwaves

of Subject’s Heartbeats

Tiago Araujo

1,2

, Neuza Nunes

, Hugo Gamboa

and Ana Fred

3,4

CEFITEC, New University of Lisbon, Caparica, Portugal

Plux Wireless Biosignals, Lisbon, Portugal

Instituto de Telecomunicac¸

oes, Scientiﬁc Area of Networks and Multimedia, Lisbon, Portugal

Department of Electrical and Computer Engineering, Instituto Superior T

ecnico, Lisbon, Portugal

Keywords:

Biometry, Classiﬁcation, Electrocardiography, Meanwave, Signal Processing.

Abstract:

The authors present a new biometric classiﬁcation procedure based on meanwave’s distances of electrocar-

diogram (ECG) heartbeats. The ECG data was collected from 63 subjects during two data-recording sessions

separated by six months (Time Instance 1, T1, and Time Instance 2, T2). Two classiﬁcation tests were per-

formed with the goal of subject identiﬁcation using a distance-based method with the heartbeat waves. In

both tests, the enrollment template was composed by the averaging of the T1 waves for each subject. For

the ﬁrst test, we composed ﬁve meanwaves of different T1 waves; In the second test, ﬁve meanwaves of dif-

ferent groups of T2 waves were composed. Classiﬁcation was performed through the implementation of a

kNN classiﬁer, using the meanwave’s Euclidean distances as features for subject identiﬁcation. In the ﬁrst

test, with only T1 waves, 95.2% of accuracy was achieved. In the second test, using T2 waves to compose

the dataset for testing, the accuracy was 90.5%. The T2 waves belonged to the same subjects but were ac-

quired in different time instances, simulating a real biometric identiﬁcation problem. We therefore conclude

that a distance-based method using meanwaves of ECG heartbeats for each subject is a valid parameter for

classiﬁcation in biometric applications.

1 INTRODUCTION

Large amounts of conﬁdential data are stored and

transferred through the web every day. In the ac-

cess control the need for more speed and efﬁciency

in intruders detection is crucial. The new era requires

new concerns about security and authentication. Bio-

metric recognition addresses this problem in a very

promising point of view. The human, voice, ﬁnger-

print, face, and iris are examples of individual charac-

teristics currently used in biometric recognition sys-

tems (Jain et al., 2000). Recently, several works stud-

ied the electrocardiography (ECG) signal as an intrin-

sic subject parameter, exploring its potential as a hu-

man identiﬁcation tool (Silva et al., 2007)(Coutinho

et al., 2010)(Li and Narayanan, 2010).

Biometry based in ECG is essentially done by

the detection of ﬁducial points and subsequent fea-

ture extraction (Lourenco et al., 2011). Neverthe-

less there are some works that use a classiﬁcation ap-

proach without ﬁducial points detection (Plataniotis

et al., 2006), referring computational advantages, bet-

ter identiﬁcation performance and peak synchroniza-

tion independence.

Since 2007, Institute of Telecommunications (IT)

research group has explored this theme addressing

it, essentially, in two ways: i) analysis of the ECG

time persistent information, with possible applicabil-

ity in biometrics over time; and ii) Development of

acquisition methods which enabled the ECG signal

acquisition with less obtrusive setups, particularly us-

ing hands as signal acquisition point. Following this

goals, a recent work proposed a ﬁnger-based ECG

biometric system, that uses signals collected at the ﬁn-

gers, through a minimally intrusive 1-lead ECG setup

recurring to Ag/AgCl electrodes without gel. In the

same work, an algorithm was developed for compari-

son between the R peak amplitude from the heartbeats

of test patterns and the R peak from the enrollment

template database. The results revealed that this could

be a promising technique.

In this work we used the IT ECG database and

follow the same methodology as described before, but

using a new biometrics classiﬁcation algorithm based

630

Araújo T., Nunes N., Gamboa H. and Fred A..

Distance-based Algorithm for Biometric Applications in Meanwaves of Subject’s Heartbeats.

DOI: 10.5220/0004358106300634

In Proceedings of the 2nd International Conference on Pattern Recognition Applications and Methods (BTSA-2013), pages 630-634

ISBN: 978-989-8565-41-9

 2013 SCITEPRESS (Science and Technology Publications, Lda.)

on the heartbeat meanwave’s Euclidean distances.

In the following section we will depict the proce-

dure for the ECG data acquisition and pre-processing.

We will also explain the methodology followed in this

study to efﬁciently classify the heartbeat waves as the

respective subject. Section 3 contains the results ob-

tained in the study. Those results are discussed and

conclusions are taken in section 4 of this paper.

2 PROCEDURE

2.1 Data Collection

ECG data were collected from 63 subjects,

166.55±8.26cm, 61.82±11.7Kg and 21±4.46

years old, during two data-recording sessions with six

months between them. The acquisitions were divided

in two groups, T1 and T2, referring respectively to

the ﬁrst recording instance and the second recording

six months after. The subjects were asked to be

seated and relaxed in both recordings.

2.2 Signal Acquisition and Conditioning

The signals were acquired by two dried electrodes

assembled in a differential conﬁguration(Lourenco

et al., 2011). The sensor uses a virtual ground, an

input impedance over 1MΩ, 110dB of CMRR and

gain of 10 in the ﬁrst stage. The conditioning circuit

consists of two ﬁltering levels: i) bandpass between

0.05Hz and 1000Hz and ii) notch ﬁlter centered in

50Hz to remove network interference. The ﬁnal am-

pliﬁcation stage has a gain of 100 to improve the res-

olution of the acquired signal. This system also mag-

niﬁes the signal after ﬁltering undesired frequencies

in each conditioning stage. The signal is then digi-

talized for further digital processing. This processing

consists in: a) bandpass digital ﬁlter (FIR) of 301 or-

der and bandwidth from 5Hz to 20Hz, obtained using

a hamming window, b) detection of QRS complexes,

c) segmentation of ECG and determination RR inter-

vals, d) outliers removal, e) meanwaves computation

and feature extraction, and ﬁnally f) the data classi-

ﬁcation. The signal acquisition and the processing

steps a), b) and c) were done by the methodology de-

veloped in IT (Lourenco et al., 2011).

In the following section the methodology de-

signed for the implementation of the remaining steps

( d), e) and f) ) will be described.

2.3 Methodology

Our goal was to successfully use the patterns of ECG

heartbeats to identify the correspondent subjects in

different time periods, with a classiﬁcation method.

Classiﬁcation is a machine learning technique used to

predict group membership for data instances.

Figure 1 depicts the usual process that is followed

to classify a set of data.

Figure 1: The process of data classiﬁcation.

This process comprises a ﬁrst stage of feature extrac-

tion, making data transformations to generate useful

and novel features from a set of candidates. For the

data classiﬁcation there’s a supervised learning pro-

cess, as we give the classiﬁer a ﬁrst set of data, called

training set, and the classiﬁer learns about the fea-

tures and correspondent classes. The new sets of data

given, called test set, will match the features with the

input training set and associate each sample to the cor-

respondent classes.

Figure 2 provides a schematics example of the

methodology followed in our work.

Figure 2: Template and Tests of the classiﬁcation process.

The data used in this study were divided in two

groups: the T1 and T2 acquisitions. In the ﬁrst test we

work with only T1 waves, and in the second test we

compare the T2 waves with the T1 template - there-

fore we can check the differences in classiﬁcation ac-

curacy when working with acquisitions separated in

time from the same subject, simulating a real biomet-

ric identiﬁcation problem.

Distance-basedAlgorithmforBiometricApplicationsinMeanwavesofSubject'sHeartbeats

631

The dataset deﬁned as template is composed with

the T1 subjects’ meanwaves. The features of the clas-

siﬁcation process are the distance value between the

template meanwaves and the meanwaves of future ac-

quisitions (tests).

To compose the template, the ﬁrst step was to

compute a meanwave (Nunes et al., 2012) by the aver-

aging of all T1 waves (which were already segmented

into RR-aligned heartbeats). An outliers removal pro-

cedure followed, by computing the mean square er-

ror distance of each wave to the resulting meanwave.

Equation 1 displays the expression for the computa-

tion of this distance for only one heartbeat (being l the

length, in samples, of the normalized cycle and mean-

wave). After gathering the distance of each wave to

the meanwave, 10% of the waves which presented the

higher values of distance were removed from the tem-

plate. A new meanwave for each subject was then

computed without the outliers. Each subject’s mean-

wave was composed with over 100 heartbeat waves.

distance =

∑

i=1

(cycle

− meanwave

)

(1)

The 63 meanwaves gathered, one for each subject,

completed the template for the classiﬁer.

For the ﬁrst Test dataset, we also used the T1

waves, but divided them randomly into 5 groups,

computing one meanwave for each group. Each

meanwave was composed with 10 heartbeat waves.

Those ﬁve test meanwaves were compared, using a

distance metric, with the T1 template, for each sub-

ject. The distance metric used was the same presented

before in equation 1, where we used the meanwave

computed from each group instead of each subject’s

cycle.

For the second Test we followed the same proce-

dure as before but with a calculation of the distance

between the T1 template meanwave and the 5 mean-

waves from T2 for each subject.

With the distance values computed for both

tests we composed two distances’ matrices with 63

columns or features, representing the distance of each

sample (the Test meanwave) to each subject’s mean-

wave of the template T1, and 315 (5x63) rows or sam-

ples, representing the 5 meanwaves we gathered for

each subject and each Test.

We used a user friendly toolbox (Orange, 2012),

to classify the data, giving the distance matrices as

input and using a k-Nearest Neighbor (kNN) classiﬁer

with a ’leave one out’ criterion. Figure 3 shows the

Orange schematics that we used to classify our data

and gather the results.

The File icon represents the data to be classiﬁed.

In our case, it represents the distance matrices given

Figure 3: Schematics used in Orange for classiﬁcation.

as input. The k Nearest Neighbor classiﬁes samples

based on the closest class amongst its k nearest neigh-

bors (we used k=1). The test learner represents the

stage where the data is given is processed by the clas-

siﬁcation algorithm and the classiﬁer learns about the

samples and correspondent classes. The confusion

matrix confronts the predictions with the expected re-

sults to return the detailed results of the speciﬁed clas-

siﬁer.

3 RESULTS

Figure 4 presents the distances matrices for test 1 and

test 2 in an image form.

Figure 4: Distance matrices for test 1 and test 2 given as

input to the classiﬁer.

The darker colors represent minimum distance

values, which we associate to the heartbeat intra-

subject distances. For both tests, as we had 5 samples

for each subject to compare to the meanwave tem-

plate, it was ideal to see a diagonal composed with

5 dark cells and all the other cells with lighter colors

(ideally totally white). As we can see in Figure 4, the

test 1 is closer to the ideal result, as this test comprises

waves from the same acquisition both in template and

test sets. In the second test the subjects are not so

ICPRAM2013-InternationalConferenceonPatternRecognitionApplicationsandMethods

632

easily visually identiﬁed by the distance metric, and

therefore it is expected to see a decrease in accuracy

for the second test.

After the learning process in Orange, a confusion

matrix returned the depicted results of the classiﬁer.

An example of that matrix is shown in Figure 5.

This matrix gathers the results of the classiﬁcation

for each class (each subject). The ideal case was to

have a diagonal always with 5 samples - it represents

that all samples were efﬁciently classiﬁed, as we only

had 5 samples per subject. A cell presenting an infe-

rior value represents that at least one misclassiﬁcation

was made, associating a sample to other class (at least

one heartbeat meanwave was classiﬁed as belonging

to a different subject).

The ﬁnal classiﬁcation results for test 1 and 2, con-

cerning all subjects are included in Table 1.

Table 1: Classiﬁcation accuracy results for test 1 and test 2.

Test Template 1 Test Template 2

95.2% 90.5%

Figure 5: Part of the confusion matrix returned from the

classiﬁer.

4 CONCLUSIONS

In this work we implemented a new biometric classi-

ﬁcation procedure based on electrocardiogram (ECG)

heartbeats meanwave’s distances. Our goal was to

successfully use the patterns of ECG heartbeats to

make subjects identiﬁcation. In order to validate the

developed solutions, the methods were tested in a real

ECG database. The database was composed by two

ﬁnger-based ECG acquisitions from 63 subjects. The

acquisitions from each subject were separated by six

month between them. This fact enabled the evalua-

tion of the algorithm accuracy in a test case scenario,

where the test and enrollment template belonged to

the ﬁrst acquisitions, and a real case scenario where

we used the ﬁrst acquisitions as the enrollment tem-

plate and the second one as test. Using our approach it

was possible to obtain accuracy rates of 95.2% for the

test scenario (test 1) and 90.5% for the real case sce-

nario (test 2). Compared with a previous state-of-the-

art approach, the results outperform the recent studies

on ﬁnger-ECG based identiﬁcations. Previous works

present 89% (Chan et al., 2008) and 94.4% (Lourenco

et al., 2011) accuracy.

Future work will be focused on improving the fea-

ture extraction process and add features to the clas-

siﬁer, such as the correlation between waves or the

intra-subject variability - as we noticed that some sub-

jects had an higher variability in their meanwaves, and

therefore the distance computed isn’t the best feature

per se.

ACKNOWLEDGEMENTS

The authors wold like to thank the Escola Superior

de Sa

ude-Cruz Vermelha Portuguesa (ESSCVP) for

the data collections infrastructures and subjects prov-

idence.

REFERENCES

Chan, A., Hamdy, M., Badre, A., and Badee, V. (2008).

Wavelet distance measure for person identiﬁcation us-

ing electrocardiograms. In IEEE Transactions on In-

strumentation and Measuremen.

Coutinho, D., Fred, A., and Figueiredo, M. (2010). Per-

sonal identiﬁcation and authentication based on one-

lead ecg using ziv-merhav cross parsing. In 10th In-

ternational Workshop on Pattern Recognition in Infor-

mation Systems.

Jain, A., Hong, L., and Pankanti, S. (2000). Biometric Iden-

tiﬁcation. Communications of the ACM.

Li, M. and Narayanan, S. (2010). Robust ecg biometrics

by fusing temporal and cepstral information,. In 20th

International Conference on Pattern Recognition.

Lourenco, A., Silva, H., and Fred, A. (2011). Unveiling

the biometric potential of ﬁnger-based ecg signals. In

Computational Intelligence and Neuroscience.

Nunes, N., Araujo, T., and Gamboa, H. (2012). Time Series

Clustering Algorithm for Two-Modes Cyclic Biosig-

nals. A. Fred, J. Filipe, and H. Gamboa (Eds.):

Distance-basedAlgorithmforBiometricApplicationsinMeanwavesofSubject'sHeartbeats

633

BIOSTEC 2011, CCIS 273, pp. 233–245. Springer,

Heidelberg.

Orange (2012). http://orange.biolab.si/.

Plataniotis, K., Hatzinakos, D., and Lee, J. (2006). Ecg bio-

metric recognition without ﬁducial detection. In Bio-

metric Consortium Conference, 2006 Biometrics Sym-

posium.

Silva, H., Gamboa, H., and Fred, A. (2007). Applicability

of lead v2 ecg measurements in biometrics. In Pro-

ceedings of Med-e-Tel.

ICPRAM2013-InternationalConferenceonPatternRecognitionApplicationsandMethods

634