Kinect based People Identification System using Fusion of Clustering and
Classification
Aniruddha Sinha
1
, Diptesh Das
1
, Kingshuk Chakravarty
1
, Amit Konar
2
and Sudeepto Dutta
3
1
Innovation Lab, Tata Consultancy Services Ltd., Kolkata, India
2
Dept. of Electronics and Tele-communication Engineering, Jadavpur University, Kolkata, India
3
Sikkim Manipal Institute of Technology, Sikkim, India
Keywords:
Kinect Sensor, Human Identification, Gait Detection, Clustering, Classification, Fusion, Dempster-Shafer
Theory, Human Skeleton.
Abstract:
The demand of human identification in a non-intrusive manner has risen increasingly in recent years. Several
works have already been done in this context using gait-cycle detection from human skeleton data using
Microsoft Kinect as a data capture sensor. In this paper we have proposed a novel method for automatic
human identification in real time using the fusion of both supervised and unsupervised learning on gait-based
features in an efficient way using Dempster-Shafer (DS) theory. Performance comparison of the proposed
fusion based algorithm is done with that of the standard supervised or unsupervised algorithm and it needs to
be mentioned that the proposed algorithm is able to achieve 71% recognition accuracy.
1 INTRODUCTION
Several modalities of human identification in terms
of human computer interaction (HCI) exists in the
current literature on machine intelligence. A few of
the common techniques include voice, facial expres-
sion, gesture, iris, fingerprint etc. Unfortunately, all
of these modalities demand direct human interaction
and thus human identification in a non-intrusive man-
ner is not feasible. Moreover extraction of fingerprint,
iris or audio related biometric information (at recog-
nizable form) from a large distance is also a challeng-
ing job. This paper aims at developing a novel scheme
for human identification from the movement data pat-
tern (gait) of the subject. The main advantages of gait
based person identification is that it is unobtrusive
and can be applied from a relatively large distance.
And also the use of gait signature is highly secure be-
cause it is very much hard to hide and immitate. One
approach to extract movement data is to determine
the junction co-ordinates and their velocity to classify
subject from the movement patterns. This however,
requires multiple positional sensors or cameras to de-
termine the co-ordinates and depth of junctions from
a given distance. Fortunately, Microsoft Kinect pro-
vides an interesting platform to capture the junctional
information using RGB camera and depth sensor in
indoor environment. Not only that, as Kinect directly
provides skeleton or junction information to store the
movement pattern instead of video, this approach en-
sures the privacy and security issues for any individ-
ual.
Usually, classification (supervised learning) or
clustering (unsupervised learning) algorithm is re-
quired to map or group the sensor captured data into
classes or clusters. In case of supervised learning,
the set of features extracted from the sensor data is
often used to train a classifier for subsequent appli-
cations in classifying an unknown movement pattern.
On the other hand, for unsupervised learning, the ex-
tracted features are used to recognize a new person
without any training or manual labeling but group-
ing based on data similarity. It needs to be men-
tioned that these supervised and unsupervised algo-
rithm based approaches for human identification had
already been proposed by Adrian et al. (Ball et al.,
2012) and Preis et al. (Preis et al., 2012) with differ-
ent set of features. While Adrian et al. (Ball et al.,
2012) only considered dynamic angular information
extracted from movement pattern, Preis et al. (Preis
et al., 2012) used both static information like height,
length of torso etc. and dynamic information like step
length and velocity. Hybrid features related to area
of upper and lower body, and distance between the
upper body centroid and the centriods derived from
different body joints are proposed in (Sinha et al.,
171
Sinha A., Das D., Chakravarty K., Konar A. and Dutta S..
Kinect based People Identification System using Fusion of Clustering and Classification.
DOI: 10.5220/0004690201710179
In Proceedings of the 9th International Conference on Computer Vision Theory and Applications (VISAPP-2014), pages 171-179
ISBN: 978-989-758-009-3
Copyright
c
2014 SCITEPRESS (Science and Technology Publications, Lda.)
2013). Artificial Neural Network (ANN) based con-
nectionist framework for classification using 46 di-
mensional feature vector was demonstrated by Pal et
al. (Pal and Chintalapudi, 1997). But, as the skele-
ton data provided by the Kinect is very noisy, a robust
person identification algorithm is required to improve
the recognition accuracy.
The supervised learning algorithm can be neg-
atively influenced by limited quality of training
data (Karem et al., 2012) as well as the parameters
of the learning algorithms. Moreover it often results
in misclassification due to asymmetrical distribution
of real classes in training dataset. As an approach to
overcome these shortcomings, this paper attempts to
design a novel approach of human identification by
fusing the score of both supervised and unsupervised
algorithm. In addition, the proposed approach is very
much insensitive to small variation in measurements
due to spurious noise pick-ups, thus improves recog-
nition accuracy.
The personnel identification method introduced
above includes three main steps i) Acquisition of mea-
surement related to movement data of subjects us-
ing Kinect sensor , ii) Feature extraction from the
recorded skeleton data, iii) Decision making from
extracted features. The feature extraction module
extracts the half gait cycle automatically from the
recorded movement data and compute all the fea-
tures F mentioned in (Sinha et al., 2013). The de-
cision making module referred above has three ba-
sic modules in its functional architecture i) a clas-
sifier C ii) a clustering algorithm A and iii) a fu-
sion algorithm D. The classifier C is trained with ex-
tracted features set F of training dataset and tested
with unknown input (movement) pattern. A classi-
fication measure in terms of probability M
C
is cal-
culated from the classification output of the test sub-
ject. The unsupervised clustering algorithm A is used
to group the extracted features into three groups and
a probabilistic measure M
A
is computed according
to the clustering result for the unknown test data.
Then a fusion algorithm D is used to fuse the two
performance measures M
C
and M
A
in order to im-
prove the recognition accuracy of the unknown in-
put. After a careful study, we have realized C by
Support Vector Machine (SVM) (Cortes and Vap-
nik, 1995) with radial basis function (RBF) kernel
and A by Fuzzy C-Means (Bezdek, 1981) (Bezdek
et al., 1984) clustering. It is experimentally found
that the combined SVM-FCM approach improves the
recognition accuracy in comparison to the existing
singleton classification or clustering algorithm men-
tioned in (Ball et al., 2012) (Preis et al., 2012) (Sinha
et al., 2013). Any fusion algorithm like Bayesian net-
works (Dempster, 1968) (Jeffreys, 1973), Kalman
filter (Kalman, 1960), Dempster-Shafer (Dempster,
1967) (Fine, 1977) can be used in the present scheme.
We have used Dempster-Shafer algorithm for realiz-
ing D because of the following reasons.
It is simple and generalizes probabilistic modeling
and inference;
It has the capability of making proper distinction
between reasoning and decision taking;
It properly quantifies
1. The presence of confirming or contradicting in-
formation sources (termed as conflict).
2. Low confidence and high confidence results
(termed as ignorance) depending on availabil-
ity of information sources.
This novel approach of fusing SVM & FCM out-
put using Dempster-Shafer algorithm provides an out-
standing performance on human identification from
raw Kinect data in comparison to the existing research
outcome in this arena (Ball et al., 2012) (Preis et al.,
2012) (Sinha et al., 2013).
Rest of the paper is organized as follows. In Sec-
tion II we explained the existing state-of-the-art meth-
ods in this context. Section III demonstrates the de-
tails of the proposed fusion based algorithm for per-
son identification. The performance evaluation sup-
ported by experimental results are presented in Sec-
tion IV. And finally we have concluded in Section V.
2 RELATED WORK
People identification using gait biometric has created
a great interest in scientific community due to its
non-intrusive nature of identification (Cheng et al.,
2012). Gait recognition can be broadly classified
into two categories 1) Model-based approach and
2) Model-Free approach (Carlsson, 2000) (Huang
et al., 1999) . In the model-based approach gait
signature are generated by modeling and tracking
different body parts (like limbs, arms, thighs etc.)
over time (Wang et al., 2010), (BenAbdelkader
et al., 2002) where as in model-free approach, fea-
tures are generated based on the change in shape
of human silhouettes over time depending on the
motion dynamics (Sarkar et al., 2005). Either of
model-based and model-free approaches resolve the
issues of each other but suffer from its own limita-
tions. In model-based approach, the feature genera-
tion is view-invariant and scale-independent but sen-
sitive to the quality of gait sequences and computa-
tionally expensive. Similarly in model-free approach
VISAPP2014-InternationalConferenceonComputerVisionTheoryandApplications
172
the feature generation is insensitive to the quality of
silhouettes and offers low computational overhead,
but dependent on the viewpoint and scale. Direct
gait classification using standard classifiers such as
SVM (Support Vector Machine) (Cortes and Vap-
nik, 1995), K-NN (K-Nearest Neighbors) (Bouchrika
and Nixon, 2008), TSVM (Transductive Support Vec-
tor Machine) (Dadashi et al., 2009), Dynamic Time
Wrapping (DTW) (Kale et al., 2003) and HMM (Hid-
den Markov Model) (Cheng et al., 2008) (Meyer
et al., 1998) are adopted by researchers as mentioned
in (Ball et al., 2012) (Preis et al., 2012) (Sinha et al.,
2013). Xeu et al. (Xue et al., 2010) proposed two fold
fusion mechanism namely multiple feature fusion and
multiple view fusion for human gait identification us-
ing Independent Component Analysis (ICA) and Ge-
netic Fuzzy Support Vector Machine (GFSVM) (Xue
et al., 2010) classifiers. Moreover, Dempster-Shafer
algorithm had been used in (Le Hegarat-Mascle et al.,
1997) for unsupervised classification in multi-source
remote sensing where data fusion is done on pixel
level to successfully identify landcover types. In
this paper we have proposed a probabilistic approach
of fusing the SVM & FCM output using Dempster-
Shafer algorithm. This approach of fusing supervised
and unsupervised learning accuracy in real-time hu-
man identification problem is novel to the best of au-
thor’s knowledge and helps in improving the overall
recognition accuracy.
3 ALGORITHMIC APPROACH
The proposed algorithm for people identification is
based on the fusion of supervised and unsupervised
algorithms. In this paper, Support Vector Machine
(SVM) (Begg et al., 2005) (Cortes and Vapnik, 1995)
with Radial Basis function kernel is used as super-
vised learning algorithm and for unsupervised learn-
ing we have used Fuzzy C-Means clustering algo-
rithm (Bezdek, 1981)(Bezdek et al., 1984).The mo-
tivation is to partition the training data into three clus-
ters with an aim to partition the subjects by height
(small, medium, tall), by speed (slow, medium, fast),
by the area encompassed (less, medium, large) or by
the angles created by different segments of the body
parts (low, medium, high). The centroids of these
clusters would be stored during the training phase.
Later during the testing, the distances of the new set
of features would be computed from the centroids of
the clusters to derive the probability of the test sub-
ject to belong to a particular cluster.This probability
would be then fused using Dempster-Shafer (DS) the-
ory (Dempster, 1967) (Fine, 1977) with the probabil-
ity obtained from SVM during the testing phase. The
uncertainty in the subjects in the clustering provides
additional information to the fusion algorithm and we
hope that this would improve the overall performance
of the accuracy of detecting an individual.
3.1 Feature Extraction
The feature extraction is an important step in any
machine learning based algorithm. In our personnel
identification problem, the feature extraction is per-
formed on every half Gait Cycle. A Gait Cycle is
starting with one foot forward and ending with same
foot forward. We have used a set of area (Sinha et al.,
2013), static (Preis et al., 2012) and dynamic (Sinha
et al., 2013) distances, certain angles (Ball et al.,
2012) and the speed (Preis et al., 2012) as features
derived from the 20 skeleton points. A summary of
these features are given below.
Area Features ( f
A
).- Mean area occupied by upper
( f
au
) and lower ( f
al
) part of the body in half-gait cy-
cle is f
A
= { f
au
, f
al
}. The joints considered are given
below:
Upper body - Shoulder centre, shoulder left, hip left,
hip centre, hip right and shoulder right.
Lower body - Hip centre, hip right, knee right, ankle
right, ankle left, knee left and hip left.
Distance Features ( f
D
). - The Euclidean distances
of the adjacent joints are the static distances. The
Euclidean distances of centroids of upper and lower
limbs from the upper body centroid are the dynamic
distances as they change while one is walking. The
joints considered for computing the centroids are
given below:
Upper body - shoulder centre, shoulder left, hip left,
hip right and shoulder right.
Right hand - shoulder right, elbow right and wrist
right.
Left hand - shoulder left, elbow left and wrist left.
Right leg - hip right, knee right and ankle right.
Left leg - hip left, knee left and ankle left.
Figure 1 shows the Euclidean distance between upper
body centroid and right hand centroid.
Angle Features ( f
G
). - These are the angles that vari-
ous segments of the two legs make with the horizontal
and vertical planes (Ball et al., 2012).
Other Features ( f
S
). - Apart from the above men-
tioned features, we have also considered all the static
and dynamic features mentioned in (Preis et al.,
2012).
The combined gait feature vector used for people
identification is of dimension
~
F = { f
A
, f
D
, f
G
, f
S
} R
46
(1)
KinectbasedPeopleIdentificationSystemusingFusionofClusteringandClassification
173
Figure 1: Euclidean distance between upper body centroid
and right hand centroid.
3.2 Dempster-Shafer Theory (DST)
Dempster-Shafer Theory (Dempster, 1967) (Fine,
1977) , based on the principle of mathematical theory
of evidence (Shafer, 1976), is one of the promising al-
gorithms for combining evidence of different sources
to arrive at a degree (generally termed as belief, de-
fined by a belief function) to which all the current
evidence supports . The theory was first proposed
by Arthur P. Dempster (Dempster, 1967) and Glenn
Shafer (Fine, 1977) (Shafer, 1976). It is mainly
a generalized form of Bayesian theory of subjective
probability (Dempster, 1968). Then naturally we will
have to understand basics of Bayesian theory (Jef-
freys, 1973) of subjective probability to apply the
DST in our application.
If unconditional probability of an event A is de-
noted by P(A) and A has a domain of possible values
{x
1
,x
2
,x
3
,...,x
n
}, then the sum of all probabilities of
A = x
1
, A = x
2
....A = x
n
is always 1. In mathematical
notation it can be expressed as
n
i=1
P(A = x
i
) = 1 (2)
The conditional probability of A, given that the event
B has already been occurred, denoted by P(A|B) is
defined by
P(A|B) =
P(A B)
P(B)
(3)
Suppose we are given P(A) and the conditional prob-
ability P(A|B), then Bayes’ law (Jeffreys, 1973) of
conditional probability is defined as
P(B|A) =
P(A|B) P(B)
P(A)
(4)
If A has n subsets A
1
,A
2
,A
3
,...,A
n
, such that A
= A
i
, where i [1,n] and B has m partitions
B
1
,B
2
,B
3
,...,B
m
, such that B = B
j
, where j [1,m],
then by Baye’s law, P(B
j
|A
i
) can be expressed as
P(B
j
|A
i
) =
P(A
i
|B
j
) P(B
j
)
m
k=1
P(A
i
|B
j
) P(B
k
)
(5)
Bayesian theory mainly assigns a positive belief to
each of the propositions but it does not consider dis-
belief of the proposition. DST addresses this prob-
lem by ensuring fusion of information by considering
both their belief and disbelief. In DST, the set of all
possible outcomes of a random experiment often re-
ferred as frame of discernment (FOD). If a random
experiment has {x
1
,x
2
,...,x
n
} outcomes then FOD is
defined as the universal set.
θ = {x
1
,x
2
,...,x
n
}, where cardinality of θ is n = |θ|
and 2
n
subsets of θ are called propositions. In DS
theory, we used to assign probability masses to the
subsets of θ unlike Bayesian theory where each ele-
ment is treated as singleton subject. When a source of
evidence assigns probability mass to one of the propo-
sitions, the resulting function is termed as basic prob-
ability of assignment (BPA). In formal notation BPA
is m
where m : 2
θ
[0,1]
where 0 m(x
i
) 1 and
m(ϕ) = 0 (ϕ is empty subset o f θ)
and
xθ
m(x) = 1.0 (6)
Belief function for DST over FOD is expressed as
Bel(X) =
Y X
m(Y ) (7)
Similarly Uncertainty U(X) of DST is the measure to
which we consider nothing one way or other about
proposition. Plausibility for DST is defined as
Pl(X) = Bel(X) + U(X)
In this theory, probability of a set A 2
θ
is bounded
by Bel(A) P(A) Pl(A). Dempster’s rule of com-
bination (Jsang and Pope, 2012) is used to combine
two independent set of probability masses. If there
are two FOD θ1 and θ2 submitted by two source of
information K1 and K2 respectively and BPA for θ1
and θ2 are m
1
(.) and m
2
(.), then the combination of
probability masses (called joint mass) is computed in
the following manner
m
1,2
(X) = K
1
X=X
i
X
j
6=ϕ
m
1
(X
i
) m
2
(X
j
)
and K = 1
X
i
X
j
=ϕ
m
1
(X
i
) m
2
(X
j
)
(8)
VISAPP2014-InternationalConferenceonComputerVisionTheoryandApplications
174
where X
i
and X
j
are focal elements of θ1 and θ2. The
equation (8) is also called orthogonal summation of
belief of functions.
3.3 Application of Dempster-Shafer
Theory (DST) in Person
Identification
In our proposed approach, we have combined deci-
sion of clustering and classification algorithm using
DST to improve the recognition accuracy.
At first we have used supervised learning ap-
proach to solve the problem. In our experiment we
have realized supervised learning algorithm by Sup-
port Vector Machine (SVM). Let us consider a case,
where N-person skeleton data is available for learn-
ing and an unknown input pattern needs to be classi-
fied. For this, we have extracted P-dimensional fea-
ture vector for each of the N persons and form a train-
ing dataset D. Then we can represent each person by a
M×P matrix X
i
i [1,N]) where rows (D
dim
) of X
i
denotes observation and column represents features.
Here M is the total number of rows i.e. observations
for the i
th
person. The dataset D can be expressed
in terms of X
i
i.e D = {X
1
,X
2
,..X
i
,..X
N
} (D is of di-
mension D
dim
× P). Now supervised learning algo-
rithm SVM is used to generate a training model using
the data set D. The testing is done using this training
model or in the other words for an unknown input of
dimension K × P, SVM use the training model to la-
bel each of the K observations of the same. Once the
labeling is done for each observation, we generate a
probability score based on the distribution of label-
ing for the unknown input among all the N classes. If
n
i
(n
i
<= K) be the number of observations detected
as class i, then the probability score of unknown input
for class i is defined as
C
i
= P
Class
SV M
(i) =
n
i
K
; where i [1, N]. (9)
We have stored P
SV M
=
{P
Class
SV M
(1), P
Class
SV M
(2), ...., P
Class
SV M
(N)} for the fu-
sion. In the second phase of our algorithm, we
have used unsupervised learning algorithm where no
training is required on dataset D. In unsupervised
approach, when unknown input of dimension K × P
arrives to the system, it automatically employs Fuzzy
C-Means clustering (FCM) algorithm to group all
the observation (Dataset D plus K × P dimensional
unknown input) into C (where 2 C (D
dim
1))
clusters. Basically, FCM provides us C cluster
centers and a set of membership values. Then we
have calculated Euclidean distance of each of K
observations of the unknown input from these cluster
centers. According to the Euclidean distance, we
label them as cluster i i [1,C] e.g if the observa-
tion O
r
r [1, K] is nearest to the cluster 3 then it
is labeled as 3. Once the labeling is done for each
of K observations, we generate a probability score
based on the distribution of labeling for the unknown
input among all the C clusters. If n
i
(n
i
<= K) be
the number of observations detected as cluster i, then
the probability score of unknown input for cluster i is
defined as
P
Cluster
FCM
(i) =
n
i
K
; where i [1,C]. (10)
Then in similar fashion, we store
P
FCM
= {P
Cluster
FCM
(1), P
Cluster
FCM
(2), ..., P
Cluster
FCM
(C)} for
fusion process.
To apply Dempster-Shafer Theory (DST) for combin-
ing stored probabilistic mass P
FCM
and P
SV M
, we have
defined frame of discernment (FOD) as
FOD = θ
person
= 1,2, 3, ..N ,
where N = total number o f persons (11)
We have also done basic probability assignment P
noise
(BPA) to θ
person
based on noise present in the skeleton
data. In our case, the P
noise
is the degree to which the
system fails to properly identify a person, but it has
the knowledge that the person belongs to the FOD. It
mainly occurs due to the noisy dataset. In our exper-
iment, we have realized P
noise
by considering height
noise (outliers) of the dataset. Here, we assume that
height of all persons should lie between 4feet 5inch
to 6feet 3inch. So, any observation with a height be-
yond this range is treated as height-noise. We com-
puted P
noise
as the ratio of number of observation with
height-noise with respect to the total number of obser-
vation. Moreover we consider that P
noise
only occurs
in the clustering process because SVM used to label
each of K observations of the unknown input in any-
one of the N classes so there is no chance of uncer-
tainty in labeling. In formal notation, P
noise
is defined
as
P
noise
=
n
o
(K + D
dim
)
; (12)
where, n
o
= total number of observation with height-
noise.
In the last phase of our algorithm, we have applied
DST to fuse P
FCM
, P
SV M
and P
noise
to generate joint
mass for i
th
person using equation (13)
P
f usion
(i) = K
1
(P
FCM
P
noise
)P
SV M
=i
(P
FCM
+ P
noise
) P
SV M
i [1, N]
where, K = 1
(P
FCM
P
noise
)P
SV M
=ϕ
(P
FCM
+ P
noise
) P
SV M
(13)
Entire flow of the algorithm is shown in the Fig-
ure 2 and Figure 3.
KinectbasedPeopleIdentificationSystemusingFusionofClusteringandClassification
175
Figure 2: Flowchart of the proposed algorithm.
Figure 3: Flowchart of the proposed algorithm.
4 EXPERIMENTAL RESULTS
We have performed the experiments using 8 subject
dataset (with 7 male and 1 female subjects) and con-
sidering two scenarios i) using singleton supervised
or unsupervised algorithm based identification and
ii) our proposed fusion (supervised + unsupervised)
based approach. Throughout our experiments perfor-
mance evaluation and comparison are done using pre-
cision and recall based F
score
metric. All the experi-
ments are performed in a Matlab environment using a
Core-2 Duo Intel platform running at 2.53 GHz. We
have used Windows SDK for Kinect version 1.6 (Mi-
crosoft, 2013) for skeleton data extraction and Lib-
SVM (Chang and Lin, 2011) package for the classifi-
cation.
Initially a kinect device has been used to record
the skeleton data from a distance of 6 feet. Then fea-
ture extraction on each half-gait cycle of the 8 subjects
(A-H) is performed resulting in a 46 dimensional fea-
ture vector. We store these feature vector for 8 persons
as a dataset D. With this feature vector we perform the
identification job. As mentioned in the Section 3.3,
we first employ Support Vector Machine (SVM) clas-
sifier to generate a model using D as input. Now a
new unknown test data is considered which needs to
be identified from among the known 8 subjects. Fea-
ture vector for the test data is evaluated and SVM test
is performed to obtain a set of probabilities. For e.g.
when a new test data of dimension O × 46 (where
number of observation O = 39) for the subject H is
taken, SVM testing provides us a set of probabilities
(using equation (9)) mentioned in the Table 1.
Table 1: Confidence score using SVM classifier.
Class Label
A
B
C
D
E
F
G
H
SVM score (C
i
)
0.028
0.153
0.111
0.097
0.069
0.084
0.0
0.458
Decision
Detected
Now we have stored SVM score C
i
i [1, 8] in
an array P
SV M
for the fusion process. P
SV M
=[ 0.028,
0.153, 0.111, 0.097, 0.069, 0.084, 0, 0.458].
In the next step, fuzzy c-means (FCM) clustering is
performed with 3 clusters on dataset D along with the
test data. Then labeling is done for each of the obser-
vation of test data with respect to its Euclidean dis-
tance from each of the centers. For our example, we
have labeled each of O observations as cluster 1 or
cluster 2 or cluster 3, according to its Euclidean dis-
tance from 3 cluster centers. In the other words, if
any observation is nearer to cluster 1, it is labeled as
1. Once the labeling is done, we calculate probability
of observation belonging to a particular cluster with
respect to total number of observation. For our exam-
ple case, the probabilities obtained for three clusters
(using equation (10)), are tabulated in the Table 2. In
similar fashion, FCM score are also stored in an array
P
FCM
to perform the work of fusion
Table 2: Confidence score using FCM.
Cluster Label
Cluster 1
Cluster 2
Cluster 3
FCM Score (P
Cluster
FCM
(i))
0.3073
0.3072
0.3473
VISAPP2014-InternationalConferenceonComputerVisionTheoryandApplications
176
P
FCM
=[0.3073 0.3072 0.3473].
For the fusion process by Dempster-Shafer theory, we
have also computed uncertainty by equation (12).
Uncertainty = P
noise
= 0.0382.
Now, fusion score (joint mass) is computed by com-
bining individual scores P
SV M
, P
FCM
and P
noise
using
equation (13). Fusion score for subject H is shown in
Table 3.
Table 3: Fusion score for subject H.
Class Label
A
B
C
D
E
F
G
H
Fusion score
(P
f usion
(i))
0.017
0.072
0.072
0.098
0.022
0.02
0.024
0.675
Decision
Detected
P
Cluster
FCM
(1)
P
Cluster
FCM
(2)
P
Cluster
FCM
(3)
P
noise
C
1
C
2
C
3
C
4
C
5
C
6
C
7
C
8
Figure 4: Fusion using Dempster-Shafer Theory.
K mentioned in the equation (13) is the sum of
area of the shaded portion shown in the Figure 4 and
calculated as K = 0.3699.
It can be observed from Table 3, that the unknown
input pattern is not only correctly identified as Person
8 (subject H), with fusion score P
f usion
(8) = 0.675, but
also has better confidence score than singleton SVM
classifier(mentioned in Table 1 as C
8
= 0.458). So,
experimental results proves that our proposed fusion
algorithm using Dempster-Shafer theory is perform-
ing better and able to identify the person H with im-
proved level of accuracy.
Next we analyze the overall improvement in confi-
dence score of recognition using the confusion matri-
ces for the above mentioned two approaches namely
with fusion and without fusion i.e. only using SVM
as classifier.
The resulting recognition accuracy using SVM in
terms of F
score
is shown in the Table 4. Then we have
applied our proposed algorithm under the same cir-
Table 4: Confidence score using SVM for different test sub-
jects (green means detected, TS= Test Subject and GT =
Ground Truth).
T S
GT
A B C D E F G H
A
0.590
0.026 0.0 0.076 0.0 0.1030.205 0.0
B 0.2040.510 0.082 0.102 0.0 0.020 0.0 0.082
C 0.1160.256
0.419
0.0700.070 0.0 0.023 0.046
D 0.071
0.0950.095
0.6670.0240.048 0.0 0.0
E 0.0 0.121 0.030 0.030
0.769
0.0 0.0 0.061
F 0.054 0.0 0.0 0.162 0.0 0.7300.054 0.0
G 0.037 0.0 0.0 0.0 0.0 0.0
0.963
0.0
H 0.028 0.153 0.111
0.0970.069
0.084 0.0 0.458
Table 5: Fusion score for different test subjects (green
means detected, TS= Test Subject and GT = Ground Truth).
T S
GT
A B C D E F G H
A 0.617 0.023 0.0 0.075 0.0 0.100 0.185 0.0
B 0.1440.573
0.092
0.083 0.0 0.016
0.0
0.092
C
0.1190.293
0.4270.0410.041 0.0 0.0260.053
D 0.038 0.071 0.050
0.759
0.0280.054 0.0 0.0
E 0.0 0.111 0.028 0.031
0.769
0.0 0.0 0.061
F
0.009
0.0 0.0 0.116 0.0 0.836
0.039
0.0
G
0.009
0.0 0.0 0.0 0.0 0.0
0.991
0.0
H 0.017 0.072 0.072
0.098
0.022 0.02 0.0240.675
cumstances and the results are presented in the Ta-
ble 5.
Diagonal of these two tables (Table 4 and Table 5)
clearly helps us to understand that the fusion of SVM
and FCM score improves the recognition accuracy for
all the subjects. The recognition accuracy with fusion
and without fusion for all the subjects are summa-
rized in terms of average F
score
in Table 6. Therefore
one can clearly infer that the proposed fusion based
method outperforms any existing singleton approach
for person identification. One thing that needs to be
mentioned here is that we have not reported the per-
formance metric of person identification using only
unsupervised algorithm because the reported recogni-
tion accuracy by Ball et.al in (Ball et al., 2012) for
only four persons using K-Means algorithm is only
43%.
Table 6: Performance comparison between without fusion
and with fusion.
Without Fusion With Fusion
0.64 0.71
KinectbasedPeopleIdentificationSystemusingFusionofClusteringandClassification
177
5 CONCLUSIONS
In this paper we have proposed a novel approach
based on fusion of supervised and unsupervised learn-
ing algorithm using Dempster-Shafer theory in defin-
ing the final decision metric of human identification.
Results indicate that the combination of our proposed
fusion algorithm outperforms existing framework of
person identification in real time. As a future work,
we would be experimenting on gait independent fea-
tures, which would further improve the robustness of
the system by getting rid of gait boundary detection,
as well as remove the constraint on the side walk.
REFERENCES
Ball, A., Rye, D., Ramos, F., and Velonaki, M. (2012). Un-
supervised clustering of people from skeleton’ data.
In Proceedings of the seventh annual ACM/IEEE in-
ternational conference on Human-Robot Interaction
(HRI), pages 225–226.
Begg, R. K., Palaniswami, M., and Owen, B. (2005). Sup-
port vector machines for automated gait classifica-
tion. IEEE Transactions on Biomedical Engineering,
52(5):828–838.
BenAbdelkader, C., Cutler, R., and Davis, L. (2002). Stride
and cadence as a biometric in automatic person iden-
tification and verification. In Fifth IEEE International
Conference on Automatic Face and Gesture Recogni-
tion, pages 372–377.
Bezdek, J. C. (1981). Pattern Recognition with Fuzzy Ob-
jective Function Algoritms. Plenum Press, New York.
Bezdek, J. C., Ehrlich, R., and Full, W. (1984). Fcm:
The fuzzy cmeans clustering algorithm. Computers
& Geosciences, 10(2):191–203.
Bouchrika, I. and Nixon, M. S. (2008). Exploratory factor
analysis of gait recognition. In 8th IEEE International
Conference on Automatic Face & Gesture Recogni-
tion, 2008. FG’08, pages 1–6. IEEE.
Carlsson, S. (2000). Recognizing walking people. In Pro-
ceedings of the 6th European Conference on Com-
puter Vision (ECCV ) -Part I, pages 472–486.
Chang, C.-C. and Lin, C.-J. (2011). Libsvm: a library for
support vector machines. ACM Transactions on Intel-
ligent Systems and Technology (TIST), 2(3):27.
Cheng, L., Sun, Q., Su, H., Cong, Y., and Zhao, S. (2012).
Design and implementation of human-robot interac-
tive demonstration system based on kinect. In Control
and Decision Conference (CCDC), 2012 24th Chi-
nese, pages 971–975. IEEE.
Cheng, M.-H., Ho, M.-F., and Huang, C.-L. (2008). Gait
analysis for human identification through manifold
learning and hmm. Pattern recognition, 41(8):2541–
2553.
Cortes, C. and Vapnik, V. (1995). Support-vector networks.
Mach. Learn., 20(3):273–297.
Dadashi, F., Araabi, B., and Soltanian-Zadeh, H. (2009).
Gait recognition using wavelet packet silhouette rep-
resentation and transductive support vector machines.
In 2nd International Congress on Image and Signal
Processing, 2009. CISP ’09, pages 1–5.
Dempster, A. P. (1967). Upper and lower probabilities in-
duced by a multivalued mapping. The annals of math-
ematical statistics 38, (2):325–339.
Dempster, A. P. (1968). A generalization of bayesian infer-
ence. Journal of the Royal Statistical Society. Series
B (Methodological), pages 205–247.
Fine, T. L. (1977). Review: Glenn shafer, a mathematical
theory of evidence. Bulletin (New Series) of the Amer-
ican Mathematical Society 83, (4):667–672.
Huang, P., Harris, C., and Nixon, M. (1999). Human gait
recognition in canonical space using temporal tem-
plates. IEE Proceedings- Vision, Image and Signal
Processing, 146(2):93–100.
Jeffreys, H. (1973). Scientific Inference (3rd ed.). Cam-
bridge University Press p. 31.
Jsang, A. and Pope, S. (2012). Dempsters rule as seen by
little colored balls. Computational Intelligence 28,
(4):453–474.
Kale, A., Cuntoor, N., Yegnanarayana, B., Rajagopalan, A.,
and Chellappa, R. (2003). Gait analysis for human
identification. In Audio-and Video-Based Biometric
Person Authentication, pages 706–714. Springer.
Kalman, R. E. (1960). A new approach to linear filtering
and prediction problems. Transactions of the ASME
Journal of Basic Engineering, (82 (Series D)):35–45.
Karem, F., Dhibi, M., and Martin, A. (2012). Combina-
tion of supervised and unsupervised classification us-
ing the theory of belief functions. In Belief Functions:
Theory and Applications, pages 85–92. Springer.
Le Hegarat-Mascle, S., Bloch, I., and Vidal-madjar (1997).
Application of dempster-shafer evidence theory to un-
supervised classification in multisource remote sens-
ing. IEEE Transactions on Geoscience and Remote
Sensing, 35(4):1018–1031.
Meyer, D., Psl, J., and Niemann, H. (1998). Gait classifica-
tion with hmms for trajectories of body parts extracted
by mixture densities. In British Machine Vision Con-
ference, pages 459–468.
Microsoft (2013). Kinect for windows. http://
www.microsoft.com/en-us/kinectforwindows/develop/
developer-downloads.aspx. [Online; accessed
25-July-2013].
Pal, N. R. and Chintalapudi, K. K. (1997). A connection-
ist system for feature selection. Neural Parallel Sci.
Comput, 5(3):359381.
Preis, J., Kessel, M., Werner, M., and Linnhoff-Popien, C.
(2012). Gait recognition with kinect. In 1st Interna-
tional Workshop on Kinect in Pervasive Computing.
Sarkar, S., Phillips, P., Liu, Z., Vega, I., Grother, P., and Or-
tiz, E. (2005). The humanid gait challenge problem:
data sets, performance, and analysis. In IEEE Trans-
actions on Pattern Analysis and Machine Intelligence,
volume 27, pages 162–177.
Shafer, G. (1976). A mathematical theory of evidence Vol.
1. Princeton: Princeton university press.
VISAPP2014-InternationalConferenceonComputerVisionTheoryandApplications
178
Sinha, A., Chakravarty, K., and Bhowmick, B. (2013).
Person identification using skeleton information from
kinect. In The Sixth International Conference on
Advances in Computer-Human Interactions (ACHI),
pages 101–108.
Wang, J., She, M., Nahavandi, S., and Kouzani, A. (2010).
A review of vision-based gait recognition methods for
human identification. In International Conference on
Digital Image Computing: Techniques and Applica-
tions (DICTA), pages 320 – 327.
Xue, Z., Ming, D., Song, W., Wan, B., and Jin, S. (2010).
Infrared gait recognition based on wavelet trans-
form and support vector machine. Pattern Recogn.,
43(8):2904–2910.
KinectbasedPeopleIdentificationSystemusingFusionofClusteringandClassification
179