A Dynamic Hybrid Local-spatial Interest Point Matching Algorithm for

Articulated Human Body Tracking

Alireza Dehghani and Alistair Sutherland

School of Computing, Dublin City University, Dublin, Ireland

Keywords:

Interest Point Matching, Cyclic String Matching, Human Body Tracking.

Abstract:

Current interest point (IP) matching algorithms are either local-based or spatial-based. We propose a hybrid

local-spatial IP matching algorithm for articulated human body tracking. The ﬁrst stage is local-based and

ﬁnds matched pairs of IPs from two lists of reference and target IPs through a local-feature-descriptors-based

matching method. The second stage of the algorithm is spatial-based. It starts with the conﬁdently matched

pairs of the previous stage, and recovers more matched pairs from the remaining unmatched IPs through graph

matching and cyclic string matching. To compensate for the problem of Reference List Leakage (RLL), which

decreases the number of reference IPs throughout the frame sequence and causes failure of tracking, an IP

List Scoring and Reﬁnement (LSR) strategy is proposed to maintain the number of reference IPs around a

speciﬁc level. Experimental results show that not only the proposed algorithm increases the precision rate

from 61.53% to 97.81%, but also it improves the recall rate from % 52.33 to 96.40%.

1 INTRODUCTION

The interest point (IP) representation is widely used

in image registration, pattern recognition, human mo-

tion tracking, etc. IP matching, which aims to ﬁnd a

reliable correspondence between reference and target

IPs (extracted from reference and target images) us-

ing some similarity criteria, is a crucial and challeng-

ing process and has been studied widely. IPs are sup-

posed to be persistent across successive frames and

robust to changes in illumination, pose and viewpoint

(Maji, 2006). Current IP matching algorithms mainly

use either local or spatial similarity to establish a cor-

respondence between IPs. The local-based methods

mainly use feature descriptors to measure local simi-

larity of points, while the spatial-based methods use

geometric distance and spatial structure among IPs

(Liu et al., 2012).

Local feature descriptors use image properties

such as pixel intensities, colour, texture, and edges

to measure the distance between IPs in the matching

process. Many remarkable local feature descriptors

such as the Scale Invariant Feature Transform (SIFT)

(Lowe, 2004), Speeded Up Robust Features (SURF)

(Bay et al., 2008), and Gradient Location and Orien-

tation Histogram (GLOH) (Mikolajczyk and Schmid,

2005) have been proposed in the literature. The ORB

(Oriented FAST and Rotated BRIEF) (Rublee et al.,

2011), which is rotation-invariant and resistant to

noise, performs same as SIFT and better than SURF,

while being twice as fast. The different feature de-

scriptors have been compared in literatures (Mikola-

jczyk and Schmid, 2005).

The above mentioned local descriptors are used

to match IPs in different applications. However, they

may collapse in some ambiguous situations such as

monotonous backgrounds, similar features, low res-

olution images, etc. In these cases, spatial-based

IP matching methods, which use information like

geometric distance or neighbourhood relations be-

tween points, can be used to compensate for these

drawbacks. The iterative Random Sample Consen-

sus method (RANSAC) (Fischler and Bolles, 1981),

which ﬁts a mathematical model to a set of points in-

cluding outliers, can be reasonably used only when

there are reasonable level of outliers.

These methods work well only when there are

not many outliers. To compensate for this, the spa-

tial relation between points has been dealt with by

many authors. Consideration of local relations be-

tween IPs (Zheng and Doermann, 2006), graph es-

tablishment by Delaunay triangulation in a two-step

algorithm (Li et al., 2005), a Graph Transforma-

tion Matching (GTM) strategy for ﬁnding a consen-

sus nearest neighbour graph from candidate matches

(Aguilar et al., 2009), and using relative positions and

536

Dehghani A. and Sutherland A..

A Dynamic Hybrid Local-spatial Interest Point Matching Algorithm for Articulated Human Body Tracking.

DOI: 10.5220/0004786705360543

In Proceedings of the 3rd International Conference on Pattern Recognition Applications and Methods (ICPRAM-2014), pages 536-543

ISBN: 978-989-758-018-5

 2014 SCITEPRESS (Science and Technology Publications, Lda.)

angles of points for reduction of false matching have

been introduced in this regard.

Although the spatial-based methods are more ac-

curate and robust than the local-based ones, they are

not as quick particularly when there is a high num-

ber of IPs. Owing to both the pros and cons of these

local and spatial IP matching strategies, combined ap-

proaches (Wen et al., 2008) can be proposed to com-

plement each other. The local feature similarity used

in local-based IP matching approaches can be used to

cut down the search space for the spatial-based meth-

ods. On the other hand, the spatial-based methods can

compensate for the defects of local-based methods in

ambiguous situations such as duplicated local features

patterns between two reference and target images.

In articulated object tracking, the Reference-list

IPs are dynamically matched to the Target-list IPs

over the frame sequence (Li et al., 2003; Zhou et al.,

2009; Ma et al., 2013). During this process, the IPs in

the Reference-list are replaced by the matched points

in the Target-list at each frame. Since the matched

Target-list will always be shorter than the Reference-

list, (because of noise; changes in illumination; ar-

ticulation of the tracking object, and even the weak-

ness of the background subtraction algorithms) the to-

tal number of IPs will be reduced at each frame and

eventually this will lead to loss of tracking. We call

this problem the Reference List Leakage (RLL) prob-

lem in this paper. To tackle this problem, which we

call the Reference List Leakage (RLL) problem in this

paper, our IP matching algorithm is equipped with a

novel IP List Scoring and Reﬁnement (LSR) strategy.

Summary: In this paper we propose a dynamic

hybrid local-spatial IP matching algorithm for hu-

man body tracking. In the ﬁrst stage, the conﬁ-

dently matched points are found using a local-based

IP matching strategy. Then, to compensate for mis-

matched and unmatched IPs, a new spatial-based

matching method based on graph matching and string

matching algorithms is applied. As a remedy for the

problem of RLL, a novel LSR strategy is applied.

The proposed approach beneﬁts from: local-based IP

matching to avoid the expense of the distance and

neighbourhood comparison of the spatial-based meth-

ods; spatial-based IP matching to compensate for the

drawback of the ﬁrst stage; and an IP List Scoring and

Reﬁnement strategy to reﬁne the IP lists and solve

the problem of RLL. The rest of this paper is out-

lined as follow: Section 2 presents the proposed al-

gorithm. Experimental results and conclusions will

be discussed in Sections 3 and 4, respectively.

2 INTEREST POINT MATCHING

ALGORITHM

The two stages of our proposed IP matching algorithm

as well as the IP List Scoring and Reﬁnement (LSR)

strategy will be described in Sections 2.1, 2.2, and

2.3, respectively.

2.1 Stage 1: Local-based IP Matching

In this stage, ﬁrst the local feature descriptors of

the IPs of Reference-list and Target-list are extracted.

Then, the IPs of these lists are matched to each other

in two directions, i.e. the Reference-list to the Target-

list and vice versa. This is carried out because the

results of matching in two different directions are not

same, no matter what type of matcher and distance

measure is used.

Then, two ﬁltering steps are applied to these

matched lists: ﬁrstly, cross-checking is applied to re-

move any IPs which do not match both ways; sec-

ondly, displacement-checking is performed to remove

any IPs where the distance between reference and tar-

get is greater than a threshold based on smoothness

or small inter-frame motion assumptions, which are

valid to assume in human body tacking applications

(Herda et al., 2000). These two checks amend the

result and deliver ”conﬁdently” matched IPs CR =

{cr

, . . . , cr

} & CT = {ct

, . . . , ct

} to the spatial

matching stage. This stage of algorithm is outlined

in Algorithm 1.

2.2 Stage 2: Spatial-based IP Matching

After ﬁnding the conﬁdently matched sets CR and CT ,

the unmatched IPs of the Reference-list, i.e. UR =

{ur

, . . . , ur

}, are dealt with one by one to ﬁnd their

possible corresponding matched IPs in the unmatched

target set UT = {ut

, . . . , ut

}. Before that, the IPs of

CR are clustered into K groups {CR

, . . . ,CR

} us-

ing the k-means clustering algorithm. The centroid of

each cluster C(x

, y

) is calculated by:

∑

i=1

, y

∑

i=1

(1)

Meanwhile, the corresponding K clusters of CT ,

i.e. {CT

, . . . ,CT

}, are obtained from correspon-

dence between the conﬁdently matched IPs of CR

and CT . Then for each unmatched point ur

of UR,

the closest cluster CR

is found by comparing its Eu-

clidean distance to the centre of each cluster. Now, ur

and the conﬁdently matched IPs of the closest cluster

compose a star-shape graph g

(Figure 1).

ADynamicHybridLocal-spatialInterestPointMatchingAlgorithmforArticulatedHumanBodyTracking

537

Algorithm 1 Local-based IP Matching Algorithm.

1: Input: Two IP set Reference-list: {r

}

i=1

Target-list: {t

}

j=1

2: Output: Conﬁdently matched set CR & CT

Unmatched set U R & UT

3: Extract feature descriptor for both IP lists.

4: Match Reference-list to Target-list: Matches RT.

5: Match Target-list to Reference-list: Matches TR.

6: Cross-Check:

7: for each matched pair (i

, j

) in Matches RT do

8: Find matched pair ( j

, i

) in Matches TR.

9: if i

= i

then

10: Keep pair (i

, j

) in Matches RT.

11: else

12: Pushback P

to UR & UT respectively.

13: end if

14: end for

15: Displacement-Check:

16: for each matched pair (i, j) in Matches RT do

17: if distance(P

, P

) < threshold then

18: Pushback P

to CR & CT respectively.

19: else

20: Pushback P

to UR & UT respectively.

21: end if

22: end for

Figure 1: The Graph g

, conﬁdently matched points

, cr

, . . . , cr

, and unmatched point ur

To ﬁnd a possible matching IP to this point, a rect-

angular search area is deﬁned around the position of

in the target image. All the unmatched points ut

in this search area are examined one by one to see if

there is any point which can be matched to the un-

matched point ur

. To do this, a similar graph g

constituted for any unmatched point ut

(inside the

search area) and it conﬁdently matched IPs of the

corresponding cluster CT

. The point ur

is matched

to one of points ut

if graph g

is matched to one

of the graphs g

. Otherwise, this IP remains un-

matched. Figure 2 shows the graph formation step of

the spatial-based IP matching stage. Based on the size

of the search area (Figure 2(c)), a few possible graphs

(Figure 2(e)-2(i)) are composed for the unmatched

points ut

to be matched to graph g

of reference un-

matched point ur

(Figure 2(d)).

For the task of graph matching, the cyclic string

(a) Clustering. (b) Target IPs. (c) Search area.

(d) g

. (e) 1

. (f) 2

(g) 3

. (h) 4

. (i) 5

Figure 2: Graph formation: (a) the clustered conﬁdently

matched IPs, (b) the unmatched target IPs, (c) search ar-

eas, (d) a reference graph, and (e-i) ﬁve target graphs for

unmatched target IPs in the search area.

matching algorithm (Maes, 1990) is used. In this or-

der, a primitive feature vector is ﬁrst extracted for any

of the graphs g

and g

. This feature should be as

light-weight and small-size as possible, while robust

to translation, rotation, and scale. The reciprocal of

compactness (ROC) (Wu, 2011) is a good choice and

satisﬁes these requisites. To extract this feature vec-

tor for any graph g

, with central point ur

and con-

ﬁdently matched IPs {cr

, cr

, . . . , cr

}, the triangle

i+1

is composed for each point cr

, then the

feature value r

is calculated as:

(2)

where p

= |cr

i+1

| + |cr

i+1

| is the

perimeter and a

is the area of the triangle.

Therefore, a vector [r

, r

, . . . , r

] of real numbers

is created for any graphs g

and g

. These vectors

compose strings ”s” and ”t” which are applied to the

string matching stage. To do this, an edit-weighted

graph (Wu, 2001) is constructed for these strings. The

string matching algorithm ﬁnds a minimum cost edit

sequence from ”s” to ”t” which is same as ﬁnding

the shortest path in the edit-weighted graph (Wagner

and Fischer, 1974). Since these strings are extracted

from cyclic graphs (no matter which point cr

is con-

sidered as the ﬁrst point), the cyclic string-to-string

correction problem is applied to our graph matching

scenario. Algorithm 2 summarizes this stage. To ﬁnd

the shortest path in the edit-weighted graph during the

cyclic string matching, the Dijkstra algorithm (Leiser-

son et al., 2001) which is a graph search algorithm for

ﬁnding the shortest path in a graph, is used. Here, the

edit cost function is considered as:

ε(s

→ t

′

) = |s

−t

′

| (3)

ICPRAM2014-InternationalConferenceonPatternRecognitionApplicationsandMethods

538

Algorithm 2 Spatial-based IP Matching Algorithm.

1: Input: IP sets CR & CT and U R & UT .

2: Output: Matched IP sets MR & MT .

3: Push back CR & CT into MR & MT .

4: Cluster CR into K clusters {CR

, . . . ,CR

5: Compose K clusters {CT

, . . . ,CT

6: for each IP ur

of UR do

7: Find its closest cluster, compose graph g

, and

extract its ROC feature vector (string s).

8: Deﬁne search area around ur

in target image.

9: min cost ⇐ ∞

10: for each IP ut

inside search area do

11: Compose graph g

, extract its ROC feature

vector (string t).

12: Find minimum cost from string s to t us-

ing cyclic string matching and Dijkstra al-

gorithms.

13: if minimum cost< min cost then

14: min cost = minimum cost.

15: end if

16: end for

17: if min cost < threshold then

18: Push back ur

and ut

to MR&MT , respec-

tively.

19: end if

20: end for

2.3 LSR Strategy

Using IP matching to track an articulated object

through a long sequence of frames is much more

complicated than simply matching IPs of two static

frames. As the object changes its pose and shape

throughout the sequence, the two main problems

which occur are: IPs in the initial frame rapidly be-

come obsolete; New IPs, which were not in the pre-

vious frames, emerge. To keep track of the object

throughout the frame sequence, we must ﬁnd some

way of removing obsolete IPs and replacing them

with new IPs.

Two lists of IPs are involved in any round of

matching: the Reference-list; and Target-list. The

Reference-list contains those IPs in the previous

frame, which we are reasonably conﬁdent and repre-

sent the previous state of the object. We match these

to the Target-List, which contains IPs from the cur-

rent frame. Any IP of the Reference-list, which ﬁnds

a matching IP in the Target-list, is replaced by the IP

of the Target-list.

A naive approach would be to delete any un-

matched Reference-list IPs on the grounds that they

are now obsolete. But this would be too severe. An

IP may fail to ﬁnd a match in a particular round be-

cause of noise or occlusion and yet may ﬁnd a match

in subsequent rounds. Therefore, we should retain un-

matched IPs in the Reference-list for a certain number

of rounds and delete them only if they fail to ﬁnd a

match for several rounds in successions.

If we wish to replace deleted IPs, a naive ap-

proach would simply be to use unmatched IPs from

the Target-list on the grounds that these represent new

IPs generated by changes in the object. However, new

IPs may also be generated by noise or occlusions.

Therefore, we have to subject new IPs to a test be-

fore we admit them to the Reference-list. To do this,

we include unmatched IPs of the Target-list in a third

list, which we call the ”Reserved-list”. If an IP in this

list ﬁnds a match over a certain number of consecutive

frames then we promote it to the Reference-list.

The LSR strategy works based on two parameters:

Score (S); and Matching-Index (MI). These parame-

ters are assigned to each IP of the Reference-list and

Reserved-list at each round. The S parameter reﬂects

the success or failure of any IP through the previous

rounds of matching. The MI parameter also shows

the number of times IP has been either matched or

unmatched in previous rounds.

The LSR strategy comprises two stages:

• IP Scoring: the S and MI parameters of each IP

in the Reference-list and the Reserved-list are up-

dated based on the result of matching. Whenever

an IP is matched, its MI is increased by 1; oth-

erwise it is decreased by 1. The S parameter is

increased by a reward score of 3, each time the IP

is matched; otherwise it is decreased by a penalty

score given by MI, the number of previous un-

matched rounds. Algorithm 3 summarizes the IP

scoring system after each round of matching.

• List Reﬁnement: the S value of the IPs are

compared with two empirical thresholds, namely

the Eligibility (E) and Merit (M) thresholds,

to ﬁnd the obsolete IPs of the Reference-list

and Reserved-list and the competent IPs of the

Reserved-list. At each round, for each IP of the

Reference-list, if S < E, then that IP is deleted.

For each IP of the Reserved-list, if S > M, then

that IP is promoted to the Reference-List. The IPs

of the Reserved-list with S < E also are deleted

to prevent explosion in this list. The detail of this

stage is described in Algorithm 4.

As an example, if an IP of the Reference-list is

matched for the ﬁrst time in round k it receives an

S value of 3. If it is matched in round k + 1, the S

value will go up to 6. If it is matched in round k + 2,

it will go up to 9. But, if it fails to match in round

k + 3, S will go down to 8 (because MI = −1). If

ADynamicHybridLocal-spatialInterestPointMatchingAlgorithmforArticulatedHumanBodyTracking

539

Algorithm 3 IP Scoring.

1: k: round of matching (k ⇐ 1)

2: for each IP i in Reference-list do

3: if IP i matched any IP j in Target-list then

4: Substitute IP i with IP j, S

= 3, MI

= 1

5: else

6: S

= −1 & MI

= −1

7: end if

8: end for

9: for each IP j in Target-list do

10: if IP j not-matched then

11: Move IP j to the Reserved-list

12: S

= −0.5 & MI

= 0

13: end if

14: end for

15: for rounds k > 1 do

16: for each IP i in Combined-list = [Reference-list

Reserved-list] do

17: if IP i matched any IP j in Target-list then

18: Substitute IP i with IP j

19: if IP i matched in round k − 1 then

20: MI

= MI

k−1

+ 1

21: else if IP i not-matched in round k − 1

then

22: MI

= 1

23: end if

24: S

= S

k−1

+ 3

25: else

26: if IP i matched in round k − 1 then

27: MI

= −1

28: else if IP i not-matched in round k − 1

then

29: MI

= MI

k−1

− 1

30: end if

31: S

= S

k−1

+ MI

32: end if

33: end for

34: for each IP j in Target-list do

35: if IP j not-matched then

36: Move IP j to the Reserved-list

37: S

= −0.5 & MI

= 0

38: end if

39: end for

40: end for

it fails to match in round k + 4, S will go down to 6

(because MI = −2). However, if it matches again in

round k + 5, S will go up to 9.

Figure 3 shows the different steps of LSR for

the ﬁrst two rounds of matching. Step 1 is where

the Combined-list (Reserved-list concatenated to the

end of the Reference-list) and Target-list are prepared

to be fed into the matching algorithm. As can be

Algorithm 4 List Reﬁnement.

1: k: round of matching

2: for rounds k > 1 do

3: for each IP i in Reference-list do

4: if IP S

< E (Eligibility threshold) then

5: Remove IP i from Reference-list

6: end if

7: end for

8: for each IP l in Reserved-list do

9: if IP S

> M (Merit threshold) then

10: Move IP l to the Reference-list

11: else if IP S

< E (Eligibility threshold) then

12: Remove IP l from Reserved-list

13: end if

14: end for

15: end for

Figure 3: The LSR for the ﬁrst two rounds of matching.

seen, the Reserved-list is empty in the ﬁrst round

and the S and MI values of the IPs are zero. Step

2 displays the status of the IPs after matching. The

red arrows show the matched pairs while the purple

ones show the unmatched IPs in Target-list, which

are moved to the Reserved-list. This leads into Step

3, where the matched Reference-list IPs are replaced

with their corresponding IPs in the Target-list and the

unmatched Target-list IPs are moved to the Reserved-

list with a penalty score of −0.5, which is a bias

penalty for unmatched IPs of Target-list. This step

is a basis for Step 1 in the next round of matching

where: the IPs of the Reference-list and Reserved-

ICPRAM2014-InternationalConferenceonPatternRecognitionApplicationsandMethods

540

list are relabelled with Ref and Rsvd labels; the list

reﬁnement procedure is applied to the Reference-list

and Reserved-list; and the Target-list is loaded with

the new IPs of target image.

3 EXPERIMENTAL RESULTS

Based on the application in hand, human upper body

tracking, extracted FAST IPs from RGB acquired im-

ages with resolution of 240∗320 pixels, are passed to

an IP-based background subtraction algorithm (Fig-

ure 4(a)) proposed by the authors (This approach is

under examination as a patent). The resultant fore-

ground IPs (Figure 4(a)-right) of any two consecu-

tive frames, are fed to the local-based stage of algo-

rithm, where the SURF descriptor extractor and the of

BruteForce matcher of OpenCV are used to estimate

the initial correspondence. Then two further cross-

checking and displacement-checking procedures are

applied to reject the outliers as well as to keep as many

inliers as possible.

Table 1: Performance comparison on the image pairs in Fig-

ure 4. The values in the columns are the TP (True Positive),

FP (False Positive), FN (False Negative) (Benezeth et al.,

2010), Precision (%), and Recall (%) (Olson and Delen,

2008), respectively.

TP FP FN P R

BruteForce 56 35 51 61.53 52.33

Cross-checked 56 25 61 69.13 47.86

Conﬁdentley 56 5 81 91.80 40.87

Combined 134 3 5 97.81 96.4

The results of our algorithm on two frames of

video are presented in Table 1. In this experiment,

there are 142 points in the Reference-list which are

matched to the Target-list IPs. As can be seen from

the ﬁrst row of table, the traditional local matchers

like BrouteForce do not deliver good precision and

recall rates. Nevertheless, the cross-checking and

displacement-checking procedures improve the accu-

racy of the local-based IP matching stage (increas-

ing the precision rate from 61.53% for BruteForce

to 91.80% for Conﬁdently matched IPs); meanwhile,

they decrease the number of conﬁdently matched IPs

(the recall rate) from 52.33% to 40.87%. Although

they pull down the recall rate(up to 40.87%), the im-

provement in precision (up to 91.80%) is used as a ba-

sis for the spatial-based matching stage to cut down its

cost of search in comparison with when the only spa-

tial IP matching algorithm. Finally, the last row of Ta-

ble 1 shows the improvement which the spatial-based

stage creates in precision and recall rate. Figure 4

shows results graphically, where the left and right im-

ages ( 4(b)-4(e)) are the reference and target images,

respectively. It is also noteworthy to compare Figures

4(d) and 4(e) to realize the delivered improvement of

hybrid local-spatial algorithm in comparison with the

only local IP matching algorithm.

(a) Left to right: image, FAST IPs, foreground IPs.

(b) IP matching using BruteForce matcher.

(d) Matched IPs after displacement-checking.

(e) Final Matched IPs after second stage.

Figure 4: Results of local-based stage of the proposed IP

matching algorithm: (a) left to right: the real image, FAST

IPs, and the foreground IPs, (b) BruteForce matching, (c)

cross-checking, (d) the conﬁdently matched IPs.

Figure 5 represents the visual comparison of our

algorithm without and with the LSR approach over

eight successive rounds of matching. As can be seen

in Figure 5(a), the RLL problem causes loss of track

after a few rounds while the proposed LSR approach

prevents it and holds the number of reference IPs at

the same level as the ﬁrst round. Moreover, if the

matching algorithm fails to ﬁnd the matched pair for

many IPs, the LSR approach compensates for that in

the subsequent rounds. For instance, as the ﬁfth round

of matching shows (3

row and 1

column of 5(a)

and 5(b)), about the half IPs (those over the torso area)

have not been matched. This is the starting point for

the fail of track in Figure 5(a), whereas the LSR has

compensated for that in the next round (Figure 5(b)).

Besides, LSR reﬁnes the Reference-list by remov-

ing its obsolete IPs and replacing them with new com-

petent IPs from the Reserved-list. This advantage of

the LSR approach helps the matching algorithm to

follow the dynamic of the tracked object. These pros

of LSR deliver a signiﬁcant improvement to the IP

ADynamicHybridLocal-spatialInterestPointMatchingAlgorithmforArticulatedHumanBodyTracking

541

(a) Result of IP matching without LSR.

(b) Result of IP matching with LSR.

Figure 5: Final result of proposed IP matching algorithm

for some consecutive frames without (a) and with (b) the

scoring and reﬁnement (LSR) strategy.

matching algorithm particularly in articulated object

tracking applications.

Figure 6 also shows the matched IPs of some con-

secutive and non-consecutive frames for both with

and without the LSR approach. Figures 6(a) and 6(b)

compares the effect of LSR over eight consecutive

frames while 6(c) shows the result for some random

frames over a 100 frames of video with different level

of articulation and deformations.

Figures 7 and 8 statistically compare different

stages of the proposed algorithm over a 100 frames

with different levels of articulation and deformation.

It is obvious from these ﬁgures that the proposed hy-

brid algorithm delivers the best precision and recall

values compared with the others. Although the pre-

cision curve of the Conﬁdently-matched stage is so

close to the hybrid method (Figure 7-right), its recall

value is quite far from it (Figure 7-left). It conﬁrms

that the local-based matching stage only delivers high

accuracy to the algorithm by ﬁltering out the mis-

matched pairs while it leaves lots of IPs unmatched.

Although same level of precision and recall as

of the algorithm is acceptable in roughly tracking

(a) Matching without LSR over 8 consecutive frames.

(b) Matching with LSR over 8 consecutive frames.

Figure 6: Matched IPs of some frames.

Figure 7: Precision and Recall curves of the algorithm.

of objects, it is not acceptable in articulated object

tracking application with lots of details, such as hu-

man body tracking. In these situations, the reference

IPs should be accurately matched to the target IPs as

much as possible. In fact, Figure 7 shows the capa-

bility of the proposed hybrid IP matching algorithm

in improvement of the recall value while preserving

the precision rate. The efﬁciency of our approach in

terms of Precision-Recall is shown in Figure 8. The

output of the local-based stage of the algorithm per-

forms roughly the same as the hybrid method for re-

call values less than 0.1. However, they are not so

steady and good for the higher recall values, which it

is essential for articulated object tracking.

Figure 8: Precision-Recall curve of the algorithm.

ICPRAM2014-InternationalConferenceonPatternRecognitionApplicationsandMethods

542

4 CONCLUSIONS

In this paper, we have proposed a new IP matching al-

gorithm for articulated object (human body) tracking

applications. The key characteristic of our approach

is the increase of precision and recall rates in two se-

quential stages: Firstly, a Local-based IP matching al-

gorithm is performed to ﬁnd the conﬁdently matched

pairs between the reference and target sets of IPs

(increasing the precision rate); Secondly, a spatial-

based matching algorithm is applied to the conﬁdently

matched pairs to recovers more matched pairs from

the remaining unmatched IPs through graph match-

ing and cyclic string matching (enhancing the recall

rate while the precision rate is kept at high level). We

applied our approach to a sequence of frames with

different levels of articulation and deformations. Ex-

perimental results show promisingly that not only the

proposed algorithm increases the precision rate from

61.53% for BruteForce to 97.81%, but also it im-

proves the recall rate from % 52.33 for BruteForce

to 96.40%.

ACKNOWLEDGEMENTS

The proposed work was supported by the Irish Re-

search Council (IRC) under their Enterprise Partner-

ship Program.

REFERENCES

Aguilar, W., Frauel, Y., Escolano, F., Martinez-Perez, M. E.,

Espinosa-Romero, A., and Lozano, M. A. (2009). A

robust Graph Transformation Matching for non-rigid

registration. Image and Vision Computing, 27(7):897–

910.

Bay, H., Ess, A., Tuytelaars, T., and Van Gool, L. (2008).

Speeded-up robust features (SURF). Computer vision

and image understanding, 110(3):346–359.

Benezeth, Y., Jodoin, P.-M., Emile, B., Laurent, H., and

Rosenberger, C. (2010). Comparative study of back-

ground subtraction algorithms. Journal of Electronic

Imaging, 19(3):33003.

Fischler, M. A. and Bolles, R. C. (1981). Random sample

consensus: a paradigm for model ﬁtting with appli-

cations to image analysis and automated cartography.

Communications of the ACM, 24(6):381–395.

Herda, L., Fua, P., Plankers, R., Boulic, R., and Thalmann,

D. (2000). Skeleton-based motion capture for robust

reconstruction of human motion. In Computer Anima-

tion 2000. Proceedings, pages 77–83. IEEE.

Leiserson, C. E., Rivest, R. L., Stein, C., and Cormen, T. H.

(2001). Introduction to algorithms. The MIT press.

Li, B., Meng, Q., and Holstein, H. (2003). Point pattern

matching and applications-a review. In Systems, Man

and Cybernetics, 2003. IEEE International Confer-

ence on, volume 1, pages 729–736. IEEE.

Li, Y., Tsin, Y., Genc, Y., and Kanade, T. (2005). Ob-

ject detection using 2d spatial ordering constraints.

In Computer Vision and Pattern Recognition, 2005.

CVPR 2005. IEEE Computer Society Conference on,

volume 2, pages 711–718. IEEE.

Liu, Z., An, J., and Jing, Y. (2012). A Simple and robust

feature point matching algorithm based on restricted

spatial order constraints for aerial image registration.

Geoscience and Remote Sensing, IEEE Transactions

on, 50(2):514–527.

Lowe, D. G. (2004). Distinctive image features from scale-

invariant keypoints. International journal of computer

vision, 60(2):91–110.

Ma, J., Zhao, J., Tian, J., Tu, Z., and Yuille, A. L. (2013).

Robust estimation of nonrigid transformation for point

set registration. In Proceedings of IEEE conference on

Computer Vision and Pattern Recognition. IEEE.

Maes, M. (1990). On a cyclic string-to-string correction

problem. Information Processing Letters, 35(2):73–

78.

Maji, S. (2006). A Comparison of Feature Descriptors. Uni-

versity of California, Berkeley.

Mikolajczyk, K. and Schmid, C. (2005). A perfor-

mance evaluation of local descriptors. Pattern Analy-

sis and Machine Intelligence, IEEE Transactions on,

27(10):1615–1630.

Olson, D. L. and Delen, D. (2008). Advanced data mining

techniques [electronic resource]. Springer.

Rublee, E., Rabaud, V., Konolige, K., and Bradski, G.

(2011). ORB: an efﬁcient alternative to SIFT or

SURF. In Computer Vision (ICCV), 2011 IEEE In-

ternational Conference on, pages 2564–2571. IEEE.

Wagner, R. A. and Fischer, M. J. (1974). The string-

to-string correction problem. Journal of the ACM

(JACM), 21(1):168–173.

Wen, G.-J., Lv, J.-j., and Yu, W.-x. (2008). A high-

performance feature-matching method for image reg-

istration by combining spatial and similarity informa-

tion. Geoscience and Remote Sensing, IEEE Transac-

tions on, 46(4):1266–1277.

Wu, W. Y. (2001). Two-dimensional object recognition

through string matching. Imaging science journal,

49(4):213–221.

Wu, W.-Y. (2011). A string matching method for hand

recognition. In Natural Computation (ICNC), 2011

Seventh International Conference on, volume 3, pages

1598–1601. IEEE.

Zheng, Y. and Doermann, D. (2006). Robust point matching

for nonrigid shapes by preserving local neighborhood

structures. Pattern Analysis and Machine Intelligence,

IEEE Transactions on, 28(4):643–649.

Zhou, H., Yuan, Y., and Shi, C. (2009). Object tracking

using sift features and mean shift. Computer Vision

and Image Understanding, 113(3):345–352.

ADynamicHybridLocal-spatialInterestPointMatchingAlgorithmforArticulatedHumanBodyTracking

543