A GRAPH-BASED SIGNATURE GENERATION FOR PEOPLE

RE-IDENTIFICATION IN A MULTI-CAMERA SURVEILLANCE

SYSTEM

T. D’Orazio

and C. Guaragnella

ISSIA CNR, via Amnedola 122/D-I 70126 Bari, Italy

DEE Politecnico di Bari, Via Orabona 70126 Bari, Italy

Keywords:

Multi-view Video Analysis, People Re-identiﬁcation, Color Features, Graph-based Signature.

Abstract:

In this paper we investigate the problem of people re-identiﬁcation in case of non overlapping and non cal-

ibrated cameras. We propose a novel method for signature generation that considers both color and spatial

features along a video sequence and a distance measure to estimate the similarity between silhouettes. A

graph based representation has been introduced to model different people, in which uniform regions repre-

sent nodes and contiguities among regions represent edges. Comparisons with a standard approach based on

histogram similarity have been provided to evaluate the proposed methodology.

1 INTRODUCTION

In this paper, the problem of people tracking in the

context of a non overlapping views of a multiple cam-

era video surveillance system, commonly referred to

as the people re-identiﬁcation problem, is addressed.

In recent literature, different re-identiﬁcation

methods have been developed, some on them focus-

ing on the trajectories matching of people moving in

the scene, others focusing solely on the appearance of

the body. The latter are referred to as appearance-

based methods, and can be grouped in two sets. The

ﬁrst group is composed by the single-shot methods

(Chai et al., 2010; Alahi et al., 2010; de Oliveira and

de Souza Pio, 2009; Gheissari et al., 2006; Mazzeo

et al., 2009; D’Orazio et al., 2009), that model a per-

son analyzing a single image and are applied when

tracking information is absent. The second group

of algorithms encloses the multiple-shot approaches;

they employ multiple images of a person (usually ob-

tained via tracking) to build a signature ((Bazzani

et al., 2010; Cong et al., 2010; Hamdoun et al.,

2008)). Some approaches try to learn the camera net-

work topology in order to simplify the people associ-

ation problem by predicting the relationship between

events like a person exiting the camera view in a given

location and the time lag between its reappearance in

another view of a camera placed in another location.

Alternative approaches to people tracking across mul-

tiple un-calibrated cameras use gait analysis as a new

biometrics technique.

In this paper we investigate the problem of peo-

ple re-identiﬁcation and in particular we consider the

more general case of non overlapping and non cali-

brated cameras. Neither constraints on the knowledge

of camera view topologies, nor possible predictions

of the expected time/location of people appearance

among neighbor cameras are imposed. We propose a

novel method for signature generation that considers

both color features and spatial distribution of uniform

regions along a video sequence and a distance mea-

sure to estimate the similarity of different sequences.

Comparisons with other methodologies on a standard

data sets have been provided to evaluate the proposed

approach.

2 THE PROPOSED APPROACH

In order to generate a robust signature characterizing

people moving in a camera ﬁeld of view, it is neces-

sary to extract moving people and track them along

the video sequence. In this paper a background sub-

traction algorithm is proposed, resulting as an evolu-

tion of the works proposed in (Spagnolo et al., 2005).

At the end of this step, people silhouettes have been

extracted in each frame of the sequence; assume also

that the image resolution is enough to appreciate col-

ors and textures of clothes. The proposed method

414

D’Orazio T. and Guaragnella C..

A GRAPH-BASED SIGNATURE GENERATION FOR PEOPLE RE-IDENTIFICATION IN A MULTI-CAMERA SURVEILLANCE SYSTEM.

DOI: 10.5220/0003842104140417

In Proceedings of the International Conference on Computer Vision Theory and Applications (VISAPP-2012), pages 414-417

ISBN: 978-989-8565-03-7

 2012 SCITEPRESS (Science and Technology Publications, Lda.)

is based on the evaluation of color similarity among

uniform regions and the extraction of robust relative

geometric information that are persistent also when

people move in the scene. The signature can be esti-

mated along a number of frames, and a distance mea-

sure is introduced to compare signatures extracted by

different video sequences. The method consists of the

following steps: 1) ﬁrst of all for each frame a seg-

mentation of uniform regions is carried out; 2) for

each region some color and area information are eval-

uated; 3) the extracted data are evaluated on a number

of consecutive frames; 4) a connected graph is gen-

erated (nodes contain the information of uniform re-

gions such as color histograms and area occupancy,

while connections among nodes contain information

on the contiguity of regions); 5) a similarity measure

is introduced to compare graphs generated by differ-

ent video sequences that considers some relaxation

rules to handle the different appearances of the same

person when observed by different point of views.

In the remaining part of this section we describe

the methodologies for the region information extrac-

tion, the edge information extraction, the graph gener-

ation, and the similarity measure to compare graphs.

Moving Region Segmentation. The detection of

moving objects in a complex background scene is an

important initial step for any computer vision applica-

tion. In this paper we propose a modiﬁcation of a sta-

tistical background subtraction algorithm (Chiu et al.,

2010) producing good performance in the context of

people detection both in indoor and outdoor contexts.

In order to construct a reference background model,

a video sequence during which moving objects can

pass through the scene is observed. For each pixel the

values of the three RGB components are recorded and

the clusters of the most probable are maintained. The

background values are those belonging to the most

probable clusters if the corresponding probability is

over a dynamic threshold. The background extrac-

tion phase continues until all the pixels have been ex-

tracted. Then moving objects in the foreground can

be segmented by comparing each pixel of the cur-

rent frame with the corresponding one in the back-

ground model. At this point, the result of this segmen-

tation step is highly dependent on the lighting vari-

ations, the presence of multiple shadows (especially

in indoor contexts) and the similarity between fore-

ground and background values (camouﬂage effects).

For this reason we have introduced new procedures

for background model updating and shadow removal

both based on a cross correlation approach.

Extraction of Uniform Region. After the extrac-

tion of moving blobs by the background subtraction

and shadow removing algorithms, the resulting seg-

mented areas are analyzed for the evaluation of color

and shape information. A color-based segmentation

has been applied to detect, inside each blob, uniform

connected regions. The segmentation algorithm pro-

posed in (Nock and Nielsen, 2004) has been used. In

order to control the coarseness of the segmentation,

the method uses a parameter Q: the smaller it is, the

less numerous are the region in the ﬁnal segmenta-

tion. The resulting segmentation is processed to ex-

tract inter-region and intra-region information. For

each region a feature vector is considered contain-

ing different information such as: the color histogram,

the mean color, the center of mass of the region, the

area, the perimeter, the region position with respect

to the whole body ﬁgure (head, central and lower).

The body silhouette has been divided in three areas

of different dimensions: the head part corresponds to

1/6 of upper region, the central part falls between

1/6 and 1/3 of the silhouette, while the lower part

corresponds to the remaining inferior area. All the

other features have been normalized with respect to

the ratio between the area dimension and the total

body area. The relations between each couple of re-

gions are examined and an adjacency matrix is gen-

erated. This step has been repeated for all the images

of the same person of the sequence. Corresponding

regions in the sequence have been extracted and the

feature vectors have been updated to weight the mea-

sure coming from different frames.

Connected Graph Signature. The proposed people

signature uses a relational graph. Such a graph con-

sists of a ﬁnite numbers of nodes and a ﬁnite number

of edges. Each node has a number of associated at-

tributes that correspond to the elements of the feature

vector described above. In this way, detecting the sim-

ilarity among people turns into determining the sim-

ilarity among graphs, which is also referred as graph

matching. In this section we describe the process-

ing steps to extract the information for graph gener-

ation, while in the following section we will describe

the proposed graph matching procedure. First of all,

in order to reduce the number of considered regions

when people wear dresses with varying texture, small

regions that fall inside larger regions are joined with

the external ones and the corresponding feature vec-

tors are updated. Small regions that are adjacent other

larger ones are neglected (for example those corre-

sponding to hands, shoes, hairor small shadow areas).

In ﬁgure 1 an example of the region extraction process

is reported. The considered regions are representative

of the nodes of the graph, while the edges will rep-

resent the adjacency relations among the nodes. For

example in ﬁgure 1 the central node corresponding to

A GRAPH-BASED SIGNATURE GENERATION FOR PEOPLE RE-IDENTIFICATION IN A MULTI-CAMERA

SURVEILLANCE SYSTEM

415

Figure 1: On the left the considered person, on the right the

resulting regions after the segmentation and the combina-

tion of internal regions.

the shirt is connected to all the other nodes since this

region is adjacent to all the other regions (head, arms,

and trousers).

Similarity Measure Among Graphs. Connected

graphs have been largely used for object recognition:

given a database of known objects and a query, the

task is to retrieve one objects from the database that

is similar to the query. However, in the context of

people re-identiﬁcation, the same person observed by

different points of view produces different signatures

both in terms of resulting nodes in the graph, both

in terms of attributes for each node. Therefore it is

necessary to consider some degree of error tolerance

in the graph matching process, considering both sub-

graph matching and not precise matches between cor-

responding node attributes. Since we observed that

the central nodes are those that contain the most use-

ful data for an initial people discrimination we started

the graph matching procedure from these nodes. In

particular among all the nodes that belong to the cen-

tal area, we selected the one corresponding to the

torso as the node whose center of mass is the closest

to center of mass of the whole silhouette and whose

area is larger than a percentage of the total area. Then,

for each couple of graphs, starting from these cen-

tral torso nodes the connected nodes belonging to the

upper and the lower regions have been considered to

evaluate geometric relationships based on distances

and color/texture characteristics. The total similarity

is carried out summing up the weighted differences

among all the couples of nodes as follows:

tot

(n,m) = w

· D(n

) + w

· D(n

) +

· D(n

) + w

· D(n

)

where n, m are the two considered graphs, n

are

torso nodes, n

are central nodes, n

are

upper nodes, n

are lower nodes, while D() is

the measure of similarity evaluated on the distance

between the nodes’ attributes. In particular we de-

cided to weight in a different way the differences

between central, upper, and lower nodes. The cen-

tral nodes corresponding to the torso are considered

the most signiﬁcant for the re-identiﬁcation problem,

then the weight w

has been ﬁxed to 0.4. The re-

maining weights have been set to a ﬁxed values of

0.2. If the two considered graphs are not isomorph

(for example they have a different number of central,

lower, and upper nodes) comparisons are carried out

considering all the other possible combinations. The

match providing the lowest distance among the differ-

ent possible matches is considered as the correct one.

3 EXPERIMENTAL RESULTS

Tests have been carried out on a real data set acquired

by two different cameras placed in two different cor-

ridors of an ofﬁce. People observed in one camera

have to be recognized when they pass in the second

camera ﬁeld of view. In this paper results are pre-

sented referred to 35 target persons compared against

47 persons observed in the second camera. In order to

evaluate the performances of the proposed method we

have compared the results with a color position his-

togram approach extracted by the work presented in

(Cong et al., 2010). The silhouette signature is gener-

ated dividing the silhouette in n equal parts and char-

acterizing each part with a color histogram. Prelimi-

nary tests have been carried out to decide the number

of regions that provide the best performances.

On the same test sets, the results obtained with the

color position histogram signature and the proposed

graph-based signature have been compared. For each

view a set of key frames in which people are en-

tirely visible has been considered in order to charac-

terize the signature with the two methods. The re-

identiﬁcation has been assigned to the couple of im-

ages that obtains the minimum value of the similar-

ity distance. The considered metric is the standard

histogram distance both for the color position his-

togram signature, and the proposed graph based ap-

proach. Several experiments were carried out in order

to detect the best color space and the best quantization

values that guarantee the invariance to lighting varia-

tions. In table 1 the best results are reported, obtained

in the HSI space and with a color map with 1050 col-

ors.

Table 1: The results obtained with the color position his-

togram signature and the proposed graph based signature.

True Positive False Negative

Col. Pos. Hist Sig. 82% 18%

Graph based Sig. 88% 12%

The proposed approach improves the detection

performances because the evaluation of similarity be-

VISAPP 2012 - International Conference on Computer Vision Theory and Applications

416

Figure 2: In the ﬁrst line the result of the color position

histogram signature, while in the second line the result of

the proposed graph based signature. The ﬁrst one on the left

is the target object and the remaining ones are the objects

identiﬁed with a growing similarity distance.

tween uniform regions instead of body part regions

with ﬁxed dimensions allows a more precise charac-

terization of the color differences. Color histograms,

even if applied in different stripes of the body, lose

the spatial information about the color distribution.

Instead the initial selection of uniform regions, that

characterize the nodes of the graph and the compar-

isons between couple of uniform regions maintain the

spatial information about colors.

In the ﬁrst line of ﬁgure 2 one case of wrong detec-

tion of the color position histogram signature method

is reported, while in the second line the corresponding

result obtained with the proposed method is shown.

The evaluation of histograms in constant stripes of the

body silhouette makes similar two different persons

whose color components are the same but their spatial

distribution is different. This is particular evident in

the two considered cases, in which people wear simi-

lar dresses unless for an inscription on the chest.

The proposed graph based signatures, anyway,

suffers in few cases of people re-identiﬁcation. In

particular when people wear similar dresses with the

same colors, but they differ only for small details,

such as the shoes color, the hair color, and so on, the

method is not able to disambiguate since the simpliﬁ-

cation of small internal region inclusion or small ex-

treme region elimination causes the lost of precious

information for the discrimination ability. These are,

of course, extreme situations in which many feature

based signature approaches fail. Future work will be

addressed to the solution of this problem, introduc-

ing an adaptive similarity measures that will consider

all the possible uniform regions in the graph match-

ing procedure only when the similarity among main

regions does not allow the disambiguation.

REFERENCES

Alahi, A., Vandergheynst, P., Bierlaire, M., and Kunt, M.

(2010). Cascade of descriptors to detect and track ob-

jects across any network of cameras. Computer Vision

and Image Understanding, 114:624–640.

Bazzani, L., Cristani, M., Perina, A., Farenzena, M.,

and Murino, V. (2010). Multiple-shot person re-

identiﬁcation by hpe signature. In 20th International

Conference on Pattern Recognition, pages 1413–

1416, Istanbul Turkey.

Chai, Y., Takala, V., and Pietikainen, M. (2010). Matching

groups of people by covariance descriptor. In 20th In-

ternational Conference on Pattern Recognition, pages

2744–2747, Istanbul Turkey August 23-August 26.

Chiu, C. C., Ku, M., and Liang, L. (2010). A robust object

segmentation system using a probability based back-

ground extraction algorithm. IEEE Transaction on

Cirtuits and Systems for Video Technology, 20(4):518

– 528.

Cong, D. N. T., Khoudour, L., Achard, C., Meurie, C., and

Lezoray, O. (2010). People re-identiﬁcation by spec-

tral classiﬁcation of silhouettes. Signal Processing,

90:2362–2374.

de Oliveira, I. and de Souza Pio, J. (2009). People reiden-

tiﬁcation in a camera network. In Eighth IEEE Inter-

national Conference on Dependable, Autonomic and

Secure Computing, pages 461–466.

D’Orazio, T., Mazzeo, P., and Spagnolo, P. (2009). Color

brightness transfer function evaluation for non over-

lapping multi camera tracking. In Third ACM/IEEE

International Conference on Distributed Smart Cam-

eras, Como Italy.

Gheissari, N., Sebastian, T. B., Tu, P. H., Rittscher, J., and

Hartley, R. (2006). Person reidentiﬁcation using spa-

tiotemporal appearance. In Proceedings of the 2006

IEEE Computer Society Conference on Computer Vi-

sion and Pattern Recognition, CVPR.

Hamdoun, O., Moutarde, F., Stanciulescu, B., and Steux, B.

(2008). Person re-identifcation in multi-camera sys-

tem by signature based on interest point descriptors

collected on short video sequences. In Proceedings of

the IEEE Conference on Distributed Smart Cameras,

pages 1–6.

Mazzeo, P., Spagnolo, P., and D’Orazio, T. (2009). Ob-

ject tracking by non-overlapping distributed camera

networks. In Advanced Concepts for Intelligent Vi-

sion Systems September, ACIVS, Mercure Chateau

Chartrons, Bordeaux, France.

Nock, R. and Nielsen, F. (2004). Statistical region merging.

IEEE Transaction on Pattern Analysis and Machine

Intelligence, 26(11).

Spagnolo, P., D’Orazio, T., M.Leo, and Distante, A. (2005).

Advances in background updating and shadow remov-

ing for motion detection algorithm. In Lecture Notes

in Computer Science, pages 398–406.

A GRAPH-BASED SIGNATURE GENERATION FOR PEOPLE RE-IDENTIFICATION IN A MULTI-CAMERA

SURVEILLANCE SYSTEM

417