FOOTBALL PLAYER TRACKING FROM MULTIPLE VIEWS

Using a Novel Background Segmentation Algorithm

and Multiple Hypothesis Tracking

A. Koutsia, N. Grammalidis, K. Dimitropoulos

Informatics and Telematics Institute,Centre of Research and Technology Hellas

km Thermi-Panorama road, Thessaloniki, Greece

M. Karaman, L. Goldmann

Communication Systems Group, Technical University of Berlin, Einsteinufer 17, Berlin,Germany

Keywords: Multiple cameras, multi-target tracking, football game analysis, background segmentation.

Abstract: In this work, our aim is to develop an automated system which provides data useful for football game

analysis. Information from multiple cameras is used to perform player detection, classification and tracking.

A background segmentation approach, which operates with the invariant Gaussian colour model and uses

temporal information, is used to achieve more accurate results. Information derived and matched from all

cameras is then used to perform tracking, using an advanced Multiple Hypothesis Tracking algorithm.

1 INTRODUCTION

As football is one of the most popular sport

activities, many research efforts have been devoted

to analyse football games. Needham and Boyle

(Needham, 2003) track the movements of football

players using a single camera on an indoor court. A

more recent approach, proposed in (Figueroa et al.,

2004) performs player tracking using a graph

representation of the moving objects in the scene.

Another interesting approach using multiple

cameras, was presented in (Xu et al., 2005) and was

part of the INMOVE project.

The main contribution of this work is the use of

a novel background segmentation approach and

Multiple Hypothesis Tracking (MHT) to improve

football player tracking. The resulting information

can be used as input for an analysis tool for football

games as well as for low bit-rate transmission and

visualisation of the game.

The paper is organised as follows: the first

section briefly describes the camera calibration

technique used and is followed by a section on a

novel background segmentation method. Next, the

observation extraction and fusion steps are

described, followed by a section on the tracking unit.

Finally experimental results are presented and

conclusions are drawn.

2 SYSTEM DESCRIPTION

2.1 Camera Calibration

In the proposed approach, multiple static cameras

are used to capture images of a football game. To

obtain the exact position of the targets in the real

world, calibration of each camera is required, so that

any point can be converted from image coordinates

(measured in pixels from the top left corner of the

image) to ground coordinates (horizontal and

vertical distances in meters from the centre of the

football pitch) and vice versa. A calibration

technique based on point and line correspondences

as well as a 3x3 homographic transformation was

used (Dimitropoulos, 2005).

2.2 Background Subtraction-Based

Segmentation

Segmentation of moving objects in videos in case of

static or motion compensated camera is mostly

523

Koutsia A., Grammalidis N., Dimitropoulos K., Karaman M. and Goldmann L. (2007).

FOOTBALL PLAYER TRACKING FROM MULTIPLE VIEWS - Using a Novel Background Segmentation Algorithm and Multiple Hypothesis Tracking.

In Proceedings of the Second International Conference on Computer Vision Theory and Applications - IU/MTSV, pages 523-526

 SciTePress

based on background subtraction. In (Karaman et al.,

2005), the authors compared some selected state of

the art background subtraction methods and proved

that most of the proposed systems in literature are

developed for specific environmental conditions and

overcome explicitly or implicitly some of problems

such as shadow, highlight, etc. Based on this

investigation, they propose a method (Karaman et

al., 2006) which combines the use of the Gaussian

colour model (GCM) and temporal information. This

method is extended and used for the purposes of this

work.

Instead of the well-known colour spaces (RGB,

YCbCr, HSV), the Gaussian colour model proposed

by (Geusebroek et al., 2001) is used. It is based on

the measurement of object reflectance in colour

images and focuses on the description of colour

invariants. Various invariant features are obtained

from reflectance properties by its differentiations.

The proposed system includes two parts: pre-

segmentation and post-classification. In the pre-

segmentation part, a background reference model is

generated using multiple frames without moving

objects, calculating mean and standard deviation

maps for each pixel. The absolute difference image

of the input frame and the reference mean frame is

binarised using unimodal thresholding (Rosin,

2001). Further, binary maps are obtained from image

differencing results and standard deviation maps.

Both binary results are combined with an AND

operation. The results of each channel is further

processed with an OR operation into a preliminary

foreground mask.

The second part of the system eliminates the falsely

detected regions. Therefore, an OR operation is

applied to the last obtained final foreground mask

and the mask from the successive frame (motion

mask). Finally, an AND operator is applied to this

motion segmented result and the preliminary

foreground mask in order to restrict the extension of

the moving objects to the motion region and

achieves elimination of large noisy regions. Some

morphological operations are also applied to reduce

residual noise. Moreover, to consider temporal

changes of the background, the background model is

selectively updated.

If there are no frames without moving objects

available, the background model estimation is more

difficult. A median image is calculated based on

multiple frames. Depending on the strength of

movements, a variable step length is used for

skipping frames considered for each intermediate

median image. A final median image is generated by

median filtering of these intermediate results and a

clean background image without any moving objects

can be obtained. Furthermore, for background

modelling, multiple frames are required in order to

derive statistical models for each pixel. For this, a

simple background subtraction method is used based

on the median image. For each current image, an

over-segmented foreground mask is obtained by

setting a very low threshold which after negation

leads to an under-segmented background mask. This

ensures that only very reliable background pixels are

used for generating the final background model.

This whole background subtraction step is applied to

each camera view independently.

2.3 Observation Extraction

Using the masks of the moving objects and the

calibration information it is possible to estimate the

position of the targets in world (ground) coordinates.

The uncertainty of each observation is also

calculated as well as a classification based on shirt

colour information.

Specifically, the following steps are used to estimate

the field position of all players visible from each

camera:

• A connected component labelling algorithm is

applied to the binary foreground mask.

• Blob filtering is performed to discard moving

blobs not corresponding to players.

• For each blob, a characteristic point is chosen.

Since the ground plane assumption is used, a

suitable point is the middle bottom of each blob.

• World (ground) coordinates of each player are

calculated based on the calibration information.

Furthermore, using a version of (Borg et al, 2005),

the observation uncertainty

is given by:

ccccww

yxyxyx ),(),(),( ΛJJR =

(1)

where

is the Jacobian matrix of the partial

derivatives of the mapping functions between the

image and the world (ground) coordinate systems

and

is the measurement covariance at location

),( yx

on the camera plane which is assumed to be a

constant diagonal matrix.

The colours of the shirts define five different groups

(two teams, two goal keepers and one for the

referees). Similar to (Xu et al., 2004) by comparing

a colour histogram for each observation and a model

for each group, five probabilities are obtained which

denote the classification of the observation.

Using a training set, a mean histogram is calculated

offline for each class based on the half upper part of

VISAPP 2007 - International Conference on Computer Vision Theory and Applications

524

each blob (shirt). Histograms of

)/(' bgrrr ++= and

)/(' bgrbb

are

used to achieve robustness against light intensity

variations.

At runtime, a 2D histogram is calculated for each

observation and is compared to each of the five class

models using the histogram intersection method

(Swain and Ballard, 1991). The resulting

probabilities for each class model are then

normalised to have a summation of one, forming a

5-value vector.

Finally, filtering is performed in order to avoid the

use of erroneous observations. Filtering occurs

when:

• The blob size or the height-to-width ratio cannot

correspond to humans.

• Classification is uncertain, i.e. none of the

probabilities is higher than a specific threshold.

• Real world position is out of the pitch.

• The observation is inaccurate, i.e. the maximum

eigenvalue of

is higher than a specific

threshold.

2.4 Observation Fusion

A target visible to multiple cameras may result in

multiple observations due to inaccuracies in the

choice of characteristic points or calibration errors.

To group together these observations, a grid that

separates the football pitch in cells is considered.

Observations which belong to the same cell or to

neighbouring cells are fused to a single observation.

Fused observations are produced by averaging the

parameters of the grouped observations.

The position

Z and uncertainty matrix

of the

fused observation are given by the equations (2):

)(

−

∑

RR ,

∑

−

ZRRZ

(2)

where

Z and

R are the position (in world

coordinates) and uncertainty matrix of the n-th

observation, respectively, in a group consisting of N

observations.

To calculate the average classification vector, the

uncertainty of each observation is used to specify a

weight

w that equals to 1, 2 or 3. The average

classification vector is then calculated by (3). These

parameters (

Z ,

) of each fused observation

comprise the input for the tracking unit.

∑

(3)

2.5 Tracking Unit

The next step is the generation and update of target

tracks. The tracking unit is based on the Multiple

Hypothesis Tracking (MHT) algorithm, which starts

tentative tracks on all observations and uses

subsequent data to determine which of these newly

initiated tracks are valid. Specifically, the tracking

unit was based on a fast implementation of the MHT

algorithm by Cox et al. (Cox and Hingorani, 1996).

A 2-D Kalman filter was used to track each target

and additional gating computations are performed to

discard observation – track pairs with incompatible

classifications. Since only position measurements

are available, a simple four-state (position and

velocity along each axes) CV (constant velocity)

target motion model in which the target acceleration

is modelled as white noise provides satisfactory

results.

3 RESULTS AND CONCLUSIONS

The proposed system was applied to 8 synchronised

image sequences of 2500 frames each, captured for

the INMOVE project. A sample of the background

masks produced using the novel background

segmentation algorithm is shown in Figure 1(a).

Figure 1(b) shows a sample frame where the

extracted observations are marked with bounding

boxes. The small circles at the bottom of each box

are the chosen characteristic points.

After processing the information obtained for each

observation, the system produces a representation of

the players’ positions on the pitch (Figure 1(c))

which gets updated for each frame. The squares

show the position of the fused observation along

with their classification. The crosses (tracks) have

different colours based on the track ID. The

experimental results showed that the position of each

player is quite precise and that the class information

is accurately extracted from the image sequences.

Furthermore, examination of the tracking sequences

showed that, in most cases, the track IDs are

successfully maintained through time.

The aim of this paper was to implement a multiple

camera system which performs football player

FOOTBALL PLAYER TRACKING FROM MULTIPLE VIEWS - Using a Novel Background Segmentation Algorithm

and Multiple Hypothesis Tracking

525