Camera Placement Optimization Conditioned on Human Behavior and

3D Geometry

Pranav Mantini and Shishir K. Shah

Department of Computer Science, University of Houston, 4800 Calhoun Road, Houston, Texas, U.S.A.

Keywords:

Camera Placement Optimization, Human Motion Forecasting, Occupancy Map Estimation, Face Detection,

Human Activity, Effective Surveillance.

Abstract:

This paper proposes an algorithm to optimize the placement of surveillance cameras in a 3D infrastructure.

The key differentiating feature in the algorithm design is the incorporation of human behavior within the in-

frastructure for optimization. Infrastructures depending on their geometries may exhibit regions with dominant

human activity. In the absence of observations, this paper presents a method to predict this human behavior

and identify such regions to deploy an effective surveillance scenario. Domain knowledge regarding the in-

frastructure was used to predict the possible human motion trajectories in the infrastructure. These trajectories

were used to identify areas with dominant human activity. Furthermore, a metric that quantiﬁes the position

and orientation of a camera based on the observable space, activity in the space, pose of objects of interest

within the activity, and their image resolution in camera view was deﬁned for optimization. This method was

compared with the state-of-the-art algorithms and the results are shown with respect to amount of observable

space, human activity, and face detection rate per camera in a conﬁguration of cameras.

1 INTRODUCTION

Video surveillance is an integral part of many pub-

lic areas such as airports, banks and train stations.

The positioning and orientation of the cameras can

play a signiﬁcant role in enabling effective surveil-

lance needs such as face detection, tracking, etc. The

geographic distribution of cameras to enable effective

surveillance can be scenario speciﬁc. For example, in

a movie theater, it might be sufﬁcient to deploy cam-

eras at locations that exhibit dominant human activ-

ity, but at an airport, it may be imperative to deploy

cameras to obtain a maximum visibility of observable

space along with emphasis on areas with dominant

human activity. Some common factors that should be

taken into consideration while deploying cameras in-

clude visibility coverage and deployment costs.

Visibility Coverage: In high security scenarios,

the camera conﬁguration should be optimized such

that a maximal coverage of the observable space in the

infrastructure can be obtained along with added em-

phasis on areas with dominant human activity. In low

security scenarios, the camera conﬁguration should at

least guarantee the coverage of all the areas where

dominant human activity would take place. The con-

ﬁguration can be made more effective by covering the

most frequently used entry and exit points in the in-

frastructure. Furthermore, a camera conﬁguration that

maximizes the capture of speciﬁc pose of objects of

interest (e.g., frontal image of the humans) with sufﬁ-

cient resolution is considered more effective.

Deployment Cost: The conﬁguration should

guarantee the mentioned visibility coverage while de-

ploying the least required number of cameras. Fur-

thermore, having a minimal number of cameras has a

signiﬁcant impact on the available storage space with

HD cameras becoming more prevalent and requiring

higher storage space.

Designing a camera deployment conﬁguration

manually by taking into consideration the above fac-

tors can be extremely tedious and error prone. Auto-

mated camera network deployment optimization tech-

niques are essential for a cost effective and safe envi-

ronment. In this paper, we address the issue of ob-

taining effective surveillance by optimizing the de-

ployment of cameras. In doing so, the multi-factorial

issues of visibility coverage, deployment costs, pre-

ferred pose of objects of interest, and resolution are

considered.

In this work, a camera conﬁguration is considered

to provide effective surveillance if the views across

deployed cameras maximizes the following aspects

Mantini, P. and Shah, S.

Camera Placement Optimization Conditioned on Human Behavior and 3D Geometry.

DOI: 10.5220/0005677602250235

In Proceedings of the 11th Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2016) - Volume 3: VISAPP, pages 227-237

ISBN: 978-989-758-175-5

227

while minimizing the total number of cameras.

Figure 1: Example of an image from a single surveillance

camera illustrating the four aspects of a camera view.

• the observable space,

• the view of regions within the infrastructure where

dominant activity is expected,

• the ability to capture the preferred pose of objects

of interest (e.g., frontal pose of humans), and

• their image resolution (e.g., face).

Consider a view from a single camera as shown in

Figure 1. In the following, we discuss the four rele-

vant aspects considered for an optimal camera conﬁg-

uration.

Maximize Observable Space in View: The in-

formation regarding the 3D geometry (ﬂoor) of the

infrastructure can be used to maximize the observable

space. In doing so, only the space that would be ac-

cessible by humans is considered relevant, as depicted

by the red bounding box in Figure 1.

Maximize the View of Regions with Expected

Dominant Human Activity: Given the observable

space, there are regions within it where one can ex-

pect dominant human activity to occur. This is illus-

trated by the green bounding box in Figure 1. All

public infrastructures have entrances, exits and points

of interest. Any doorway can be considered as an en-

trance or an exit. For simplicity they are referred to

as nodes. In an infrastructure, different nodes are ac-

cessed with different frequencies. A node represent-

ing a common entrance or an exit has a high frequency

of access as opposed to an employee’s personal ofﬁce.

Given these nodes and their probabilities, human mo-

tion can be estimated or measured between the nodes

to identify regions of high human activity.

Maximizes the Ability to Capture Preferred

Pose of Objects of Interest: In this surveillance sce-

nario, frontal pose of the humans can be considered to

be the preferred pose as illustrated by the blue bound-

ing boxes in Figure 1. The direction of motion of

humans can be used to maximize the view of their

frontal pose. Given the nodes, their probabilities, and

trajectories followed by humans, this direction of mo-

tion can be identiﬁed.

Resolution of the Imaged Objects: The resolu-

tion of the face (yellow bounding box in Figure 1)

could be considered as a feature of interest in the do-

main of human surveillance, and hence it’s captured

image resolution would be expected to be high. Sim-

ilar to the previous step, the trajectories provide the

direction of motion for the humans. A location of a

face can be assumed based on the estimate of the av-

erage human height. The number of rendered pixels

of the bounding box representing the face in the im-

age from the camera can be used for maximizing this

quantity.

In this paper, we provide a solution for optimal

placement of cameras while considering the above

factors. The main contributions of the paper are:

• We propose a method to incorporate predicted hu-

man behavior for camera placement optimization.

• We propose a method to estimate the human activ-

ity based on the 3D geometry of the infrastructure.

• We propose a method to identify and cluster re-

gions of plausible high human activity.

• We propose a metric to assess the quality of a

camera conﬁguration based on observable space,

amount of activity in the view, preferred pose of

objects of interest, and their image resolution.

2 PREVIOUS WORK

Camera placement optimization is a crucial problem

in computer vision and has been explored by many

researchers. Most of the early work puts empha-

sis on image resolution and were based on a single

camera focused on a static object. The problem was

to ﬁnd the best position for the camera that maxi-

mizes the quality of features on an object (Tarabanis

et al., 1995; Fleishman et al., 1999). Later, (Chen

and Davis, 2000) proposed a metric based on resolu-

tion and occlusion characteristics of the object that as-

sessed the quality of multiple camera conﬁgurations.

The conﬁguration was optimized based on this metric

such that minimum occlusion would occur while en-

suring a certain resolution. (Mittal and Davis, 2004)

suggested a probabilistic approach for visibility anal-

ysis that captured the observable space aspect and cal-

culated the probability of visibility of an object from

at least one camera in the conﬁguration. Then a cost

function was deﬁned that mapped the sensor parame-

VISAPP 2016 - International Conference on Computer Vision Theory and Applications

228

ters to the probability and the cost function was mini-

mized by simulated annealing.

(Erdem and Sclaroff, 2004) suggested a bi-

nary optimization approach for the camera placement

problem that captured both the observable space and

resolution aspect. The polygon representing the space

is fragmented into an occupancy grid and the algo-

rithm tries to minimize the cost of a camera conﬁgura-

tion while maintaining some speciﬁed spatial resolu-

tion. (H

orster and Lienhart, 2006a; H

orster and Lien-

hart, 2006b; H

orster and Lienhart, 2006c) proposed

a linear programming approach that determines the

calibration for each camera in the network that max-

imizes the coverage of the observable space with a

certain resolution. (Ram et al., 2006; Sivaram et al.,

2009) proposed a performance metric that evaluates

the probability of accomplishing a task as a function

of set of camera conﬁgurations. This metric took into

consideration the objects of interest in the scenario

and was deﬁned to realize only images obtained in a

certain direction (frontal image of the person). (Bodor

et al., 2007) proposed a method, where the goal is to

maximize the aggregate observable space across mul-

tiple cameras. An objective function that quantiﬁes

the resolution of the image and the motion trajectories

of the object in the scene is deﬁned. A variant of hill

climbing method was used to maximize this objective

function.

(Murray et al., 2007) applied coverage optimiza-

tion combined with visibility analysis to address this

problem. For each camera location, the coverage was

calculated using visibility analysis. Maximal cov-

ering location problem (MCLP) and backup cover-

age location problem (BCLP) were used to model the

optimum camera combinations and locations. (Ma-

lik and Bajcsy, 2008) suggested a method for opti-

mizing the placement of multiple stereo cameras for

3D reconstruction. An optimization framework was

deﬁned using an error based objective function that

quantiﬁes the stereo localization error along with res-

olution constraints. A genetic algorithm was used to

generate a preliminary solution and later reﬁned us-

ing gradient descent. (Kim and Murray, 2008) also

employed BCLP to solve the camera coverage prob-

lem. They suggested an enhanced representation of

the coverage area by representing it as a continu-

ous variable in contrast to a commonly used discrete

variable. (Yabuta and Kitazawa, 2008) and (Debaque

et al., 2009) also employed a combination of MCLP

and BCLP for solving the optimum camera cover-

age problem. The former took into consideration the

3D geometry of the environment and supplemented

the MCLP/BCLP problem by including a minimal

localization error variable for both monoscopic and

stereoscopic cameras. The optimization problem was

solved using simulated annealing. In the latter, the

MCLP/BCLP problem was supplemented using visi-

bility analysis for optimization. (Huang et al., 2014)

proposed a 2 stage approximation algorithm, the ﬁrst

part proposes a solution for the minimum watch-

men tour problem and placed cameras along the es-

timated tour, the second part ﬁnds the solution to art

gallery problem and added extra cameras to connect

the guards. Most of the previous work emphasizes

the importance of maximizing observable space and

resolution of this space. There is little work address-

ing the signiﬁcance of activity in the observable space

along with obtaining useful data. This work address

this by assumes equal importance to all four aspects

which were ignored in the previous work.

Considering the 3D geometry of the environment

is of signiﬁcant value for the camera coverage opti-

mization problem. In this paper, we focus on indoor

scenarios and assume the availability of a complete

3D model of the environment where the camera net-

work is to be deployed. To the best of our knowledge,

this is the ﬁrst work that takes into consideration the

human activity in the scenario for designing an opti-

mal camera network in the absence of any observa-

tions. Although (Bodor et al., 2007; Janoos et al.,

2007) proposed the use of observed human activity

for optimizing the camera placement, in the proposed

work the human trajectories are simulated and not ob-

served in order to identify regions with dominant hu-

man activity. Furthermore, (Ram et al., 2006; Sivaram

et al., 2009) proposed the use of frontal view from ob-

servations as a task for optimizing the camera position

unlike the proposed method that predicts frontal view

based on human behavior. Finally, the human behav-

ior in a given scenario is inﬂuenced by the 3D ge-

ometry of that environment (Mantini and Shah, 2014;

Kitani et al., 2012). To the best of our knowledge,

this is the ﬁrst work that incorporates this information

to optimize the camera network locations for video

surveillance.

3 METHODOLOGY

3.1 Problem Formulation

Let G be the geometry (ﬂoors, ceilings, walls, etc.)

of an infrastructure. Let {C

,...,C

} be a set

of cameras located in G with conﬁgurations (like

position, orientation, zoom, etc.) represented by

{ω

,ω

,...,ω

},ω

∈ Ω, where Ω is the set of all pos-

sible conﬁgurations within G. Let g : ω 7→ R be an

objective function. The problem is to ﬁnd a set of op-

Camera Placement Optimization Conditioned on Human Behavior and 3D Geometry

229

timal conﬁgurations {ω

∗

,ω

∗

,...,ω

∗

} such that:

{ω

∗

,ω

∗

,...,ω

∗

} = argmax

{ω

,ω

,...,ω

}∈Ω

∑

i=1

g(ω

) (1)

3.2 Camera Coverage Quality Metric

The function g(.) quantiﬁes the following aspects in

view of the camera:

• amount of observable space,

• amount of view of regions with expected domi-

nant activity,

• amount of ability to capture the preferred pose of

objects, and

• image resolution of these objects.

(Janoos et al., 2007) proposed cell coverage qual-

ity metric to determine the coverage quality of a cell

given a set of camera conﬁgurations by modeling

realistic camera characteristics. A cell was deﬁned

as any unit of observable space, like a square in a

grid or a triangle in a triangular mesh. Furthermore,

they proposed a cost function that combines this met-

ric with observed human occupancy for optimization.

We extend this notion and deﬁne the Camera Cover-

age Quality Metric (CCQM) to quantify amount of

observable space (A), amount view of regions with

expected dominant activity (H), amount of ability to

capture the preferred pose (F) and image resolution

of these objects (R) for a camera conﬁguration ω. The

Camera Coverage Quality Metric (CCQM) is deﬁned

as:

CCQM(ω) = g(A,H,F, R)

=A(ω) ∗H(ω) ∗ F(ω) ∗ R(ω)

(2)

The optimal conﬁguration of the cameras in G is de-

ﬁned as:

{ω

∗

,ω

∗

,...,ω

∗

} = argmax

{ω

,ω

,...,ω

}∈Ω

∑

i=1

CCQM(ω

) (3)

Given ω, the functions {A,H,F, R} are deﬁned as

follows. Without loss of generality we assume that the

geometry to be viewed is represented by a triangular

mesh containing triangles {t

,...t

} with centroids

,...c

}. Let {t

,...t

} be the set of triangles

in view of the camera with conﬁguration ω.

Amount of Observable Space: The geometric

area in view of the camera is used to quantify this

aspect. The area of coverage function A(ω) is deﬁned

as:

A(ω) =

area

in view

total area

∑

i=1

area(t

)

∑

i=1

area(t

)

(4)

Amount of View of Regions with Expected

Dominant Activity: An occupancy map of a space

quantiﬁes how often a point is accessed compared to

other points in that space. Let us assume an occu-

pancy map as deﬁned in (Mantini and Shah, 2014),

that deﬁnes the frequency with which a triangle is ac-

cessed by humans. The same methodology as fol-

lowed in (Mantini and Shah, 2014) is used to com-

pute the occupancy map. The amount of occupancy is

used to deﬁne the activity in the area. If O(t) is the oc-

cupancy of the triangle t, then the human occupancy

volume function is deﬁned as:

H(ω) =

∑

i=1

O(t

)

∑

i=1

O(t

)

(5)

Figure 2: Vector discretization of triangle in a triangular

mesh for creating a vector transition histogram from trajec-

tories.

Amount of Ability to Capture the Preferred

Pose of Objects: Humans are considered as objects

of interest. Assuming that τ = {T

,...} be a set

of trajectories followed by humans in the geometry

G. These trajectories are used to quantify the amount

of frontal view that can be obtained from the con-

ﬁguration ω. For every triangle t

in the ﬂoor trian-

gular mesh, direction discretization is performed and

eight direction vectors {v

,...,v

} are deﬁned as

by (Zhou et al., 2010)(Figure 2).

In the following step, a vector transition histogram

is constructed from the set of these trajectories. Con-

secutive points in the trajectory are considered to

create a direction vector. If T = {p

, p

,..., p

} is

a trajectory of length l, for all set of consecutive

points {p

i−1

, p

}, the direction vector is deﬁned as

− p

i−1

). The bin corresponding to the triangle t

in which the point p

i−1

is located and the discretized

direction vector subtending the smallest angle with

− p

i−1

) is incremented. Let Ψ(t,v) 7→ R where

t ∈ {t

,...,t

} and v ∈ {v

,...,v

} be the his-

togram function, then the frontal pose function F(ω)

VISAPP 2016 - International Conference on Computer Vision Theory and Applications

230

for a camera with center C is deﬁned as:

F(ω) =

∑

i=1

(((C − c

) · v

k−1

)Ψ(t

k−1

)

+((C − c

) · v

)Ψ(t

)

+((C − c

) · v

k+1

)Ψ(t

k+1

))

(6)

k = argmax

· (C − c

)) (7)

where t

is the triangle with centroid c

and v

is the

direction vector that subtends the smallest angle with

(C − c

Image Resolution of the Object: This compo-

nent of CCQM quantiﬁes the resolution of the face.

If the obtained image is far from the camera, the ob-

tained resolution is very low and the image might not

add any value to the system. This component is appli-

cation dependent, it could be customized to obtain a

sufﬁcient resolution of any object, which could be just

the face or the entire body of a human. We follow the

methodology described by (Janoos et al., 2007) and

deﬁne the function R(ω) for a camera with center C

as:

R(ω) =

∑

i=1

)

min

(8)

Algorithm 1: Optimal Pair.

Require: v

(ceiling point), L (ﬂoor points list)

Ensure: v

(Optimal ﬂoor point)

1: procedure OPTIMAL–PAIR

2: //Random Search

3: n ← number of points for random search

4: currentv

← Random Solution(L)

5: current ← CCQM(v

,currentv

)

6: for (i = 1; i ≤ n; i + +) do

7: currentv

← Random Solution(L)

8: candidate ← CCQM(v

,currentv

)

9: if candidate > current then

10: current ← candidate

11: candidatev

← currentv

12: end if

13: end for

14: //Hill Climbing

15: current ← CCQM(v

,candidatev

)

16: for k ∈ neighbors(candidatev

) do

17: currentv

← candidatev

.neighbor[k]

18: candidate ← CCQM(v

,currentv

)

19: if candidate > current then

20: current ← candidate

21: v

← currentv

22: end if

23: end for

24: Return(v

)

25: end procedure

) = (2π ∗ d(C,c

)

(1 − cos(γ/2)))

−1

(σ

k−1

(C − c

) · v

k−1

+σ

(C − c

) · v

+σ

k+1

(C − c

) · v

k+1

)

(9)

where γ is the Y-ﬁeld of view deﬁned for the camera,

d(p

, p

) is the Euclidean distance between the points

and p

, k is as deﬁned in Equation 7, σ is the num-

ber pixels the object occupies in the image and ρ

min

is the user deﬁned value that deﬁnes a minimum re-

quired resolution of an object in pixels/inch.

3.3 Optimization

Now that a metric is deﬁned to assess the quality of

a camera conﬁguration ω, we perform a search in

the geometry G to ﬁnd the optimum parameter ω

∗

Given the geometry and the domain knowledge, the

search is performed to ﬁnd two points, ﬁrst on the

Algorithm 2: RRHC Optimization.

Require: C (ceiling points list), F (ﬂoor points list)

Ensure: v

, v

(Optimal pair)

1: procedure RRHC–OPTIMIZATION

2: //Random Search

3: n ← number of points for random search

4: currentv

← Random–Solution(C)

5: currentv

← Optimal–Pair(currentv

)

6: current ← CCQM(currentv

,currentv

)

7: for (i = 1; i ≤ n; i + +) do

8: candv

← Random–Solution(C)

9: candv

← Optimal–Pair(candv

)

10: candidate ← CCQM(candv

,candv

)

11: if candidate > current then

12: Maxv

← candv

13: current ← candidate

14: end if

15: end for

16: //Hill Climbing

17: currentv

← Maxv

18: currentv

← Optimal–Pair(currentv

)

19: current ← CCQM(currentv

,currentv

)

20: for k ∈ neighbors(currentv

) do

21: candv

← currentv

.neighbor(k)

22: candv

← Optimal–Pair(candv

)

23: candidate ← CCQM(candv

,candv

)

24: if candidate > current then

25: current ← candidate

26: v

← candv

27: v

← candv

28: end if

29: end for

30: Return(v

)

31: end procedure

Camera Placement Optimization Conditioned on Human Behavior and 3D Geometry

231

ceiling to position the camera and the second on the

ﬂoor to point the camera towards. Hence the param-

eter ω contains a pair of 3D points {v

}. A varia-

tion of the hill climbing algorithm called the random-

restart hill climbing (RRHC) algorithm is used for

ﬁnding the optimum parameter ω

∗

. Random-restart

hill climbing is an optimization search that provides

near optimal performance (Zhang et al., 2014; Filho

et al., 2010). The idea is to search a limited number

of points randomly and choose the best start location

for hill climbing optimization. Since the objective is

to ﬁnd two points, one on the ﬂoor and the second on

the ceiling, this is done at two levels.

Optimal Pair: This algorithm takes as input a

point on the ceiling (v

) along with the list of points

on the ﬂoor as input and performs RRHC optimiza-

tion to ﬁnd the optimal pair v

(a point on the ﬂoor)

for v

that maximizes CCQM (Algorithm 1).

RRHC Optimization: This algorithm takes as in-

put a list of points representing the ceiling (C) and

another list representing the points on the ﬂoor (F)

and performs RRHC to ﬁnd the optimal parameters

} that maximizes CCQM for a camera, where

is a point to position the camera and v

is a point

for orienting the camera towards (Algorithm 2).

3.4 Framework

The framework for obtaining the optimal parameters

{ω

∗

,ω

∗

,...,ω

∗

} given the geometry G is described in

this section. The framework design is shown in Fig-

ure 3, which contains three modules.

1. Model: In this module, the infrastructure is mod-

eled. This requires domain knowledge regard-

ing the infrastructure such as entrances, exits and

doors (nodes). Furthermore, knowledge regarding

the frequency of accessing these nodes is also re-

quired. The output is a list of transitions between

Figure 3: Framework with three modules, model, data gen-

eration, and RRHC optimizer for obtaining the optimal pa-

rameters {ω

∗

,ω

∗

,..., ω

∗

nodes.

2. Data Generation: In this module, the data re-

quired for optimization is generated. The input

is the list of node transitions from the previous

module. First a list of trajectories are generated

using CTF for each pair of nodes from the list.

These are the list of trajectories described in sec-

tion 3.2 for quantifying the amount of preferred

pose of objects of interest. These trajectories are

then given as input to a sub-module that accumu-

lates the trajectories to create an occupancy map

that describes the frequency with which humans

access the geometry. This occupancy map is the

function O(t) described in section 3.2 for quan-

tifying the amount of view of regions with dom-

inant activity. Then the occupancy map is also

input to a clustering algorithm to cluster points

based on their occupancy and spacial location in

the geometry.

3. RRHC Optimizer: Each one of these clusters

obtained is given as input to optimizers for ﬁnd-

ing the optimized conﬁguration {ω

∗

,ω

∗

,...,ω

∗

}

for each cluster.

4 EXPERIMENTS

4.1 Implementation

4.1.1 Model

Given the geometry of an infrastructure, most humans

follow trajectories with a goal of reaching a destina-

tion like an entrance, exit or a doorway. There is a cer-

tain probability associated with accessing these nodes

based on the purpose they serve in the infrastructure.

For example at an airport, passengers might access the

ticket counter with a higher probability than a coffee

shop or a restroom. The knowledge of this probability

can be used to sample nodes that humans can transi-

tion between. Let us consider the following test case

scenario. In Figure 4, the objective was to install a

network of cameras that provide effective surveillance

in the hallway.

Figure 4: Floor plan of the test case scenario where the cam-

eras are to be placed. The nodes are labeled with numbers.

VISAPP 2016 - International Conference on Computer Vision Theory and Applications

232

Create Nodes and Probability Distribution:

The identiﬁed nodes are labeled with numbers in Fig-

ure 4. Let {n

,...} be the nodes in the geometry

G. In the absence of any observations of human mo-

tion, the probability of accessing a node was assumed

to be proportional to the accommodation capacity of

the room unless it was an entrance or exit. Implying

that higher the capacity of a room to hold/seat peo-

ple, the higher was the probability of accessing it. If

) is a probability function that assigns probabil-

ity to a node n

and A

) is its accommodation ca-

pacity, then

) ∝



0 if n

= entry/exit

) otherwise

(10)

Sampling Algorithm: The sampling algorithm

was designed based on few assumption. A human en-

tering the geometry G would eventually exit. A hu-

man would access a minimum of one node before ex-

iting the geometry. Algorithm 3 describes the steps.

Algorithm 3: Nodes Sampling.

1: Choose a random entrance

2: Choose a node to access using P

as distribution

3: Choose randomly to either exit or access another

node

4: if access another node then

5: Choose another node excluding the current

node

6: Goto step 3

7: else

8: Choose a random exit

9: end if

In the example geometry in Figure 4, an en-

try (4, 7) was chosen with equal probability, then a

node was chosen that is not an exit based on the as-

signed probability (P

). Now assuming that the hu-

man had transitioned to the node, the human could ei-

ther choose to transition to another node or exit with

equal probability. If the human chose to exit, the clos-

est exit was chosen, else the human would choose to

go to another node based on a calculated probability.

The probability of choosing the second node changed

because the node that the human was currently in was

eliminated when calculating the probabilities. This

gave a list of nodes {n

,...} that can be used as

start and end nodes for simulating trajectories.

4.1.2 Data Generation

Given the geometry of the environment along with

the nodes and their assigned probabilities, the likely

Figure 5: Occupancy map (O(t)) of the hallway obtained

by mapping multiple simulated trajectories. Red indicating

regions of dominant activity and blue with minor activity.

human motion in the infrastructure was simulated to

identify regions of dominant human activity.

Contextual Trajectory Forecasting (CTF):

CTF (Mantini and Shah, 2014) was used to simulate

trajectories from the start node to the end node.

Given the 3D geometry of the environment and the

starting point and destination of a human, CTF is

assembled on two assumptions. First, the human

would follow a path that requires the shortest time

to reach the destination, and second, the human

would adhere to certain behavioral norms that are

observed when walking in hallways. CTF uses a

Markov model and assigns probabilities to points on

the ﬂoor such that consecutive points are sampled

from start to destination to form a trajectory that

represents the shortest path while conforming to

observed behavioral norms. CTF can take any pair of

nodes {n

} from the previous step and produce a

trajectory T

i j

= {n

, p

,....,n

Create Occupancy Map (O(t)): In this step,

multiple pairs of nodes were generated as described in

the previous step. These generated nodes were input

to CTF to obtain a set of trajectories τ = {T

,...,}.

These are the set of trajectories used for quantifying

the preferred pose of objects of interest as described

in section 3.2. These trajectories were mapped to

the ﬂoor in the geometry to create an occupancy map

O(t

) which quantiﬁes the number of times a trajec-

tory passes through a triangle t

as used in quantifying

the amount of view of regions with dominant activity

in section 3.2. A snapshot of the occupancy map from

the simulated trajectories T in G is shown in Figure 5.

Clustering Algorithm: The regions that belong

to the same cluster should have a similar value of oc-

cupancy and also be located in the same spacial loca-

tion. A point’s spatial co-ordinates and it’s occupancy

,O(t

)) were used as features, where c

= {x

}

are the 3D co-ordinates of the centroid of triangle t

and O(t

) it’s occupancy. The clusters obtained by us-

ing Expectation Maximization (EM) (Dempster et al.,

Figure 6: Clusters of regions with dominant activity in the

geometry obtained by EM algorithm.

Camera Placement Optimization Conditioned on Human Behavior and 3D Geometry

233

Table 1: Identiﬁed clusters and their mean occupancies.

No. Cluster Occupancy

1 Blue 0.23

2 Red 0.42

3 Green 0.13

4 Aqua 0.11

5 Light Pink 0

6 Pink 0.11

1977) are shown in Figure 6. In this scenario, red

cluster was identiﬁed to have the highest average hu-

man occupancy followed by blue and then pink as

shown in Table 1.

4.1.3 RRHC Optimization

Once the clusters are identiﬁed, the optimization is

applied on each cluster separately. Given a cluster,

ﬁrst the points in the ceiling that have a view of the

centroid of the cluster are identiﬁed and these points

are considered as the possible location of the cameras.

The only possible orientation for a camera are point-

ing towards the points on the ﬂoor in the cluster. This

would simplify the problem to ﬁnding two points, one

on the ceiling to position the camera and the second

on the ﬂoor to point the camera towards. As described

in section 3.3, random restart hill climbing optimiza-

tion was performed to ﬁnd the two optimal points.

4.2 Results

The motivation for this work was to optimize the cam-

era placement in the geometry to provide effective

surveillance as deﬁned in section 1. A conﬁguration

of cameras in a geometry is considered to provide ef-

fective surveillance if it maximizes the below quanti-

ties while minimizing the number of cameras. Such a

system is effective both in terms of surveillance and

cost. Hence all the quantities used for comparison are

normalized by the number of cameras in the conﬁgu-

ration.

• Area of Observable Space in View: The total

area accessible by humans in view of the camera

is calculated for all the cameras and normalized.

• Amount of Activity in View: To quantify the oc-

cupancy of a location that is in view, the activity

produced in that location is considered. The num-

ber of frames that have motion in them are used as

a metric to deﬁne the activity of the location that

is viewed from the camera. The normalized value

is used as a metric.

• Pose of Objects of Interest and their Resolu-

tion: Assuming that a certain number of pixels are

required for face detection. Face detection is used

to quantify the pose of objects of interest along

with their resolution. The number of faces de-

tected are counted for every camera in the con-

ﬁguration and normalized.

The above metrics are deﬁned to assess these quali-

ties in a conﬁguration of cameras. The conﬁguration

generated by the proposed method is compared to the

following method.

• 3 Coloring Solution (Fisk, 1978): A solution

to Art Gallery Problem (AGP) was obtained us-

ing the 3 coloring solution and the cameras were

placed at these locations. This conﬁguration was

used as baseline. The geometry of the environ-

ment’s polygon contains holes. The polygon was

modiﬁed to remove the holes and then 3 coloring

solution was computed for the polygon. The cam-

eras were manually placed to maximize the area

in view. The solution is as shown in Figure 7.

Figure 7: Conﬁguration of cameras obtained by computing

3 coloring solution to AGP.

• (Janoos et al., 2007): Janoos et al. deﬁned cell

coverage quality metric by taking observed hu-

man occupancy and resolution into account. This

metric was used to optimize the camera location

for each cluster. The following conﬁguration was

obtained, see Figure 8.

Figure 8: Conﬁguration of cameras obtained by optimizing

the cell coverage quality metric proposed bt Janoos et al.

2007 for each cluster.(Janoos et al., 2007).

• (Huang et al., 2014): Huang et al. proposed a

shortest watchman route solution and positioned

wireless cameras along the route to maximize the

view area of the polygon. Their solution was pro-

posed only for simple polygons with out holes and

hence the modiﬁed polygon was used in this case

as well. The obtained conﬁguration is shown in

Figure 9.

Figure 9: Conﬁguration of cameras obtained by ﬁnding the

shortest watchman route in the geometry as proposed by

Huang et al. 2014(Huang et al., 2014).

VISAPP 2016 - International Conference on Computer Vision Theory and Applications

234

Figure 10: Conﬁguration of cameras obtained from the pro-

posed method.

• Proposed Method: The obtained conﬁguration

from the proposed method is shown in Figure 10,

and the view from the cameras are shown in Fig-

ure 11.

Cam. 1 Cam. 2

Cam. 3 Cam. 4

Cam. 5

Figure 11: Camera view from the cameras deployed in the

test case scenario as calculated by the proposed method.

Table 2 shows the area under view per camera.

Although, 3 coloring solution and Huang et al. has

higher area coverage, the number of cameras used is

higher than that of the proposed method and the area

in view per camera is higher for the proposed method.

Table 2: Comparison of area and activity in view per cam-

era.

Method no. of cams. Area/cam Activity/cam

3 Coloring 8 0.057 28048.6

Janoos 5 0.01 44366.2

Huang 10 0.064 40092.5

Proposed 5 0.109 69933.8

All cameras used for experiments had a frame rate

of 30fps. For each camera, the number of frames in

which there is activity is counted using background

subtraction. The average number of frames per cam-

era are shown in Table 2. Most activity per camera

was observed in the proposed method.

For each of these methods, a day’s worth of data

(10 hours) was collected. We have run face detec-

tion (Viola and Jones, 2001; Lienhart and Maydt,

2002) on these videos to count the number of faces

Table 3: Faces counted from individual cameras in the pro-

posed method.

Camera Faces

Cam. 1 622

Cam. 2 3430

Cam. 3 5929

Cam. 4 915

Cam. 5 1930

captured. The number of faces captured for each cam-

era are shown in Table 3. It can be noticed that Cam.

3 has the highest number of faces detected followed

by Cam. 2. Cam. 3 is over-viewing the common

hallway represented by the red cluster (see Table 1)

with the highest simulated occupancy value. The av-

erage number of faces detected for each method are

shown in Table 4. Approximately the same total num-

ber of faces were detected by 3 coloring solution and

the proposed method, except for 3 coloring solution

uses 8 cameras and the proposed method uses only 5

cameras. Using Huang et al. more than twice the to-

tal number of faces were detected than the proposed

method but the number of cameras used were also

twice as many than the proposed method. More than

a quarter of the faces detected by Huang et al. con-

ﬁguration were from a single camera of the 10 cam-

eras, which coincidentally happened to be focused at

an elevator where people tend to stand and wait. The

method proposed by Janoos et al. focuses on areas

with high human occupancy and takes resolution of

the triangle into account as opposed to the proposed

method which uses the resolution of the approximate

location of the face and hence their cameras are lo-

cated above the regions of dominant human occu-

pancy and fails to capture faces.

Table 4: Comparison of faces detected per camera.

Method cameras Faces/cam

3 Coloring 8 1264

Janoos 5 1111.8

Huang 10 2040.5

Proposed 5 2183.6

Although the proposed system performs better

over the state of the art systems, some necessary im-

provements are to be taken into consideration. As no-

ticed in Huang et al. conﬁguration, signiﬁcant num-

ber of faces were captured by focusing a camera at

the elevator. This can be considered as a draw back

of the proposed system and all the others being com-

pared to, as none of the systems take the entrances

and exits into consideration which could be valuable

for surveillance. It would be interesting to incorpo-

rate a method to include the entrances and exits in the

Camera Placement Optimization Conditioned on Human Behavior and 3D Geometry

235

analysis. A method to estimate the number of cam-

eras required for each cluster depending on the size

of the cluster can be useful. If the cluster is big, it

might be interesting to assign multiple cameras and

incorporate a MCLP/BCLP problem formulation for

optimization to ensure maximal coverage.

5 CONCLUSION

We have proposed an algorithm to optimize the place-

ment of surveillance cameras in a 3D infrastructure

by predicting the possible human behavior within the

infrastructure. We have proposed a method to iden-

tify regions with dominant human activity. We have

also proposed a metric that quantiﬁes the position of a

camera based on the observable space, activity in this

space, pose of objects of interest within the activity

and their image resolution in camera view for opti-

mization. This method was compared with the state

of the art algorithms and the obtained results show an

improvement in the amount of area under view, ob-

served activity and face detection rate per camera.

ACKNOWLEDGEMENT

This work was supported in part by the US Depart-

ment of Justice 2009-MU-MU-K004. Any opinions,

ﬁndings, conclusions or recommendations expressed

in this paper are those of the authors and do not nec-

essarily reﬂect the views of our sponsors.

REFERENCES

Bodor, R., Drenner, A., Schrater, P., and Papanikolopoulos,

N. (2007). Optimal camera placement for automated

surveillance tasks. Journal of Intelligent and Robotic

Systems, 50(3):257–295.

Chen, X. and Davis, J. (2000). Camera placement consid-

ering occlusion for robust motion capture. Technical

report.

Debaque, B., Jedidi, R., and Prevost, D. (2009). Optimal

video camera network deployment to support security

monitoring. In Information Fusion, 2009. FUSION

’09. 12th International Conference on, pages 1730–

1736.

Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977).

Maximum likelihood from incomplete data via the em

algorithm. JOURNAL OF THE ROYAL STATISTICAL

SOCIETY, SERIES B, 39(1):1–38.

Erdem, U. M. and Sclaroff, S. (2004). Optimal placement of

cameras in ﬂoorplans to satisfy task requirements and

cost constraints. In In Proc. of OMNIVIS Workshop.

Filho, C., de Oliveira, A., and Costa, M. (2010). Us-

ing random restart hill climbing algorithm for mini-

mization of component assembly time printed circuit

boards. Latin America Transactions, IEEE (Revista

IEEE America Latina), 8(1):23–29.

Fisk, S. (1978). A short proof of Chvtal’s Watchman Theo-

rem. Journal of Combinatorial Theory, 24.

Fleishman, S., Cohen-Or, D., and Lischinski, D. (1999).

Automatic camera placement for image-based model-

ing. In Computer Graphics and Applications, 1999.

Proceedings. Seventh Paciﬁc Conference on, pages

12–20, 315.

orster, E. and Lienhart, R. (2006a). Approximating opti-

mal visual sensor placement. In Multimedia and Expo,

2006 IEEE International Conference on, pages 1257–

1260.

orster, E. and Lienhart, R. (2006b). Calibrating and op-

timizing poses of visual sensors in distributed plat-

forms. Multimedia Systems, 12(3):195–210.

orster, E. and Lienhart, R. (2006c). On the optimal place-

ment of multiple visual sensors. In Proceedings of the

4th ACM International Workshop on Video Surveil-

lance and Sensor Networks, VSSN ’06, pages 111–

120, New York, NY, USA. ACM.

Huang, H., Ni, C.-C., Ban, X., Gao, J., Schneider, A., and

Lin, S. (2014). Connected wireless camera network

deployment with visibility coverage. In INFOCOM,

2014 Proceedings IEEE.

Janoos, F., Machiraju, R., Parent, R., Davis, J. W., and Mur-

ray, A. (2007). Sensor conﬁguration for coverage opti-

mization for surveillance applications. In SPIE, 2007

Proceedings.

Kim, K. and Murray, A. T. (2008). Enhancing spatial repre-

sentation in primary and secondary coverage location

modeling*. Journal of Regional Science, 48(4):745–

768.

Kitani, K., Ziebart, B., Bagnell, J., and Hebert, M. (2012).

Activity forecasting. Lecture Notes in Computer Sci-

ence, pages 201–214.

Lienhart, R. and Maydt, J. (2002). An extended set of haar-

like features for rapid object detection. In Image Pro-

cessing. 2002. Proceedings. 2002 International Con-

ference on, volume 1, pages I–900–I–903 vol.1.

Malik, R. and Bajcsy, P. (2008). Automated Placement of

Multiple Stereo Cameras. In The 8th Workshop on

Omnidirectional Vision, Camera Networks and Non-

classical Cameras - OMNIVIS, Marseille, France.

Rahul Swaminathan and Vincenzo Caglioti and An-

tonis Argyros.

Mantini, P. and Shah, S. (2014). Human trajectory fore-

casting in indoor environments using geometric con-

text. In Proceedings of the Ninth Indian Conference

on Computer Vision, Graphics and Image Processing,

ICVGIP ’14. ACM.

Mittal, A. and Davis, L. (2004). Visibility analysis and sen-

sor planning in dynamic environments. In Pajdla, T.

and Matas, J., editors, Computer Vision - ECCV 2004,

volume 3021 of Lecture Notes in Computer Science,

pages 175–189. Springer Berlin Heidelberg.

VISAPP 2016 - International Conference on Computer Vision Theory and Applications

236

Murray, A. T., Kim, K., Davis, J. W., Machiraju, R., and

Parent, R. (2007). Coverage optimization to support

security monitoring. Computers, Environment and

Urban Systems, 31(2):133 – 147.

Ram, S., Ramakrishnan, K. R., Atrey, P. K., Singh, V. K.,

and Kankanhalli, M. S. (2006). A design methodol-

ogy for selection and placement of sensors in mul-

timedia surveillance systems. In Proceedings of the

4th ACM International Workshop on Video Surveil-

lance and Sensor Networks, VSSN ’06, pages 121–

130, New York, NY, USA. ACM.

Sivaram, G. S. V. S., Kankanhalli, M. S., and Ramakrish-

nan, K. R. (2009). Design of multimedia surveillance

systems. ACM Trans. Multimedia Comput. Commun.

Appl., 5(3):23:1–23:25.

Tarabanis, K., Allen, P., and Tsai, R. (1995). A survey of

sensor planning in computer vision. Robotics and Au-

tomation, IEEE Transactions on, 11(1):86–104.

Viola, P. and Jones, M. (2001). Rapid object detection using

a boosted cascade of simple features. In Computer Vi-

sion and Pattern Recognition, 2001. CVPR 2001. Pro-

ceedings of the 2001 IEEE Computer Society Confer-

ence on, volume 1, pages I–511–I–518 vol.1.

Yabuta, K. and Kitazawa, H. (2008). Optimum camera

placement considering camera speciﬁcation for secu-

rity monitoring. In Circuits and Systems, 2008. ISCAS

2008. IEEE International Symposium on, pages 2114–

2117.

Zhang, Y., Lei, T., Barzilay, R., and Jaakkola, T. (2014).

Greed is Good if Randomized: New Inference for De-

pendency Parsing. In Proceedings of the 2014 Con-

ference on Empirical Methods in Natural Language

Processing (EMNLP). Association for Computational

Linguistics.

Zhou, W., Xiong, H., Ge, Y., Yu, J., Ozdemir, H., and

Lee, K. (2010). Direction clustering for characteriz-

ing movement patterns. In Information Reuse and In-

tegration (IRI), 2010 IEEE International Conference

on, pages 165–170.

Camera Placement Optimization Conditioned on Human Behavior and 3D Geometry

237