EFFICIENT DYNAMICAL COMPUTATION
OF PRINCIPAL COMPONENTS
Darko Dimitrov
1
, Mathias Holst
2
, Christian Knauer
3
and Klaus Kriegel
1
1
Freie Universit
¨
at Berlin, Institute of Computer Science, Takustraße 9, D-14195 Berlin, Germany
{ }
2
Universit
¨
at Rostock, Institute of Computer Science, Albert Einstein Straße 21, D-18059 Rostock, Germany
3
Universit
¨
at Bayreuth, Institute of Computer Science, Universit
¨
atsstraße 30, D-95447 Bayreuth, Germany
Keywords:
Principal Component Analysis, Dynamic Computation, Bounding Box Algorithms, Approximation Algo-
rithms.
Abstract:
In this paper we consider the problem of updating principal components of a point set in R
d
when points
are added or deleted from the point set. A recent result of (P
´
ebay, 2008) implies an efficient solution for
that problem when points are added to a discrete point set. Here, we extend that result for deletions in the
discrete case, and for both additions and deletions for continuous point sets in R
2
and R
3
. In both cases,
discrete and continuous, no additional data structures or storage are needed for computing the new principal
components. An important application of the above results is the dynamical computation of bounding boxes
based on principal component analysis. PCA bounding boxes are very often used in many fields, among others
in computer graphics, for example, for ray tracing, fast rendering, collision detection, or video compression
algorithms. Since some version of PCA bounding boxes have guaranties on their size (volume), they are also
of interest in applications where the guaranty of the approximation quality is required. We have designed and
implemented algorithms for computing dynamically PCA bounding boxes in R
3
.
1 INTRODUCTION
Principal component analysis (PCA) (Jolliffe, 2002)
is probably the oldest and best known of the tech-
niques of multivariate analysis. The central idea and
motivation of PCA is to reduce the dimensionality of a
point set by identifying the most significant directions
(principal components). Let P = {~p
1
,~p
2
,...,~p
n
} be a
set of vectors (points) in R
d
, and~µ = (µ
1
,µ
2
,...,µ
d
)
R
d
be the center of gravity of P. For 1 k d, we
use p
i,k
to denote the k-th coordinate of the vector p
i
.
Given two vectors ~u and ~v, we use h~u,~vi to denote
their inner product. For any unit vector ~v R
d
, the
variance of P in direction ~v is
var(P,~v) =
1
n
n
i=1
h~p
i
~µ ,~vi
2
. (1)
The most significant direction corresponds to the
unit vector ~v
1
such that var(P,~v
1
) is maximum. In
1
This research was supported by the German Research
Foundation (DFG), grant No. AL 253/6-2, Project “Entwurf
und Analyse anwendungsbezogener geometrischer Algo-
rithmen”.
general, after identifying the j most significant di-
rections ~v
1
,...,~v
j
, the ( j + 1)-th most significant di-
rection corresponds to the unit vector ~v
j+1
such that
var(P,~v
j+1
) is maximum among all unit vectors per-
pendicular to ~v
1
,~v
2
,...,~v
j
.
It can be verified that for any unit vector~v R
d
,
var(P,~v) = hΣ~v,~vi, (2)
where Σ is the covariance matrix of P. Σ is a sym-
metric d × d matrix where the (i, j)-th component,
σ
i j
,1 i, j d, is defined as
σ
i j
=
1
n
n
k=1
(p
k,i
µ
i
)(p
k, j
µ
j
). (3)
The procedure of finding the most significant di-
rections, in the sense mentioned above, can be formu-
lated as an eigenvalue problem. If λ
1
λ
2
··· λ
d
are the eigenvalues of Σ, then the unit eigenvector ~v
j
for λ
j
is the j-th most significant direction. Since the
matrix Σ is symmetric positive semidefinite, its eigen-
vectors are orthogonal, all λ
j
s are non-negative and
λ
j
= var(P,~v
j
).
85
Dimitrov D., Holst M., Knauer C. and Kriegel K..
EFFICIENT DYNAMICAL COMPUTATION OF PRINCIPAL COMPONENTS.
DOI: 10.5220/0003324800850093
In Proceedings of the International Conference on Computer Graphics Theory and Applications (GRAPP-2011), pages 85-93
ISBN: 978-989-8425-45-4
Copyright
c
2011 SCITEPRESS (Science and Technology Publications, Lda.)
Computation of the covariance matrix can be done
in O(d
2
n) time, while computation of the eigenval-
ues, when d is not very large, can be done in O(d
3
)
time, for example with the Jacobi or the QR method
(Press et al., 1995). Thus, the time complexity of
computing principal components of n points in R
d
is
O(d
2
n + d
3
). The multiplicative factor of O(d
2
) and
the additive factor of O(d
3
) throughout the paper will
be omitted, since we will assume that d is fixed. For
very large d, the problem of computing eigenvalues
is non-trivial. In practice, the above mentioned meth-
ods for computing eigenvalues converge rapidly. In
theory, it is unclear how to bound the running time
combinatorially and how to compute the eigenvalues
in decreasing order. In (Cheng and Y. Wang, 2008)
a modification of the Power method (Parlett, 1998) is
presented, which can give a guaranteed approxima-
tion of the eigenvalues with high probability.
Examples of applications of PCA include data
compression, exploratory data analysis, visualization,
image processing, pattern and image recognition,
time series prediction, detecting perfect and reflec-
tive symmetry, and dimension detection. The thor-
ough overview of PCAs applications can be found
for example in the textbooks (Duda et al., 2001) and
(Jolliffe, 2002). Most of the applications of PCA are
non-geometric in their nature. However, there are
also few purely geometric applications that are quite
widespread in computer graphics. Example are the
estimation of the undirected normals of the point sets
or computing PCA bounding boxes (bounding boxes
determined by the principal components of the point
set).
Contributions and Organization of the Paper. Dy-
namic versions of the above applications, i.e., when
the point set (population) changes, are of big impor-
tance and interest. Efficient solutions of those prob-
lems depend heavily on an efficient dynamic compu-
tation of the principal components (eigenvectors of
the covarince matrix). Dynamic updates of variances
in different settings have been studied since the six-
ties (Chan et al., 1979), (Knuth, 1998), (P
´
ebay, 2008),
(Welford, 1962), (West, 1979). Recently, in a techni-
cal report (P
´
ebay, 2008) also investigated the dynamic
maintenance of covariance matrices. Our contribution
extends these results in the following directions:
1. We also take into account the operation of point
deletions.
2. We study the dynamic computation of principal
components in the continuous version.
3. We combine the dynamic PCA versions with ef-
ficient methods for computing PCA bounding
boxes.
We consider the computation of the dynamic PCA
bounding boxes, since it has very important appli-
cations in many fields including computer graphics,
where the PCA boxes are used, for example, for ray
tracing, fast rendering, video compression algorithms,
or collision detection. Two distinguished hierarchi-
cal data structures from computer graphics used for
representation of 3D surfaces and for rapid interfer-
ence detection, based on PCA bounding boxes, are
the Boxtree (Barequet et al., 1996) and the OBBTree
(Gottschalk et al., 1996). We would like to stress, that
PCA bounding boxes are also of interest in applica-
tions where the guaranty of the approximation qual-
ity is required, since some version of PCA bounding
boxes have guaranties on their size (Dimitrov et al.,
2009b). Based on the theoretical results in this paper,
we have implemented several algorithms for comput-
ing PCA bounding boxes dynamically.
The organization and the main results of the paper
are as follows: In Section 2 we consider the prob-
lem of updating the principal components of a set of
n points, when m points are added or deleted from
the point set. For both operations performed on a dis-
crete point set in R
d
, we can compute the new prin-
cipal components in O(m) time for fixed d. This is a
significant improvement over the commonly used ap-
proach of recomputing the principal components from
scratch, which takes O(n + m) time. We also consider
the computation of the principal components of a dy-
namic continuous point set. We give closed form so-
lutions when the point set is a convex polytope R
3
.
Due to the space limitation, the cases when the point
set is the boundary of a convex polytope in R
2
or R
3
,
or a convex polygon in R
2
, are left for an extended
version of this paper. In Section 3 we present and
verify the correctness of some theoretical results pre-
sented in the Section 2. We have implemented several
dynamic PCA bounding box algorithms and evaluated
their performances. Conclusion and open problems
are presented in Section 4.
2 UPDATING THE PRINCIPAL
COMPONENTS EFFICIENTLY
2.1 Discrete Case in R
d
Here, we consider the problem of updating the
covariance matrix Σ of a discrete point set P =
{~p
1
,~p
2
,...,~p
n
} in R
d
, when m points are added or
deleted from P. The recent result of (P
´
ebay, 2008)
implies the same solution that we have obtained for
additions. Since the derivation of the closed form so-
GRAPP 2011 - International Conference on Computer Graphics Theory and Applications
86
lutions for deletions is similar to that for additions,
and due to the space limitation here, we just state the
main result related to the discrete points sets without
proof.
Theorem 2.1 Let P be a set of n points in R
d
with
known covariance matrix Σ. Let P
0
be a point set in
R
d
, obtained by adding or deleting m points from P.
The principal components of P
0
can be computed in
O(m) time for fixed d.
The principal components of discrete point sets can
be strongly influenced by point clusters (Dimitrov
et al., 2009b). To avoid the influence of the distribu-
tion of the point set, often continuous sets, especially
the convex hull of a point set is considered, which
lead to so-called continuous PCA. Computing PCA
bounding boxes (Gottschalk et al., 1996), (Dimitrov
et al., 2009a), or retrieval of 3D-objects (Vrani
´
c et al.,
2001), are typical applications where continuous PCA
are of interest.
2.2 Continuous Case in R
3
Here, we consider the computation of the principal
components of a dynamic continuous point set. We
present a closed form-solutions when the point set is
a convex polytope or the boundary of a convex poly-
tope in R
2
or R
3
. When the point set is the boundary
of a convex polytope, we can update the new principal
components in O(k) time, for both deletion and addi-
tion, under the assumption that we know the k facets
in which the polytope changes. Under the same as-
sumption, when the point set is a convex polytope in
R
2
or R
3
, we can update the principal components in
O(k) time after adding points. But, to update the prin-
cipal components after deleting points from a convex
polytope in R
2
or R
3
we need O(n) time. This is due
to the fact that, after a deletion the center of gravity of
the old convex hull (polyhedron) could lie outside the
new convex hull, and therefore, a retetrahedralization
is needed (see Subsection 2.2.1).
Due to the space limitation, we present in this sec-
tion only the closed-form solutions for a convex poly-
tope in R
3
, and leave the cases when the point set is
the boundary of a convex polytope in R
2
or R
3
, or a
convex polygon in R
2
, for an extended version of this
paper.
2.2.1 Continuous PCA over a Convex
Polyhedron in R
3
Let P be a point set in R
3
, and let X be its convex
hull. We assume that the boundary of X is triangula-
ted (if it is not, we can triangulate it in a preprocessing
step). We choose an arbitrary point ~o in the interior of
X, for example, we can choose ~o to be the center of
gravity of the boundary of X. Each triangle from the
boundary together with ~o forms a tetrahedron. Let
the number of such formed tetrahedra be n. The k-th
tetrahedron, with vertices ~x
1,k
,~x
2,k
,~x
3,k
,~x
4,k
= ~o, can
be represented in a parametric form by
~
Q
k
(s,t,u) =
~x
4,k
+ s(~x
1,i
~x
4,k
) +t (~x
2,i
~x
4,k
) +u (~x
3,i
~x
4,k
), for
0 s,t, u 1, and s +t + u 1. For 1 i 3, we use
x
i, j,k
to denote the i-th coordinate of the vertex ~x
j
of
the tetrahedron
~
Q
k
.
The center of gravity of the k-th tetrahedron is
~µ
k
=
R
1
0
R
1s
0
R
1st
0
ρ(
~
Q
k
(s,t,u))
~
Q
i
(s,t,u)du dt ds
R
1
0
R
1s
0
R
1st
0
ρ(
~
Q
k
(s,t,u))du dt ds
,
where ρ(
~
Q
k
(s,t,u)) is a mass density at a point
~
Q
k
(s,t,u). Since we can assume ρ(
~
Q
k
(s,t,u)) = 1,
we have
~µ
k
=
R
1
0
R
1s
0
R
1st
0
~
Q
k
(s,t,u)du dt ds
R
1
0
R
1s
0
R
1st
0
du dt ds
=
~x
1,k
+~x
2,k
+~x
3,k
+~x
4,k
4
.
The contribution of each tetrahedron to the center of
gravity of X is proportional to its volume. If M
k
is the
3×3 matrix whose l-th row is~x
l,k
~x
4,k
, for l = 1...3,
then the volume of the k-th tetrahedron is
v
k
= volume(Q
k
) =
|det(M
k
)|
3!
.
We introduce a weight to each tetrahedron that is pro-
portional to its volume, define as
w
k
=
v
k
n
k=1
v
k
=
v
k
v
,
where v is the volume of X. Then, the center of grav-
ity of X is
~µ =
n
k=1
w
k
~µ
k
.
The covariance matrix of the k-th tetrahedron is
Σ
k
=
R
1
0
R
1s
0
R
1st
0
(
~
Q
k
(s,t,u)~µ)(
~
Q
k
(s,t,u)~µ)
T
du dt ds
R
1
0
R
1s
0
R
1st
0
du dt ds
=
1
20
4
j=1
4
h=1
(~x
j,k
~µ)(~x
h,k
~µ)
T
+
4
j=1
(~x
j,k
~µ)(~x
j,k
~µ)
T
.
The (i, j)-th element of Σ
k
, i, j {1, 2, 3}, is
σ
i j,k
=
1
20
4
l=1
4
h=1
(x
i,l,k
µ
i
)(x
j,h,k
µ
j
)+
4
l=1
(x
i,l,k
µ
i
)(x
j,l,k
µ
j
)
,
with ~µ = (µ
1
,µ
2
,µ
3
). Finally, the covariance ma-
trix of X is
Σ =
n
i=1
w
i
Σ
i
,
EFFICIENT DYNAMICAL COMPUTATION OF PRINCIPAL COMPONENTS
87
with (i, j)-th element
σ
i j
=
1
20
n
k=1
4
l=1
4
h=1
w
i
(x
i,l,k
µ
i
)(x
j,h,k
µ
j
)+
n
k=1
4
l=1
w
i
(x
i,l,k
µ
i
)(x
j,l,k
µ
j
)
.
We would like to note that the above expressions
hold also for any non-convex polyhedron that can be
tetrahedralized. A star-shaped object, where ~o is the
kernel of the object, is such example.
Adding Points
We add points to P, obtaining a new point set P
0
. Let
X
0
be the convex hull of P
0
. We consider that X
0
is
obtained from X by deleting n
d
, and adding n
a
tetra-
hedra. Let
v
0
=
n
k=1
v
k
+
n+n
a
k=n+1
v
k
n+n
a
+n
d
k=n+n
a
+1
v
k
= v +
n+n
a
k=n+1
v
k
n+n
a
+n
d
k=n+n
a
+1
v
k
.
The weight of a tetrahedron Q
k
is now
w
0
k
=
v
k
v
0
.
The center of gravity of X
0
is
~µ
0
=
n
k=1
w
0
k
~µ
k
+
n+n
a
k=n+1
w
0
k
~µ
k
n+n
a
+n
d
k=n+n
a
+1
w
0
k
~µ
k
=
1
v
0
n
k=1
v
k
~µ
k
+
n+n
a
k=n+1
v
k
~µ
k
n+n
a
+n
d
k=n+n
a
+1
v
k
~µ
k
!
=
1
v
0
v~µ +
n+n
a
k=n+1
v
k
~µ
k
n+n
a
+n
d
k=n+n
a
+1
v
k
~µ
k
!
(4)
Let
~µ
a
=
1
v
0
n+n
a
k=n+1
v
k
~µ
k
, and ~µ
d
=
1
v
0
n+n
a
+n
d
k=n+n
a
+1
v
k
~µ
k
.
Then, we can rewrite (4) as
~µ
0
=
v
v
0
~µ +~µ
a
~µ
d
. (5)
The i-th component of~µ
a
and~µ
d
, 1 i 3, is denoted
by µ
i,a
and µ
i,d
, respectively. The (i, j)-th component,
σ
0
i j
, 1 i, j 3, of the covariance matrix Σ
0
of X
0
is
σ
0
i j
=
1
20
n
k=1
4
l=1
4
h=1
w
0
k
(x
i,l,k
µ
0
i
)(x
j,h,k
µ
0
j
)+
n
k=1
4
l=1
w
0
k
(x
i,l,k
µ
0
i
)(x
j,l,k
µ
0
j
)+
n+n
a
k=n+1
4
l=1
4
h=1
w
0
k
(x
i,l,k
µ
0
i
)(x
j,h,k
µ
0
j
)+
n+n
a
k=n+1
4
l=1
w
0
k
(x
i,l,k
µ
0
i
)(x
j,l,k
µ
0
j
)
n+n
a
+n
d
k=n+n
a
+1
4
l=1
4
h=1
w
0
k
(x
i,l,k
µ
0
i
)(x
j,h,k
µ
0
j
)
n+n
a
+n
d
k=n+n
a
+1
4
l=1
w
0
k
(x
i,l,k
µ
0
i
)(x
j,l,k
µ
0
j
)
.
Let
σ
0
i j
=
1
20
(σ
0
i j,11
+ σ
0
i j,12
+ σ
0
i j,21
+ σ
0
i j,22
σ
0
i j,31
σ
0
i j,32
),
where
σ
0
i j,11
=
n
k=1
4
l=1
4
h=1
w
0
k
(x
i,l,k
µ
0
i
)(x
j,h,k
µ
0
j
), (6)
σ
0
i j,12
=
n
k=1
4
l=1
w
0
k
(x
i,l,k
µ
0
i
)(x
j,l,k
µ
0
j
), (7)
σ
0
i j,21
=
n+n
a
k=n+1
4
l=1
4
h=1
w
0
k
(x
i,l,k
µ
0
i
)(x
j,h,k
µ
0
j
), (8)
σ
0
i j,22
=
n+n
a
k=n+1
4
l=1
w
0
k
(x
i,l,k
µ
0
i
)(x
j,l,k
µ
0
j
), (9)
σ
0
i j,31
=
n+n
a
+n
d
k=n+n
a
+1
4
l=1
4
h=1
w
0
k
(x
i,l,k
µ
0
i
)(x
j,h,k
µ
0
j
),
(10)
σ
0
i j,32
=
n+n
a
+n
d
k=n+n
a
+1
4
l=1
w
0
k
(x
i,l,k
µ
0
i
)(x
j,l,k
µ
0
j
). (11)
GRAPP 2011 - International Conference on Computer Graphics Theory and Applications
88
Plugging-in the values of µ
0
i
and µ
0
j
in (6), we obtain:
σ
0
i j,11
=
n
k=1
4
l=1
4
h=1
w
0
k
(x
i,l,k
v
v
0
µ
i
µ
i,a
+ µ
i,d
)·
(x
j,h,k
v
v
0
µ
j
µ
j,a
+ µ
j,d
)
=
n
k=1
4
l=1
4
h=1
w
0
k
(x
i,l,k
µ
i
+ µ
i
(1
v
v
0
) µ
i,a
+ µ
i,d
)
(x
j,h,k
µ
j
+ µ
j
(1
v
v
0
) µ
j,a
+ µ
j,d
)
=
n
k=1
4
l=1
4
h=1
w
0
k
(x
i,l,k
µ
i
)(x
j,h,k
µ
j
)
+
n
k=1
4
l=1
4
h=1
w
0
k
(x
i,l,k
µ
i
)(µ
j
(1
v
v
0
) µ
j,a
+ µ
j,d
)
+
n
k=1
4
l=1
4
h=1
w
0
k
(µ
i
(1
v
v
0
) µ
i,a
+ µ
i,d
)(x
j,h,k
µ
j
)
+
n
k=1
4
l=1
4
h=1
w
0
k
(µ
i
(1
v
v
0
) µ
i,a
+ µ
i,d
)·
(µ
j
(1
v
v
0
) µ
j,a
+ µ
j,d
).
(12)
Since
n
k=1
4
l=1
w
0
k
(x
i,l,k
µ
i
) = 0, 1 i 3, we have
σ
0
i j,11
=
1
v
0
n
k=1
4
l=1
4
h=1
v
k
(x
i,l,k
µ
i
)(x
j,h,k
µ
j
)
+
1
v
0
n
k=1
4
l=1
4
h=1
v
k
(µ
i
(1
v
v
0
) µ
i,a
+ µ
i,d
)·
(µ
j
(1
v
v
0
) µ
j,a
+ µ
j,d
)
=
1
v
0
n
k=1
4
l=1
4
h=1
v
k
(x
i,l,k
µ
i
)(x
j,h,k
µ
j
)
+ 16
v
v
0
(µ
i
(1
v
v
0
) µ
i,a
+ µ
i,d
)·
(µ
j
(1
v
v
0
) µ
j,a
+ µ
j,d
).
(13)
Plugging-in the values of µ
0
i
and µ
0
j
in (7), we obtain:
σ
0
i j,12
=
n
k=1
4
l=1
w
0
k
(x
i,l,k
v
v
0
µ
i
µ
i,a
+ µ
i,d
)·
(x
j,h,k
v
v
0
µ
j
µ
j,a
+ µ
j,d
)
=
n
k=1
4
l=1
w
0
k
(x
i,l,k
µ
i
+ µ
i
(1
v
v
0
) µ
i,a
+ µ
i,d
)·
(x
j,h,k
µ
j
+ µ
j
(1
v
v
0
) µ
j,a
+ µ
j,d
)
=
n
k=1
4
l=1
w
0
k
(x
i,l,k
µ
i
)(x
j,h,k
µ
j
)
+
n
k=1
4
l=1
w
0
k
(x
i,l,k
µ
i
)(µ
j
(1
v
v
0
) µ
j,a
+ µ
j,d
)
+
n
k=1
4
l=1
w
0
k
(µ
i
(1
v
v
0
) µ
i,a
+ µ
i,d
)(x
j,h,k
µ
j
)
+
n
k=1
4
l=1
w
0
k
(µ
i
(1
v
v
0
) µ
i,a
+ µ
i,d
)·
(µ
j
(1
v
v
0
) µ
j,a
+ µ
j,d
).
(14)
Since
n
k=1
4
l=1
w
0
k
(x
i,l,k
µ
i
) = 0, 1 i 3, we have
σ
0
i j,12
=
1
v
0
n
k=1
4
l=1
v
k
(x
i,l,k
µ
i
)(x
j,h,k
µ
j
)
+
1
v
0
n
k=1
4
l=1
v
k
(µ
i
(1
v
v
0
) µ
i,a
+ µ
i,d
)·
(µ
j
(1
v
v
0
) µ
j,a
+ µ
j,d
)
=
1
v
0
n
k=1
4
l=1
v
k
(x
i,l,k
µ
i
)(x
j,h,k
µ
j
)
+ 4
v
v
0
(µ
i
(1
v
v
0
) µ
i,a
+ µ
i,d
)·
(µ
j
(1
v
v
0
) µ
j,a
+ µ
j,d
).
(15)
From (14) and (15), we obtain
σ
0
i j,1
= σ
0
i j,11
+ σ
0
i j,12
= σ
i j
+ 20
v
v
0
(µ
i
(1
v
v
0
) µ
i,a
+ µ
i,d
)·
(µ
j
(1
v
v
0
) µ
j,a
+ µ
j,d
).
(16)
Note that σ
0
i j,1
can be computed in O(1) time. The
components σ
0
i j,21
and σ
0
i j,22
can be computed in
O(n
a
) time, while O(n
d
) time is needed to compute
σ
0
i j,31
and σ
0
i j,32
. Thus,
~
µ
0
and
σ
0
i j
=
1
20
(σ
0
i j,11
+ σ
0
i j,12
+ σ
0
i j,21
+ σ
0
i j,22
+ σ
0
i j,31
+ σ
0
i j,32
)
=
1
20
(σ
i j
+ σ
0
i j,21
+ σ
0
i j,22
+ σ
0
i j,31
+ σ
i j,32
)
+
v
v
0
(µ
i
(1
v
v
0
) µ
i,a
+ µ
i,d
)(µ
j
(1
v
v
0
) µ
j,a
+ µ
j,d
)
(17)
can be computed in O(n
a
+ n
d
) time.
EFFICIENT DYNAMICAL COMPUTATION OF PRINCIPAL COMPONENTS
89
Deleting Points
Let the new convex hull be obtained by deleting n
d
tetrahedra from and adding n
a
tetrahedra to the old
convex hull. If the interior point ~o (needed for a
tetrahedronization of a convex polytope), after sev-
eral deletions, lies inside the new convex hull, then
the same formulas and time complexity, as by adding
points, follow. If ~o lie outside the new convex hull,
then, we need to choose a new interior point~o
0
, and re-
compute the new tetrahedra associated with it. Thus,
we need in total O(n) time to update the principal
components.
3 PRACTICAL VARIANTS OF
DYNAMICAL PCA BOUNDING
BOXES AND EXPERIMENTAL
RESULTS
The main focus in this section is to show the advan-
tages of the theoretical results presented in this pa-
per in the context of computing dynamic PCA bound-
ing boxes. We present three practical simple algo-
rithms, and compare their performances. The algo-
rithms were implemented in C#, C++ and OpenGL,
and tested on a Core Duo 2.33GHz with 2GB mem-
ory. All algorithms use the result mentioned in Sec-
tion 2.1 to compute the principal components. They
differ only in how the extremal points along the prin-
cipal components are found. The implemented algo-
rithms are the following:
PCA-AP (PCA-all-points) - finds the extremal
points by going through all points.
PCA-AGP (PCA-all-grid-points) - the space is
discretized by a regular three dimensional axis-
aligned grid, with cells of size ε ×ε ×ε. See Fig-
ure 1 for an illustration. The grid size is chosen
relatively to the size of the object. Each object
is scaled such that its diameter is 1. The val-
ues of ε are between 0.001 and 1. The corners
of non-empty cells are candidates for extremal
points along the principal directions.
PCA-EGP (PCA-extremal-grid-points) - this is
an improvement of the PCA-AGP algorithm. To
each vertical grid line, i.e., orthogonal to the XY
plane, two extremal corners of the non-empty cell
are computed. Thus, we reduced the number
of candidates for extremal points from O(
1
ε
3
) to
O(
1
ε
2
).
We further reduce the number of points consid-
ered in the PCA-AGP and PCA-EGP algorithms by
replacing the cell corners with the centers of gravity
of the cells. Afterwords, we expand the resulting box
by
3ε/2 to ensure that the box contains all origi-
nal points. We have implemented also these variants,
but, since for a reasonable big grid size (ε 0.01) the
running time improvements are negligible, we report
here only the results of the base variants of the algo-
rithms PCA-AGP and PCA-EGP. For very dense grid
the improved version of the both algorithms give bet-
ter results.
In the following experiments, we add (delete) ran-
dom points from the point set, and compare the re-
sults of a dynamical versions of PCA bounding boxes
with their corresponding static versions (when the co-
variance matrix of the point set is computed from
scratch). The time of computing, the volume of a
bounding box, and the grid density are parameters of
interest in this evaluation study. The tests were per-
formed on a large number of real graphics models
taken from various publicly available sources (Stan-
ford 3D scanning repository, 3D Cafe). Typical sam-
ples of the results are given in Table 1, Table 2, and
Table 3.
The main conclusions of the experiments are as fol-
lows:
As expected from the theoretical results, the dy-
namic versions of the algorithms are significantly
faster than their static counterparts. Typically, the
dynamic versions are about an order of magnitude
faster (see Table 1).
The dynamic PCA-AP algorithm is not only sig-
nificantly faster than its static version, it is also
faster than the static version of the PCA-AGP and
PCA-EGP algorithms. This is due to the fact that
the brute force manner of finding the extremal
points is faster than computing the covariance ma-
trix of the new point set from scratch, although
both algorithms require O(n) time in the asymp-
totic analysis.
Clearly, the PCA-AGP and PCA-EGP algorithms,
that exploit the grid subdivision structure, are
faster than the PCA-AP algorithm. The price that
must be paid for this is twofold. First, an extra
preprocessing time for building the grid is needed.
For the example considered in Table 1, computing
the grid takes about 0.4 seconds for the PCA-AGP
algorithm, and about 0.43 for the PCA-EGP algo-
rithm. Second, the resulting bounding boxes are
less precise (see Table 2).
As it is shown in Table 3, for grids that are not
very sparse (ε 0.03), the approximated PCA
bounding boxes computed by the PCA-AGP and
PCA-EGP algorithms are quite close to the exact
PCA bounding boxes.
GRAPP 2011 - International Conference on Computer Graphics Theory and Applications
90
(a)
(b)
Figure 1: (a) A real world object and its corresponding grid for ε = 0.03. Only the non-empty cells are visualized. (b) The
bounding box of the object obtained by the PCA-AGP algorithm.
Table 1: Time needed by the PCA bounding box algorithms for the lion model (183408 points). The values in the table are
the average of results of 100 runs of the algorithms, each time adding/deleting the corresponding number of points.
Adding/deleting points, ε = 0.005
1pnt 1pnt 100 pnts 100 pnts 1000 pnts 1000 pnts
algorithm static dynamic static dynamic static dynamic
PCA-AP 0.166 s 0.014 s 0.171 s 0.015 s 0.172 s 0.016 s
PCA-AGP 0.092 s 0.010 s 0.093 s 0.009 s 0.990 s 0.017 s
PCA-EGP 0.081 s 0.006 s 0.082 s 0.006 s 0.092 s 0.014 s
Tighter bounding boxes for the PCA-AGP and PCA-
EGP algorithms can be obtained by the following ap-
proach. Let P
1
be the supporting plane at the extremal
grid point along one principal direction, and let P
2
be
the plane parallel to P
1
, such that the distance between
P
1
and P
2
is
3ε/2, and P
2
intersect or is tangent to
the grid. We denote by S the subspace between P
1
and P
2
. Then, the candidates points for the chosen
principal direction, that determine the tight bounding
box, are all original points that belong to cells that
have intersection with S. See Fig. 2 for an illustration.
However, in the worst case all original points have to
be checked.
Further (theoretical) improvement of the algo-
rithms presented here could be obtained if, instead of
the point set, we consider its convex hull when we
look for extremal points. This only makes sense if
the convex hull is computed dynamically. Otherwise,
computing the static convex hull of the points will be
more expensive than finding the exact extremal points
by scanning all points.
S
P
2
P
1
P C
1
g
1
x
Figure 2: For the principal direction PC
1
, the algorithms
PCA-AGP and PCA-EGP detect the point g
1
as extremal
grid point, and the point x as extremal point of the original
point set. However, there are other points (the violet colored
circles) that are further than x along PC
1
.
3.1 Computing efficiently a Bounding
Box of Several Objects
An interesting application of the closed-form solu-
tions from Section 2 is to compute the principal com-
ponents of two or more objects with already known
covariance matrices. Thus, for fixed d the new co-
variance matrix Σ and the new principal components
EFFICIENT DYNAMICAL COMPUTATION OF PRINCIPAL COMPONENTS
91
Table 2: Volume of the PCA bounding box algorithms for the lion model. The values in the table are the average of results of
100 runs of the algorithms, each time adding the corresponding number of points.
Adding points, dynamic version, ε = 0.005
algorithm 1pnt 10pnt 100 pnts 1000 pnts 10000 pnts
PCA-AP 285.5 644.6 856.3 1149.1 1236.4
PCA-AGP, PCA-EGP 295.5 662.7 880.3 1221.8 1263.2
Table 3: Volumes of the PCA bounding boxes algorithms for lion model for different grid density. The values in the table are
the average of results of 100 runs of the algorithms, each time adding the corresponding number of points.
Adding 100 points, dynamic version
algorithm ε = 0.005 ε = 0.01 ε = 0.03 ε = 0.05 ε = 0.1 ε = 0.2
PCA-AP 856.3 856.3 856.3 856.3 856.3 856.3
PCA-AGP, PCA-EGP 880.3 904.3 942.3 1080.1 1292.7 2324.8
(b)(a)
Figure 3: (a) Two objects with their PCA bounding boxes. (b) The common PCA bounding box. Computing the common
PCA bounding box dynamically takes 0.004 seconds, while the static version takes 0.02 seconds.
can be computed also in O(1) time.
This is a significant improvement over the com-
monly used approach of computing the principal com-
ponents from scratch, which takes time linear in the
number of points. Efficient computation of the com-
mon PCA bounding box of several objects is straight-
forward. See Fig. 3 for an illustration in R
3
.
4 CONCLUSIONS AND FUTURE
WORK
The main contributions of this paper are the closed-
form solutions for updating the principal components
of a dynamic point set. The advantages of the theo-
retical results were verified and presented in the con-
text of computing dynamic PCA bounding boxes, a
very important application in many fields including
computer graphics, where the PCA boxes are used to
maintain hierarchical data structures for fast render-
ing of a scene or for collision detection. We have pre-
sented three practical simple algorithms and compare
their performances.
An interesting open problem is to find a closed-
form solution for dynamical point sets different from
convex polyhedra, for example, implicit surfaces or
B-splines. An implementation of computing princi-
pal components in a dynamic and continuous setting
is planned for future work. Applications of the results
presented here in other fields, like computer vision or
visualization, are of high interest.
There are several further improvements and open
problems regarding computing dynamic PCA bound-
GRAPP 2011 - International Conference on Computer Graphics Theory and Applications
92
ing boxes. Instead of subdividing the space by a sim-
ple regular grid, one can use more sophisticated data
structures, like octrees or binary space partition-trees
to speed up the time needed to find the extremal points
along the principal directions. A practical, imple-
mentable algorithm for computing the dynamic con-
vex hull of the point set (computing extremal points
dynamically) would also improve the dynamic PCA
bounding box algorithms. Finding coresets for dy-
namic PCA bounding boxes will lead to efficient ap-
proximation algorithms for PCA bounding boxes. We
are also not aware of data structures for efficient com-
putation of extremal points both approximately and
dynamically. Such data structures are also of interest.
ACKNOWLEDGEMENTS
The authors would like to thank the anonymous re-
viewers for their helpful comments and suggestions
that improved the quality of the paper.
REFERENCES
Barequet, G., Chazelle, B., Guibas, L. J., Mitchell, J. S. B.,
and Tal, A. (1996). Boxtree: A hierarchical represen-
tation for surfaces in 3D. Computer Graphics Forum,
15:387–396.
Chan, T. F., Golub, G. H., and LeVeque, R. J. (1979). Up-
dating formulae and a pairwise algorithm for comput-
ing sample variances. Technical Report STAN-CS-
79-773, Department of Computer Science, Stanford
University.
Cheng, S.-W. and Y. Wang, Z. W. (2008). Provable di-
mension detection using principal component analy-
sis. Int. J. Comput. Geometry Appl., 18:415–440.
Dimitrov, D., Holst, M., Knauer, C., and Kriegel, K.
(2009a). Closed-form solutions for continuous PCA
and bounding box algorithms. A. Ranchordas et al.
(Eds.): VISIGRAPP 2008, CCIS, Springer, 24:26–40.
Dimitrov, D., Knauer, C., Kriegel, K., and Rote, G. (2009b).
Bounds on the quality of the PCA bounding boxes.
Computational Geometry, 42:772–789.
Duda, R., Hart, P., and Stork, D. (2001). Pattern classifica-
tion. John Wiley & Sons, Inc., 2nd ed.
Gottschalk, S., Lin, M. C., and Manocha, D. (1996). OBB-
Tree: A hierarchical structure for rapid interference
detection. Computer Graphics, 30:171–180.
Jolliffe, I. (2002). Principal Component Analysis. Springer-
Verlag, New York, 2nd ed.
Knuth, D. E. (1998). The art of computer program-
ming, volume 2: seminumerical algorithms. Addison-
Wesley, Boston, 3rd ed.
Parlett, B. N. (1998). The symmetric eigenvalue prob-
lem. Society of Industrial and Applied Mathematics
(SIAM), Philadelphia, PA.
P
´
ebay, P. P. (2008). Formulas for robust, one-pass parallel
computation of covariances and arbitrary-order statis-
tical moments. Technical Report SAND2008-6212,
Sandia National Laboratories.
Press, W. H., Teukolsky, S. A., Veterling, W. T., and Flan-
nery, B. P. (1995). Numerical recipes in C: the art
of scientific computing. Cambridge University Press,
New York, USA, 2nd ed.
Vrani
´
c, D. V., Saupe, D., and Richter, J. (2001). Tools for
3D-object retrieval: Karhunen-Loeve transform and
spherical harmonics. In IEEE 2001 Workshop Mul-
timedia Signal Processing, pages 293–298.
Welford, B. P. (1962). Note on a method for calculating cor-
rected sums of squares and products. Technometrics,
4:419–420.
West, D. H. D. (1979). Updating mean and variance esti-
mates: an improved method. Communications of the
ACM, 22:532–535.
EFFICIENT DYNAMICAL COMPUTATION OF PRINCIPAL COMPONENTS
93