ONLINE LEARNING OF GAUSSIAN MIXTURE MODELS
A Two-Level Approach
Arnaud Declercq and Justus H. Piater
Montefiore Institute, University of Li
`
ege, B-4000 Li
`
ege, Belgium
Keywords:
Online learning, Gaussian mixture model, Uncertain model.
Abstract:
We present a method for incrementally learning mixture models that avoids the necessity to keep all data points
around. It contains a single user-settable parameter that controls via a novel statistical criterion the trade-off
between the number of mixture components and the accuracy of representing the data. A key idea is that each
component of the (non-overfitting) mixture is in turn represented by an underlying mixture that represents
the data very precisely (without regards to overfitting); this allows the model to be refined without sacrificing
accuracy.
1 INTRODUCTION
Mixture models are used for many purposes in com-
puter vision, e.g. to represent feature distributions or
spatial relations. Given a fixed data sample, one can
fit a mixture model to it using one of a variety of meth-
ods. However, in many applications, it is not possible
or convenient to fix a model at the outset; one would
rather learn it over time. For example, this would al-
low the deployment of generic recognition or tracking
systems with minimal set-up effort, and training them
over time on the task at hand.
However, learning and refining a mixture model
incrementally is not an easy task. How is a given
model to be updated when new data points arrive?
If the data points underlying the current model have
been discarded, then there is no general answer to this
question. On the other hand, keeping all data around
defeats the purpose of learning parametric models in-
crementally. Thus, a compromise needs to be found.
We need to keep around enough information to be
able to refine a model without sacrificing model accu-
racy, but the quantity of this information should grow
much more slowly than the number of raw data points.
We address this problem by seeking to represent
the data points with (1) sufficient fidelity that we can
safely discard them, while at the same time (2) com-
mitting to no more predictive precision as the original
data support.
These two objectives are mutually exclusive, as
the former tends to overfit and the latter to underfit
the data. We therefore propose a two-level representa-
tion. The first level seeks to summarize the data with
high precision, allowing us to discard underlying data
without significantly impairing our ability to refine
the model. We therefore call it the precise model. The
second level provides a model that represents no more
detail than is supported by the underlying data and
then avoids counterproductive bias in future predic-
tions; we call it the uncertain model. Each uncertain
component is then represented by a precise mixture
model that allows it to be split appropriately when it
turns out that it oversimplifies the underlying data. In
the following development, we use Gaussian mixture
models, but most of the principles are applicable to
other types of mixture models.
2 LEVEL 1: THE PRECISE
MIXTURE MODEL
When a GMM is learned from a data set of n ob-
servations, the main difficulty lies in the choice of
the mixture complexity (i.e. the number of Gaus-
sian components in the mixture). The most popular
offline method is Expectation Maximization (Demp-
ster et al., 1977) for fitting a sequence of GMMs,
each with a specified number of components. The
optimal model is then selected using a penalty func-
tion (Akaike, 1973; Rissanen, 1978; Schwarz, 1978).
Online fitting is even more difficult; since the data
points have been discarded, they cannot be used to
evaluate the fitted models. The problem is then ad-
605
Declercq A. and H. Piater J. (2008).
ONLINE LEARNING OF GAUSSIAN MIXTURE MODELS - A Two-Level Approach.
In Proceedings of the Third International Conference on Computer Vision Theory and Applications, pages 605-611
Copyright
c
SciTePress
dressed through a split and merge criterion. However,
these methods are either too slow for online learning
(Hall and Hicks, 2005), assume that data arrives in
chunks (Song and Wang, 2005) or does not guarantee
the fidelity of the resulting model (Arandjelovic and
Cipolla, 2005). Here we propose a new efficient on-
line method that explicitly guarantees the accuracy of
the model through a fidelity criterion.
2.1 Update of the Gaussian Mixture
Model
Suppose we have already learned a precise GMM
from the observations up to time t:
p
t
(x) =
N
i=1
π
t
i
g(x;µ
t
i
,C
t
i
)
N
i=1
π
t
i
(1)
where each Gaussian is represented by its weight
π
t
i
, its mean µ
t
i
and its covariance C
t
i
. We then re-
ceive a new data point represented by its distribution
g
t
(x;µ
t
,C
t
) and its weight π
t
. C
t
here represents the
observation noise. As suggested by Hall and Hicks
(Hall and Hicks, 2005), the new resulting GMM is
computed in two steps:
1. Concatenate – produce a model with N + 1 com-
ponents by trivially combining the GMM and the
new data into a single model.
2. Simplify if possible, merge some of the Gaus-
sians to reduce the complexity of the GMM.
The GMM resulting from the first step is simply
p
t
(x) =
N
i=1
π
t1
i
g(x;µ
t1
i
,C
t1
i
) + π
t
g
t
(x;µ
t
,C
t
)
N
i=1
π
t1
i
+ π
t
(2)
The goal of the second step is to reduce the complex-
ity of the model while still giving a precise descrip-
tion of the observations. Hall and Hicks (Hall and
Hicks, 2005) propose to group the Gaussians using
the Chernoff bound to detect overlapping Gaussians.
Different thresholds on this bound are then tested and
the most likely result is kept as the simplified GMM.
Since this method is too slow for an on-line process,
we use a different criterion proposed by Declercq and
Piater (Declercq and Piater, 2007) for their uncertain
Gaussian model. This model provides a quantitative
estimate λ of its ability to describe the associated data
that takes on a value close to 1 if the data distribu-
tion is Gaussian and near zero if it is not. This value,
called the fidelity in the sequel, is useful to decide if
we can merge two given Gaussians without drifting
from the real data distribution.
2.2 Estimating the Fidelity of a
Gaussian Model
To estimate the fidelity λ of a Gaussian model, we first
need to compute the distance between this model and
its corresponding data set. This is done with a method
inspired from the Kolmogorov-Smirnoff test,
D =
1
|
I
|
Z
I
ˆ
F(x) F
n
(x)
dx, (3)
where F
n
(x) is the empirical cumulative distribution
function of the n observations,
ˆ
F(x) is the correspond-
ing cumulative Gaussian distribution, and I is the in-
terval within which the two functions are compared.
To simplify matters, the distance D is assumed to have
a Gaussian distribution, which leads to the pseudo-
probabilistic weighting function
λ = e
D
2
T
2
D
, (4)
where T
D
is a user-settable parameter that represents
the allowed deviation of observed data from Gaus-
sianity. Whereas the sensitivity of the Kolmogorov-
Smirnov test grows without bounds with n, λ provides
a bounded quantification of the correspondence be-
tween the model and the data. Therefore, this crite-
rion is more appropriate for our case since we need
to estimate the correspondence of the data with the
model and not their possible convergence to a Gaus-
sian distribution.
Thus, the original data are not required anymore
if we keep in memory an approximation of their cu-
mulative distribution within a given interval. Since
the number of dimensions of the data space can be
large, we compute the distance D for each dimension
separately to keep the computational cost linear in the
number of dimensions. The total distance is then sim-
ply the sum of these individual distances.
2.3 Simplification of the Gaussian
Mixture Model
To decide whether two Gaussians G
i
and G
j
can
be simplified into one, we merge them together and
check if the resulting Gaussian has a fidelity λ close
to one, say, exceeding a given threshold λ
+
min
= 0.95.
The resulting Gaussian is computed using the usual
equations supplemented by the combination of the cu-
VISAPP 2008 - International Conference on Computer Vision Theory and Applications
606
3.1 The Uncertain Gaussian Model
The uncertain Gaussian model represents a distribu-
tion with an appropriately weighted sum of informa-
tive (Gaussian) and uninformative (uniform) compo-
nents
q(x) = λ exp
1
2
(x µ)
T
˜
C
1
(x µ)
+ (1 λ)
(14)
where
˜
C is an augmented covariance that bounds
the risk of underestimating the true covariance, i.e.,
P(
˜
C C) = α, where conventionally α = 0.05. Since
empirical estimates of variance follow a χ
2
distribu-
tion,
˜
C =
n
χ
2
n1
(α)
ˆ
C, (15)
where n is the number of observations used to learn
the model and
ˆ
C is its maximum-likelihood covari-
ance matrix. Thanks to the new threshold λ
min
and the
uncertain Gaussian model, we are now able to learn a
GMM that is kept as general as possible until there is
sufficient evidence that the model can be made more
specific.
The drawback of this solution is that it is now im-
possible to recover the data from it. For example, the
data in figure 3(a) suggest that the underlying distribu-
tion is poorly represented by two Gaussians. Unfortu-
nately, when this fact is detected, it is already too late:
The observations are not in memory anymore, leaving
you with a poor model that can no longer be refined.
This motivates our two-level mixture model where the
data are represented by the uncertain mixture model,
and where each uncertain Gaussian contains a precise
mixture model to describe itself. Thus, when we want
to refine an uncertain Gaussian, we can split it accord-
ing to its underlying mixture components.
3.2 Updating a Two-Level Gaussian
Mixture Model
The algorithm used to update the GMM proceeds
along the following steps:
1. Merge the new data point with the nearest uncer-
tain Gaussian,
2. if the resulting Gaussian has a value of λ below
the corresponding λ
min
, replace it with two Gaus-
sians learned from its underlying GMM with EM
(Dempster et al., 1977),
3. else continue to merge the current uncertain Gaus-
sian with its nearest neighbour until the resulting
Gaussian has a value of λ lower than the corre-
sponding λ
min
.
Merging two uncertain Gaussians also involves
merging their respective underlying mixture models.
This can be done by simply summing the components
from both mixtures, and using the simplification step
only on the precise Gaussian that contains the new ob-
servation. Even if other precise Gaussians could pos-
sibly be merge together, we leave that for later when
they merge with the current observation. This way,
we distribute the computational cost through different
time instants.
3.3 Discussion
Figure 3 shows an example of the evolution of the
GMM with data points generated from an arc-shaped
distribution. This time the complexity of the GMM
only increases when there is enough evidence that
the observed distribution is too complex for the cur-
rent model. If we compare figure 3 with figure 1,
we see that the two-level GMM and the precise mix-
ture model converge to the same distribution. The
two-level approach then provides a more stable non-
overfitted model that can still become more accurate
thanks to the precise model level.
4 EXPERIMENTS
4.1 Empirical Analysis of the Behaviour
of the 2-Level Model
To analyze the relation between the model complex-
ity and the only parameter T
D
, we generated data from
a circular distribution for different values of T
D
from
0.01 to 0.25. We ran 30 tests per value of T
D
and
stopped each test after 500 observations. As we can
see in figure 4(a), T
D
provides us with a simple way to
specify the desired trade-off between the model com-
plexity and its accuracy.
Since the learning is incremental, we may won-
der if the model will always converge to qualitatively
the same result. We therefore performed the same ex-
periment with T
D
= 0.04 and with angular velocities
between 0.01 and 2 rad/frame for the process that gen-
erates the observations. As shown in figure 4(b), the
model complexity is nearly independent of the order
of the observations.
4.2 A Vision Application
Our method provides an under-fitted probability den-
sity estimation of the partially observed distribution.
ONLINE LEARNING OF GAUSSIAN MIXTURE MODELS - A Two-Level Approach
609
5 CONCLUSIONS
We presented a method for incrementally learning a
Gaussian mixture model based on a new criterion for
splitting and merging mixture components. This cri-
terion depends on a single user-settable parameter that
allows easy tuning of the trade-off between the com-
plexity and the accuracy of the mixture model. Our
two-level approach provides a solution to the over-
fitting problem of small data sets without any com-
promise on the model accuracy. As more data arrive,
the mixture complexity can be increased without any
propagation of errors due to a previously underfitted
model. As we have demonstrated empirically, this
method is nearly independent of the order in which
the data are observed.
ACKNOWLEDGEMENTS
This work is supported by a grant from the Belgian
National Fund for Research in Industry and Agricul-
ture (FRIA) to A. Declercq and by the EU Cognitive
Systems project PACO-PLUS (IST-FP6-IP-027657).
REFERENCES
Akaike, H. (1973). Information theory and an extension of
the maximum likelihood principle. Second Interna-
tional Symposium on Information Theory.
Arandjelovic, O. and Cipolla, R. (2005). Incremental learn-
ing of temporally-coherent gaussian mixture models.
BMVC.
Declercq, A. and Piater, J. H. (2007). On-line simultaneous
learning and tracking of visual feature graphs. Online
Learning for Classification Workshop, CVPR’07.
Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977).
Maximum likekihood from incomplete data via the em
algorithm. Journal of the Royal Statistical Society,
39:1–38.
Hall, P. and Hicks, Y. A. (2005). A method to add gaussian
mixture models. Tech. Report, University of Bath.
Rissanen, J. (1978). Modeling by shortest data description.
Automatica, vol. 14, pp. 465-471.
Schwarz, G. (1978). Estimating the dimension of a model.
Annals of Statistics, pages 6:461–464.
Song, M. and Wang, H. (2005). Highly efficient incremental
estimation of gaussian mixture models for online data
stream clustering. Intelligent Computing: Theory and
Application.
ONLINE LEARNING OF GAUSSIAN MIXTURE MODELS - A Two-Level Approach
611