From Theory to Implementation
Nikzad Babaii Rizvandi, Aleksandra Piˇzurica and Wilfried Philips
Image Processing and Interpretation Group (IPI), Department of Telecommunications and Information Processing (TELIN)
Gent University, St-Pietersnieuwstraat 41, B-9000 Gent, Belgium
Shape Model, Texture Model, Active Appearance Model, Active Shape model, advantages and disadvantages.
Active Appearance Model (AAM) is a kind of deformable shape descriptors which is widely used in computer
vision and computer graphics. This approach utilizes statistical model obtained from some images in training
set and gray-value information of the texture to fit on the boundaries of a new image. In this paper, we describe
a brief implementation, apply the method on hand object and nally discuss its performance in compare to
Active Shape Model(ASM). Our experiments shows this method is more sensitive to the initialization and
slower than ASM.
Model-based approaches analyze different variations
of an object using some samples of the object in a
training set and finally calculate a model based on the
object variations. Active Appearance Model (AAM)
of Cootes et al. in (T.F.Cootes and C.J.Taylor, 2001)
is one of the well-known model-based methods.
To build an AAM model, a training set of images
is assumed in which corresponding landmark points
have been marked on every image. A statistical model
of the shape variation by using these landmarks and
Principle Component Analysis (PCA), a model of the
texture variation (sampling of the gray values of im-
ages) using mean shape, delaunay triangles and an-
other PCA and a model of the correlations between
shape and texture, are computed. The final shape-
texture model and the images in the training set are a
basis to learn a multi-variate regression matrix. With
enough training examples this model should be able
to synthesize any image of normal anatomy. By find-
ing the parameters which optimize the match between
a synthesized model image and a target image all the
structures represented by the model can be located.
Obtaining a model by AAM includes two main stages
which are:
Offline stage:
- Manual Labeling: Placing landmarks surround-
ing objects in images inside training set manually.
(T.F.Cootes and J.Graham, 1995) explains differ-
ent kinds of landmarks.
- Shape Alignment: The differences among
shapes, which are rotation, x-y translations and
scaling, are omitted and mean shape is created.
- Statistical Shape Model: using PCA the aligned
shape variations are modeled as an eigenvector
matrix and a few parameters.
- Texture Sampling: The gray-value of each pixel
under each shape in the training set is obtained.
- Texture Alignment: Texture alignment is used in
order to omit the illumination differences among
the images in the training set.
- Statistical Texture Model:After texture align-
ment another PCA is utilized to model the typical
texture variations.
- Joint Shape-Texture Statistical Model: Both
shape and texture models are merged together to
construct an unit model.
- Training Regression Matrix: A multivariate lin-
ear regression matrix is trained using the joint
model and some images in the training set.
Online stage:
- Search in a New Image: The obtained AAM
model along with the regression matrix are used
to find the modeled object in a new image.
Babaii Rizvandi N., Pižurica A. and Philips W. (2008).
ACTIVE APPEARANCE MODEL (AAM) - From Theory to Implementation.
In Proceedings of the Third International Conference on Computer Vision Theory and Applications, pages 539-542
DOI: 10.5220/0001081705390542
In this paper, we describe a brief implementation
of AAM, then we examine AAM on hand object
and finally compare AAM performance with another
model-based method named Active Shape Model
This stage utilizes some statistical analysis by using
Principle Component Analysis (PCA) on the shape
variations and also texture variations of some gath-
ered images in a training set.
2.1 Manual Labeling
In the analyzed model, the shape is represented by
a set of points (or landmarks). These landmarks
are placed manually for each shape in the training
set. The corresponding landmarks in the shapes must
be approximately in the same location because each
point represents a particular part of the object or
its boundary. To increase accuracy some additional
points are added between two points when the dis-
tance is more than a threshold.
2.2 Shape Alignment
All objects in the training set has different scaling,
rotation and x-y position(or translation), named pose
parameters, compare to the others. In order to re-
move the pose differences and only remain the object
shape variations the alignment procedure is used. The
center of mass of the shape is calculated and moved
to the coordinate origin for removing X-Y transla-
tion. After removing the X-Y translation the obtained
shape becomes unity scale by dividing the shape on
its L
norm. To remove the rotation, another shape
is needed as a reference. It can be proved mathe-
matically that Singular Value Decomposition (SVD)
calculates the rotation matrix between the shape and
the reference shape. A comprehensive explanation of
shape alignment with its procedure can be found in
(Babaii Rizvandi et al., 2007). The mentioned proce-
dure is only for one shape. For the shapes in the train-
ing set, we align all shapes to the first shape and cal-
culate the mean shape. Then we align all shapes to the
mean shape and we recalculate the mean shape. This
procedure, which aligns to the mean shape and recal-
culates the mean shape, is continued until the mean
shape does not change significantly in two iterations.
2.3 Statistical Shape Model
The 2N elements are highly correlated, so it is possi-
ble to represent them much more compactly. One ap-
proach is Principle Component Analysis (PCA) that
is widely used in pattern recognition to reduce the di-
mension. Using PCA the number of elements reduces
from 2N to M while M << 2N. The final shape model
X = X + Φ
where X is the mean shape, Φ
contains the shape
eigenvectors and b
includes the shape parame-
2.4 Texture Sampling
The question to make a texture model is which gray
values must be used in the model and how the model
should be defined. The answer to the first question
is that only pixels including the object are necessary.
Dividing the shape into a combination of triangles by
delaunay triangulation is the common solution for the
second question.
The problem with delaunay triangulation is that
these triangles cover all regions including background
of the convex hull (Stegmann, 2000), (T.F.Cootes and
J.Graham, 1995). So in order to form a suitable tex-
ture model, a convex hull algorithm must be used.
After removing the background pixels, the next
step is to find the corresponding pixels in the object
textures and warp these pixels positions. To do this
task, the pixels inside the mean shape are sampled
and the related pixels in the other images textures in
the training set are obtained by using the correspond-
ing triangles. (Stegmann, 2000) and (Babaii Rizvandi
et al., 2007) explain the complete algorithm.
2.5 Texture Alignment
Within the object there are usually some variations
in gray values because of different illumination in-
tensities. Since the goal is to build a stable model
without these unwanted effects, these variations must
be eliminated. The common method is to align all
textures to the standardized mean texture, with zero
mean and unit variance, and continue this procedure
till the difference between the standardized mean tex-
ture in two following iterations is less than a threshold
[(T.F.Cootes and C.J.Taylor, 2001) ,(Stegmann, 2000)
and (Babaii Rizvandi et al., 2007)].
VISAPP 2008 - International Conference on Computer Vision Theory and Applications
2.6 Statistical Texture Model
The same as section 2.3 another PCA is used to rep-
resent the obtained texture information much more
compactly. Because the number of elements in the
texture is much higher, using the traditional PCA
takes a lot of time. If N
and N are the number of
texture pixels and the number of shapes in the train-
ing set, so the covariance matrix will have N
dimensions. When N
N, calculating the covari-
ance matrix, and therefore the texture eigenvectors
and eigenvalues, is computationally expensive. The
idea is to calculate the covariance between the tex-
tures and then convert it to the covariance between
the pixels (Stegmann, 2000). The final texture model
T = T +Φ
where T is the mean texture, Φ
contains the texture
eigenvectors and b
includes the texture parameters.
2.7 Joint Shape-Texture Model
Both b
and b
models should be merged to form
a unit model including both texture and shape vari-
abilities and keeping the correlation between them.
Since the nature of shape and texture are different,
some weighting is necessary. In the absence of these
weighting, spread of the points in the space will be
undesirable (Stegmann, 2000). A simple weighting
matrix is a diagonal matrix:
W = wI =
w · ·· 0
0 · · · w
where w =
and λ
and λ
are eigenval-
ues of b
and b
, respectively. The merged model
is a simple column vector:
W b
To eliminate correlation between shape and tex-
ture parameters, another PCA should be performed
on the combined data (b
= φ
c (5)
where φ
is the eigenvector matrix of joint shape-
texture parameters and c is the final combined model
parameters. The same as the section 2.3, the order of
model is reduced after calculating the PCA.
The final shape and texture models are calcu-
lated with the following equations (T.F.Cootes and
C.J.Taylor, 2001), (Zambal, 2005):
X = X + φ
T = T + φ
c (6)
These two equations are the basis to calculate the re-
gression matrix in the next level.
2.8 Training a Regression Matrix
The search procedure in AAM is considered as an
optimization problem in which the gray value differ-
ences between the artificial object obtained by AAM
and an actual image is to be minimized:
δI = I
In this case the optimization can be enhanced by ad-
justing the model and pose parameters in order to fit
the artificial object with the image. So δI can be re-
placed by δT because this procedure is based on the
normalized texture vectors (Stegmann, 2000). One
possibility is that to consider the relation between δT
and the model-pose parameters changes, δc, as lin-
ear and use the information obtained from the joint
shape-texture model and the texture of some images
in the training set in a linear regression matrix (R) as
= Rδ
where ´c = [c,t
, θ, S]. The idea of the standard
AAM approach is to estimate R in a precalculation
step. The parameters of a model instance are changed
and the according differences in texture are measured.
If the parameter differences are the column vectors of
a second matrix
and each of
represents the tex-
ture differences corresponding to the parameters dif-
ferences, the last equation becomes
= R
The final R can be calculated as (Zambal, 2005)
R =
where Λ and Φ are eigenvalues and eigenvectors for
, respectively.
In the online stage, we use the constructed AAM
model in order to fit the model on a target object in
a new image. The following is the standard AAM
search algorithm (T.F.Cootes and C.J.Taylor, 2001)
and (Zambal, 2005).
- Place an initial shape near the desired object in the
new image.
ACTIVE APPEARANCE MODEL(AAM) - From Theory to Implementation
- Calculate the texture differences δ
- Calculate the parameter differences by using
= Rδ
- ´c ´c+ δ
- Calculate the differences between artificial
model texture and image texture belong the arti-
ficial model shape(E).
until E Threshold
Hand Gesture Extraction is one of the common appli-
cations of Active Appearance Model (AAM) and Ac-
tive Shape Model (ASM). In this section, we applied
our implementation of Active Appearance Model
(AAM) and Active Shape Model (ASM) on images
of hand in order to compare these method efficien-
cies. At first the images of hand must be labeled with
some landmarks. In our implementation both AAM
and ASM iterate 40 times. Figure.1 shows the result
of both AAM and ASM for a suitable initialization.
Our experiment shows the efficiency of both methods
has an extreme dependence on two factors: (a) com-
prehensive object variations in the training set that
means all changes outside of the training set are not
included by the model and (b) a suitable initialization.
ASM searches around the current location so it has
a larger capture range than the AAM which only con-
siders the image directly under its current area. ASM
only uses data around the model landmarks and does
not involve all the grey-level information available
across an object as the AAM does. Thus it may be
less reliable. In compare to AAM, ASM is faster and
achieves more accurate feature point location than the
AAM, but tends to be less reliable.
In this paper, we examined the AAM model perfor-
mance for finding the boundary of hand. We also
compared its efficiency with another deformable
method named Active shape model (ASM). The re-
sults show that because this method uses gray-values
information of images, it is slower than the ASM.
Moreover, due to using the local information its
capture range is less than ASM and so more sensitive
to the initialization than ASM.
Figure 1: Experimental results for AAM in compare to
ASM: (a) initial shape for ASM (b) Final Shape for ASM
(c) Initial for AAM (d) Final shape for AAM.
The author N.Babaii Rizvandi is supported as a Re-
search Assistant by Gent University under doctoral
grant. A.Piˇzurica is a postdoctoral research fellow of
FWO, Flanders.
Babaii Rizvandi, N., Philips, W., and Pizurica, A. (2007).
Active appearance model construction: Implementa-
tion notes. In Proc. 10th Joint Conference on Infor-
mation Sciences, Salt Lake City, USA. 7 pages.
Stegmann, M. B. (2000). Master thesis. In Active Appear-
ance Models: Theory, Extensions and Cases. Techni-
cal university of Denmark.
T.F.Cootes, C.J.Taylor, D. and J.Graham (1995). In Active
Shape Models: Their Training and Application. Com-
puter Vision and Image Understanding.
T.F.Cootes, G. and C.J.Taylor (June 2001). In Active ap-
pearance Models. IEEE Transactions on Pattern Anal-
ysis and Machine Intelligence.
Zambal, S. (August 2005). Master thesis. In 3D Active
Appearance Models for segmentation of Cardiac MRI
data. Technische Universitat Wien, Austria.
VISAPP 2008 - International Conference on Computer Vision Theory and Applications