A Statistical Model for Coupled Human Shape and Motion Synthesis

Alina Kuznetsova

, Nikolaus F. Troje

and Bodo Rosenhahn

Institute for Information Processing, Leibniz University Hanover, Hannover, Germany

Bio Motion Lab, Queen’s University, Kingston, Canada

Keywords:

Animation, Shape and Motion Synthesis, Statistical Modeling.

Abstract:

Due to rapid development of virtual reality industry, realistic modeling and animation is becoming more and

more important. In the paper, we propose a method to synthesize both human appearance and motion given

semantic parameters, as well as to create realistic animation of still meshes and to synthesize appearance based

on a given motion. Our approach is data-driven and allows to correlate two databases containing shape and

motion data. The synthetic output of the model is evaluated quantitatively and in terms of visual plausibility.

1 INTRODUCTION

Emerging interest in 3D technologies and virtual real-

ity introduced a need for realistic computer modeling

and animation. It is a rapidly developing area and yet

there are many unresolved problems, such as fast and

realistic character creation. In this paper, we propose

a statistical model to solve the latter problem.

In general, character creation is a challenging

task. The character generation problem is addressed

either by manual editing and 3D modeling or by

data-driven approaches; a well-known example of ap-

pearance creation is the SCAPE model (Anguelov

et al., 2005). Motion simulation is usually con-

sidered as a separate topic, where three types of

methods exist: manual motion editing, physics-based

approaches and data-driven approaches. The old-

est approach is manual motion editing, such as key

frame animation. However, manual editing is not

able to provide enough level of motion detailization

and is very time-consuming, therefore data-driven ap-

proaches emerged recently, as well as physics based

and control-based methods. Unfortunately, data-

driven approaches usually produce physically incor-

rect results and the question of ﬁtting generated mo-

tion to the concrete character (motion retargeting) is

not solved completely; physics-based approaches are

usually very computationally-intensive and complex,

do not provide enough variability and, as experiments

showed, physical ﬁdelity of motion does not imply vi-

sual plausibility. Another way to animate a new char-

acter is to transfer motion from another character, but

when the characters have different parameters (such

as proportions, height, etc), such transfer can produce

unrealistic results (so-called retargeting problem), as

shown on the Fig 1. Here, the same shape was an-

imated using different motions, one of which comes

from a person with similar biometric parameters and

the other is the motion of the person with completely

different biometric parameters. Even from a sequence

of images it is possible to see a mismatch between the

shape and the motion.

The other disadvantage of the approaches de-

scribed above is that they are difﬁcult to apply, when

the task is to animate many characters at once.

In this work, we propose a model for character ap-

pearance creation and animation by combining a sta-

tistical model of human motion with one for character

appearance. In this way we address the problems de-

scribed above. We are able to generate both the char-

acter’s appearance and motion simultaneously, there-

fore avoiding the retargeting problem and extensive

computations, while still producing visually plausi-

ble animations. Semantic parameters, such as weight,

height and proportions of the character, serve as a link

between shape and motion, and are integrated in our

model, allowing excellent control over the generation

process. Since our model is stochastic, it can be used

for random character generation.

Our paper is organized in the following way. In

Section 2, we give a short overview of existing meth-

ods for character generation and animation. In Sec-

tion 3, we give technical details about data collection

and processing. In Section 4 we explain the model we

use for generation. In Section 5, we provide an eval-

uation of our approach in terms of visual perception

227

Kuznetsova A., Troje N. and Rosenhahn B..

A Statistical Model for Coupled Human Shape and Motion Synthesis.

DOI: 10.5220/0004215602270236

In Proceedings of the International Conference on Computer Graphics Theory and Applications and International Conference on Information

Visualization Theory and Applications (GRAPP-2013), pages 227-236

ISBN: 978-989-8565-46-4

 2013 SCITEPRESS (Science and Technology Publications, Lda.)

(a) (b)

Figure 1: Example of motion retargeting problem: (a) red ﬁgure unrealistically bends backwards, while green ﬁgure has

correct balance; (b) red ﬁgure has its shoulders set backwards, as is typical for slimmer person.

of generated characters and their motions, as well as

a quantitative assessment of an appearance-to-motion

ﬁt.

2 RELATED WORKS

In this section, we revise already existing works on

motion and shape modeling. Since standard ap-

proaches to this problem separate creation of charac-

ter and creation of its motion, we ﬁrst provide a short

review of methods proposed for both problems.

Motion Simulation Techniques. Many techniques

address the motion simulation problem. A purely

data-driven approach for motion simulation was in-

troduced by (Wang et al., 2008) and is based on Gaus-

sian process models for motion generation; (Li et al.,

2002) proposed to use linear dynamic systems with

distribution of the dynamic system parameters to con-

trol the generation; (Brand and Hertzmann, 2000)

used style machines to create variations in one type

of motion. Finally, (Troje, 2002) used Principal Com-

ponent Analysis (PCA) analysis to build a manifold

of cyclic motions.

Another approach to motion synthesis is in com-

bining already existing motions without usage of any

statistical models, for example, in (Sidenbladh et al.,

2002), no motion is synthesized, but the closest real

motion in the database of motions is found based on

predeﬁned metrics.

Physics-based models of human motion, such as

proposed in (Liu et al., 2005), are used to make mo-

tion physically plausible, which is often required for

animation of interactions. More frequently, however,

different types of controllers are used to reﬁne al-

ready generated motion, for example as proposed in

(da Silva et al., 2008; Lee et al., 2009; Popovic and

Witkin, 1999; Sok et al., 2007).

Mesh Simulation Techniques. The traditional way

of creating character appearance is manual modeling.

Recently, automated approaches were presented, such

as the SCAPE model (Anguelov et al., 2005), that is

now state-of-the-art technique. A slightly different

statistical model is presented in (Hasler et al., 2009).

Here, the use of semantic “handles” is also proposed

to achieve variability of shapes.

Our Contribution. In contrast to the above men-

tioned works, where mesh and motion are clearly sep-

arated, we propose to combine appearance creation

with motion simulation, therefore avoiding many

problems of motion retargeting. To our knowledge,

there are no previous works bringing motion simula-

tion and shape creation together. We apply the model

to generate joint realistic human shape and motion.

We also see our contribution in developing

a method to ﬁnd dependencies in two separate

databases and building a model, that allows to unite

the data from both databases, using the dependencies

found.

In general, we can imagine many applications

of the model, for example, crowd simulation, au-

tomatic animation of already existing characters or

appearance-from-motion reconstruction. In mathe-

matical sense, our model is inspired by variational

models, proposed in (Cootes et al., 2001). However,

we include a random component and therefore allow

for more loose coupling of motion and shape, as well

as for variability in sampled characters.

GRAPP2013-InternationalConferenceonComputerGraphicsTheoryandApplications

228

40 50 60 70 80 90 100 110

(a) Weight

150 160 170 180 190

(b) Height

10 20 30 40 50 60

40 50 60 70 80 90 100 110

(d) Weight

150 160 170 180 190

(e) Height

10 20 30 40 50 60

(f) Age

Figure 2: Distribution of the semantic parameters (weight, height, age) of females in the databases. The upper row shows

parameter distribution for the shapes dataset; the lower row shows parameter distribution for the motion dataset.

3 DATA PREPARATION

3.1 Data Preprocessing

Since our model is purely data-driven, we ﬁrst give

insights of data preprocessing.

To train the model we need two sources of data:

body scans to learn human appearance, or shape,

and motion data, collected during recording session.

These two databases contain shapes and motions,

recorded from the different people, although the dis-

tribution of their semantic parameters are approxi-

mately the same.

The shape database (Hasler et al., 2009) contains

body scans of 114 different subjects. Body scans were

previously registered, such that aligned meshes, con-

taining N = 1002 vertices, were produced.

Into each mesh we embed a skeleton (see Fig. 4),

consisting of K = 15 joints, and morph meshes in such

a way, that the skeletons of these meshes are exactly

in the same pose. This is done to exclude variability in

shapes due to slight differences in pose. We compute

the skeleton’s joints’ positions as a linear combination

of nearby vertices:

∑

i=1

, (1)

where weighs ω

were derived manually based on the

procedure, that was used to place markers on the body

during the creation of the motion database.

As a source of motion data, we used the motion

database, described in (Troje, 2002); the database

contains recordings of gait motion of 100 individu-

als. Firstly, motion was acquired using MoCap sys-

tem with 45 markers. From 45 physical markers

K = 15 virtual markers were derived; these mark-

ers correspond to the placement of skeleton joints in-

side the mesh. For each motion 4 so-called eigenpos-

tures are extracted. Each motion is then represented

as a linear combination of eigenpostures, where the

coefﬁcients are periodical functions, that depend on

time. The walk cycle parameter is individual for

each person and therefore is stored separately. That

means, that all together each motion is represented

with K ·3 ·(4 + 1) + 1 coordinates.

Moreover, for each person semantic data, such as

gender, weight, heights, age, was stored. We denote

semantic attributes, connected to each recorded shape

s or motion m, as v(s) or v(m). Firstly, we separated

male from female subjects in both databases, since

variation between male and female meshes is much

greater then the variation within the female (or male)

part of the database itself and such separation makes

our model able to better capture variation within both

groups. Each group has the same number of mem-

bers, i.e 55 females and 59 males in the mesh database

and 50 females and 50 males in the motion database.

The distributions of the semantic parameters of fe-

males are presented in Fig. 2. For our experiments,

we took weight and height as the most varying pa-

rameters across our data sets, but in general any set

of parameters could have been taken, as long as it has

signiﬁcant inﬂuence on shape appearance and on mo-

tion.

3.2 Binding a Skeleton to a Mesh

Since the motion data we obtained is represented us-

ing positions of joints depending on time, we chose to

use a skeleton-based approach to animate the model.

Several methods are proposed to solve the problem of

skeleton-based character animation. As mentioned in

the previous subsection, we embedded a skeleton into

each mesh. To morph a mesh in accordance with a

given skeleton position, we chose to use linear skin-

ning (Jacka et al., 2007), since the technique is both

fast and robust, while providing sufﬁciently good re-

sults. To summarize the implementation, we ﬁrst as-

sign weights w

i j

to each vertex of the mesh, that char-

acterize how much bone b

inﬂuences the vertex v

Let a coordinate system be attached to each bone,

AStatisticalModelforCoupledHumanShapeandMotionSynthesis

229

PCA

matching

scale

Gaussian

model

regression

−3 −2 −1 0 1 2 3 4

−3

−2

−1

Figure 3: Block diagram of the algorithm; starting from the initial sets of motions (M) and shapes (S), Gaussian model is

built.

then transformation matrix T

transforms vertex co-

ordinates from bone coordinate system to world coor-

dinate system. For the new position of the bone,

denotes the new transformation matrix. Then the new

position of the vertex is given by the formula:

ˆv

′

∑

j=1

i j

−1

∑

i=1

i j

= 1 (2)

where K

′

= K −1 is the number of bones (see Fig.

4). The weights are generated using the solver for the

heat equation, where heat is propagated from each of

the bones (Baran and Popovi

c, 2007) until heat equi-

librium is reached, and weights are set equal to the

normalized temperatures.

Figure 4: Alignment of a skeleton and a mesh.

4 APPEARANCE MORPHABLE

MODEL

In this section, we explain our shape-motion model

and its constraints and limitations.

The algorithm for model preparation and training

is given in the Fig. 3. The main idea of our model

is to correlate shape and motion based on semantic

parameters. To achieve better alignment and to avoid

the retargeting problem, we ﬁrst match and align the

meshes with the motions. A rough match is done

based on semantic parameters, i.e. we create a set

of shape-motion pairs (s

, m

), such that the values

of the semantic parameters for each pair differ less

than a given threshold ε : P = {(s

, m

), ∥v(s

) −

v(m

)∥ < ε} (this procedure corresponds to the ﬁrst

block in the Fig. 3). In the next step, we morph each

shape s

in the way, that its skeleton corresponds ex-

actly to the skeleton of the motion m

. To implement

scaling, we modify Eq. (2) by adding a transforma-

tion matrix that represents relative scaling:

ˆv

′

∑

j=1

i j

−1

∑

i=1

i j

= 1, (3)





0 0

0 s

0 0 s





(4)

where s

is a scaling factor. After scaling, we ﬁt the

skeleton again using Eq. (1). We iterate between

these two steps several times before the bone lengths

of the newly ﬁtted skelton converge to the desired

values. Since the scaling coefﬁcients, i.e. ratio be-

tween corresponding bone lengths, are in the inter-

val of [0.85, 1.15], it does not affect the realistic look

of the shapes. After scaling, the bone lengths of the

mesh skeleton correspond exactly to those of motion

skeleton.

Since the dimensionality of the data is still high

after this preprocessing step, we ﬁrst reduce the di-

mensions by performing Principal Component Anal-

ysis (PCA) on both shape and motion coefﬁcients sep-

arately and converting our data into PCA space of

smaller dimensions. PCA transformation consists of

ﬁnding orthogonal directions in space (called basis),

corresponding to the maximal variance of the sample.

Then, given a set of vectors x

∈ X ⊂ R

, N

size of

the set X , and the found basis U

= [u

, .. . , u

], the

original vectors can represented as:

= U

ˆx

+ µ(X), j = 1 . . . N

(5)

where µ(X) =

∑

i=1

is the mean of the set X and

ˆx

is a vector of coordinates of x

in PCA space, di-

mensionality of ˆx

is smaller than the dimensionality

GRAPP2013-InternationalConferenceonComputerGraphicsTheoryandApplications

230

of x

. Then,

ˆx

= U

−µ(X)) (6)

is the reverse transformation from ˆx

to x

To apply PCA on our data, we ﬁrstly stretch the

matrices, representing shapes and motions, into vec-

tors; therefore we produce two sets: the set of vectors

representing shapes S ⊂ R

and the set of vectors

representing motion M ⊂ R

15K+1

. Than we perform

PCA as described above (see the third block in the Fig

3). We denote the PCA coordinates of smaller dimen-

sionality as ˆm ∈R

and ˆs ∈R

accordingly. Dimen-

sionality of the space is chosen in such a way, that

leaves 95% of variance of the sample in both cases.

Now we bind mesh and motion coordinates to learn

a joint Gaussian distribution over a set [ ˆs, ˆm], depend-

ing on the semantic parameters v. In this sense, our

model is close to Active Appearance Models (AAM,

(Cootes et al., 2001)), although in contrast to AAM,

the relation between two sets of PCA coordinates in

our model is probabilistic, i.e. they are not ﬁrmly cou-

pled together.

Since each pair has common semantic parameters,

i.e. to each pair of PCA coordinates ˆy = ( ˆs, ˆm) ⊂

Y ∈ R

N+

a vector of semantic parameters v is as-

signed, we can now derive control ’handles’ over

our model. For that, we use linear regression in the

space of joint PCA coordinates. Since not all coor-

dinates are correlated with semantic parameters, we

ﬁrst perform standard correlation signiﬁcance anal-

ysis and ﬁnd signiﬁcant coefﬁcients: I(

Y )

= {i ∈

[1, . . . ,

N +

K] : P(ρ(ˆy

, v) = 0) < γ}, where ρ( ˆy

, v)

is thecorrelation between the i-th coordinate of vector

y and semantic values, γ = 0.05 is p-value for testing

the hypothesis that no correlation exists (Kendall and

Stuart, 1973). We then use coordinates from I(

Y )

build the joint regression model:

ˆy

Y )

= Θv + ε, (7)

where Θ is a matrix of regression coefﬁcients and

ε ∼ N (0, Σ

) is a normally distributed random vari-

able with covariance matrix Σ

For the rest of the coordinates we assume a joint

Gaussian distribution N (0, Σ

), where Σ

is the joint

covariance and can be easily estimated from the data.

The full joint model (the construction of the joint

model corresponds to the last two blocks in Fig. 3)

is described by the following equations:

ˆy = µ

√

Σξ, ξ ∼ N (0, I), (8)

(µ

)

{

(Θv)

, i ∈ I(

Y ),

0, otherwise

(9)

(a)

(b)

Figure 5: Two examples of conditional sampling of motion:

(a) motion sampled with correlation analysis; (b) motion

sampled without correlation analysis.

Here (Θv)

denotes the corresponding coordinate,

obtained from Eq. (7). Without loosing generality,

we can reorder indices in such a way, that indices in

the sets I(

are put in the beginning, followed by

indices of the elements, which are independent from

semantic parameters. Then, the full covariance matrix

can be written as:

Σ =

(

0 Σ

)

(10)

Using the proposed model for generating a char-

acter based on semantic parameters is straightfor-

ward: ﬁrstly, semantic parameters are chosen to de-

termine the parameters of Gaussian distribution using

Eq (7),(8),(10); then a random point in PCA space is

sampled according to a normal distribution with de-

termined parameters. Finally, the coordinates of the

point are converted into shape-motion space using Eq

(5).

Another useful property of the model is the ability

to animate a given mesh, i.e. create the corresponding

motion, given as well as create a well-ﬁtting appear-

ance, i.e. mesh, based on a motion. Let an input shape

s be given. Firstly, the mesh is transformed in PCA

space. Secondly, we derive a sampling distribution in

the PCA space of ˆs by applying Bayes theorem. We

assume normal distributions and therefore are able to

sample from the conditional distribution to derive the

AStatisticalModelforCoupledHumanShapeandMotionSynthesis

231

mesh coordinates, corresponding to the motion and

vice verse:

P( ˆm|ˆs) =

P( ˆm, ˆs)

P( ˆs).

(11)

Since we assume a parametrized normal distributions

we can write:

ˆm|ˆs

= µ

ˆm

+ Σ

ˆm ˆs

−1

ˆm ˆm

( ˆs −µ

ˆs

) (12)

ˆm|ˆs

= Σ

ˆs ˆs

−Σ

ˆs ˆm

−1

ˆm ˆm

ˆm ˆs

(13)

ˆm = µ

ˆm|ˆs

+ Σ

ˆm|ˆs

ξ, ξ ∼ N (0, I) (14)

Afterwards, we convert the PCA coordinates of the

generated motion to the original motion space. One

can apply the similar procedure to derive the mesh

coordinates from a given motion.

As mentioned above, we separate coordinates,

correlated with semantic parameters. We do it to

avoid false dependencies between coordinates and to

be able to use free coordinates to our model to in-

troduce more variability into the generation process.

To show the importance of separating correlated co-

ordinates, we perform the following experiment: we

divide the databases into two disjoint parts - test and

train parts, train our model using train parts of each

of the databases and then perform motion sampling

conditioned on a mesh from the test part of the mesh

database using the model with correlation analysis

and without correlation analysis. As can be seen in

the Fig. 5, the model that uses all coordinates for

the regression (i.e. the model trained without correla-

tion analysis) fails to generate suitable motion, while

the model with correlation analysis produces realistic

motion. We explain that by the fact, that coordinates

of the mesh from the test set possibly are far from

the coordinates of the meshes in train set, and there-

fore false dependencies introduced by regression on

all coordinates have strong inﬂuence and produce im-

plausible results.

5 EVALUATION

As stated above, our model allows to generate

highly realistic shape-motion correlation with rela-

tively small afford. In this section, we evaluate sim-

ulated sequences in terms of motion ﬁdelity, and also

provide quantitative evaluation to show how our ap-

proach allows to avoid the motion retargeting prob-

lem.

There is a number of methods for visual ﬁdelity

evaluation proposed in the literature. They can be di-

vided into interrogation approaches and automatic ap-

proaches.

Interrogation-based approaches can be applied on

very different problems. In (Reitsma and Pollard,

2003), experiments with human observers were con-

ducted to determine the sensitivity of human percep-

tion to physical errors in motions and (Hodgins et al.,

1998) investigated, which types of anomalies added

to the motion disturb human perception the most. In

(Pra

ak and O’Sullivan, 2011) and (McDonnell et al.,

2008) human crowd variety was investigated and in-

ﬂuence of either motion or shape clones on the whole

crowd perception was investigated.

As an example of the automated motion evalua-

tion approaches, (Ren et al., 2005) proposed and eval-

uated automated data-driven method for assessing un-

naturalness of the human motion; in (Ahmed, 2004)

motion is evaluated in terms of feet sliding and gen-

eral smoothness, while in (Jang et al., 2008) phys-

ical plausibility of motion was evaluated. Unfortu-

nately, as also mentioned by (Ren et al., 2005), these

approaches depend a lot on the type of distortion, ap-

plied to motion, and therefore are directed on the eval-

uation of a speciﬁc type of motion generation.

Interrogation-based approaches provide more

general results and are more reliable and more suit-

able for our goal.

Therefore, we performed two experiments de-

signed as questionnaires and one additional experi-

ment for quantitative evaluation. 13 male and 10 fe-

male subjects took part in our experiment. The exper-

imental setup is described below.

5.1 Interrogation-based Evaluation

Experiment 1. In the ﬁrst experiment, we asked

the participants to compare two animations, one of

which was generated using the model and the second

one was generated by choosing motion and mesh ran-

domly from our databases. We prepared 3 pairs of

videos of 10 s. length. The parameters of the gen-

erated (or selected) characters are given in the Table

1. To diminish bias due to greater attractiveness of

one ﬁgure in comparison to another, we chose appear-

ances, that have approximately the same semantic pa-

rameters, e.g. compared two fat humans, or two thin

humans etc.

The perspective and surroundings for the ani-

mated ﬁgures, as well as textures of the ﬁgures were

the same. The participants were asked to indicate,

which animation from the pairs looked more realis-

tic; they also had the opportunity to state, that both

animations looked equally bad or equally good. The

results of the experiment are summarized in the Table

As the results show, the mesh-to-motion ﬁt de-

livers in general better results then just combining

mesh and motion without taking into account param-

GRAPP2013-InternationalConferenceonComputerGraphicsTheoryandApplications

232

Table 1: Semantic parameters (weight in kg., height in cm.)

of the samples used in the experiments.

our model selected

mesh

selected

motion

Experiment 1

pair 1 (40, 169) (53, 173) (86, 176)

pair 2 (70, 155) (44.7, 150) (75.2, 186)

pair 3 (110, 170) (120, 176) (49, 168)

Experiment 2

question 1 (90, 176) (102, 176) (86.3, 176)

question 2 (45, 169) (49, 169) (49.5168)

question 3 (77.9, 179) (77.9, 176) (77.9, 181)

question 4 (75, 178) (75.6, 178) (74.5, 179)

Table 2: Percentage of the participants, that accepted the

sequence generated with our model as more realistic, less

realistic and percentage of people, having difﬁculty to eval-

uate one of the motions as more natural.

pair 1 pair 2 pair 3 average

positive (%) 65.00 30.00 80.00 58.33

neutral (%) 15.00 15.00 10.00 13.33

negative (%) 20.00 55.00 10.00 28.33

eters of combined mesh and motion. We explain fail-

ure in the second pair of sequences due to generally

lesser attractiveness of the appearance sampled from

the model (see Fig. 6). In general, we observed, that

mismatch between mesh and motion due to difference

in weight is a lot easier to detect then mismatch due

to difference in height, because in the latter case pro-

portions of the person, unless some extreme cases are

taken (e.g. a six-year child and a tall adult) are the

same and therefore motion can ﬁt quite well. Some

more examples of subjects, sampled using our model,

are given in the Fig 7.

Taking into account, that during artiﬁcial charac-

ter animation it is usually difﬁcult to ﬁnd real motion

data that ﬁts exactly to the animated shape, our ap-

proach is beneﬁcial in terms of delivered result.

Experiment 2. In the second experiment, we asked

our participants to evaluate several sets of videos. In

each set, four videos were presented. The videos in

each set were produced as follows:

• A

: an animation was generated by matching

mesh and motion based on semantic parameters;

• A

: an animation was generated by sampling

mesh and motion from the model with the same

semantic parameters;

• A

: a mesh was taken from the database of meshes

and a motion was sampled from our model given

the mesh;

• A

: an animation, where a motion was taken from

(a) (b)

Figure 6: Shapes used in the second pair of sequences in ex-

periment 1 for comparison of visual ﬁdelity (a) mesh sam-

pled from the model (b) mesh from the original database.

(a) weight:40kg,

height:169cm

(b) weight:61kg,

height:171cm

height:178cm

(d) weight:110kg,

height:170cm

Figure 7: Examples of simulated characters.

the database of motion and a mesh was sampled

from our model given the motion.

The parameters of the samples are given in Table 1.

The participants should grade each video with marks

from 1 to 4, where 1 means very visually plausible

and 4 mean completely unrealistic. In Table 3, the

results are given in the form of percentage of partic-

ipants, that evaluated sequences with 1, 2, 3 or 4 re-

spectively.

As the results show, that our model produces

slightly better results as match of semantic parame-

ters, which is a doos result given that the two data

bases were only matched via semantic parameters

during training.The second most realistic animation

is delivered by motion sampling, conditioned on the

mesh, while appearance sampling (i.e. mesh sam-

pling) based on motion generates the same or even

slightly lower results, then semantic matching. We

AStatisticalModelforCoupledHumanShapeandMotionSynthesis

233

Table 3: Percentage of the participants, given marks from

1 to 4 and mean note for each algorithm. The notes can

vary in the interval [1, 4], where 1 means very realistic, and

4 means not visually plausible.

alg. 1 2 3 4 average

note

15.00 30.00 42.50 12.50 2.52

33.75 51.25 13.75 1.25 1.82

20.00 40.00 28.75 11.25 2.31

18.75 25.00 33.75 22.50 2.60

attribute this to the fact that motion still does not pro-

vide enough information to sample appearance accu-

rately.

We explain the superiority of our sampling meth-

ods with several arguments: ﬁrstly, mesh and mo-

tion sampled with our model are aligned better to

each other, and secondly, due to smoothing on the

PCA stage, small artifacts, appearing occasionally in

matched sequences, are smoothed away, and therefore

visually plausible result can be produced.

5.2 Quantitative Evaluation

There exist several approaches to solve the motion re-

targeting problem, and none of them is perfect. How-

ever, we can avoid this problem by generating mesh

and motion simultaneously. We evaluate the mesh-

motion ﬁt by comparing bone lengths of the mesh

skeleton b

and corresponding bone lengths of the

motion skeleton b

in terms of amount of scaling

needed:

= 1 −

(15)

Here j corresponds to a body part and s

denotes

the amount of scaling required to ﬁt mesh to motion.

For the motion and mesh from the same person, i.e.

when no scaling is required, the amount of scaling

equals zero. As shown in the Figure 8, the model cap-

tures dependencies in bone lengths in mesh with bone

lengths in motion and allows to generate pairs of mesh

and motion already scaled properly, so that no scaling

is required afterwards. We also evaluate mesh-motion

ﬁt in terms of conditional mesh sampling, when mo-

tion is given, and conditional motion sampling, when

mesh is given. We create pairs from the real data us-

ing semantic matching and for the same range of se-

mantic parameters we sample pairs from the model.

The results (Fig. 8) conﬁrm, that mesh and motion in

the pairs, sampled from the model, are already prop-

erly aligned to each other.

hand forearm shoulder torco crus thigh pelvis

−0.1

−0.08

−0.06

−0.04

−0.02

0.02

0.04

0.06

0.08

s is scale factor for the bones

model−gen. data, on sem. param

model−gen. data, cond. on mesh

model−gen. data, cond. on motion

sem.−matched real data

Figure 8: The mean amount of scaling required for each of

the body parts; green color corresponds to the pairs sampled

from the model, red color corresponds to the matched pairs.

When no scaling is required, the amount of scaling equals

5.3 Using the Model in Crowd

Simulation

As mentioned above, we also propose to use the

model for automatic character appearance and mo-

tion simulation of crowds. We leave out of scope of

this paper the question of crowd behavior simulation,

since it is an active area of research and numerous ap-

proaches exist (see, for example, (Guy et al., 2010),

(Narain et al., 2009)). By controlling the distribution

of the semantic parameters, it is possible to generate

a set of character meshes, that is speciﬁc to the scene.

In our example, we generate the crowd by randomly

placing sampled characters in space. For creating the

crowd in Fig. 9 and 10, we used a uniform distribution

of the semantic parameters weight and height. How-

ever, one can consider more advanced ways of setting

semantic parameters, if some speciﬁc distribution of

shapes and motions is required.

More demonstrations are provided in the supple-

mentary material.

6 FUTURE WORK

As our experiments have shown, there are several lim-

itations of our model. First of all, our model was ap-

plied to a cyclical motion, so its extension to arbi-

trary type of motion can represent some difﬁculties.

Secondly, the model can be used to generate samples

based on semantic parameters, when the semantic pa-

rameters are not very far from the range of the param-

eters used for training. However, when it is not the

case, produced results can look unrealistic. We hope,

that increasing size and variability of the datatbases

GRAPP2013-InternationalConferenceonComputerGraphicsTheoryandApplications

234

Figure 9: Example of crowd generation with semantic parameters weight and height uniformly distributed on [40, 70] and

[150, 180] accordingly. The animation itself can be viewed in additional materials.

Figure 10: Example of crowd generation with semantic parameters weight and height uniformly distributed on [50, 100] and

[150, 200] accordingly. The animation itself can be viewed in additional materials.

will help to generate characters with a wider range of

semantic parameters.

Therefore, in future we are planning to extend

our work on more complex motions, as well as

achieve better mesh-motion ﬁtting in the model train-

ing phase. More complex motions should ﬁrstly be

aligned using techniques such as time warping (Hsu

et al., 2007) and then the similar procedures as in

(Troje, 2002) should be applied to produce vector-

based representation of motions, suitable for analysis.

Another interesting topic is the use of more ad-

vanced stochastic models for mesh-motion coordi-

nates coupling to be able to capture non-linear depen-

dencies.

7 CONCLUSIONS

In our work, we proposed a parametrized model com-

bining shape and motion, that can be applied to gener-

ate realistic characters together with appropriate mo-

tion. While providing enough variability, the model

allows tight control over the generation process with

the help of semantic parameters, and has a probabilis-

tic formulation.

Although the usage of two independent databases

can be seen as a weakness of our approach, we want

to stress that the method was suggested to ﬁnd de-

pendencies in the initially unrelated databases and use

them to bridge these databases. Such an approach can

be advantageous when it is not possible or difﬁcult to

collect motion and shape data together from the same

subjects.

Our model can be used to generate completely

new meshes and motion, as well as to generate a spe-

ciﬁc motion for a given mesh or a speciﬁc shape using

an existing motion.

We also evaluated our model in three experiments,

that showed superiority of the proposed model in

terms of visual plausibility of created samples in com-

parison to simple matching based on semantic param-

eters. We furthermore avoid the retargeting problem,

since our model already contains necessary depen-

dencies in it.

The model can be easily extended to bigger

datasets, since all the algorithms used here in general

are computationally inexpensive.

Possible applications of our model include ani-

mating existing characters, creating appearance based

on motion, as well as creating a visually plausible

crowd.

ACKNOWLEDGEMENTS

We would like to thank Prof D. Fleet for the very help-

ful discussions and good advice.

AStatisticalModelforCoupledHumanShapeandMotionSynthesis

235

REFERENCES

Ahmed, A. A. H. (2004). Parametric synthesis of human

animation.

Anguelov, D., Srinivasan, P., Koller, D., Thrun, S., Rodgers,

J., and Davis, J. (2005). Scape: shape completion and

animation of people. ACM Trans. Graph, 24:408–416.

Baran, I. and Popovi

c, J. (2007). Automatic rigging and

animation of 3d characters. In ACM SIGGRAPH 2007

papers, SIGGRAPH ’07, New York, NY, USA. ACM.

Brand, M. and Hertzmann, A. (2000). Style machines.

In Proceedings of the 27th annual conference on

Computer graphics and interactive techniques, SIG-

GRAPH ’00, pages 183–192, New York, NY, USA.

ACM Press/Addison-Wesley Publishing Co.

Cootes, T. F., Edwards, G. J., and Taylor, C. J. (2001). Ac-

tive appearance models. IEEE Trans. Pattern Anal.

Mach. Intell., 23(6):681–685.

da Silva, M., Abe, Y., and Popovi

c, J. (2008). Interactive

simulation of stylized human locomotion. In ACM

SIGGRAPH 2008 papers, SIGGRAPH ’08, pages

82:1–82:10, New York, NY, USA. ACM.

Guy, S. J., Chhugani, J., Curtis, S., Dubey, P., Lin, M., and

Manocha, D. (2010). Pledestrians: a least-effort ap-

proach to crowd simulation. In Proceedings of the

2010 ACM SIGGRAPH/Eurographics Symposium on

Computer Animation, SCA ’10, pages 119–128, Aire-

la-Ville, Switzerland, Switzerland. Eurographics As-

sociation.

Hasler, N., Stoll, C., Sunkel, M., Rosenhahn, B., p. Seidel,

H., and Informatik, M. (2009). A statistical model of

human pose and body shape. computer graphics fo-

rum28.

Hodgins, J. K., O’Brien, J. F., and Tumblin, J. (1998).

Perception of human motion with different geomet-

ric models. IEEE Transactions on Visualization and

Computer Graphics, 4(4):101–113.

Hsu, E., da Silva, M., and Popovi

c, J. (2007). Guided time

warping for motion editing. In Proceedings of the

2007 ACM SIGGRAPH/Eurographics symposium on

Computer animation, SCA ’07, pages 45–52, Aire-la-

Ville, Switzerland, Switzerland. Eurographics Associ-

ation.

Jacka, D., Merry, B., and Reid, A. (2007). A comparison of

linear skinning techniques for character animation. In

In Afrigraph, pages 177–186. ACM.

Jang, W.-S., Lee, W.-K., Lee, I.-K., and Lee, J. (2008). En-

riching a motion database by analogous combination

of partial human motions. Vis. Comput., 24(4):271–

280.

Kendall, G. and Stuart, A. (1973). The Advanced Theory of

Statistics. Vol.2: Inference and: Relationsship. Grif-

ﬁn.

Lee, Y., Lee, S. J., and Popovi

c, Z. (2009). Compact charac-

ter controllers. In ACM SIGGRAPH Asia 2009 papers,

SIGGRAPH Asia ’09, pages 169:1–169:8, New York,

NY, USA. ACM.

Li, Y., Wang, Y. L. T., and yeung Shum, H. (2002). Mo-

tion texture: A two-level statistical model for charac-

ter motion synthesis. In ACM Transactions on Graph-

ics, pages 465–472.

Liu, C. K., Hertzmann, A., and Popovic, Z. (2005). Learn-

ing physics-based motion style with nonlinear inverse

optimization. ACM Trans. Graph, 24:1071–1081.

McDonnell, R., Larkin, M., Dobbyn, S., Collins, S., and

O’Sullivan, C. (2008). Clone attack! perception of

crowd variety. In ACM SIGGRAPH 2008 papers, SIG-

GRAPH ’08, pages 26:1–26:8, New York, NY, USA.

ACM.

Narain, R., Golas, A., Curtis, S., and Lin, M. C. (2009). Ag-

gregate dynamics for dense crowd simulation. In ACM

SIGGRAPH Asia 2009 papers, SIGGRAPH Asia ’09,

pages 122:1–122:8, New York, NY, USA. ACM.

Popovic, Z. and Witkin, A. P. (1999). Physically based mo-

tion transformation. In SIGGRAPH, pages 11–20.

Pra

ak, M. and O’Sullivan, C. (2011). Perceiving human

motion variety. In Proceedings of the ACM SIG-

GRAPH Symposium on Applied Perception in Graph-

ics and Visualization, APGV ’11, pages 87–92, New

York, NY, USA. ACM.

Reitsma, P. S. A. and Pollard, N. S. (2003). Per-

ceptual metrics for character animation: sen-

sitivity to errors in ballistic motion. In ACM

SIGGRAPH 2003 Papers, pages 537–542.

http://www.odysci.com/article/1010112995491572.

Ren, L., Patrick, A., Efros, A. A., Hodgins, J. K., and Rehg,

J. M. (2005). A data-driven approach to quantifying

natural human motion. ACM Trans. Graph, 24:1090–

1097.

Sidenbladh, H., Black, M. J., and Sigal, L. (2002). Implicit

probabilistic models of human motion for synthesis

and tracking. In In European Conference on Computer

Vision, pages 784–800.

Sok, K. W., Kim, M., and Lee, J. (2007). Simulating

biped behaviors from human motion data. ACM Trans.

Graph., 26(3).

Troje, N. F. (2002). Decomposing biological motion: A

framework for analysis and synthesis of human gait

patterns. In Journal of Vision, volume 2, pages 371–

387.

Wang, J. M., Fleet, D. J., and Hertzmann, A. (2008). Gaus-

sian process dynamical models for human motion.

IEEE Trans. Pattern Anal. Mach. Intell., 30(2):283–

298.

GRAPP2013-InternationalConferenceonComputerGraphicsTheoryandApplications

236